Javatools

The Javatools are a collection of Java classes for a variety of small tasks, such as parsing, database interaction or file handling. They were developed by Fabian M. Suchanek for the YAGO-NAGA project. The Javatools are licensed under a Creative Commons Attribution 3.0 License by the YAGO-NAGA team.

People

Downloads

The tools require Java version 1.6+ (download here). You can

  • Download the Java tools (version: 20-03-2014)
  • Browse the full documention
  • Browse the short descriptions of the classes below

Tools

    Parsing

Char Decodes, encodes and normalizes UNICODE, UTF8, HTML and URI/URL strings
DateParser Parses and normalizes different date formats (e.g. "January 5th, 2000" or "500 BC")
Name Provides primitive heuristics to recognize and parse person names and organization names
NounGroup Splits a noun group (given by a String) into its modifiers and its head
NumberParser Parses and normalizes complex number expressions (e.g. "10 million meters")
NumberFormatter A simple number formatter
PlingStemmer Stems an English noun to singular. Knows nearly all exceptions.
RegularExpression Parses a Regular Expression and converts it to an automaton, allows to invert it

 

    Database

Database This abstract class provides a simple Wrapper for an SQL data base, including a bulk inserter.
MySQLDatabase implements the Database-interface for a MySQL data base.
OracleDatabase implements the Database-interface for an Oracle SQL data base
PostgresDatabase implements the Database-interface for a Postgres SQL data base
ResultIterator An Iterator across an SQL database ResultSet
SQLType Implements SQL datatypes in a database-specific way
WordNet provides a (non-database) wrapping for WordNet
DBWordNet provides a database wrapping for WordNet

 

    Datatypes and Iterators

ArrayQueue Implements a simple non-blocking queue
BitVector Implements a bit vector, i.e. a list of bits, a set of small integers
CombinedIterator Combines multiple iterators to one iterator
CompressedString Compresses a String in a potentially lossy way to 7, 6 or less bits per character (instead of 16)
DirectedGraph Implements a directed graph wit ancestor finding
FilteredIterator Implements an iterator that allows filtering out certain elements
FinalMap Provides a nicer constructor for a TreeMap
FinalSet Provides a very simple container implementation with zero overhead
Frequency Vector Provides recall and precision measures on bags of words, including fuzzy recall and fuzzy precision, and Wilson Interval computation
Immutable Wraps a list or a set so that it becomes immutable.
IntSet implements a set of small integers as a bit vector with constant time access
IterableForIterator Wraps an iterator so that it can be used in a for-each-loop
IterableForEnumeration Wraps an untyped enumeration into a typed iterator
MappedIterator implements an iterator that maps each element by a function before yielding it
Pair For the simple datatype Pair
PeekIterator An Iterator that can look ahead (peek) one element
SmallStack Implements a fast stack for int, long, double
SparseVector Represents a Sparse Vector, i.e. a vector that has only few non-zero entries. Implements k-means
SmallStack Represents an efficient stack for primitive datatypes (int, long, double, boolean)
SVMModel Implements an SVM-light-Model
Tree For the simple datatype Tree
Trie implements a trie (an efficient set of strings based on prefixes)
UndirectedGraph implements an undirected graph
Visitor For the common visitor design pattern
Visitable For the common visitor design pattern

 

    Administrative

Announce Provides an easy log writer with timed progress bars
D Provides convenience methods for Input/Output. Allows to do basic I/O with easy procedure calls -- nearly like in normal programming languages. Furthermore, the class provides basic set operations for EnumSets.
CallStack Allows to retrieve the method name and the source code line number of the current code position at runtime
Parameters Provides an interface for an initialization/configuration/properties-File
Tracer Provides an tracer that can figure out where a program hangs if a method runs for longer than a given time period

 

    FileHandling

CSVFile Writing to a comma-separated file (CSV file).
CSVLines Can iterate through the columns of a comma-separated file (CSV file).
DeepFileSet Represents a set of files as given by a wildcard string. Can recurse subfolders.
FigureProducer Produces Latex tables and JPG plots for table data
FileLines Provides an iterator over the lines in a file
FileSet Represents a set of files as given by a wildcard string. Does not include folders, is not case-sensitive.
HTMLReader Reads characters from an HTML-file
MatchReader Provides an iterator over Regular Expression matches in a file
SimpleInputStreamReader Reads characters from a file, regardless of the encoding
SimpleOutputStreamWriter Writes character to a file, regardless of the encoding (see here for a problem description)
UTF8Reader Reads characters from an UTF8-encoded file
UTF8Writer Writes UTF8-encoded characters to a file