Javatools
The Javatools are a collection of Java classes for a variety of small tasks, such as parsing, database interaction or file handling. They were developed by Fabian M. Suchanek for the YAGO-NAGA project. The Javatools are licensed under a Creative Commons Attribution 3.0 License by the YAGO-NAGA team.
People
Downloads
The tools require Java version 1.6+ (download here). You can
- Download the Java tools (version: 20-03-2014)
- Browse the full documention
- Browse the short descriptions of the classes below
Tools
Parsing
| Char | Decodes, encodes and normalizes UNICODE, UTF8, HTML and URI/URL strings |
| DateParser | Parses and normalizes different date formats (e.g. "January 5th, 2000" or "500 BC") |
| Name | Provides primitive heuristics to recognize and parse person names and organization names |
| NounGroup | Splits a noun group (given by a String) into its modifiers and its head |
| NumberParser | Parses and normalizes complex number expressions (e.g. "10 million meters") |
| NumberFormatter | A simple number formatter |
| PlingStemmer | Stems an English noun to singular. Knows nearly all exceptions. |
| RegularExpression | Parses a Regular Expression and converts it to an automaton, allows to invert it |
Database
| Database | This abstract class provides a simple Wrapper for an SQL data base, including a bulk inserter. |
| MySQLDatabase | implements the Database-interface for a MySQL data base. |
| OracleDatabase | implements the Database-interface for an Oracle SQL data base |
| PostgresDatabase | implements the Database-interface for a Postgres SQL data base |
| ResultIterator | An Iterator across an SQL database ResultSet |
| SQLType | Implements SQL datatypes in a database-specific way |
| WordNet | provides a (non-database) wrapping for WordNet |
| DBWordNet | provides a database wrapping for WordNet |
Datatypes and Iterators
| ArrayQueue | Implements a simple non-blocking queue |
| BitVector | Implements a bit vector, i.e. a list of bits, a set of small integers |
| CombinedIterator | Combines multiple iterators to one iterator |
| CompressedString | Compresses a String in a potentially lossy way to 7, 6 or less bits per character (instead of 16) |
| DirectedGraph | Implements a directed graph wit ancestor finding |
| FilteredIterator | Implements an iterator that allows filtering out certain elements |
| FinalMap | Provides a nicer constructor for a TreeMap |
| FinalSet | Provides a very simple container implementation with zero overhead |
| Frequency Vector | Provides recall and precision measures on bags of words, including fuzzy recall and fuzzy precision, and Wilson Interval computation |
| Immutable | Wraps a list or a set so that it becomes immutable. |
| IntSet | implements a set of small integers as a bit vector with constant time access |
| IterableForIterator | Wraps an iterator so that it can be used in a for-each-loop |
| IterableForEnumeration | Wraps an untyped enumeration into a typed iterator |
| MappedIterator | implements an iterator that maps each element by a function before yielding it |
| Pair | For the simple datatype Pair |
| PeekIterator | An Iterator that can look ahead (peek) one element |
| SmallStack | Implements a fast stack for int, long, double |
| SparseVector | Represents a Sparse Vector, i.e. a vector that has only few non-zero entries. Implements k-means |
| SmallStack | Represents an efficient stack for primitive datatypes (int, long, double, boolean) |
| SVMModel | Implements an SVM-light-Model |
| Tree | For the simple datatype Tree |
| Trie | implements a trie (an efficient set of strings based on prefixes) |
| UndirectedGraph | implements an undirected graph |
| Visitor | For the common visitor design pattern |
| Visitable | For the common visitor design pattern |
Administrative
| Announce | Provides an easy log writer with timed progress bars |
| D | Provides convenience methods for Input/Output. Allows to do basic I/O with easy procedure calls -- nearly like in normal programming languages. Furthermore, the class provides basic set operations for EnumSets. |
| CallStack | Allows to retrieve the method name and the source code line number of the current code position at runtime |
| Parameters | Provides an interface for an initialization/configuration/properties-File |
| Tracer | Provides an tracer that can figure out where a program hangs if a method runs for longer than a given time period |
FileHandling
| CSVFile | Writing to a comma-separated file (CSV file). |
| CSVLines | Can iterate through the columns of a comma-separated file (CSV file). |
| DeepFileSet | Represents a set of files as given by a wildcard string. Can recurse subfolders. |
| FigureProducer | Produces Latex tables and JPG plots for table data |
| FileLines | Provides an iterator over the lines in a file |
| FileSet | Represents a set of files as given by a wildcard string. Does not include folders, is not case-sensitive. |
| HTMLReader | Reads characters from an HTML-file |
| MatchReader | Provides an iterator over Regular Expression matches in a file |
| SimpleInputStreamReader | Reads characters from a file, regardless of the encoding |
| SimpleOutputStreamWriter | Writes character to a file, regardless of the encoding (see here for a problem description) |
| UTF8Reader | Reads characters from an UTF8-encoded file |
| UTF8Writer | Writes UTF8-encoded characters to a file |