Javatools

The Javatools are a collection of Java classes for a variety of small tasks, such as parsing, database interaction or file handling. They were developed by Fabian M. Suchanek for the YAGO-NAGA project. The Javatools are licensed under a Creative Commons Attribution 3.0 License by the YAGO-NAGA team.

People

    Downloads

    The tools require Java version 1.6+ (download here). You can

    • Download the Java tools (version: 20-03-2014)
    • Browse the full documention
    • Browse the short descriptions of the classes below

    Tools

        Parsing

    CharDecodes, encodes and normalizes UNICODE, UTF8, HTML and URI/URL strings
    DateParserParses and normalizes different date formats (e.g. "January 5th, 2000" or "500 BC")
    NameProvides primitive heuristics to recognize and parse person names and organization names
    NounGroup Splits a noun group (given by a String) into its modifiers and its head
    NumberParserParses and normalizes complex number expressions (e.g. "10 million meters")
    NumberFormatter A simple number formatter
    PlingStemmerStems an English noun to singular. Knows nearly all exceptions.
    RegularExpressionParses a Regular Expression and converts it to an automaton, allows to invert it

     

        Database

    DatabaseThis abstract class provides a simple Wrapper for an SQL data base, including a bulk inserter.
    MySQLDatabase implements the Database-interface for a MySQL data base.
    OracleDatabase implements the Database-interface for an Oracle SQL data base
    PostgresDatabase implements the Database-interface for a Postgres SQL data base
    ResultIteratorAn Iterator across an SQL database ResultSet
    SQLTypeImplements SQL datatypes in a database-specific way
    WordNet provides a (non-database) wrapping for WordNet
    DBWordNet provides a database wrapping for WordNet

     

        Datatypes and Iterators

    ArrayQueueImplements a simple non-blocking queue
    BitVectorImplements a bit vector, i.e. a list of bits, a set of small integers
    CombinedIteratorCombines multiple iterators to one iterator
    CompressedStringCompresses a String in a potentially lossy way to 7, 6 or less bits per character (instead of 16)
    DirectedGraphImplements a directed graph wit ancestor finding
    FilteredIteratorImplements an iterator that allows filtering out certain elements
    FinalMap Provides a nicer constructor for a TreeMap
    FinalSet Provides a very simple container implementation with zero overhead
    Frequency Vector Provides recall and precision measures on bags of words, including fuzzy recall and fuzzy precision, and Wilson Interval computation
    Immutable Wraps a list or a set so that it becomes immutable.
    IntSetimplements a set of small integers as a bit vector with constant time access
    IterableForIteratorWraps an iterator so that it can be used in a for-each-loop
    IterableForEnumerationWraps an untyped enumeration into a typed iterator
    MappedIteratorimplements an iterator that maps each element by a function before yielding it
    PairFor the simple datatype Pair
    PeekIteratorAn Iterator that can look ahead (peek) one element
    SmallStackImplements a fast stack for int, long, double
    SparseVector Represents a Sparse Vector, i.e. a vector that has only few non-zero entries. Implements k-means
    SmallStackRepresents an efficient stack for primitive datatypes (int, long, double, boolean)
    SVMModel Implements an SVM-light-Model
    TreeFor the simple datatype Tree
    Trieimplements a trie (an efficient set of strings based on prefixes)
    UndirectedGraphimplements an undirected graph
    Visitor For the common visitor design pattern
    Visitable For the common visitor design pattern

     

        Administrative

    AnnounceProvides an easy log writer with timed progress bars
    DProvides convenience methods for Input/Output. Allows to do basic I/O with easy procedure calls -- nearly like in normal programming languages. Furthermore, the class provides basic set operations for EnumSets.
    CallStackAllows to retrieve the method name and the source code line number of the current code position at runtime
    Parameters Provides an interface for an initialization/configuration/properties-File
    TracerProvides an tracer that can figure out where a program hangs if a method runs for longer than a given time period

     

        FileHandling

    CSVFileWriting to a comma-separated file (CSV file).
    CSVLinesCan iterate through the columns of a comma-separated file (CSV file).
    DeepFileSet Represents a set of files as given by a wildcard string. Can recurse subfolders.
    FigureProducerProduces Latex tables and JPG plots for table data
    FileLines Provides an iterator over the lines in a file
    FileSet Represents a set of files as given by a wildcard string. Does not include folders, is not case-sensitive.
    HTMLReader Reads characters from an HTML-file
    MatchReaderProvides an iterator over Regular Expression matches in a file
    SimpleInputStreamReaderReads characters from a file, regardless of the encoding
    SimpleOutputStreamWriterWrites character to a file, regardless of the encoding (see here for a problem description)
    UTF8ReaderReads characters from an UTF8-encoded file
    UTF8WriterWrites UTF8-encoded characters to a file