Aiaioo Labs

 Data Annotation Tool (DAT)

The DAT is a product for the annotation of textual data by a distributed team.

It lowers the cost and improves the scalability of corpora annotation.

It can be used to create corpora for parsing (grammar annotations), transliteration, entity/relation/event extraction, and sentiment analysis.

Click here for a demo of the DAT.


 Corpus Development

Corpora can refer to unprocessed data extracted from the internet or to manually curated information.

We can help you automatically extract or manually build the following linguistic resources:

Gazetteers

Gazetteers are word-lists. These would contain words or terms or names satisfying certain conditions - examples are place names, actors' names, etc. These can be useful in classifiers and rules-engines.

Lexicons

Lexicons or lexica are basically dictionaries. They can map words to meanings, or to root forms (morphological dictionaries) or to synonyms or antonyms.

Labelled Training Data

For a number of Machine Learning applications, running text needs to be marked/tagged with information. This labelled corpus is then used to train algorithms to automatically perform certain tasks.

We are in the process of establishing a network of linguists and computer science professionals who can help in corpus development tasks.

Unlabelled Training Data

We are in the process of collecting large corpora of unlabelled text in a number of languages. Thesse resources can be made available for use in unlabelled machine learning tasks.

Testing Data

Machine Learning algorithms are evaluated by running them on marked/tagged text. The amount of marked text required for testing is usually less than the amount of text required for training.

We can help develop test corpora and test strategies.

Ontologies

Ontologies are collections of words and concepts with relationships between them. Some relationships, such as hyponymy and hypernymy might be possible to extract from documents by pattern analysis.