PetaMem Language Technology Online Demo
A robust automatic language identification of a text is one of the basic
conditions for natural language processing. It is because of this information
that higher language-specific processes can make further analysis.
PetaMem has a total of four procedures how to determine the language of a
text. These are basically founded on statistical, syntactical and semantical
analytical methods. The procedures differ in terms of throughput and
characteristics of the classification and thus cover all cases that occur
Let our system determine the language
of your text. (List of the available languages)
The dictionary is an essential component for machine translation systems.
PetaMem has detailed syntactic and semantic information for every word entry
contained herein is also translation information into other languages.
The deep semantic information for every word entry allows a disambiguation
and thus finally usefull machine translations
Have a look up in our electronic dictionary.
(reduced dataset for non-members)
Our datasets are based on data we gained from processing of ca. 240GB corpora
of several natural
languages. Among other information, PetaMem has extensive histograms of
statistical distributions of words in the respective languages. This information
together with rule-based processes of mapping various character classes onto
each other, it is possible to reconstruct various datasets.
You'll find a demonstration of reconstruction of diacritics of various languages
Convert numbers in the interval from 0 - 999 999 999 or (cardinal) numerals into
the word or number form.
Go to the numerical conversions here.
Data Conversion (coming soon)
Language data is present in several various formats. To provide our technology the
highest possible flexibility regarding its integration into existing business
processes, our systems our systems have conversion mechanisms that cover a wide
range of formats.
Convert your text documents into another format