Technology / Data Pool
Data Pool



PetaMem has extensive corpora, whose volume is an equivalent of about 48.000 bible books. These corpora are the basis for the acquisition of various knowledge about languages as such, as well as world knowledge in general.


As a result of the extensive statistical data that were gained from the corpora, we can gain generic statements about probabilistic distributions of chars, words or cphrases (collocations). This raw statistical data again serve as basis for the decision trees of various stochastic algorithms.


The extraction of the corpora data gave one of the most extensive wordlists with partially more than one million full forms per language. We may claim the vocabulary of our systems being quantitatively superior to that of a human.


By deploying a new generation of tagger it was possible to gain morphosyntactic information from the raw material automatically in wide parts (manual postprocessing was necessary in about 4% of all cases).
This information itself is the foundation of semantic IR. Meanwhile, our systems possess various interpretation modi for natural languages. By this, an analysis of dialects and slang is possible offhand. This eases the analysis of transliterations of spoken language.


An outstanding USP of our systems is the deep semantic information in about every entry. By this, a clear and qualitative delimitation between our systems and todays conventional commercially available systems is drawn.
This information opens up completedly new application areas and offers results in the areas of information retrieval / data mining or machine translation, that are unequalled up to now.


Specific ontologies are an essential component of e.g. dialogue systems. PetaMem has two dedicated ontologies in the areas pre-sales and support. These cover general concepts of the respective fields of duty and can of course be extended or adapted according to individual requirements.

World Knowledge

The basis for general reasoning and inferences of our systems is world knowledge with a presently very rudimentary ruleset. Primarily this knowledge serves to provide a contextsensitive disambiguation of possible interpretations of a text or a dialogue. It is a first try to implement "commonsense knowledge".

New World Knowledge

By means of the already present information and the continuous extension and refinement of our inference mechanisms new world knowledge is opened up undeviatingly. Though this happens automatically, it is being verified manually. Of course he extended datasets are available for our customers within the scope of the continuous maintenance of their systems.

Contact | News | Solutions | Services | Sitemap
PetaMem Group

  legal disclaimer Copyright (C) 2002-present PetaMem, s.r.o. last changed: March 2, 2016 9:40 am, Feedback: