PetaMem has extensive corpora, whose volume is an equivalent of about 48.000
bible books. These corpora are the basis for the acquisition of various
knowledge about languages as such, as well as world knowledge in general.
As a result of the extensive statistical data that were gained from the
corpora, we can gain generic statements about probabilistic distributions
of chars, words or cphrases (collocations). This raw statistical data
again serve as basis for the decision trees of various stochastic algorithms.
The extraction of the corpora data gave one of the most extensive
wordlists with partially more than one million full forms per
language. We may claim the vocabulary of our systems being
quantitatively superior to that of a human.
By deploying a new generation of tagger it was possible to gain
morphosyntactic information from the raw material automatically
in wide parts (manual postprocessing was necessary in about 4% of
This information itself is the foundation of
semantic IR. Meanwhile, our systems possess various
interpretation modi for natural languages. By this, an analysis
of dialects and slang is possible offhand. This eases the
analysis of transliterations of spoken language.
An outstanding USP of our systems is the deep semantic
information in about every entry. By this, a clear and
qualitative delimitation between our systems and todays
conventional commercially available systems is drawn.
information opens up completedly new application areas and offers
results in the areas of information retrieval / data mining or
machine translation, that are unequalled up to now.
Specific ontologies are an essential component of e.g. dialogue
systems. PetaMem has two dedicated ontologies in the areas
pre-sales and support. These cover general
concepts of the respective fields of duty and can of course be
extended or adapted according to individual requirements.
The basis for general reasoning and inferences of our systems is
world knowledge with a presently very rudimentary
ruleset. Primarily this knowledge serves to provide a
contextsensitive disambiguation of possible interpretations of a
text or a dialogue. It is a first try to implement "commonsense
New World Knowledge
By means of the already present information and the continuous
extension and refinement of our inference mechanisms new world
knowledge is opened up undeviatingly. Though this happens
automatically, it is being verified manually. Of course he
extended datasets are available for our customers within the
scope of the continuous maintenance of their systems.