Technology / NLP & NLU / Basic Features
Google
 
Basic Features


Basic Features
Complex Features
This section should give you an Overview about the basic features of the PetaMem LangSuite System.

Basic Features of the NLP-Core

The NLP-Core is the central component of our natural language processing system. It provides basic functionality and is necessary in every configuration.

The architecture of the system is a distributed client/server model. The system runs on all relevant UNIX-derivatives (Solaris, AIX, HP-UX, Linux) and is scalable on hardware from a classical workstation over beowulf-clusters up to modern mainframe architectures. The internal encoding and all system relevant parameters are designed for the processing of arbitrary natural languages (including arabic and asiatic char systems).

Identification of the Language of a Text

A central feature of NLP-systems is the robust classification of input data. This includes a correct assignment of the language of the text itself,

PetaMem utilizes no less than four methods for the identification of a given text. These methods range from fast methods with broad language coverage based on statistical methods, over dictionary-based robust identification up to deep semantic analysis of a given text.

These methods are applied in a cascade-like way to ensure fast and reliable operation. With deep semantic analysis a german text containing an english citation which itself is much longer than the text, but containing a short german excerpt of the english text is classified as a german and not as an english text.

Communication Interfaces & Data Formats

The functionality of the central NLP components can be made accessible via various interfaces. THe following list should give an overview about the currently available interfaces as well as the resulting application areas of the whole system:
Mail
The mail interface allows for a connection to any SMTP-able mail system as well as the management of an arbitrary amount of mail peers (folders). As incoming mails are all formats according to RFC 822, 2822 and 1341 accepted. Nevertheless the system tries to process malformed emails as robust as possible. Accepted data formats are ASCII, DOC, RTF, UTF8, HTML, PDF, PS and others.

The generation of answer mails takes place strictly according to RFC with help of the mail system installed on the host.

Moreover the mail interface takes care of a efficient archiving and access of previous correspondence according to various criteria and allows that way for a continuous discourse based on previous dialogue history.

Web
The implemented web interface is also designed for bidirectional data streams. Input takes place via classical HTML forms, Javascript or JAVA, output of outgoing data is served HTML compliant by means of a web server.

GUI/CLI
Detailed documentation of the APIs allows for access of the system functionality from the command line as well as from graphical user interfaces. With this a broad spectrum of applications is possible. Starting with the integration of powerfull spell checkers and thesauri up to human quality machine translations.
For all available interfaces does apply, that the underlying control logic allows for a matrix-like assignment of the data streams. This way all combinations are possible. E.g. some web input generating an apropriate answering email or incoming emails affecting web content apropriatedly.

The system uses Unicode natively and is thus compatible to all national character encoding systems. Conversions in and from specific coding systems are done if required.

Summary Technical Data

Hardware & Architecture
Operating System    Solaris, AIX, HP-UX, Linux
HardwareUltraSparc64, x86, pa-risc, p-series
ClusteringBeowulf-Cluster
ArchitectureDistributed Client/Server


 
Contact | News | Solutions | Services | Sitemap
PetaMem Group

  legal disclaimer Copyright (C) 2002-present PetaMem, s.r.o. last changed: March 2, 2016 9:40 am, Feedback: webmaster@petamem.com