Syntax analyzer

DictaScope Syntax

Parser that builds a dependency tree for an input sentence of natural language (Russian).

While parser builds the tree morphological homonymy is removed, words (tokens) obtain the grammatical meaning, for each subordinate connection is determined its type. Sentence segmentation is also performed: an extraction of simple sentences from the complex ones, extraction of [оборотов], including nested - participial, adjectival, etc., detection of series of homogeneous members. The functional homonymy of punctuations is removed, their roles are defined. Accounting punctuation allows a correct analysis of long complex sentences.

Some of multipart objects (organizations, dates, etc.) are extracted. Each compound object is represented in a tree as a vertex with syntactic relations.

Optional:

  • input text can be broken into separate sentences;
  • some spelling mistakes can be corrected;
  • surface-semantic analysis is performed: the action, the subject and the object is identificated for each sentence, including the simple sentences in the complex ones.

The result may be given out as XML-document.

The program operation requires a morphological dictionary and the lexical analyzer.

The DictaScope Syntax analyzer's kernel implements universal language dependencies, making it possible to develop analyzers of different languages on a single platform. The experimental versions of analyzer for English and German languages were released. Russian version is currently the most developed one.

The program comes in the form of a dynamic link library for Windows or FreeBSD operating system.