Ключові слова:

textual corpora, semantic analysis, distribution, search mask, query language, data mining


In spite of the rapid development of textual corpora along with that of the tools of processing them, many potential users are not fully aware of their utility for solving a wide range of text formulating problems. Beyond a quite straightforward strategy such as usage of asterisks and checking out collocations, the modern corpus tools are characterised by a high potential in solving also a wide range of semantic issues regarding grammar and vocabulary. Knowing the usage of search masks, part-of-speech, morphological and semantic tags is of great help in formulating pertinent queries. Although the semantic tagging in actual corpora is quite rare, it is a very promising feature; its application is still hindered by polysemy of semantic tags. Before being “translated” into a formal query language, a logical solution should be found on the basis of formal properties of linguistic signs by applying analysis of distributional (colligational and collocational) potentiality, substitution, calque, and morphological analysis. Substitution allows to extrapolate properties from one unit to another within the same semantic group; distribution offers the possibility to unveil several semantic components in the context, and, vice versa, to find out an expected lexeme by its hypothetical surrounding; calque is a powerful tool within the trial and error strategy for finding potential equivalents; analysis of frequency is helpful at the stage of results’ interpretation and evaluation of their reliability. Combination of these methods allows users to solve orthographic, punctuation, morphological, syntactic and lexical problems arising both in monolingual communication and translation, including translation and data mining.


