Ключові слова:

textual corpora, semantic analysis, distribution, search mask, query language, data mining


In spite of the rapid development of textual corpora along with that of the tools of processing them, many potential users are not fully aware of their utility for solving a wide range of text formulating problems. Beyond a quite straightforward strategy such as usage of asterisks and checking out collocations, the modern corpus tools are characterised by a high potential in solving also a wide range of semantic issues regarding grammar and vocabulary. Knowing the usage of search masks, part-of-speech, morphological and semantic tags is of great help in formulating pertinent queries. Although the semantic tagging in actual corpora is quite rare, it is a very promising feature; its application is still hindered by polysemy of semantic tags. Before being “translated” into a formal query language, a logical solution should be found on the basis of formal properties of linguistic signs by applying analysis of distributional (colligational and collocational) potentiality, substitution, calque, and morphological analysis. Substitution allows to extrapolate properties from one unit to another within the same semantic group; distribution offers the possibility to unveil several semantic components in the context, and, vice versa, to find out an expected lexeme by its hypothetical surrounding; calque is a powerful tool within the trial and error strategy for finding potential equivalents; analysis of frequency is helpful at the stage of results’ interpretation and evaluation of their reliability. Combination of these methods allows users to solve orthographic, punctuation, morphological, syntactic and lexical problems arising both in monolingual communication and translation, including translation and data mining.


Ah-Hwee, T. (1999). Text Mining: The state of the art and the challenges. In Ning Zhong, Lizhu Zhou (Eds.) Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases. Berlin, Heidelberg, New York; Barcelona, Budapest; Hong Kong; London; Milan; Paris; Singapore; Tokyo: Springer (pp 65-70). Retrieved October 29, 2019 from http://www.ntu.edu.sg/home/asahtan/papers/tm_pakdd99.pdf (in English)

Belyayeva, L.N. (2011). Korpusnaya Lingvistika i perevod: potentsial i ogranichenia [Corpus Linguistics and Translation: potentiality and limitations] (pp. 86-91). In Zakharov V.P. (Ed.) Trudy mezhdunarodnoy konferentsii "Korpusnaya lingvistika-2011". Saint-Petersburg: Edition of Philological Faculty (in Russian).

Bowker, L. (2000). A Corpus-Based Approach to Evaluating Student Translations. TheTranslator, 6(2), 183-210 (in English).

Bowker, L. (2001). Towards a Methodology for a Corpus Based Approach to Translation Evaluation. Meta, 2001, 46, #2, 345-364. Retrieved October 29, 2019 from DOI: https://doi.org/10.7202/002135ar (in English)

Burnard, L. (2005). Metadata for corpus work, in M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice (pp. 30-46.) Retrieved October 29, 2019 from http://users.ox.ac.uk/~lou/wip/metadata.html#HDR (in English)

Chantal Pérez Hernández, M. (2002). Explotación de los córpora textuales informatizados para la creación de bases de datos terminológicas basadas en el conocimiento. Estudios de Lingüística del Español. Universidad de Málaga. Vol. 18.: Retrieved October 29, 2019 from http://elies.rediris.es/elies18/ (in Spanish)

Corpas Pastor, G. (2004). Localización de recursos y compilación de corpus via Internet: aplicación para la didáctica de la traducción médica especializada In C. Gonzalo García; V. García Yebra (Eds.). Manual de documentación y terminología para la traducción especializada. Madrid: Arco/Libros. (in Spanish)

Corpus de Referencia del Español Actual. (2008.). Corpus de Referencia del Español Actual (CREA). Retrieved October 29, 2019 from http://www.rae.es/recursos/banco-de-datos/crea (in Spanish)

Davies, Mark. (2004-) British National Corpus (from Oxford University Press). Retrieved October 29, 2019 from https://corpus.byu.edu/bnc/ (in English)

English Language & Usage Stack Exchange. (2019). Retrieved October 29, 2019 from https://ell.stackexchange.com/questions/178126/express-interest-in-toward-to-something (in English)

Fokin, S.B. (2012). Dystrybutyvnyj analiz pry ukladanni dvomovnykh perekladnykh slovnykiv (na prykladi ukrayins "ko-ispanskykh vidpovidnykiv polya “osvita”) [Distribution analysis in compilation bilingual translational dictionaries, case of Spanish-Ukrainian correspondances in the field ‘Education’]. In Problemy semantyky, prahmatyky ta kohnityvnoyi linhvistyky – К.: Київський національний університет імені Тараса Шевченка, 2012. – Вип. 21. – C. 490-500.(in Ukrainian)

Fokin, S.B. (2017). Kompiuterna leksykohrafia i pereklad [Computational lexicography and translation]. Kyiv: Taras Shevchenko National University of Kyiv. (in Ukrainian)

Gak, V.G. (1998). Yazykovye preobrazovaniya [On linguistic transformations]. Moscow: Shkola yazyka i russkoy kultury.

García Meseguer, A. (2006). Nombres temporales “alba”, “amanecer”, “madrugada”. Punto y coma: boletín de los trarudctores españoles de las instituciones de la Unión Europea, nº 100, 27-28. Retrieved October 29 (pp. 27-28) 2019 from http://ec.europa.eu/translation/spanish/magazine/documents/pyc_100_es.pdf (in Spanish)

Hansard corpus of British Parliament speeches. (2016). Retrieved October 29, 2019 from https://www.hansard-corpus.org/ (in English)

Hassani, G.A. (2011). Corpus-Based Evaluation Approach to Translation Improvement. Meta. Revue des Traducteurs, Vol. 56, Issue 2, 351-373 Retrieved October 29, 2019 DOI: https://doi.org/10.7202/1006181ar (in English)

IdeoPhrase. (2019). Onomasiological Multilingual Dictionary of Phraseological Synonyms. Retrieved October 29, 2019 from http://postup.zzz.com.ua/IdeoPhrase.html# (in English)

Jensen, V., Mousten B. & Laursen A.L. (2012). Electronic Corpora as Translation Tools: A Solution in Practice. Communication and Language at Work-ICT Tools and Professional Language 1(1), 21-33. Retrieved October 29, 2019 from https://pdfs.semanticscholar.org/07b8/5e09bb1aad0dc74d1cff618f4704183caa92.pdf (in English)

Multitran. (n. d.). Retrieved October 29, 2019 from https://www.multitran.com/ (in English)

One Look Reverse Dictionary. (n. d.). Retrieved October 29, 2019 from https://www.onelook.com/reverse-dictionary.shtml (in English)

Orozco-Jutorán, M. (2018). Efficient Search for Equivalents at Your Fingertips – The Specialized Translator’s Dream. In Meta. Revue des Traducteurs, Vol. 62, 1, pp. 1-241. Retrieved October 29, 2019 from DOI: https://doi.org/10.7202/1040470ar (in English)

Ortega y Gasset, J. (1983). Sobre el fascismo. In Revista de Occidente (Ed.) Obras completas. Tomo III. Madrid: Alianza editorial, Revista de occidente. pp. 489-497. (in Spanish)

Oxford Guide to English. (2002). Retrieved October 29, 2019 from https://www.uop.edu.jo/download/research/members/oxford_guide_to_english_grammar.pdf (in English)

Sharoff, S. (2006). Translation as problem solving: uses of comparable corpora. In E. Yuste (Ed.) Proceeding of Third International Workshop on Language Resources for Translation Work, Research and Training at LREC (pp. 24-28). Magazzini del Cotone ConferenceCentre, Genoa, Italy. Paris: ELRA / ELDA (European Language Resources Association,European Language Resources Distribution Association). (in English)

Tatsenko, N. (2018). Grammatical parameters of the notional modus of EMPATHY concept lexicalised in modern English discourse [Text] / N. Tatsenko. In Advanced Education, Issue 9, 148-153. Retrieved October 29, 2019 from DOI: 10.20535/2410-8286.107093 (in English).