DOI: https://doi.org/10.11649/cs.2010.013

Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)

Ludmila Dimitrova, Violetta Koseska-Toszewa, Danuta Roszko, Roman Roszko

Abstract


Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)

In this paper we present applications of a trilingual corpus in language research. Comparative and contrastive studies of Polish and Bulgarian as well as Polish and Lithuanian have been already conducted, but up to the best of our knowledge no such studies exist for Bulgarian and Lithuanian. On the one hand, it is interesting to note that two Slavic languages are compared to a Baltic language (Lithuanian). On the other hand, the three languages are marginally present in the EU because of the later ascension of the three countries to the EU. The paper shortly describes the first electronic Bulgarian–Polish–Lithuanian experimental corpus, currently under development only for research. We also focus our attention on the morphosyntactic annotation of the parallel trilingual corpus according to the Corpus Encoding Standard: we present a review of the Part-of-Speech (POS) classification of the participle in the three languages – Bulgarian, Polish, and Lithuanian in comparison to another POS, the adjective. We briefly discuss tagsets for corpus annotation from the point of view of possible unification in the future with some examples.


Keywords


multilingual electronic corpora; parallel and comparable corpora; corpus annotation; lexical databases; multilingual electronic dictionaries

Full Text:

PDF in English


Copyright (c) 2015 Ludmila Dimitrova, Violetta Koseska-Toszewa, Danuta Roszko, Roman Roszko

License URL: http://creativecommons.org/licenses/by/3.0/pl/