ON THE BENEFITS OF FOREIGN LANGUAGE LEARNING BASED ON PARALLEL LANGUAGE CORPUS

A recently observed strong interest in language corpora, which can be defined as a collection of texts in an electronic format, as well as my work within the European Project Clarin on ‘The Parallel Polish-Bulgarian-Russian Corpus’ became the reason for writing the text concerning the use of the parallel language corpus for learning a foreign language. The article discusses the benefits resulting from the use of such a corpus in learning a foreign language, describes selected corpus language tools supporting the learning process as well as indicates some threats arising from the wrong use of the corpus.

1. Language corpora, because of the dynamic development of corpus linguistics and computer technologies, are extremely important research material, used among others in applied linguistics (cf. Hunston, 2002) and lexicography (cf. Ooi, 1998).In my view, language corpora should play a bigger role in developing language skills than to date.In recent years many researchers (Maia, 2003, p. 52), academic teachers and not a very numerous group of foreign language teachers, who attended conferences dedicated to the issue, have taken to these tools.
The incorporation of corpus methodology and introduction of corpus tools to foreign languages learning should become a fact.The creative search for words, the possibility of contrasting units, constructions and utterances in language A with language B and others would probably lead to many individual discoveries and give answers to many questions which are essential at a given stage of learning.The questions which so far almost only foreign language teachers have been able to answer.
In learning a foreign language both corpora of one language as well as parallel corpora can be useful.Annotated and unedited (electronic) text collections are important since both types of material provide linguistic data worth looking at.
The benefits of teaching Polish, Bulgarian or Russian languages on the basis of parallel corpus are merely signalled in the article.'The Parallel Polish-Bulgarian-Russian Corpus', 1 to which the author of the article with the team of corpus linguistics and semantics had contributed, became a fundamental source and, among other lexicographic aids, helped in writing the first volume of 'The Contemporary Bulgarian-Polish Dictionary' (cf.Satoła-Staśkowiak & Koseska-Toszewa, 2014).It is also the basis of a trilingual Russian-Bulgarian-Polish dictionary which is being compiled (Kisiel, Koseska, Sosnowski).
1.1.'The Parallel Polish-Bulgarian-Russian Corpus' contains text collections exceeding 6 million forms (common value for three languages).The collected materials are made up of pieces of fiction, technical instructions, legal and other texts in Polish, Bulgarian and Russian.The texts were obtained in three different ways: 1. Texts based on free access, 2. Texts with the obtained author's licence, 3. Copyright-exempt texts.
1.2.The Corpus has been designed from the theoretical-methodological side in such a way so as to maintain the proportions of the above mentioned kinds of texts.The possibilities of its use are diverse.It can be helpful, among other things, in writing dictionaries, grammar textbooks, scientific articles, specialist thematic sub-corpora and in teaching the Polish, Bulgarian or Russian language.
2. In learning a foreign language any language corpus whose resources contain a representative number of examples together with their language context can be helpful.It is good if it is an annotated collection which provides for the possibility of asking detailed questions and enables the user to find information which is of interest to them.From the corpus user's point of view both morphosyntactic and stylistic markers, arising from metadata placed with every text (containing information on the author, translator, the time of edition and the place of editing the work) can be essential.The trilingual corpus discussed here in its first version contains solely semantic annotation which concerns only some examples chosen from the corpus (about 1/10 examples from the whole corpus) as well as metadata selected according to the pattern discussed above.
A lack of annotation of lexical units in the corpus is not an obstacle to using its resources.Such a corpus can be a practical tool supporting learning a foreign language.Especially as, like in the case of 'The parallel Polish-Bulgarian-Russian Corpus, on the level of a sentence three languages are collated in it simultaneously and the user of the parallel corpus should be very familiar with at least one of them.
The benefits resulting from using material which is not annotated were described by M. Wilkinson (n.d.) (the article was published on the Internet without the date of publication) listing, among other things: a possibility of confirming oneself in intuitive decisions, affirming or changing decisions (on the choice of unit and construction) based on other sources such as e.g.dictionaries, obtaining information about possible collocations, broadening knowledge on the subject of patterns in the target language, learning how to use new expressions. 2nother advantage of learning a language on the basis of language corpora is the fact that thanks to them lexical units are presented in a broader context, enabling advanced learners to make their own basic linguistic analysis.
2.1.In the future intelligent concordance programs supporting advanced analysis of texts included in 'The Parallel Polish-Bulgarian-Russian Corpus' will be used for following specific units and constructions.At present only ordinary filters facilitate searching for a definite word or derivational or morphological element in the text.The electronic tools used in the corpus show the frequency of occurrence of specific units or constructions thus aiding learners (which is especially important for them) in creating their own lists, e.g.put in alphabetical order, forming synonymous or antonymous groups, groups of neologisms or archaisms as well as other units important to the individual user.The filters also support, in accordance with the adopted methodology, the segmentation of language material (in the case of the language corpus discussed here) into sentences.3. The parallel Polish-Bulgarian-Russian corpus (further: PPBRC) shows the user collocations of definite units and the number of positions they determine in three different languages, thus marking the dissimilarity of the collated systems, or, conversely, they confirm a common way of explication of chosen constructions.Thanks to following examples in three languages together with their contexts PPBRC can recognise the semantic value of the analysed units.The user has access to knowledge concerning each of the three languages, which is not in any way restricted on account of difficulty or easiness of understanding the collated material.It makes the corpus material different from educational material (e.g. from textbooks for learning a foreign language) of individual languages, intended for a student with elementary, intermediate or advanced knowledge of a specific language.The third language collated in the corpus (it can be both Russian or Bulgarian as well as Polish) constitutes a kind of additional linguistic background.
PPBRC provides the user with information on word order, exchangeability of individual lexical elements and changes of meaning that some exchanges in lexical constructions carry with them (Bogusławski, 1976(Bogusławski, , 1994)).
PPBRC allows looking at the equivalents of such lexical units as: Polish na domiar (on top of all that) , na skutek (as a result), na zawsze (for ever), chodzi o (the point is), na dodatek (in addition), do diabła (to hell): (3) Pol.
PPBRC aids the analysis of forms whose formal equivalents exist in language A but do not exist or are becoming extinct in langugae B or C, cf.e.g. the perfective participle in the Polish and Bulgarian language (Satoła-Staśkowiak, 2009).
PPBRC imparts information on inflected, semantic and even pragmatic qualities.It illustrates the frequency of using anglosemantisms or internationalisms in each of the collated languages (Satoła-Staśkowiak, 2014).
The observed frequency of the use of definite units in a specific kind of texts in PPBRC can also be confronted (just to be sure) with other corpora the user is familiar with, e.g., for Polish, with the National Corpus of Polish Language.Perhaps it will become a fairly serious source of information at least half as important as classic electronic or printed dictionary.It has great importance in the case of learning a Bulgarian language since the only 'complete' Polish-Bulgarian and Bulgarian-Polish dictionaries by S. Radewa and F. Sławski were written 26 and 27 years ago and the lack of newer lexicographical titles describing comprehensively general contemporary Bulgarian language is still noticeable.An added advantage is the possibility of becoming acquainted with translation techniques (cf.Satoła-Staśkowiak, 2014).
4. It seems that the possibilities that the corpora give together with the programs supporting them -looking for key words, collocations or suitability of expressions or constructions far outweigh potential threats that some researchers indicate in the literature on the subject (cf.Ball, 1997;Stewart, 2000).After all, these threats are described mainly in connection with the translator's work and the possibilities that the corpus translation memory brings.The translator, backed by advanced tools, can treat the solutions suggested by translation memory as authority, forgetting about their own creative input into the translated work.(Satoła-Staśkowiak, 2014).However, this is not a rule.
Instead, a multitude of translation solutions observed in the corpus is the best incentive to learn a language and understand subtle semantic differences between examples.It is important that no corpus constitutes the only source of information on a language.The corpus described here has to be treated as an aid in the process of education, which in conjunction with other existing sources (e.g.dictionaries, grammar textbooks) will ensure deeper and more reliable knowledge.
In the near future (at the end of the year 2015) everyone interested in Polish, Bulgarian or Russian language will be able to use PPBRC.At this time it will be available on the Internet.Unlimited access to the corpus and the fact that the digitalized and parallelized text resources will be consistently expanded will allow verifying knowledge about its actual usefulness in learning the three Slavonic languages discussed in the article.

Figure 1 .
Figure 1.presents the operations of a filter indicating a key word in context in 'The Parallel Polish-Bulgarian-Russian Corpus'