SEMANTICS, CONTRASTIVE LINGUISTICS AND PARALLEL CORPORA

In view of the ambiguity of the term “semantics”, the author shows the differences between the traditional lexical semantics and the contemporary semantics in the light of various semantic schools. She examines semantics differently in connection with contrastive studies where the description must necessary go from the meaning towards the linguistic form, whereas in traditional contrastive studies the description proceeded from the form towards the meaning. This requirement regarding theoretical contrastive studies necessitates construction of a semantic interlanguage, rather than only singling out universal semantic categories expressed with various language means. Such studies can be strongly supported by parallel corpora. However, in order to make them useful for linguists in manual and computer translations, as well as in the development of dictionaries, including online ones, we need not only formal, often automatic, annotation of texts, but also semantic annotation — which is unfortunately manual. In the article we focus on semantic annotation concerning time, aspect and quantification of names and predicates in the whole semantic structure of the sentence on the example of the “Polish-Bulgarian-Russian parallel corpus”.

1 Semantics Due to the understanding of "semantics" as a discipline of studies dealing with the meaning, i.e. assignment of linguistic symbols to extra-language objects in a broad sense, even the term "meaning" itself acquired "various meanings", depending on the individual semantic schools.As we know, traditional linguistics defined semantics as the study of meaning of words.In contemporary linguistics, as Kazimierz Polański wrote, the focus shifted from words to the sentence (Polański, 1999, p. 341) This implies that words mean something in the whole structure of a sentence or an utterance, but not outside the sentence (utterance), i.e. separately."Even if -writes Polański -we use a single word as a sentence or utterance, this will be a sentence or utterance consisting of a single word".It is worth emphasizing that the above applies to semantic phenomena expressed formally in the language rather than to extra-linguistic phenomena, see Item 2 below.As one of the experts and scientific advisors assisting the development of the theoretical concept of the Bulgarian-Polish Contrastive Grammar (from now on, the BPCG), see e.g.(BPCG 1990), K. Polański was of the opinion that semantic phenomena in a natural language can be described in the most precise way using logical theories and the conceptual apparatus of mathematical logic -a view I completely agree with, see (Studia, 1984).
2 Direct approach semantics As I have repeatedly stressed, the semantic volumes of the BPCG are based just on selected logically-mathematical theories: namely, quantification theory or contemporary process theory, known as "Petri net" theory.Semantics is understood here like in the "direct approach semantics" works of B. Russell and H. Rasiowa, and in later works on situation semantics by Barwise and Perry (see Russell, 1967;Rasiowa, 1975;Barwise & Perry, 1983;Cooper, 1996).Scholars subscribing to this semantic trend were interested a fact which, though in principle known, was not fully realized by many: that the same linguistic forms (words, expressions, sentences) can be carriers of quite different information contents, see in more detail (Koseska, 2013).While in other semantic schools the meaning of a sentence is defined with help of two abstract objects (in case of Frege (Frege 1892), these are truth and falsity), the theory with a direct approach to semantics discussed here introduces into the definition of the meaning of a sentence also the notion of a situation, and defines the meaning of a sentence as a set of abstract situations (Barwise & Perry, 1983;Cooper, 1996).Standard and classic examples of theories with direct approach to semantics are Bertrand Russell's denotation and description theories (Russell, 1967), used in the second volume of the Bulgarian-Polish Contrastive Grammar (Koseska & Gargov, 1990) and model theory of first order predicate logic, which, being extensional, is applied to a natural language with considerable limitations, see (Rasiowa, 1975) and the exemplary sentence: Maria kończy studia [Mary is finishing her studies], which is understood as true information in opposition to the sentence: Mówi się, że Maria kończy studia [Mary is said to be finishing her studies], which is understood as information that might be true, but its truth is not certain.

2.1
It is worth stressing here that contemporary linguistic schools combine semantics with mathematical logic in relation to a sentence rather than its fragments, out of which a sentence can only be built.The branch of contemporary linguistics understood in this way decisively distinguishes between semantics and semantic syntax concerning the argument-predicate structure of the sentence (i.e., its semantic segments having different syntactic order in different languages).By way of example, the predicate: x jest chory [x is sick] becomes a sentence only after assigning an argument value to x or assigning a quantifier to x -see, respectively, Jan jest chory [John is sick] and (ix)P (x), where i is the iota-operator, and (ix) jest chory [(ix) is sick] is a sentence.In some linguistic works, the above sentence with the iota-operator together with x is erroneously treated as a "predicate" rather than a sentence.Another problem which can be observed e.g. in the syntactic volume of the academic grammar of Polish is the fact that in its part authored by S. Karolak a "predicate" is not understood as a propositional function, like in the part authored by M. Grochowski.There is no need to convince the reader that ambiguity of such a commonly described notion as a "predicate" still confuses some scholars.The BPCG assumes that a "predicate" is to be understood as a propositional function, i.e. the way a "predicate" is understood by M. Grochowski, see the volume of the Contemporary Polish Grammar (Gramatyka, 1984) devoted to the syntax.
3 Semantics versus contrastive studies -Theoretical contrastive studies Since in the BPCG we understand semantics as above, why in relation to contrastive studies do we treat semantics differently?Contrastive studies are known to be a branch of synchronous linguistics with both theoretical and practical applications.When contrastive studies deal with analysis of differences and similarities between languages for practical (didactic or translational purposes), we speak of them as of a branch of applied linguistics, connected first of all with teaching of foreign languages.In turn, we speak of theoretical contrastive studies only in the case when the studies concern universal language notions, when they employ methods of linguistic research aimed at isolating, in an equal way, what is common and what is different in the studied languages.From the viewpoint of research methods employed, as well as the use of synchronous approach, theoretical contrastive studies are close to typological studies, but differ from them in the purpose of description: typological studies lead to classification of languages, while contrastive studies -to systemic analysis of the contrasted languages.Without doubt, also in this case a good solution would be an interlanguage, allowing for equal comparison of the meanings and forms of the studied languages.However, development of such a tertium comparationis is not an easy task (Koseska, Korytkowska & Roszko, R., 2009), see also (Koseska, 2013;Selinker, 1972).

3.1
The process of developing a semantic interlanguage can be divided in a number of stages: 1. Selection of a universal semantic language category -e.g.definiteness/indefiniteness, time, communicant, etc.
2. Selection of logically-semantic theory to be used for developing the notion system of the interlanguage, e.g.logical quantification, network-based description of time in a natural language, etc.
Defining notions in accordance with the selected theory, see below.
Developing a terminological vocabulary starting from the notions of the semantic interlanguage.
Ad. 1. Selection of semantic language categories must absolutely take into consideration the specifics of Slavic languages.By example, the well-known and popular in linguistics logically semantic Reichenbach's theory, which describes time in a natural language, concerns in fact English, which has no grammatical category of aspect.Hence it does not distinguish between the meanings of time and aspect -and without such a distinction, the description of the semantic issues of time in each of the Slavic languages where the grammatical category of aspect has become distinguished will be both incomplete and false.
Ad. 2. The interlanguage should be developed based on theories not leading to theoretical contradictions.For example, when creating the basic semantic units used in the interlanguage to describe the language category of definiteness/indefiniteness, we can use reference theory, but also theory of definite description and quantification.However, a simultaneous use of both theories is not recommended, since its leads to internal contradictions in the notion system of the interlanguage.This can be observed in works that do not distinguish between the selected notions, e.g.such as reference and definite description.Already based on Volume 2 of the Bulgarian-Polish Contrastive Grammar (Koseska & Gargov, 1990) we can see that a description that takes as starting point Bulgarian formal language means is totally different from a description originating from Polish formal language means.This is determined even by the more expanded morphological plane of the means for expressing the notions of definiteness and indefiniteness in Bulgarian compared to Polish, see also (Koseska & Mazurkiewicz, 1988).Hence it would be a serious methodological error to replace the interlanguage by one of the languages being contrasted, together with its metalanguge, Nevertheless, this is how that issue is treated in most of the works we know, where the description of the language proceeds from the form to the contents.The latter approach is the basis of most grammars, which describe one language (most often, a foreign one) using another (native) language.

Direction of description
Distinguishing semantics in its relation to contrastive studies is connected with the direction of description of language phenomena and with going from the meaning towards the form rather than, as is the custom in traditional contrastive descriptions and descriptive presentations of a single language, from the form towards the contents.

4.1
In order to develop a description going from the meaning towards the forms in two or more languages, one should precisely distinguish between a form and its meaning.For example, the use of the term "definiteness" in the cases when the so-called "definite article" expressed indefiniteness, i.e. universality, was an obvious error in interpreting the meaning of the article, and followed from failure to distinguish between the form of the article and its meaning.In our works, the definiteness/indefiniteness category has been defined as a category with semantic opposition: uniqueness -non-uniqueness, whereby definiteness is understood solely as uniqueness of an element or a set (satisfying the predicate), and by indefinitenessnon-uniqueness (both existentiality and universality) (Koseska, 1982;Koseska, Gargov, 1990).

4.2
The interlanguage for contrasting Polish and Bulgarian within the semantic category of definiteness/indefiniteness used in our works is based on the theoretical assumption about the quantificational character of this category.The basic notion of uniqueness (of an element and of a set) in that interlanguage could be written down using a linguistic iota-operator construction (in the text, shortly "iota-operator"), that of existentiality -with help of an existential quantificational expression (in the text, shortly existential quantifier), and of universality -with help of a universal quantificational expression (in the text, shortly "universal quantifier).For a description of the definiteness/indefiniteness category using the logical theory of scope-based quantification, see (Bellert, 1971;Barwise & Cooper, 1981;Cooper, 1996;Grzegorczyk, 1972Grzegorczyk, , 1976;;Koseska, 1982;Koseska & Gargov, 1990;Descles, 1999;Roszko, R., 2004;Koseska, 2006;Roszko, D., 2014).

4.3
The methodology presented above is applied in semantic annotation, which we impose by hand in parallel corpora.In Bulgarian, as I have already written in many works, the most typical morphological means for expressing uniqueness and universality in the nomen group, is deemed to be the article.Its absence, or morphological Ø, is an exponent of existentiality, or pure predication.The ambiguity of the Bulgarian article is a good illustration of the difficulties encountered by a scholar studying this category when classifying natural language expressions.As I have already mentioned, in Bulgarian the same form of the article expresses both uniqueness and universality (or, respectively: definiteness and indefiniteness).In the already quoted book (Koseska-Toszewa, 1982), I put forward a hypothesis concerning development of the meaning of the Bulgarian article.In my opinion, the article initially expressed uniqueness of an element (object), and then started expressing also uniqueness of a set, which later, as a result of equalling two completely different semantically-logical structures, i.e. structures with universal and with unique quantification, led to homonymy and to the article expressing also universality.I later confirmed the above observations, based first of all on semanticallylogical aspects of the definiteness category, on the historical language material from Kodeks Supraski [the Supraśl Code], where the Bulgarian article does not appear yet in universally quantified nominal structures, but in uniquely quantified nominal expressions, meaning satisfaction of a predicate by either a single element of a set or by the whole set treated as the only one, see (Zaimov, 1982, p. 5-9).

4.4
Let us return to the second important feature of semantics and contrastive linguistics, that it, distinguishing between language forms and their meanings.It should be stressed that without distinguishing between a language form and its meaning, contrasting material from several languages may lead to committing numerous substantive errors and drawing nonscientific conclusions.The above issues would be only postulates, were it not for the 12-volume academic Bulgarian-Polish contrastive grammar (BPCG, 1990(BPCG, -2007)), which solved the above-mentioned problems.
5 Form, meaning and corpora Distinguishing between language forms and their meanings is of key importance for the work on unilingual and multilingual corpora, which once more brings the problems of semantics and contrastive linguistics closer to the use of parallel corpora.Without distinguishing between language forms and their meanings, it is difficult to imagine what the application of parallel corpora and their benefits will be.To distinguish between a language form and its meaning, it is not enough to write "language form" and "language form content".In order to know what content of the language form we have in mind, we need thorough research on the set of contents of a selected semantic language category that the given content belongs to.What is more, to contrast language material in parallel corpora of several languages, we need also to distinguish between contrastive and comparative studies.Contrastive studies are synchronous, while comparative ones -diachronic.This is important for selecting texts which are synchronous on the contrastive level, and in our case concern contemporary development of the languages being contrasted in the corpora.
6 Semantic category of time 6.1 The semantic category of time in Slavic languages selected here as an example also contains information about the aspect of verbum, and can be described in a detailed and precise way using the contemporary formal theory of processes known as Petri nets (Petri, 1962;Mazurkiewicz, 1986;Laskowski, 1986;Koseska & Mazurkiewicz, 1988).In my opinion, identifying the meaning of the Slavic aspect with the so-called "action type" is a fundamental substantive and methodological error.The Aktionsart teory, or semantic category of a verb, has been distinguished in German linguistics due to the way in which an action is to run in languages where no grammatical category of aspect has developed, like in all Slavic languages.The Aktionsart theory has allegedly given rise to various meanings concerning the forms (kinds) of actions, e.g.so-called inchoative ones (representing the beginning of an action), or desiderative ones (representing the wish to perform some action).The grammatical category of Aspect characterizes Slavic verbs only.This problem can be well illustrated using Bulgarian, which, in opposition to e.g.Polish or Russian, has a very expanded system of aspectually-temporal meanings, which allows us to understand why such renowned aspect specialists as S. Ivanchev (1971) insisted on consistent treatment of temporal and aspectual meaning of the verbum.This is because in Bulgarian the aorist form is derived from both perfective and imperfective verbs, and likewise the imperfectum form is derived from both imperfective and perfective verbs.It is difficult to understand and translate the meaning of aorist of imperfective verbs in isolation from the meaning of aspect combined with the temporal meaning of the verbal form.The meaning of aorist of imperfective verbs is translated to Polish using the praeterite form of imperfective verbs.However, take care!Also the Bulgarian imperfectum form of imperfective verbs is translated to Polish using the praeterite form of imperfective verbs, though in both cases we have to do with different aspectually-temporal meanings, see (Koseska, 1982(Koseska, , 1985(Koseska, , 1995(Koseska, , 1977(Koseska, , 2006)).Both meanings differ in being a placeholder for a quantifier expression which occurs in the semantic structure of the sentence next to the verbum, and this is totally independent of the so-called kind of action.The imperfectum form of imperfective verbs is a placeholder first of all for either a "universal quantifier" or an "existential quantifier" in the semantic structure of the sentence, but rarely appears next to a "iota-operator expression".In turn, the aorist form of imperfective verbs is a placeholder only for an iota -operator expression.In Polish sentences, the praeterite form of imperfective verbs is in one case a placeholder for either a "universal or existential quantifier", and in the other case for a "iota-operator".In Polish we very often have to do with so-called incomplete quantification (a term coined by Ajdukiewicz), see (Ajdukiewicz, 1974), (Koseska & Gargov, 1990).Incomplete quantification in Polish sentences can be "completed" only with either a semantic paraphrase or a set of situations concerning selected sentences.

Imperfectum of imperfective verbs
Appears next to iota-operator expressions (iota-operator): По това време тя спеше у тях.'At that time she was sleeping at their place.' As we can see, Bulgarian aorist of imperfective verbs appears only next to the iota-operator, similarly as aorist of perfective verbs (Koseska, 2006).
In the above cases, the Bulgarian verb form has an inchoative meaning.However, this does not change the fact that both verb forms are perfective as well as imperfective, and have different aspectual meanings, see Bulg.Пчелата кацна за момент върху розата (event), избръмча няколко пъти (sequence of events and states ended with an event) и полетя (event).'A bee sat down on a rose for a moment, buzzed a few times, and flew away.' From the above examples we can immediately see that "aspect of a verb" and the "kind of action" are two different things.Combining Aktionsart and aspect theories in Slavic languages, one erroneously identifies two mutually contradictory linguistic theories.This is a phenomenon resembling the erroneous combining of reference theory and scope-based quantification theory concerning the semantic definiteness / indefiniteness category, see e.g. the frequently encountered expression "referential quantification", from which we cannot tell whether we have to do with quantification or reference, and hence it is the proverbial "mumbo jumbo".What we do know, however, is that both theories are mutually contradictory and are related to different approaches to the semantics of a natural language.Identifying the meaning of the verb aspect and the type of the verb action is an inadmissible phenomenon in constructing the interlanguage in contrastive studies, because when developing tertium comparationis we try to formulate notions and terms that are unambiguous and do not contradict each other.

6.3
When describing the semantic category of time selected by way of example, we adopt states and events as fundamental units of time and aspect description.The basic characteristic distinguishing these notions is the temporal spread of states and the momentary character of events.In other words, states "last", while events can only "happen".An abstract counterpart of this distinction is the difference between a section of the real axis (a state) and a point lying on that axis (an event).The adopted postulate of model finiteness implies that in constructing our description we cannot limited ourselves to events only, and in consequence treat states as sets of events, as e.g.done by Reichenbach (1967).This is because when describing a state as a set of events we have to answer the question: "Set of what events?All or only some of them?And if some, then how to choose them?"Omitting events in the model and limiting it solely to states deprives us of the capability to consider such phenomena as "collision", "opening", "uncovering", "awakening" and the like.A characteristic feature of events is that we cannot speak of them in the present tense: this is because an event does not last -it has no time spread.Referring to the quoted analogy with points and sections, we can say that events correspond to points, states -to sections, and the mutual relationship of events and states is like the relation between points and sections: each point is either the beginning or the end of some section (or in a special case of as halfline); each event is either the beginning or the end of some state (e.g. the state preceding the occurrence of the event or the state following the event).The analogy can be continued: each section, similarly as each state, has at most one beginning and one end, while each point (each event) can begin or end many sections that are interesting for us (many states).In other words, an event need not be the beginning or an end of only one state, and hence it cannot be treated as just an ordinary transition -"transition of a state into another state".
It should be emphasized that imperfective verb forms express not only 1. state, but 2. states, that is, sequences of states and events, finally ended witha state, while perfective verb forms express both 1. events and 2. sequences of events and states, finally ended with an event.
7 Semantic annotation We will need the above research results for manual semantic annotation, which we want to present on the material of parallel corpora in four languages under the "Clarin" 1 project.CLARIN is a pan-European scientific infrastructure that will allow scholars from the area of humanities and social sciences to work conveniently with very large text sets.The aim of CLARIN is to overcome a number of barriers in access to the so-called language technology by scholars who have no specialist knowledge in the area of IT or natural language engineering.CLARIN-PL is the Polish part of the great European network, tightly integrated with it, but directed specially at the Polish language, both written and oral.
The semantic annotation applied in our corpora concerns also quantification of names and predicates rather than only meanings of aspectually-temporal verb forms.We are the first to use such annotation for the first time in the literature of the subject.It requires knowledge of deep semantic structure of natural language sentences and a reliable theoretical apparatus, thanks to which the meanings of language forms can be written down in a precise way.Though contemporary software used for parallel corpora covers the problems of the semantics, it concerns mainly lexical semantics -though not always and not only, see e.g. the works of M. Piasecki and his team on plWordNet, Polish "Słowosieć"."Słowosieć" is a unique dictionary providing a formalized description of the meanings of a huge number of Polish words.Each meaning is described through its semantic relations with many other meanings of other words (Piasecki, 2013).

7.1
The semantic annotation we propose can be divided into several levels: 1. annotation concerning time and aspect in Polish, Bulgarian, Russian and Lithuanian.
2. annotation concerning scope-based quantification of nomen at selected syntactical positions and predicates in the sentence.
3. semantic annotation combining annotation concerning the meaning of aspect and time contained in the predicate and quantification of the predicate in the sentence.
Polish and Bulgarian examples: Ad1.Pol.Maria oczekuje kogoś (state) i kiedy od czasu do czasu słyszy samochód (sequence of events and states, ended with a state), szybko biegnie (sequence of events and states, ended with an event) do okna.
(Incompleteness of any quantification is marked with a question mark preceding the formal notation.) Ad3. Ta służąca (ix)P (x) czasami spała (∃X)P (X) (sequence of states and events, finally ended with state) u nich.
Тази прислужница (ix)P (x) понякога спеше (∃X)P (X) (sequence of states and events, finally ended with state) у тях 7.2 Our notation of quantificational expressions in Polish, Bulgarian and Russian sentences: (ix)P (x) -iota-operator preceding an Attribute expressed with a noun, (iX)P(X) -jota-operator preceding a predicate expressed with a verb form.
(∃x)P (x) -existential quantifier preceding an Attribute expressed with a noun, (∃X)P (X) -existential quantifier preceding the Predicate expressed with a verb form, (∀x)P (x) -universal quantifier preceding an Attribute expressed with a noun form, (∀X)P (X) -quantifier preceding the Predicate expressed with a verb form.Incompleteness of any quantification is marked with a question mark preceding the formal notation.

Cognitive approach and translations
In view of the different meanings given to the term "cognitive", let me return here to its understanding in our studies.Our contrastive studies remove strict divisions into the "grammar" and the "lexis".We have chosen here universal language categories of material importance for description of the language, which have not been described exhaustively in academic grammars od Polish and Bulgarian up to nownamely, such semantic categories, as time, modality or definiteness/indefiniteness, quantity, communicant, etc.The strict separation of the morphological, syntactic and lexical levels in traditional grammars of those languages did not allow for a comprehensive presentation of semantic language phenomena.In the BPCG we consciously resigned from a strict division into the grammatical and lexical levels, since the fact that selected "content" in a given language is expressed with gram-matical means in one language, and lexically in another, does not imply that the content is absent in that other language.We refer to this new approach as "cognitive".Further, we understand cognitive studies on the one hand as theoretical semantic studies which allow for taking into consideration language means from different levels: lexical and grammatical ones, but perceived as a single whole.On the other hand, if necessary, we make use of broader language situations, where the phenomena we are interested in are unequivocally understood by language users.Those situations always take also in consideration the language user's state and attitude to the conveyed contents, especially modal ones.
8.1 A comprehensive description of cognitive language phenomena plays a very important role in translations from one language to another.Thanks to such an approach, a translator of Bulgarian will learn that, for example, in Polish there is a semantic category of definiteness/indefiniteness, which traditional grammar does not mention.He/she will also learn what lexical means in Polish convey the various meanings expressed via morphological means of that semantic category in Bulgarian.In other words, the translator will know how to translate the Bulgarian article.However, he/she will not learn that Polish lacks the modality known as imperceptivity, and that this is not a modality typical for Bulgarian only, as stressed by traditional grammars.Imperceptivity in Bulgarian grammar -known as преизказност and преизказно наклонение -is described as morphological means for expressing the state of speaker's lack of knowledge.In Polish similar contents are expressed lexically with help of expressions: jakoby, ponoć, as well as using the praesens and praeteritum forms of the verb mieć appearing with an infinitive, see: Bulg.Той бил добър лекар.and Pol.On jest ponoć dobrym lekarzem.'He is allegedly a good doctor.'Bulg.Той бил завършил филология.and Pol.On miał ukończyć filologię.'He has allegedly completed philological studies.'A translation based on traditional grammar is often done by "groping in the dark", and its quality depends solely on the translator's intuition and experience.The comprehensive treatment of semantic phenomena in natural languages proposed here can have a large impact on translations, also machine ones.Regarding the latter, we refer the reader to our works in the EU grant "Mondilex" (Koseska & Mazurkiewicz, 2009;Dimitrowa & Koseska, 2008, 2009), as well as in the project titled "Semantics and contrastive linguistics with special emphasis on bi-and multilingual electronic dictionaries" implemented through cooperation between IS PAN and IMI BAN, as well as an emerging Bulgarian-Polish electronic dictionary, (Dimitrova, Koseska-Toszewa & Satoła-Staśkowiak, 2009).9 International and interdisciplinary project "Clarin" (see footnote 1 above) and parallel corpora Development of a trilingual "Polish-Bulgarian-Russian" corpus, together with a "Polish-Lithuanian parallel and topical corpus" is the scientific task of the IS PAN Corpus Linguistics and Semantics Group in the "Clarin" project3 .
Examples from the Polish-Bulgarian-Russian parallel corpus: Fragment from "Lord Jim" by Joseph Conrad 2