ABOUT CERTAIN SEMANTIC ANNOTATION IN PARALLEL CORPORA

The semantic notation analyzed in this works is contained in the second stream of semantic theories presented here — in the direct approach semantics. We used this stream in our work on the Bulgarian-Polish Contrastive Grammar. Our semantic notation distinguishes quantificational meanings of names and predicates, and indicates aspectual and temporal meanings of verbs. It relies on logical scopebased quantification and on the contemporary theory of processes, known as “Petri nets”. Thanks to it, we can distinguish precisely between a language form and its contents, e.g. a perfective verb form has two meanings: an event or a sequence of events and states, finally ended with an event. An imperfective verb form also has two meanings: a state or a sequence of states and events, finally ended with a state. In turn, names are quantified universally or existentially when they are “undefined”, and uniquely (using the iota operator) when they are “defined”. A fact worth emphasizing is the possibility of quantifying not only names, but also the predicate, and then quantification concerns time and aspect. This is a novum in elaborating sentence-level semantics in parallel corpora. For this reason, our semantic notation is manual. We are hoping that it will raise the interest of computer scientists working on automatic methods for processing the given natural languages. Semantic annotation defined like in this work will facilitate contrastive studies of natural languages, and this in turn will verify the results of those studies, and will certainly facilitate human and machine translations.


Direct approach to semantics
In order to specify what I understand by semantic annotation, I will dwell on a few short remarks concerning contemporary semantic theories.
1.1.It is well known that each semantic theory should take into consideration the relationship holding between knowledge and information.The said relationship consist in the fact that information carriers include all processes changing our knowledge.As to the notion of knowledge, which is especially important for understanding linguistic modal phenomena, we should probably reconcile ourselves with B. Russell's thesis that knowledge is not a precise notion, and that is merges with what we understand by "probable opinion".The character of the above relationships is most often illustrated on semantically and structurally "simple" examples, such as: Eve lives in London.An analysis of such a sentence looks as follows: assuming that the person uttering this sentence (S) is not lying, the recipient (R) will, firstly, learn something about the reality (about objects and about systems of such objects).Secondly, the recipient (R) will also learn something about the consciousness of the person uttering the sentence (S), namely, that the person uttering the sentence (S) claims that "Eve lives in London".The latter type of knowledge, provided by the uttered sentence, is very important for the communication act.If the recipient (R) does not accept that second type of knowledge, they will not be able to treat the uttered sentence (S) as a source of knowledge.Consequently, in such a case the communication act will not be fulfilled.
1.2.Besides information, contemporary linguistic semantic theories also recognize a second notion, which will term here classification ability of a natural language.The ability of a language to distinguish (classify) states of the real world is most often used in meaning theories in the opposite direction, i.e. to classify language fragments themselves.Such classification reduces to distinguishing between two classes of expressions: a) class of language expression referring to states of the real world; and b) class of language expression referring to our consciousness.
Classes of linguistically distinguishable states of the material world form the domain of reality fragments relevant for the language.These are objects, their properties and relations between (among) them.Following traditional logic, properties and relations form the notions of predicates, whereby properties are represented by unary predicates, binary relations -by binary predicates, etc.In turn, classes of linguistically indistinguishable states of consciousness are represented by all expression which fall into a given class for each type of classification.Traditionally, following Lock (1948), they are commonly referred to as "ideas".Hence semantic theories can also be understood as theories of classification capabilities of a natural language.
2. Differences between meaning theories concerning understanding and interpretation of the essence of a meaning allow us to single out three groups: 2.1.To the first group we can classify semantic theories whose proponents stress the ability of the language to distinguish ideas.In turn, they see the ability of the language to distinguish states of the real world as a secondary one, derived from the former.The above theories will be named here semantic mental theories.Following Lock, scholars have adopted the view that words may replace objects of the real word only because the ideas as such may replace (symbolize) real objects.The basic methodological difficulty in this type of theories is the principal impossibility of classifying ideas formed in an individual consciousness (Lock, 1948).The second, not less complex problem, is the issue of the relation holding between ideas and the real world.It reduces to the assumption that it is ideas rather than words that can mean anything.A drawback of these theories is that they reduce reasoning to an unending regress, which consists in the fact that ideas symbolize ideas which symbolize ideas which..., etc.This leads to postulating the level of reasoning on which ideas symbolize directly objects of the real world.Hence we go back to the issue of the relation between the real world and ideas.The above methodological difficulties make many researchers resign from pursuing semantic mental theories in contemporary linguistics.
2.2.The second group of semantic theories comprises theories interested in the external capabilities of a natural language to denote the real world.Words group around the way in which they describe the real world rather than around the ideas that they express.Such semantic theories are traditionally termed theories with direct relation to semantics.The above direct relation to semantics consists in the assumption that the relation holding between the language and the real world poses no problem.The central point of the theory is a language-based classification of the real world, which obligatorily implies a certain classification of consciousness phenomena (ideas).In the compound sentence: Roman told me moment ago that his daughter as a rule sleeps until twelve o'clock.the clause: that his daughter as a rule sleeps until twelve o'clock.does not reflect external objects, but Roman's state that consists in his thinking that "his daughter as a rule sleeps until twelve o'clock".Hence the direct relation to semantics allows for taking into consideration also "ideas".Scholars adhering to this semantic trend were also interested in the known, but nontrivial fact that the same language forms (words, expression structures, sentences) can carry quite different information.Hence though in other semantic schools the meaning of a sentence is defined with help of two abstract objects: (in case of Frege, these are truth and falsity), the discussed second trend of semantic theories introduced into the definition of the meaning of a sentence also the notion of a situation, and started to define the meaning of a sentence as a set of abstract situations (Barwise & Perry, 1983).The standard and already classic example of theories with direct relation to semantics are Bertrand Russell's denotation and description theories (Russell, 1967), used in the second volume of the Bulgarian-Polish contrastive grammar (Koseska & Gargov, 1990), and the model theory for first order predicate logic, which because of its extensional character is applied to natural language with considerable limitations, see (Rasiowa, 1975).

2.3.
With gross simplification, we can say that the third stream of semantic theories has developed around Frege's works and his criticism with respect to proponents of the direct relation to semantics.Frege charged the direct approach to semantics of natural languages with absence of a strict distinction between the de-notation, word and sense (Frege, 1892).In his opinion, meanings of such names, as e.g. the "Evening Star" and the "Morning Star", cannot be distinguished since they denote the same object (denotation), namely, the planet Venus.In addition, Frege noted that in a natural language there are sensible expressions that do not denote anything from the real world.These observations led to introducing into semantic theories a third class of notions, seen as a necessary one: the class of senses.In Frege's opinion, coherence of semantic theories of a natural language was determined not only by the classes of ideas and objects, but also by the class of their relations and structure: the class of senses.In this way, Frege initiated one of the directions of logical semantics, known later as intentional logic.This direction came to existence also in linguistics, as a direction of formal semantics of natural languages, presented in the fullest way in the works by R. Montague (1974).Without doubt, when using the two descriptive expressions "Morning Star" and "Evening Star", we are speaking of Venus in two different ways, and the expressions differ in their meanings.If these expressions convey different meanings, thenas G. Rylle writes -"Venus, the planet described with help of these expressions, cannot be what they mean", whereby he refers to works by John Stuart Mill, "who admits this openly and takes into consideration."(Rylle, 1967).In our opinion, the expressions "Morning Star" and "Evening Star", chosen here by way of example and widely discussed in the subject literature, are classifiers of different states of our consciousness 2.4.The above reasoning is of special importance for us, because the Contrastive Bulgarian-Polish grammar adopted a description methodology close to the second direction of the meaning theory for natural languages presented in this chapter (Koseska & Gargov, 1990).The authors' choice was motivated, among others, by B. Russell's theory of description and denotation.In turn, as J. Barwise and J. Perry admit in their works, situation semantic is close to both B. Russell's and A. Mostowski's ideas and intuitions of linguists, especially those occupied with functional grammar, see (Barwise & Perry, 1983).Hence in the Grammar quoted above we base on situation semantics as a theory not only consistent with B. Russell's theory, but also close to Petri net theory, which is a theory with direct approach to natural language semantics.See (Petri, 1962;Mazurkiewicz, 1986;Laskowski, 1986;Koseska & Mazurkiewicz 1988;Mazurkiewicz & Koseska 1991;Koseska & Mazurkiewicz, 1994).
3. The direction of description in contrastive studies of two or more languages is of immense importance for the results of those studies.Do we compare the languages taking as the starting point the form or the contents?Our longterm research on semantics and language confrontation has shown that language confrontation yields reliable results only if the research direction corresponds to starting with the contents.Such a direction also enables development of an interlanguage for contrastive studies of two or more languages.
3.1.Form and meanings, and distinguishing between them.In order to be able to implement the postulate of theoretical contrastive studies, we should precisely distinguish between language forms and their meanings.In traditional grammars, there is no precise distinction between the form and the meaning.Even today we can commonly read that, for example, perfective verb forms express "perfectiveness", and imperfective forms -"imperfectiveness".Similarly, when speaking about definiteness and indefiniteness and the forms expressing them, traditional grammars do not define their meanings.
3.2.The interlanguage should be semantic.The distinction between theoretical and applied contrastive studies is connected with the notion of interlanguage.This is a key issue for theoretical contrastive studies, since the latter developed only in the 1960s-1970s.We remember the strong entry, following Chomsky, of the generative-transformational grammar in 1970s, and the criticism that proponents of that grammar heaped on contrastive studies, charging them first and foremost with lack of criteria for foundations of contrastive analysis, that is, absence of an interlanguage (tertium comparationis).The criticism had a positive influence on the development of theoretical contrastive studies.The interlanguage is the property of theoretical contrastive studies only.In Selinker's opinion, interlanguage is "the type of competence in the target language which is the resultant of a competence in the native language and the system of the target language" (Selinker, 1972).This definition does not tell us what type of competence in the target language is referred to.We also have a problem of another nature, but we will discuss this later.As we can see, both the term and the notion of interlanguage are relatively new.We can expect that as contrastive grammar theory develops further, they can be used not necessarily in line with Selinker's intention.Without doubt, also in this case a good methodological solution could be an interlanguage allowing for objective and equal comparisons of meanings and forms of the studied languages.However, development of such a language is an extremely difficult task, even if it is to be used for studying just two languages.

3.3.
It should be noted that with progressing studies the interlanguage develops, and is enriched with new meanings.We think that during its creation the most important requirement is that interlanguage be formed based on theories not leading to contradiction.For example, when creating basic semantic units used to describe the semantic categories of definiteness / indefiniteness in the interlanguage, we can use reference theory or definite description theory.However, simultaneous use of both theories is inadvisable, since it leads to internal inconsistencies in the notion systems of the interlanguage.This can be perceived in works which do not distinguish between the notions selected here by way of example, such as reference and definite description.Already based on Volume 2 of Bulgarian-Polish contrastive grammar (Koseska & Gargov, 1990) we can see that a description that chooses as the starting point Bulgarian formal language means is totally different from a description originating from Polish formal language means.This results even from the fact that the morphological plane of the means for expressing the notions of definiteness -indefiniteness is more expanded in Bulgarian than in Polish, see also (Koseska & Mazurkiewicz, 1988).Hence, among others, it would be a major methodological error to replace the interlanguage with one of the languages under comparison together with its metalanguage -and this is how this issue is treated in most of the contrastive works known to us.The interlanguage for comparing Polish and Bulgarian within the semantic category of definiteness -indefiniteness is based on the assumption of its quantificational character.Its basic notion of uniqueness (of an element or a set) could be expressed using the language construction based on the iota operator, existentiality with help of an existential quantificational expression, and universality with help of a universal quantificational expression, etc., see (Koseska & Gargov, 1990;Koseska, 2006).The interlanguage necessary for comparing Polish and Bulgarian within the semantic category of time and modality is based first of all on Petri net theory.For example, the notions of state and event are distinguished there as units of the interlanguage, in the way they are defined in the net theory.As we have already mentioned, in our works we define the defined and non-defined meanings in the light of logical theory of quantification, and aspectual-temporal meanings -using the network-based theory of processes (Petri nets).The meanings described using a formal logic theory are not only strictly defined, but can be expressed in a formal way, and easily used in contrasting many languages.The meanings we have chosen for semantic annotation in our bi -and trilingual parallel corpora are based on just those theories.
3.4.Volume 2 of "Studia gramatyczne bułgarsko-polskie" (1987) published a discussion on logical quantification and reference theory.After that discussion, in the Bulgarian-Polish contrastive grammar, the authors adopted a quantificational model for describing the definiteness -indefiniteness category, presented by V. Koseska-Toszewa and G. Gargov.
As we have mentioned, with regard to selecting a theory for describing the semantic category of definiteness / indefiniteness, in linguistics there were two theories competing with each other: P. Strawson's reference theory (at that time, still very fashionable and dominant), and quantification theory originating from B. Russell's definite description.At that time, quantification theory was still relatively poorly known in linguistics.Today, the model based on the logical quantification theory is increasingly often used in describing the definiteness / indefiniteness category, see: (Barwise & Cooper, 1981;Bellert, 1971;Grzegorczykowa, 1972Grzegorczykowa, , 1976;;Koseska & Gargov 1990;Cooper, 1996;Descl'es, 1999;Roszko, 2004).
We should remind here B. Russell's response to Strawson's criticism, which is insufficiently known among the linguists.The essence of Strawson's argumentation in his criticism of Russell's theory consisted in identifying two notions, which Russell treated as completely separate ones: definite description and egocentrism, see (Strawson, 1950;Russell, 1970).In Russel's opinion, egocentrism and definite description were not the same issue.Strawson, who did not distinguish between the two issues, claimed that the only issue that needed to be solved was that of egocentrism.In turn, Russell dealt both with z egocentrism and definite description, but did not connect these two issues with each other: when writing about one of them, he omitted the other one.Russel's response to Strawson's criticism was exhaustive.He explained in it the direct relations to semantics underlying his works."If the basic words in each individual's vocabulary had no direct connection with facts, neither would the language in general.I doubt -Russell wrote -if Mr. Strawson would be able to give the ordinary meaning to the word red if something that this word means didn't exist."(Russell, 1970, p. 270-279).He goes on: "He (Strawson) thinks that the word false has an unchanging meaning, and treating it as an appropriate one would be a mistake, but caution prevents him from divulging what that meaning is.As for me -continued Russell -I think that it would be convenient to define the word falsity in such a way that each sentence were either true or false."In Russell's opinion, "The word this denotes anything which, at the time when this word is used is at the centre of our attention.In case of words without an egocentric character, a constant component is something concerning the indicated object; however, the word ten denotes different objects in each case when it is used.Here what is constant is not the denoted object, but its relation to the given specific use of that word.Each time this word is used, the person using it is occupied with some object, and it is just that object that the word indicates.When a word does not have an egocentric character, there is no need to distinguish between the multifarious cases of its application; however, such distinction must be made in case of egocentric words, since what they indicate is something being in a specific relation to the person using this word.... " Russell (1948, p. 107).
Quantification of natural language expressions can concern names (first order logic), but also predicates (second order logic).A quantifier transforms a logical predicate into a logical sentence, so predication cannot be under any circumstances identified with quantification, though this is exactly the case in some linguistic works.By existentiality I mean here expressions of the form (∃x)P(x) which precede a predicate, i.e. a sentential function P, in the semantically-logical structure of the sentence, read using the expressions "there is x such that", "for some x".By universality I mean expressions of the form (∀x)P(x) which precede a predicate P in the semantically-logical structure of the sentence.Finally, by uniqueness I mean an expression of the form (ix)P(x) which assumes that the given sentential function (P) is satisfied either by one and only one element of the universe under consideration or by one and only one set of elements.As already accepted in logical literature, I treat the iota-operator as the unique quantifier in my work.Quantificational expressions are not unambiguous (Koseska & Gargov, 1990).For example, "each such that" can be understood as "all elements satisfying P", which can be written down as (∀x)P(x) -see Dnes vsjako momche kara ski./ Dzisiaj każdy chłopiec jeździ na nartach.[Today each boy is skiing].In a context different from the above, an expression written down as (∀x)P(x) may be understood as "the set (of usually many elements) which satisfies P as the only one".Then this quantificational meaning should be written down as: (iX)P(X), and in that case it would be an expression with unique meaning, e.g.Samo uchenicite ot V klas bjaha dnes nakazani ot Direktora./ Tylko uczniowie z V kasy zostali dzisiaj ukarani przez Dyrektora.[Only fifth form boys have been punished by the Director today], see (Koseska-Toszewa, 1982).

3.5.
Treating of the definiteness / indefiniteness category as a sentence category and the attempt to include quantification also on the verbal phrase level facilitate tracking of the development of the Bulgarian article meaning.As I have already mentioned, in Bulgarian the same article form can express both uniqueness and universality (and hence, traditionally: definiteness and indefiniteness) (Koseska-Toszewa, 1982).The above observations, based first of all on semantically-logical aspects of the definiteness / indefiniteness category, were confirmed by the language material from the Supraśl Code (Zaimov, 1982, p. 5-9), where the Bulgarian article does not occur in universally quantified nominal structures, but in uniquely quantified nominal expressions, which represent satisfaction of a predicate by either one element of a given set, or by a whole set treated as the only one.See (Koseska-Toszewa, 1986).
4. In the Bulgarian-Polish contrastive grammar, Petri net theory is used as a basis for describing such semantic categories of the language, as definiteness / indefiniteness, aspect and time, and modality, and in consequence also all other semantic categories that can be described using the notions of state, event, discrete process and quantification of states and events, see: (Mazurkiewicz, 1986;Laskowski, 1986;Koseska & Mazurkiewicz 1988, 2004, 2010).Petri net theory (Petri, 1962) is a tool independent of the existing natural languages, and indifferent with respect to them.Its simplicity (the theory is based on three prime notions only: state, event and their mutual succession) coupled with considerable, as will try to show, expressive power predestines those notions to playing the role of tertium comparationis in contrastive studies of natural languages.The Petri net, from now on called shortly a net, is built of a finite number of objects symbolizing either states or events, and joined with the succession relation.The succession relations need not be shown in a linear order; certain objects of the net may be incomparable with respect to the order if none of them precedes the other.Some states of the net, like e.g. the speech state, may be distinguished.
4.1.States and events.We adopt states and events as fundamental units of time and aspect description.The basic feature distinguishing these two notions is the temporal spread of states and momentary character of events.In other words, states "last", while events can only "happen".An abstract analogue of this distinction is the difference between a section of the real line (state) and a point lying on that line (event).
4.2.The adopted postulate of model finiteness implies that our description cannot be limited to events only, with states treated as a consequence as sets of events, in the manner of Reichenbach (1967).Indeed, describing a state as a set of events gives rise to a question: "of what events?All, or only some of them?And if only some, how should they be selected?"In turn, omitting events in the model and limiting it to states only deprives us of the possibility of considering such phenomena, as "collision", "opening", "unveiling", "awakening", and the like.The characteristic feature of events is that we cannot speak about them in the present tense, because an event does not last -it has no spread in time.Referring to the analogy with points and sections, events correspond to points, and statesto sections; the mutual relationship between events and states is the same as the relationship between points and sections; each point is either the beginning or the end of some section (or, in a special case, a half-line); each event is either the beginning or the end of some state (e.g., a state holding before occurrence of the event or a state after it).The analogy can be taken further: each section, like each state, has at most one beginning and one end, while each point (each event) may begin or end many sections (many states) that we are interested in.In other words, an event need not be the beginning or the end of one state only, and in consequence it cannot be treated solely as an ordinary transition -"transition of a state into a state".

4.3.
It should be stressed that imperfective forms of a verb can express not only 1. a state, but 2. states, i.e. sequences of states and events finally ended with a state, while perfective forms of a verb can express both 1. events, and 2. sequences of events and states, finally ended with event.

5.
The above semantic annotation is not easy.Hence initially we can just mark the quantification of nomen at the subject position, and specify event 1, event 2 and state 1, state 2.Only at the subsequent state it is also worthwhile to distinguish between adverbial forms and their quantificational meanings, like in the sentence in the parallel corpus below: Понякога майка им (ix)P(x) ги водеше (∃X)P(X) (sequence of states and events finally ended with state) край пенливия планински поток.
Semantic annotation can also be initially limited just to aspectually-temporal meanings of verbal forms, and hence to distinguishing states and events and their quantification, and then gradually increase the number of semantic problems marked in sentences from the selected three languages, see (Garabík, Dimitrova & Koseska-Toszewa, 2011;Dimitrova & Koseska-Toszewa, 2012;Roszko, D., 2013;Roszko, D. & Roszko, R. 2013).

Summary
The semantic notation analyzed in this work is contained in the second stream of semantic theories presented here -in the direct approach semantics.We used this stream in our work on the Polish-Bulgarian Contrastive Grammar.Our semantic notation distinguishes quantificational meanings of named and predicates, and indicates aspectual and temporal meanings of verbs.This is a novum in elaborating sentence-level semantics in parallel corpora.For this reason, the semantic notation incomplete quantification.
proposed here is manual.However, we are hoping that it will raise the interest of computer scientists working on automatic methods for processing the given natural languages.Semantic annotation defined like in this work will facilitate contrastive studies of natural languages, which will in turn facilitate both human and machine translations.