LEXICAL MEANS IN COMMUNICATING EMOTION IN SUICIDE NOTES — ON THE BASIS OF THE POLISH CORPUS OF SUICIDE

Polish Corpus of Suicide Notes (PCSN) is a relatively large set of authentic suicide notes that are linguistically annotated on several levels. In order to identify features characteristic for this genre we compared PCSN with the collected subcorpus of counterfeited suicide notes. In this paper we focus on the lexical means of expressing emotions. Our goal was to analyse ways of expressing emotions in this specific genre. Our initial list of lexical markers was based on Markowski’s list of the lexis common for different genres. The list was next expanded with the help of the plWordNet 2.0 — a lexico-semantic network. The expansion was based on the manually selected noun and verb hypernymy branches according to their correspondence to the elements of the initial list. For words from the extended list, a quantitative analysis was performed for both authentic and fake suicide notes. We have also analysed the use of the lexical markers of emotions, feelings and emotional states, as well as emotion operators, and ways of expressing personal evaluation, affection and hate.


The Polish Corpus of Suicide Notes
The Polish Corpus of Suicide Notes (henceforth PCSN) is a set suicide notes that were scanned and next manually transcribed.PCSN includes 614 genuine notes from the years 1998-2008 that were acquired from the public prosecutor's offices from the area of Poland.The collected texts have been written by 382 authors, where the youngest was 12 years old and the oldest 89 years.The collection includes: • 456 suicide notes written by men (74.26% of all documents in the collection; written by the 290 distinct male writers), • 158 suicide notes written by women (27.73% of all notes; written by 92 different women).
This ratio is close to the proportion observed in the Polish statistical data concerning committed suicides from the studied period, i.e. 77.78% male cases vs 22.21% female cases.PCSN includes also the Subcorpus of Counterfeited Suicide Notes (SCSN) which consist of 117 notes acquired during experiments in which experiment participants were asked to write a suicide note not in his/her own name, but on behalf of an imaginary person whose characteristic was presented to the participant as a part of the task description.In order to select the most representative data for the experiment (to generate tasks for participants), we had to collect data about people who had committed suicides in Poland in a way as complete as possible.We were especially interested in those who had left suicide notes.The data were extracted from both the available statistical data concerning the authors of suicide notes and people committing suicides in general, as well as the research results in this area (Zaśko-Zielińska, 2013).PCSN has been built as a result of the research project: "Suicide notes -linguistic methods of authorship attribution of text".

Goal
The main topic of this work are descriptions of emotions included in Polish suicide notes.Our aim is not recognition of the speaker's emotional state, but an analysis of the ways of expressing emotions in such a specific genre as suicide notes.We expected that the observed characteristic features would facilitate distinguishing between genuine and counterfeited notes, that is one of the goals of the forensic linguistics.Supporting of such a research, was one of the reasons to build PCSN.
In the work presented here, we focus on the lexical means of communicating emotions.We leave all other aspects apart as they require rather a separate work (e.g.syntactic-semantic means) or have already being described (e.g.graphical means including emotional punctuation) (Zaśko-Zielińska, 2013).We wanted to identify a list of words that can be used as lexical markers for the recognition of the emotional sentiment of the analysed letters.According to our initial assumption we have been looking for: • names of emotions or affects and descriptions of the emotional states, • emotion operators (i.e.dependant modifiers, see Sec. 5), • emotion intensifiers, • valuation words, • terms of endearment and invectives.

Analysis Scheme
At the beginning the initial list of the lexical means for expressing emotions was constructed on the basis of Markowski's list of the lexis common for different genres (Markowski, 1992).The initial list included only one-word lemmas. 1 Next, we used the created list as a departure point to analyse ways of communicating emotions on the lexical level in the suicide notes.
In the next step, we tried to expand the list with lexical units2 that are semantically associated in a way suggesting the similarity of their use in the context of communicating emotions.The list was extended on the basis of plWordNet 2.03 (Polish name: Słowosieć) (Maziarz, Piasecki, & Szpakowicz, 2012) that is briefly presented below.For the expansion we used the lexico-semantic relations of synonymy and hypernymy (Maziarz, Piasecki, & Szpakowicz, 2013).We assumed that lexical units which are linked by synonymy in plWordNet or belong together to the same, relatively small hypernymy subtree, share the emotional aspects of their lexical meanings with their synonyms and close hypernyms (i.e.inside a relatively small hypernymy subtree).
The acquired set of words,4 in fact lemmas, have been compared with the lemma frequency extracted from PCSN and SCSN.Those frequencies, merged from both corpora, were used to build a frequency ranking of lemmas from the acquired set.
plWordNet is a lexical semantic network in which lexical units (representing lexical meanings) are described by lexico-semantic relations such that the given lexical unit participate in, e.g.
Different plWordNet hypernymic subtrees can encompass hundreds, and in some cases even thousands, lexical units.The exact number depends on the particular thematic domain and how general is a hypernym which is selected as the top root node for the whole hypernymic subtree.Hypernymic subtrees include also sets of synonyms, as it can be observed in Figure 1, e.g. the lexical unit uczucie 5 'emotion' belongs to the set of synonyms {emocja 1, uczucie 5} 'emotion', which is linked by hypernymy with 236 other sets of synonyms (however, many of them include only one lexical unit).
We assumed that hyponyms, direct and indirect, share the emotional aspect of their meanings with their hypernyms.Following this assumption, in order to expand the set of lemmas developed on the basis of Markowski's list of the lexis common for different genres (Markowski, 1992), we performed analysis of lemmas from this list from the perspective of expressing emotions.We tried to identify those lemmas5 that express clear emotional polarity: positive or negative.Next, for each selected lemma we identified all lexical units from plWordNet that correspond to it.Starting with the identified lexical units and the sets of synonyms they belong to -called synsets, i.e. places or nodes in the plWordNet graph, we searched for hypernymic subtrees, such that • they included as many of the identified synsets as possible, • they encompass lexical units of the same emotional polarity, • and they are as large as possible, in terms on the number of synsets included.
The initial subtrees delineated at the beginning were quite large and rooted in more general lexical units as hypernyms.However, they were next dived into smaller ones (defined by hypernyms dominating smaller number of lexical units), but more coherent from the point of view of expressing emotions or emotional polarity of their lexical units.We have also explored possibilities of expanding the initial subtree originating from the meaning of a lemma from Markowski's list of the lexis common for different genres (Markowski, 1992) by moving the subtree root to more general hypernyms.However, such a generalising expansion mostly resulted in too far going enlargement of the subtree and lost of coherence with respect to the emotions expressed.In this work we used the version 2.0 of plWordNet, as the version 2.2 was not yet available in the moment of expanding the list.

Names of emotions, affects and emotional states
As the first subset of words, we analysed lemmas that name emotions, but they are not emotionally polarised (Spagińska-Pruszak, 1994, p. 11).We do not equal { miłość 1, uczucie 4, afekt 1 } 'love' hypernymy (5/24) feeling of an emotion with a way of expressing it.We have been interested only in those emotions that were expressed in the text by the lexical means.We decided that the presence of emotions in a text and the way of expressing them can be a feature which characterises the speaker and the text topics, as well as it can help us to distinguish between genuine and counterfeited suicide notes.Following A.
We started search for the names of emotions from a list of nouns and verbs selected from Markowski's list of the lexis common for different genres.

Verbal names of emotions
Verbs used as the lexical markers of emotions are based on the group of 92 verbs from the 9th field of (Markowski, 1992, pp. 115-116): Uczucia, emocje, oceny uczuciowe i emocjonale -Subpole 'uczucia' (Affects, emotions, emotional assessments and emotionals -The Subfield 'affects') that has been next expanded with lemmas from a few identified hypernymic verb subtrees from plWordNet.Contrary to (Markowski, 1992), verbs from aspectual pairs have been treated as separate lexical units. 6In addition, we have also expanded the verb set with morphological derivatives of the verbs already included in the set that have been found in PCSN and SCSN merged together, e.g.: kochać 'to love' -ukochać 'to love perf.', odkochać 'to stop to love'; czuć 'to feel' -uczuć 'to feel perf ', cierpieć 'to suffer' -nacierpieć się 'to suffer perf.,iteratively', nienawidzić 'to hate' -znienawidzić 'to hate perf.', szanować 'to respect' -uszanować 'to respect perf.'. 46 verb lemmas, i.e. 50% from the created set have been found on the PCSN frequency list, and 22 (23.91%) have been found on the SCSN frequency list.The top frequent verb lemmas found on the PCSN list are: kochać 'to love', czuć (się), 7 bać się 'to be afraid', żałować 'to regret', nienawidzić 'to hate'.The first top positions on the SCSN verb frequency list are occupied by: kochać, czuć (się), bać się, cierpieć 'to suffer', martwić się.The first verb, i.e. kochać 'to love' dominates in both the genuine and counterfeited notes -it can be found in 41.4% genuine documents and in 36.75% counterfeited documents.The second verb according to the rank, i.e. czuć (się) 'to feel, to feel like' have slightly larger frequency in SCSN 11.11% than in PCSN 6.19%.It seems that the use of this verb allows for speaking about emotions in a less direct way, e.g.

Jestem samotny. 'I am lonely' vs
Czuję się samotny.'I feel lonely' In the first case the speaker names his emotional state, while in the second he is analysing his emotional state.
The third lemma in the rank list list, i.e. bać się 'to be afraid' occurred in about 4% of genuine and counterfeited notes.The similar frequencies are expressed by other verbs among the five most frequent verbs of SCNS: cierpieć 'to suffer' and martwić się 'to worry', as well as żałować 'to regret' and nienawidzić 'to hate' that appear relatively frequently in PCSN.

Names of emotions -conclusions
On the basis of the analysis of names of emotions, affects and emotional states expressed with the help of nouns and verbs, we can claim that: 1. Authors of the genuine suicide notes more often write about emotions than the authors of the counterfeited notes.
2. In PCSN the most frequently used lemmas thematically related to emotions are the verb kochać 'to love' and the nouns: miłość 'love', nadzieja 'hope' and serce 'heart'.
3. In SCSN, the verb kochać 'to love' is also the most frequently used verb thematically related to emotions, but the most frequent noun related to emotions is nadzieja 'hope', which occurs much more often in the counterfeited notes (including a collocation mieć nadzieję '≈to hope') than in the genuine suicide notes.Moreover, the noun nadzieja 'hope' is mostly used in the present tense in the counterfeited notes (only one past form of mieć nadzieję '≈to hope' has been found), while this noun is mostly used with the reference to the past and to the presence in the genuine notes, e.g.miałem nadzieję 'I had hope, hoped, żyłem nadzieją 'I lived with the hope', wypaliłeś we mnie malutką nadzieję 'You have burnt the smallest hope out of me'.

Emotion operators
Emotions can be expressed not only explicitly, but also with the means of emotion operators that are dependent units (lexical or phrasal) used to modify independent units, i.e. the emotion operators do not have meaning by their own, but they are used to modify the meaning of other units, e.g. the operators are used to express hierarchical relations between independent units (Lewiński, 2006, pp. 54-55).Emotion operators can be divided into three groups: 1. exclamations encompassing primary exclamation words, parenthetical exclamation words (Grochowski, 1997, p. 14) and affective modifiers (Jodłowski, 1976, p. 21); 2. emotive phrases with positive polarisation and phrases "expressing emotions without any specified positive or negative profile" (Rodak, 2000, p. 196) e.g.na szczęście 'luckily', dzięki Bogu 'thanks God'; 3. generalising operators: generalising pronouns and negative pronouns, e.g.wszystko 'everything', żaden 'none', nikt 'nobody', zawsze 'always'.

Exclamation words
The most frequent examples of exclamation words that can be found in the suicide notes are affective modifiers, e.g.nawet 'even' or raczej 'rather'.Parenthetical exclamations are much more rare, as curses are not characteristic words for the analysed texts, e.g.cholera 'damn' occurs 0.24% in PCSN and kurwa 'shit, fuck' 1.59% in PCSN.We could find only occasionally primary exclamation words in the genuine notes from PCSN, e.g.ach 0.12%, oj 0.12%, hehe 0.49%.They are more characteristic for spoken texts.
Affective modifiers occur in both corpora (namely PCSN and SCSN) with the similar frequency.Among them, the highest positions on the rank list are occupied by nawet 'even' and przecież 'but, yet, though'.However the first is much more frequent: nawet 9.88% in PCSN, przecież 3.90% in PCSN; nawet 11.11% in SCSN, przecież 2.56% in SCSN.Larger variety of affective modifiers is a characteristic feature of the genuine suicide notes.We could not find lemmas: nareszcie 'at last', koniecznie 'necessarily', bynajmniej 'not at all, not in the least', na szczęście 'luckily', raczej 'rather' in the counterfeited notes.This is definitely caused by the fact that those notes have been written by the smaller number of different authors.However, it can only be a signal that those utterances are more schematic.

Emotive phrases
The set of emotive phrases found in both analysed corpora is dominated by phrases of the religious character, e.g.mój Boże! 'my God!', Boże! 'God!' or Jezu! 'Jesus!'.However, mostly, they are not emotion operators in suicide notes, but genuine address forms related to prayers.As there is no content related to religion in the counterfeited notes, there are no religious address forms at all in those documents.
It is worth to emphasised that the negative pronoun nic 'nothing' has much larger frequency in SCSN: 29.91% than in PCSN: 15.15%.This imbalance is probably caused by the situation of the writer in the moment of counterfeiting the note.The writer has little or no knowledge about the described events and has no other choice but to use generalising operators like 'nothing' etc.The use of the generalising operator simplifies the creation of the text in a moment in which the amount of knowledge possessed by the write about the described events is very small.The comparison of the two frequency lists of PCSN i SCSN shows that the authors of the counterfeited notes less frequently use concrete words, and much more frequently abstract words like problem 'problem', sens 'sens' or sytuacja 'situation'.The use of generalising operators also allows for avoiding informing about concrete events and persons.

Terms of endearment and invectives
The closest relation between the speaker and receiver of the suicide note is expressed with the help of terms of endearment, i.e. appellatives occurring "in the form of words or expressions used in the situation of especial intimacy, mostly [...] in relationships between spouses, fiancés, lovers or in the relation parents -children" (Perlin & Milewska, 2000, p. 165).The terms of endearment are also defined as "intimate nicknames that people give to their life partners, family members or close friends" (Bańko & Zygmunt, 2010, p. 6).120 terms of endearment were found in PCSN and can be grouped into the following classes: • names of the family roles e.g.: żoneczko '(dimunitive) wife', tatusiu "(dimunitive) daddy' .
In SCSN we could find only 22 terms of endearment and most of them belong only to the two classes: lexemes of the meaning 'happiness, dear/beloved', e.g.Kochanie 'my dear', Kochana 'my dear (female)', Kochany 'my dear (male)' and lexemes of the meaning 'something precious', e.g.Drogi 'dear', Najdroższa 'my most precious'.
In the genuine notes terms of endearment occur not only in the expressions directed towards the addressee (address forms), in a characteristic vocative case (noun phrases), but they can be found also in the note content or signature, e.g.Twój Kotek 'yours kitty'.Such phrases are traces of the intimate communication between the speaker and the addressee from the time before the moment of writing the note, e.g.
In SCSN, we can find members of only one of the above classes, namely words from the class of lexemes of the meaning 'happiness, dear/beloved', e.g.Kochanie 'My Dear '. 8  Invectives appear relatively rarely in the suicide notes, e.g.ty ochydna kłamco 'you awful liar', Ty IDIOTKO 'you idiot', Twoja zasługa /jest w tym Ty babiożu 'yours merit / is in that you awful woman (negative augmentative)'.
This group of words is associated with language aggression and it is characteristic for the genuine suicide notes in which the occurrence of invectives is motivated not only by the life situation but also by the acquaintance with the addressee.That enables maximal shortening the communication distance to the addressee.Invectives do not occur at all in SCSN.

Conclusions and future work
Emotions, affects and emotion valuations appear significantly more often as topics of the genuine suicide notes than the counterfeited notes.In PCSN we can observe much larger variety of the vocabulary related to emotions in terms of both: means of expressing emotions and the use of different synonyms.This is related to the fact that the group of different writers is much bigger in the case of PCSN than in the case of SCSN, and also the lack of the knowledge among the authors in SCSN about the context of the described events.Moreover, in the case of SCSN there is no common knowledge (common communication context) shared between the speaker and the addressee.Thus the difference between both corpora originate from the influence of the original situation on the process of writing the text.
References to love are present in the majority of texts in both corpora, e.g.miłość 'love', serce 'heart', kochać 'to love', kochany 'dear, beloved', and kochanie 'dear, beloved', . . .However, the counterfeited notes differ in the use of the word nadzieja 'hope'.It mostly refer to the presence in these notes.PCSN also differentiates from SCSN in the higher frequency of the intensifier bardzo 'very'.It is worth to notice, that the frequency of different lemmas that are identical or very similar in PCSN and SCSN may be caused by the different motivation for their use.Thus they may be used in slightly different meaning in both corpora.For instance, the generalising operators have similar ranks in PCSN and SCSN, but they are used to emphasise commonality of events and their consistent character.However, in SCSN they express the lack of knowledge or limited knowledge of the speaker about the context of the described situation.