The Electronic Historical Latvian Dictionary Based on the Corpus of Early Written Latvian Texts

This article deals with the development of the Electronic Historical Latvian Diction­ ary (http://www.tezaurs.lv/lvvv) based on the Corpus of Early Written Latvian Texts (http://www.korpuss.lv/senie/). Some issues concerning the compilation and processing of the corpus data are discussed and the main sources added to the Corpus during the four­year project are described: the 16th c. Lord’s Prayers, 17th c. dictionaries, texts of oaths and laws, religious texts and so­called dedication poetry. The aim of the project is to compile a pilot electronic dictionary of 16th–17th century Latvian where all parts of speech are represented among the entries. This dictionary will contain ca. 1,200 entries, including both proper names and common nouns. The main emphasis is on the description of the dictionary entries supplied with relevant practical and theoretical observations. Each part of the dictionary entry is discussed, followed by comments on various issues pertaining to that part (e.g., the choice of headword and the representation of spelling versions) and how these were resolved. Special attention is paid to the head of entry, explanation of meaning deduced from the examples found in the corpus, different types of collocations and their representation in the dictionary, as well as etymological information. Finally, E. Andronova, R. Siliņa-Piņķe, A. Trumpa, P. Vanags The Electronic Historical Latvian Dictionary... 37 we present a brief review of the dictionary writing software TLex 2013 based on our experience with this tool.


Introduction
The Corpusbased Electronic Historical Dictionary of Latvian (Korpusā balstīta elektro niska latviešu valodas vēsturiskā vārdnīca; henceforth -LVVV) is the title of a four year long research project initiated in 2013 and financed by the Latvian Council of Science (Latvijas Zinātnes padome).The aim of the project is to compile a representative pilot of a modern electronic dictionary of 16th-17th century Latvian.This dictionary will contain approximately 1200 entries and is based on data from the Corpus of Early Written Latvian Texts (Latviešu valodas seno tekstu korpuss, 2 available at http://www.korpuss.lv/senie/).During the project the corpus has also been supplemented with formerly unincluded texts, namely the earliest printed or handwritten Latvian texts.
The project was made possible through several years' work on the Corpus of Early Written Latvian Texts as well as work on defining a dictionary structure and produc ing several hundreds of sample entries during the period 2002-20092.During that period: 1) the early text corpus was compiled, with more than 40 publications from 16th-18th century,3 2) the editorial guidelines for the Historical Dictionary of Latvian were written on the basis of studies of historical dictionaries of other languages and 3) approximately 500 sample entries were written (300 common nouns and 200 proper names4); about one third of them are published in Word format on the project website (http://www.tezaurs.lv/lvvv/#vardnicasskirkli).5 The objective of this article is to describe the elaboration process of LVVV from the development of the corpus to the moment when the dictionary entries were uploaded to the internet, and to highlight some problems related to the compilation of the dictionary.

Sources
The history of the Latvian written language dates back to the 16th century and is largely linked to the Reformation of the Church.Only a few individual records of words have been preserved from before the 16th century.The earliest texts from the 16th century are various versions of translations of the Lord's Prayer, as well as separate short records in the books of Riga trade associations.The first longer texts are translations for the needs of the Lutheran church -catechism, pericope, and hymn translations.For a long period of time they existed as manuscripts, but starting from 1586/1587 they appeared in printed versions.The first Catholic catechism, which is linked to the Counter Reformation efforts in the PolishLithuanian Commonwealth, was also printed around the same time (1585).
The quantity and volume of Latvian texts gradually increased in the 17th century.The abovementioned texts were reissued and improved together with the first Book of Sermons (1654), a complete Bible translation (1685)(1686)(1687)(1688)(1689)(1690)(1691)(1692)(1693)(1694), and other texts of a religious content.The first dictionaries and grammar books were also compiled at this time.
There are two significant aspects of these early Latvian texts.The first is that most of the texts were translations from German, Latin and Polish, and there were very few original texts.One of the few exceptions is G. Mancelius's Book of Sermons.The second aspect is that the majority of the translators were not native speakers of Latvian, and some translators had a rather limited command of Latvian.Hence, it is assumed that the written Latvian language in the texts from the 16th-17th century is a far cry from the Latvian spoken at that time (further readings see RūķeDraviņa, 1977;Ross & Vanags, 2008;Vanags, 2009).
The Historical Dictionary of Latvian is based on the Corpus of Early Written Latvian Texts, i.e. on texts dating from the 16th and 17th centuries.Thus, before writ ing the dictionary entries it was important to expand the corpus in order for it to be sufficiently representative.Additions to the corpus indluded the main Latvian lexi cographical sources of the 17th century, as well as religious and secular texts.Unlike previously, the corpus currently encompasses not only printed but also handwritten texts.
Currently, the Corpus of Early Written Latvian Texts contains: 1) ten 16th century sources -all the oldest known books in Latvian,6 as well as the earliest Latvian versions of the Lord's Prayer (both printed and handwritten); 2) 46 sources from the 17th century, including three dictionaries (both manuscripts of Christopher Fürecker's dictionary (2nd half of the 17th century), the dictionary part of the work Lettus and the socalled 'Ten conversations' (1638) by Georg Mancelius); 3) 17 sources from the 18th century (including one lexicographic source -Liborius Depkin's dictionary).The sources added to the corpus during the last three years were significant contributions from a chronological, thematic and also a stylistic point of view.Each of these sources is also a valuable comple ment to the LVVV.
We will now describe the main groups of sources in more detail.

The Lord's Prayers of the 16th century
The earliest known Latvian texts are several versions of the Lord's Prayer in various 16th century sources.Within the framework of this project, the Corpus of Early Written Latvian Texts was supplemented with the following: 1) the Lord's Prayer by Ghisbertus (Gis1507_PN7), written by hand in a Catholic Agenda printed in 1507 in Leipzig; further more, two more handwritten versions of the Lord's Prayer, dated to approximately 1520, namely: 2) the Lord's Prayer by Bruno (Br1520_PN) and 3) the Lord's Prayer by Grunau (Gr1520_PN), 4) the socalled Lord's Prayer of Hasentöter printed in 1550 (Has1550_PN), which is the oldest known printed Latvian text; 5) the Lord's Prayer by Lazius printed in 1557 (Laz1557_PN), and 6) the Lord's Prayer by Megiser, approx. 1593 (Meg1593_PN).
The earliest Latvian versions of the Lord's Prayer are denoted by the names of the indi viduals who either wrote them or owned the manuscripts.While most of these prayers exist in several copies, two of them -those by Ghisbertus and Bruno -are only extant in one single copy.The Lord's Prayer by Grunau, along with the chronicle of Prussia by the same author, exists in several slightly different copies.The version by Hasentöter was first included in the 1550 German edition of Sebastian Münster's 'Cosmographia', in the Latin edition of the same year, and later in other editions in other languages as well.S. Münster's books is the source from which the Latvian text of the Lord's Prayer was later copied to be included in the publications by Wolfgang Lazius, Hieronimus Megiser and others; therefore some authors (e.g.Breidaks, 1994) also speak of the Lord's Prayers by Megiser,etc.8Due to the numerous versions, we had to choose which texts of the Lord's Prayer to include in the corpus.With the historical dictionary in mind, we decided that only the earliest version from each set of lexically and grammatically identical texts should be included.However, to meet the needs of researchers wishing to use the corpus to analyse spelling other details, the other versions should also be added in the future.
The first Latvian translations of the Lord's Prayer are of a varying quality.Nev ertheless, these texts are very important because they are the first written attestations of the approximately 40 Latvian lexemes used in this prayer.

Dictionaries
In addition to C. Fürecker's dictionary manuscripts, which had been added at an earlier stage, the corpus is now supplemented with G. Mancelius's GermanLatvian dictionary Lettus (1638), which is the oldest lexicographic source of Latvian.Lettus consists of three parts that together comprise the first textbook for learning Latvian: a dictionary (Manc1638_L), the part Phraseologia Lettica (Manc1638_PhL) comprising 51 sections of lexis on different topics, and 10 conversations about everyday issues (travelling, agriculture, etc); these are considered the first Latvian original short stories (Manc1638_Run).Lettus is an extremely valuable source for the compilation of a historical Latvian dictionary because 1) the begin ning of each LVVV entry contains data on 17thcentury Latvian lexicographic sources, and 2) all three parts of Lettus contain a large amount of secular vocabulary.Moreover, for many words this was the first time when they were fixed in printed form, and one of the tasks of the LVVV is to give information about this as well.
During the final year of the project, we will prepare Georg Elger's PolishLatin Latvian dictionary Polono-Latino-Lottauicum (1683) for inclusion in the corpus.This is the second printed Latvian dictionary, after G. Mancelius's Lettus.

Larger religious texts
The most voluminous source of 17thcentury written Latvian, without which the compilation of LVVV would not be possible, is Ernst Glück's Latvian translation of the Bible.Until now the early text corpus contained only the New Testament part of this translation (JT1685).Due to the large size of this source, many LVVV entries take their usage examples chiefly from this text.Also, since the Bible was translated into Latvian in the late 17th century, the quotes from this New Testament text are often the final examples in the LVVV entries.
The quality and significance of E. Glück's Bible translation in the development of written Latvian is undeniable.A comparison with earlier translations of some parts of the Old Testament, e.g., with G. Mancelius's translation of Ecclesiastes (Manc1637_ Sal) published in 1637, shows noticeable lexical differences.The 1689 translation of the Old Testament is richer and more nuanced.
Currently,9 the corpus contains the first three books of the Genesis, but intense work on the rest of E. Glück's translation of the Old Testament (1689-1694) is under way at the moment.The digital version of the text is compared with the original and text structural markup is provided for inclusion in the corpus, and we plan to add the text to the corpus during 2016.
The corpus is gradually supplemented with other religious texts -namely with the Lutheran handbook published in the region in 1685.Its first part, the hymnal (LGL1685_K1), was one of the first sources included in the Corpus of Early Written Lat vian Texts.Later it was followed by the pericopes translated and edited by C. Fürecker (VLH1685), Ecclesiastes (VLH1685_Sal), the book of Sirach (VLH1685_Syr), and a cat echism (VLH1685_Cat).

17th century oaths and laws
The texts of oaths and laws are considered minor sources but this does not make them less important.They are particularly valuable because the Corpus of Early Written Latvian Texts contains only a limited number of secular texts.Among the sources formerly added to the corpus are courtmartial laws (SKL1696_KB; SKL1696_RA) and a law prohibiting infanticide (SL1684).During this project, a number of other texts were added to the corpus: tradesman oaths, witness oaths, and several other laws, the original versions of which are kept in the State Archives of Latvia.They were deciphered and copied digitally.This was a rather difficult task, especially deciding how to render certain 17th century handwritten symbols in print.
At this moment, the Corpus of Early Written Latvian Texts contains eight hand written documents: the statutes of linenweavers from 1625 (LS1625); the informant's oath from the socalled plough revision (German: Haken-Revision) in the region of Vidzeme in 1638 (Zv1638_VAR); two witness oaths from 1681 (Zv1681_Liec_1; Zv1681_Liec_2); an oath of nonGerman timber sorters from 1681 (Zv1681_Kok); the tradesman oaths of the hemp manufacturers (Zv1689_Kan), the saltcarriers (Zv1689_Salsnes) and the warehouse workers (Zv1698_Lig).
The statutes of the Latvian linenweavers' fraternity in Riga are particularly important because they contain both general vocabulary and specific professional and legal vocabulary.It is the earliest known Latvian text of this kind.Besides, its German original has also been preserved, which is very useful in the work on LVVV because it makes it possible to clarify problematic places in the Latvian text (the Ger man text, however, is not included in the corpus).Most handwritten sources, includ ing the statutes of the linenweavers, are represented in the corpus by their scanned facsimiles as well, so that users can see their original appearance.

Dedication poems
In the 17th century a new genre of Latvian texts arose -the socalled dedication poetry.It was composed for all kinds of events -both in family life (weddings, baptisms, funerals), academic life (defenses of theses, acquisition of degrees), and literary life (publication of books).The dedication poems were composed in various languages and printed on separate sheets of papers or in sets of several pages.Quite often, the language of the poem was not the native language either of its author or the addressee.The Corpus of Early Written Latvian Texts contains some dedication poems that represent the secular poetry of the 17th century -e.g., a poem by Heinrich Fuhrmann from 1690 (Fuhr1690_LL), a poem by Michael Wittenburg from 1696 (Witt1696_MMID), and a poem from 1685 by an unknown author (ZP1685).

Possible future supplements to the corpus
The sources available in the Corpus of Early Written Latvian Texts of course do not cover all the data of 16th-17thcentury Latvian.The corpus should eventually contain all the known texts, and will hopefully also be supplemented with newfound sources.The fact that formerly unknown material can sometimes come to light quite nearby is proved by the recently discovered Latvian text in the Catholic handbook Rituale sacramentorum ac aliarum ecclesiae ceremoniarum for Lithuanian and Polish priests.Editions of this handbook containing also some Latvian texts date from 1675 and 1685, as well as from the 18th century.Although they contain only about 250 word usages, they represent a valuable addition to the 17thcentury Catholic Latvian texts.
This case highlights one of the problematic issues: How should the compilators of the corpus handle repeated editions which are identical or almost identical to the first ones?The current principle -using only the first publication -is not applicable in all cases.Reprints that have not been revised do not cause a problem (e.g., the editions of G. Mancelius's Sermon book in the late 17th century) -they would not provide the corpus with new material, only add to the frequency of word usages.Thus it would be sufficient to upload only the earliest known edition of the abovementioned Catholic texts.It is likewise clear that revised editions should be added to the corpus (as some already are) -e.g., the editions of M. Luther's Small Catechism, beginning with 1586, and the reprints edited by G. Mancelius (1631) andHeinrich Adolphi (1685).
Apart from printed texts, the corpus would also benefit from the inclusion of several handwritten sources which also would constitute an important supplement with regard to subject matter.Among them are the manuscripts of Andreas Gecelius' translations of the Psalms and Ecclesiastes, the anonymous late 17thcentury manuscript of prayer texts (see Augstkalns, 2009, pp. 514-529) and Georg Elger's translations of the Gospels and Epistles from 1640.
A specific group of sources are Latvian inscriptions (separate words and phrases) in German and Latin texts, either printed or handwritten.These do not fully comply with the definition of 'early Latvian texts', and thus we are not currently considering upload ing them to the corpus.Yet, in the future they might eventually become an interesting supplement.For instance, there are Latvian inscriptions in several late 17th and early 18th church metric books (such words as Audſeckne 'ward, fosterchild', Büſʄeneek[s] 'free peasant who has the right to carry weapons', etc.10), or in printed German texts (maiſe ſemme 'lit.: "bread land", black soil with clay', Wavveering 'wild rosemary'11).

The process of preparing source texts for inclusion in the corpus
Before uploading each source to the corpus, the texts are carefully prepared.They are scanned, OCR read with the program FineReader, and the digital versions are proofread, against the originals.During this process, problems sometimes arise with the identification of certain written symbols.For instance, in the text of the Old Tes 11 Examples from Salomon Gubertus' agriculture guidebook Stratagema Oeconomicum oder Ackerstudent, denen jungen vngeübten Ackerleuten in Lieffland zum nöhtigen Vnterrichte … (Gubertus, 1645).tament it is sometimes unclear whether the diacritic above a letter is one big dot, two smaller dots, or a macron, or whether a letter is topped by a gravis or a halferased caron.The texts contain several versions of diacritics which were employed without much consistency.A question arises: how precisely should all these nuances be reflected, and is it possible to do some unification?The current principle is to reflect them as precisely as possible, while also taking into account the peculiarities of each source.For instance, in E. Glück's Old Testament what appears as a macron over a vowel is most probably a diaeresis, i.e. an umlaut sign, where the two dots have merged.The deciphering of manuscripts highlights other types of problems, e.g.: 1) should we follow the inconsistent and often nondifferentiated usage of small and capital letters?2) how to interpret unclear parts of the text?Due to the inconsistencies of the original, the text in the corpus retains capital letters only in wordinitial position and only in proper names: Beth kad wairack no ʃkaitieʃchen tohs wahtzes gir buß liedcz tems wehlehtems buth12.(LS1625, 3r13.) Before the texts are added to the corpus, structural marking is carried out.This, for example, entails marking words in other languages, so that they are not automati cally included in the index of Latvian words.The majority of texts in the corpus are not monolingual.Different kinds of codeswitching are very typical in the writings of early modern times.Until the late 17th century, the readership of Latvian books consisted of BalticGerman clergy, and Latvian text was basically inserted into German text, which functioned as a paratext.In such cases it is not difficult to distinguish the texts of different languages.Dictionary materials also do not cause problems.Problems arise, however, in cases of intrasentential codeswitching.In such cases, it is not always 9 possible to tell whether a particular form is in Latvian or another language.A proper name in a Latvian text can be followed by a Latin inflectional ending, as in ar weenu Mutt teitzeeta Deewu vnd to Thäwu muhʃʄa Kunga JEʃu Chriʃti 'all together praise the God and the Father of our Lord Jesus Christ' (Manc1631_LVM, 2515).Here the gen.sg.JEſu Chriſti can be regarded as Latin forms.However, when the flectional forms of both languages look identical, as in Vnd JEʃus ʄatziya vs to 'and Jesus said to him' (Manc1631_LVM, 511), the word JEſus can be interpreted both as a Latin and a Latvian nom.sg.case form.In the corpus neither of these occasions is marked as nonLatvian text.Therefore they appear in the dictionary as well, including cases when the switched code is a single word, not a part of a sentence, e.g.tas brähts / Abba / myļais Tähws 'he cries / Abba/ dear Father' (Manc1631_LVM, 412).The word Abba 'my father' is left without marking in the corpus, and thus it also appears in the dictionary.
In the Latvian texts added to the corpus so far, one can observe words in German, Latin, Polish, Greek, Estonian, and Hebrew.The scale of structural annotation differs from source to source -in dictionaries it is more detailed than in other texts, e.g.@v{Allein/} @l{adverb:} tickai to ween.(@v introduces the German text, @lthe Latin text).Also in the Bible text a relatively detailed marking is used: each book and each chapter are marked, as well as comments on the text of the Holy Writ and notes, e.g.@p{* Greek: Wallod: Apʄmeets.}.Furthermore, indications to parallel passages in the end of a verse are marked, e.g.@t{w:3.2.Moʃ:Gr: 29,10.}.This structural marking is also necessary for the determination of the exact location of a word form.The Corpus of Early Written Latvian Texts, unlike Modern Latvian text corpora, can give precise information about the source and the number of each page and line.This considerably alleviates the researcher's work, since it provides a simple way to check the correctness of the text by comparing it with the facsimile of the original.
Erroneous forms are also marked, i.e. obvious spelling mistakes or typos, e.g.uud instead of und 'and'.These errors are marked by first giving the reconstructed correct form and then the original erroneous form in curly brackets, e.g.und{uud}.
On each such occasion a question arises: how far should the compilers of the corpus interfere by correcting the centuriesold text?The current principle is to mark only the obvious spelling mistakes.However, even with this principle there is a risk of acting incompetently or misleadingly.
Some publications, e.g.G. Mancelius's Lettus, contain a list of corrected errors that were noticed by author himself.Thus, G. Mancelius noticed that the form attghadaht 'to unravel' was misprinted as attghaghat and Rahm 'calm' as Rahn, and sometimes the author felt that a word should be replaced by another, more suitable one -e.g. in one case he wishes to replace Smilltis 'sand' by Pieʃchli 'dust'.In the cor pus, this is represented as Pieʃchli{Smilltis}.The process would definitely by alleviated by a morphological markup of the early text corpus, which has not yet been performed.The Institute of Mathematics and Computer Science of the University of Latvia has a certain amount of experience with working on old texts and has developed a lawbased system to improve the optical character recognition and to correct typical mistakes caused by this, so that the word forms in the old texts could later be mapped to Modern Latvian word forms (Pretkalniņa, Paikens, Grūzītis, Rituma, & Spektors, 2012).Of course, some preliminary work is necessary before this system can be emplyed to analyse 16th-17th century Latvian texts and in the work on the historical dictionary.Most probably, a description of the "grammar" of each of these sources should be prepared, so that it can be later used for the markup.

Writing and supplementing the LVVV entries
The central part of the project is the compilation of sample entries for the Historical Dictionary of Latvian.Therefore, along with adding new sources to the corpus, already in 2013 we supplemented the 150 previously compiled entries13 with new information from sources added to the corpus during 2009-2012.14In the end of 2013, we uploaded 150 revised and 63 new entries to the LVVV website.
Since the beginning of 2014, intense work has been carried out on producing new entries as well as on supplementing the existing entries with materials from the sources that were recently added to the corpus.Because the project plan entails the preparation of a relatively small number of entries (ca.1200, ca.700 of which are already written and some of which have already been published on the abovementioned website), the entries of common nouns are chosen according to the principle of representativity,15 i.e. so that they comprise all parts of speech and are different in size.Thus, the entries include both littlerepresented lexemes and hapax legomena, and also large entries where the frequency of the word usage reaches into the hundreds or thousands.
Certainly, this dictionary will not contain almost every written Latvian word of the 16th-17th century.The number of potential headwords is much larger.However, in order to prepare a complete dictionary, a larger team and much more time would be necessary.The 1,200 entries will serve as an example, a testing field and a basis for future work, but even such a limitedsize dictionary can give some insight into the grammatical and semantic phenomena and processes of the language, as well as the general use of written Latvian during the 16th and 17th centuries.As noted in the beginning of this article, during the first stage of work on the Historical Diction ary of Latvian we elaborated its instructions,16 based on the experience of historical dictionaries of other languages,17 but also considering the specific character of the Lat vian language and early Latvian texts.Still, it is impossible in advance to envisage all possible difficulties.Resuming the work on the dictionary entries in the new project, we encountered new problems which had to be solved.This was partly due to the new format of the dictionary, given the transition to the dictionary writing software TLex Suit 2013 (Joffe, de Schryver, & Prinsloo, 2003) and partly due to the fact that, as the dictionary expands, the entries become more and more diverse.
Henceforth in this paper, the examples of LVVV entries of common nouns and proper names will be followed by a brief description, as well as an analysis of the problems encountered.
Example of a LVVV entry of a common noun:18

Head of entry (headword, grammatical information, frequency of usage) (I)
The choice of form for headwords is usually one of the most discussed issues in his torical lexicography.For the time being, there is no universal principle regulating this choice.The authors of each historical dictionary take into account the lexicographical tradition of their respective country, as well as the specific nature of the data.
In LVVV the headword is usually a single word (in exceptional cases two or more words) transcribed in contemporary spelling.It is the basic form of the respec tive word (e.g., dibins 'ground' or Matužs 'Matthew').The early texts provide rich material for studying the adaptation process of borrowed lexemes, their phonological form and visual representation.This is particularly true in the case of proper names, therefore it is sometimes difficult to define the form of the headword.In most cases when there are two or more headwords, it is due to the spelling peculiarities of proper names (Atenas, Atēnas 'Athena') or unclear morphological stems (Manase, Manases, Manasis, Manasus).In some cases, however, the entries for adjectives also have more than one headword versions (niknis, nikns, nikna 'furious') -here the regular basic form is the masculine nikns, but alse the form niknis with an inserted vowel i and feminine form nikna are included as headwords.More than one headword is also found in entries for compounds with various connecting vowels (dusmapūķis, dusmupūķis '(lit.: anger dragon, angry person') and entries for nouns that represent a nomen agentis if they have both gender forms in the corpus (adītājs, adītāja 'knitter').
In some cases, the entry contains only the headword and a crossreference to another entry.This is often done where the spelling of a word in early texts or in a particular text deviates from the contemporary spelling, e.g.badadzeguse → badadzeguze, badadzeguse 'hoopoe' or Pamphilia → Pamvilija.The compilers of LVVV had to find a solution for representing variant forms in the entry or head of entry.The Corpus of Early Written Latvian Texts contains many spelling variants that cover at least two periods in the history of written Latvian (before and after G. Mancelius's reform in 1631), as well as examples of the individual spelling of some authors (e.g., G. Elger and J. Reuter).Moreover, the corpus includes printed texts and deciphered manuscripts.In LVVV it was decided to include different spelling and phonetic variants in one entry,19 but different morphological stems in separate entries.Unclear or reconstructed endings (usually the basic forms of proper names) are italicized, e.g.Amplia, Amplias (1) s. m. npers.Ampliu (1) 'Ampliats', or prieds (4) s. m. preedus (4) 'additional payment'.Italics are also used if the quality of a sound is unclear, e.g.mīkčaula (2) gen.mihkzaula (2) 'softshelled '. 14 In the head of the entry, the headword is followed by information about the total frequency of the lemma in the corpus, grammatical information, and all the attested word forms with correspoding token frequency in the corpus.The electronic version of LVVV will be mapped to the corpus (with concrete location pointers (addresses) of word forms), which will allow the user to see the particular contexts.Following common practice, homonyms are placed in separate entries.In some cases it is dif ficult or impossible to determine the part of speech of a word if it is the result of con version and the context does not give clear indications.For instance, nabags can be either the adjective 'poor' or the noun 'poor person'.For the distinction of this kind of homonyms we also consulted the 17thcentury lexicographic sources.For instance, nowadays the word bezdibenis 'abyss' is a noun, but bezdibens was fixed in G. Mance lius's dictionary Lettus as an adjective meaning 'endlessly deep'.
In the course of the project we discussed several complex issues concerning headwords, sometimes modifying or changing the original principles we had set out.

Word fixation in lexicographic sources of the 17th century (II)
If a word appears in the lexicographic sources of the 17th century, the examples from the sources are given immediately after the head of the LVVV entry.Currently the examples are taken from three sources, but when the corpus is supplemented with newfound sources it will not be difficult to add the new information to the dictionary.Usually these lexicographic sources contain common nouns but sometimes there are proper names as well.For instance, chapters Nr 49 and Nr 50 of G. Mance lius's Phraseologia Lettica contain placenames such as Liebaw/ Leepai 'Liepāja' (Manc1638_PhL, 41114), while C. Fürecker's dictionary manuscripts show sporadic occurrences of placenames, personal names and other proper names, such as Eewa, Eva, it.faulbaum, Ewa kohks.id.'Ieva (proper name, and also name of a tree)' (Fuer1650_70_2ms, 1132), Muhsa, die bach bei Baliske u.Namanna, die machen beide die Aa. 'the Mūsa river, which together with the Mēmele creates the Lielupe river' (Fuer1650_70_1ms, 16119) or Auʄeklis Spulgis.der Morgen stern.deeniņņa ne tahļu, auʄeklis jau uhslez, der tag ist õ weit, der morgen stern gehet schon auff.'Morning star (Venus)' (Fuer1650_70_1ms 1428).

Word meanings (III)
Here we conform to the principles elaborated during the previous projects -word meanings are deduced from the examples found in the corpus.They are explained as simply as possible, with the help of synonyms or periphrasis if necessary.For instance, the entry čuska, čūška (Modern Latvian čūska 'snake') contains the archaic transferred meaning 'devil'.Explanations of obsolete concepts sometimes include encyclopaedic information as well -e.g., the entry nauda 'money' contains the col location ∆ skēpju nauda (lit.: 'spear money', cf.Modern Latvian šķēps 'spear') and its explanation: 'the socalled Alberta dālderi ('Albert's thalers', named after the Dutch regent Albrecht II), used in 17thcentury Livonia, Courland and Riga; they had a depic tion of two crossed spears on the reverse'.
There are certain criteria patterns for interpreting parts of speech, criteria for distinguishing word meanings, and established methods for determining the mean ings in unclear cases20.Still, the interpretation of meanings never becomes a routine because problematic and nontypical cases appear from time to time.21Some cases will be presented below.
Since this is a historical dictionary, the historical principle is observed, i.e. the meanings are given in chronological order of usage.The first one usually is the meaning fixed in the 16th or 17th century.Quite often, it does not coincide with the modern primary meaning of the word.For example, the word skriet in LVVV is first defined as 'to flow': Te yʃʃtep' pe to kruʃte kòke: Tas dárgas aʃins ʃkräie bes galle.. 'They drag (Him) to the crucifix, The dear blood flowed without end' (Elg1621_GCG, 672); the second meaning is 'to fly': Vnd ka tee Puttni ʃkreen / ta ghroʃahs ʄöw tee Wehyi.. ' And when the birds fly / then the winds change.'(Manc1631_Syr, 59824), and only its third, most recent meaning (in the corpus first attested in a text by G. Mancelius from 1654) is the Modern Latvian meaning 'to run': Weens labbs Ghanns / kad wings räds to Willku ʃkreijam ʃtarrpan tahms Ahweems / tad eeʄahk taß ʄaukt.. 'A good shepherd / when he sees the wolf run ning among the sheep / then he starts to call to them.' (Manc1654_LP1, 45111).The search for meanings and their explanation is a very meticulous work demand ing a lot of attention because even a seemingly clear usage can contain a meaning quite different from the contemporary one.For instance, the word čakls might seem to always have had the same meaning as nowadays, i.e. 'hardworking, industrious'.However, in G. Mancelius's dictionary the Latvian word čakls is used as an equivalent to German behend, geschwind, and schnellfüssig, which all mean 'quick, agile, nimble'.Besides, the comparison of the 16th-17th century Bible text translation with the newest Latvian translation of the Bible can also be helpful in interpreting the word meanings.For instance, the 1685 text uses the word čakls as follows: Wiņņo Kahjas irr tʄchaklas Aʄʄini isleet.'Their feet are swift to shed blood' (JT1685, Rm 3:15), while the 2012 text uses the word žigls 'quick' in the same verse: viņu kājas ir žiglas, kad tie steidz izliet asinis 'Their feet are swift to shed blood' (Bībele, 2012(Bībele, , p. 2437)).Thus, the meaning Nr. 1 (i.e. the oldest meaning) in the LVVV entry čakl(i)s, čakla is: 'quick, agile, nimble'.
Nevertheless, the old dictionaries can not only shed light but sometimes also confuse the compilers of LVVV.Thus, the only two instances of the word čākstēt (with the German equivalent knirschen, knarschen, but without examples) are from the two manuscripts of C. Fürecker's dictionary: Zahksteht, knirschen, knarschen (Fuer1650_70_1ms, 3116), Zahksteht, Knirschen, Knarschen (Fuer1650_70_2ms, 5385).The German word knirschen nowadays is translated as 'to crunch, to squeak', and we have given this meaning in the LVVV entry čākstēt.However, due to the lack of examples we cannot be completely certain about the meaning of this word in those times.
Although in most cases meanings are deduced from the examples found in the cor pus, in unclear cases we turn to other sources as well.For instance, the collocation plikka cepure 'lit.: naked hat' in C. Fürecker's manuscript is translated as ein Hut 'hat' Thus, if we were to rely only on C. Fürecker's translation, we would have to explain plika cepure simply as 'cepure' 'hat', which would not be very helpful.Luckily, ME contains a compound plikcepure defined as 'hat made of hairless leather'.We give this definition in LVVV as well.22Since the meanings of collocations are sometimes interpreted on the basis of only a limited number of examples or unclear definitions in dictionaries, one can agree with O. Reichmann that in historical dictionaries the explanations of words sometimes are only "hypotheses about their meaning" (Reichmann, 2012, p. 251).

Examples (IV)
Since the online version of LVVV will be linked with the Corpus of Early Written Latvian Texts, making it possible to click on the word forms to see their usage in a broader context (this mapping is not yet provided in dictionary writing software, but is planned in the future), the dictionary text contains only two illustrative examples of each meaning -the oldest one and the most recent one (in the period from the early 16th century to the late 17th century).However, this approach has some drawbacks.Sometimes -especially in entries with a small number of usages -both examples are almost identical or very similar because they are from the translations of the same Bible verse or the same hymn or carol.In the period between these two there may have been examples from other sources as well, and sometimes they are more precise and illustrative, but due to the principle that we are following, we cannot include them in the dictionary.This problem will be partly solved by the alreadymentioned mapping between the dictionary and the corpus -but only partly because the word forms are only given in the head of the entry, not in the sections presenting the separate meanings.

Collocations (V)
After the examples of each meaning we provide word collocations, if necessary.They are a significant element of LVVV as they show the words in actual use.The diction ary will distinguish three types of word collocations: 1) frequently used free colloca tions, which are included not because of their semantic opaqueness but due to their frequent use (in LVVV they are preceded by the sign -), 2) fixed collocations without transfer of meaning, i.e. collocations where the meaning of one of the components cannot be deduced from the semantics of the headword of the entry, but which still do not have a specific transfer of meaning (preceded by the sign Δ), 3) fixed collocations with transfer of meaning, i.e. idioms or phraseological units (preceded by the sign ◊).Both fixed collocations and idioms are provided with definitions and the oldest and most recent example of usage.Nevertheless, the identifaction and reflection of word collocations in LVVV highlight a number of problematic issues.

Collocations and compounds
The peculiarities of Latvian spelling in the 16th and 17th century often pose a question: how can one tell collocations from compounds?Compounds in early Latvian texts were written either as one word, e.g.naudakalējs (1) s. m. naudakalleis (1) '(lit.: money smith) moneyer', or with the double hyphen typical of Gothic script, e.g.naudakaša (1) s. com.nauda=kaʃʃcha (1) '(lit.: moneyscraper) miser; greedy person'.Similarly, collo cations can also be written in two versions: either separately or with a double hyphen, as in the following two examples of the collocation Δ zobu nauda: sohbu nauda (1), sohbu=nauda (1) '(lit.: teeth money) a gift money given by godparents at christening'.
Currently we follow several criteria for distinguishing collocations and com pounds: 1) if the first component agrees in gender, number and case with the second component, it is most probably a collocation, not a compound, as in the following example: Jo mums arri weens Leelas=Deenas Jehrs/ tas irr Kriʃtus par mums uppurehts.'Because we also have a Paschal Lamb / that is Christ, sacrified for us' (VLH1685, 3726); if the first component has no ending or the ending is reduced or changed, it can be regarded as a compound: Mihļajs Eņģļis/ krahʄchnajs Eņģļis/ All.All.Kur es atraʄchu ʄawu Kungu/ All.All.Tas irr aug ʄcham zehlees no ta Kappa/ All.All.ʄcho=deen ʄchinnî ʄwehtâ Leeladeenâ/ All.All.'Dear Angel/ magnificent Angel/ Hallelujah.Hallelujah.Where will I find my Lord/ Hallelujah.Hallelujah.He has raised from the tomb/ Hallelujah.Hallelujah.today in this holy Easter/ Hallelujah.Hallelujah.' (LGL1685_K1, 498); 2) if both components of a word combination (e.g., allus=klappi 'beer tankard') can be observed separately as well, the collocation is quoted in both respective entries, in this case alus 'beer' and klape 'tankard'; 3) if one of the com ponents does not exist separately, as with abla=sahle 'clover' or Baijero=semmmes 'Bavaria', we do not reconstruct forms such as *abls or *baijeri, but create separate entries for abla zāle and Baijeru zeme.But making distinctions based on intuition would be even more arbitrary.Sometimes even in cases when two words are writ ten without a space or double hyphen, it is doubtful whether it is a compound, e.g.Waʃʄaraszäppure 'summer hat'.Meanwhile, compounds are sometimes written with a space between their components, e.g.Mihkst zauļis most probably belongs in the entry mīkstčaulis 'softshelled egg'.

Phraseological units or idioms
An idiom is traditionally defined as "a phrase etc. which is understood by speakers of a particular language despite its meaning not being predictable from that of the separate words" (SOED, 1993(SOED, , p. 1312)).However, fixed collocations and idioms are not always attested frequently enough to make it clear whether they were typical of time.This is due to the dominance of religious writings among the early Latvian texts.Thus, the idioms of this period are usually from the Bible or from spiritual poetry, and their frequency of usage may be due to the repeated editions, e.g.: ◊ dusēt Dieva rokā (2) 'to be under God's protection', ◊ neturēt mēli iekškan iemautu (2) 'not to keep silent', ◊ akmins pol ūz dūšu (2) 'unpleasant feelings begin', ◊ Abraama klēpis (3) 'the bosom of Abraham; paradise'.The language of G. Mancelius's texts and C. Fürecker's dic tionary manuscripts is more diverse, with phraseological units that may stem from the actual Latvian language tradition of the time, e.g.: ◊ mieles tapa, mieļa tapa (5) 'drunkard', ◊ putna nags (2) 'miser', ◊ naudiņa spiež maciņā (1) 'said if somebody is wasteful', ◊ Antiņš, kur stabulīte / stabuliņ?(3) 'fool'; some of them are still used today.Nevertheless, even these fixed collocations with transferred meaning have a low frequency of usage in the Corpus of Early Written Latvian Texts.As pointed out by O. Reichmann in his Historische Lexikographie: the less examples of a possible idiom there are, the more difficult it is to define its meaning (Reichmann, 2012, p. 405).A question thus arises: can we regard as idioms collocations which obviously have a transferred meaning (e.g., naudas žurka 'miser') but very few or only one instance of usage?Until now, taking into account that the Corpus of Early Written Latvian Texts (and the number of Latvian texts in the 16th-17th century in general) is relatively small, such collocations were treated as idioms and are included in the dictionary.Even if they have a single instance of usage, we try to explain its meaning, judging from the context.The same refers to the fixed collocations.

Interpretation of meanings in fixed collocations and idioms
The problems here are basically the same as with the interpretation of meanings of headwords.Whenever possible, they are deduced from the context.The task is easier if the respective collocation is given with a translation in one of the lexicographi cal sources (although the translations in 17thcentury dictionaries cannot always be deciphered), or if the Latvian translation can be compared with the original text.In some cases, however, more detailed research is necessary.A good example is the alreadymentioned fixed collocation skēpju nauda 'spear money' from the court martial laws of 1696: Kad nelaulahts Wihrs ar ohtra Wihra laulatas Śeewas/ jeb nelaulata Śeewa ar ohtras Śeewas laulata Wihra Laulibu pahrkahp/ tad buhs tam kas nelaulahts/ jo wiņʃch pirmâ reiʃi tohp atraʃt/ 40.Dalderus Śuddraba=Naudu/ tas irr/ 20.Dalderus Śkehpju=Naudu ʃtrahpi doht.. 'When an unmarried man with another man's spouse / or an unmarried wife with other wife's husband breaks the marriage / then the one who is not married / because it is first time (s)he has been found / must pay a penalty of 40 silver Thalers / that is/ 20 Thalers in spear money' SKL1696_KB, 91.At first, this concept, which was probably obvious to the audience of the 17th century, seemed obscure.Then it was clarified by the research of Konstance Kļava on the Latvian legal text vocabulary of the late 17th century, where she has explained the meaning of this collocation as follows: "the socalled Alberta dālderi 'Albert's thalers' (named after the Dutch regent Albrecht II), used in the 17thcentury Livonia, Courland and Riga and evaluated higher than Swedish money; they had a depiction of two crossed spears on the reverse" (Kļava, 1989, p. 103).It turned out that this type of money is also mentioned by K. Mühlenbach as šķēpu nauda (ME IV, 1932, p. 33), and also in E. Dunsdorfs's book "Latvijas vēsture 1600-1710" (Dunsdorfs, 1962, pp. 303-305), pp. 303-305).Based on this information, we elaborated the definition in LVVV.

Explanation of origin (VI)
An important task of a historical dictionary is to provide new data and information relevant for the study of the development of the vocabulary.Certainly, this does not mean that a historical dictionary should replace the already existing etymological dic tionaries -it should rather supplement them.The corpus makes it possible to pinpoint more precisely the time when a respective word entered the written language, which in many cases may be the time when the word actually appeared in Latvian.
Most Latvian texts from the 16th-17th century are religious texts.When it comes to the Christian vocabulary in Latvian, we have to keep in mind that it has been under a substantial degree of impact from other languages.Also, since Christianity was itself borrowed from another culture, we can say that the whole Christian discourse is borrowed.Nevertheless, it has been developed with the help of elements both from Latvian and from other languages.The Christian vocabulary of the Latvian language contains all possible kinds of borrowings: lexical, derivative, and semantic (according to W. Betz's terminology: Lehnwörter; Lehnbildungen; Lehnbedeutungen (Betz, 1959, p. 128)).There are also borrowings on the phraselogical level (German Lehnwendungen; see also Range, 1994).A historical dictionary might be of help to the research of this aspect of lexical history.
The explanations and references to the origins of words are based on various principles.Firstly, taking into account the digital format, it was decided to indicate 20 the origin after each headword, not just include a crossreference.Therefore some lexemes in the dictionary may have identical references of origin.Secondly, these references are as short as possible, and various symbols are employed ( introduces explanation of origin;  means "derived from"; : means "cf."; < means "borrowed from").The references include a limited amount of text -only in those cases where more detailed explanations are necessary.The indications of origin are of second ary importance in LVVV, since it is not an etymological dictionary.Therefore we basically use the alreadybexisting definitions.The sources most often employed are the following: ME; EH; Karulis 1992;Fraenkel, 1962Fraenkel, -1965;;Pokorny, 1959Pokorny, -1969;;Sehwers, 1953;Summent, 1950.We also use the most recent etymological diction aries of Lithuanian, such as Smoczyński, 2007 andALEW (2015), and the Estonian etymological dictionary Eesti etümoloogia sõnaraamat (2012).The forms of loanword etymons (Estonian, Livonian, Lithuanian, (Middle) Low German, (Old) Russian words) are checked against recent dictionaries of the respective languages: SL; LB; СДЯ XI-XIV; LKŽ, the LivonianEstonianLatvian dictionary, etc.The sources of these explanations are not yet indicated but in the future they might be included in the log of theoretical sources.
In the second group, the sign : is followed by a reference to related words in Latvian, in other Baltic languages, or in other IndoEuropean languages.Such refer ences are given when the wordbuilding process is obscure and outside the scope of Latvian language history, as in the following cases:  native words with direct equivalents in other Baltic languages: adata 'needle' : Lithuanian ãdata 'needle';  native words with equivalents in other IndoEuropean languages: agrs 'early' : Avestan agrō 'first', agrәm 'beginning'; alot 'to mistake' : Greek αλάομαι 'I wander around'.
In the third group the sign < is followed by a reference to the source, i.e. the etymon.Thus, it is used only with  lexical borrowings: boķēt 'to thresh' < Middle Low German bōken 'to beat'; balandas 'goosefoot' < Curonian or Lithuanian balanda 'goosefoot'; aba 'or' < Belarusian or Polish abo 'or'.
This group covers most entries of proper names.These are mostly personal names and placenames from religious texts, and their original form in the source language is added.Usually this is from Luther's Bible translation (Luth1545), e.g., Laodikeja < Greek Laodíkeia; Liflante < German Livland; Miha < German Micha.
The fourth group consists of words denoted by the sign .Of course, such a brief description gives rise to problems that cannot be solved within the LVVV project.They refer either to etymological studies that are outside the scope of the compilation of LVVV, or to theories on morphology that may change in the course of time.It is crucial to use the same reference types both for common names and proper names.

Cross-references (VII)
At the end of the entry there are crossreferences to other entries:  points from a derivation to the basic word,  denotes a reference concerning wordderivation,  denotes a reference of comparison,  denotes a reference to an entry where the respective word can be found.When necessary, comparative references are given from a fixed collocation or idiom to another entry.Due to technical issues that are currently not yet solved, it is not possible to create references in the opposite direction, i.e. from one entry to a collocation quoted in another entry.

References to literature (VIII)
This section is included only in relatively few entries, and is not present in the entry examples shown in the beginning of this article.In cases where specific literature has been used, e.g. a book on plant names in Latvian (Ēdelmane & Ozola, 2003), the refer ence to this source is given at the end of the entry.

Dictionary writing software
As the project title shows, this dictionary is in a digital format.In order to facilitate the process of its compilation and uploading to the internet, where the dictionary will be supported with various types of search engines,23 we employ the socalled dictionary writing software.The compilation of new entries is closely linked with mastering the dictionary compilation software TLex Suit 2013 and its adaptation for the specific needs of the Historical Dictionary of Latvian.24 The dictionary writing system helps to structure the entry, to automatize various actions (thus, one no longer has to take care of the visual appearance of the entry each time, or to type the titles of sources, which are instead chosen from a menu).It also helps preparing the dictionary for uploading to the internet with various search options (e.g., the opportunity to search for grammatical parameters, for a particular source, for etymological references, etc.).Nevertheless, this program has its drawbacks.For instance, during the work process one cannot mark problematic places that will need to be revisited.There is a comment log in the end of the entry, but it is not very suitable for our particular needs.Therefore the entries are first prepared in Word format, discussed in team meetings, and only then uploaded in the system.Still, the advantages of this lexicographic tool outweigh its disadvantages, and its usefulness is indisputable.

Conclusion
Although the fouryear long project 'Corpusbased Electronic Historical Dictionary of Latvian' is nearing its end, the work on the dictionary is far from finished.By the end of the project, only 1,200 sample entries will be completed, a small number compared to the number of all the possible entries.However, a large portion of the work will be accomplished, which will make it possible later to resume the compilation of this dictionary -on a new level of quality, within the framework of new projects, with a much larger text corpus as the basis, with new experience with entries of various parts of speech and various sizes, and with modern lexicographic tools.And, most importantly -with the possibility to publish the entries online, so that they are freely available to users and comfortable to use. LGL1685_K1

Abstract
This article deals with the development of the Electronic Historical Latvian Diction ary (http://www.tezaurs.lv/lvvv)based on the Corpus of Early Written Latvian Texts (http://www.korpuss.lv/senie/).Some issues concerning the compilation and processing of the corpus data are discussed and the main sources added to the Corpus during the fouryear project are described: the 16th c.Lord's Prayers, 17th c. dictionaries, texts of oaths and laws, religious texts and socalled dedication poetry.The aim of the project is to compile a pilot electronic dictionary of 16th-17th century Latvian where all parts of speech are represented among the entries.This dictionary will contain ca.1,200 entries, including both proper names and common nouns.The main emphasis is on the description of the dictionary entries supplied with relevant practical and theoretical observations.Each part of the dictionary entry is discussed, followed by comments on various issues pertaining to that part (e.g., the choice of headword and the representation of spelling versions) and how these were resolved.Special attention is paid to the head of entry, explanation of meaning deduced from the examples found in the corpus, different types of collocations and their representation in the dictionary, as well as etymological information.Finally,

10
Examples from marriage records made by Nathaniel Pommer in the metric book of Mālpils parish church during 1697-1706 (LVVA, record group No 235, description No 3, file No 173).
They have unclear or unknown origin.These entries contain more text describing these individual cases.Sometimes parallels in other languages are shown as well:  abuls 'clover'  From dābuols 'clover' with change of root and suffix. bokstīties 'to stagger'  Unclear origin. burkāns 'carrot'  Unclear origin; cf.BalticGerman burkan, Estonian porgand, Russian dialects burkan, borkan 'carrot'.