Temporal Expressions in Polish Corpus KPWr

Temporal Expressions in Polish Corpus KPWr This article presents the result of the recent research in the interpretation of Polish expressions that refer to time. These expressions are the source of information when something happens, how often something occurs or how long something lasts. Temporal information, which can be extracted from text automatically, plays significant role in many information extraction systems, such as question answering, discourse analysis, event recognition and many more. We prepared PLIMEX — a broad description of Polish temporal expressions with annotation guidelines, based on the state-of-the-art solutions for English, mainly TimeML specification. We also adapted the solution to capture the local semantics of temporal expressions, called LTIMEX. Temporal description also supports further event identification and extends event description model, focusing at anchoring events in time, ordering events and reasoning about the persistence of events. We prepared the specification, which is designed to address these issues and we annotated all documents in Polish Corpus of Wroclaw University of Technology (KPWr) using our annotation guidelines.


Introduction
The recognition of temporal expressions and events is one of the major tasks in Information Extraction (henceforth IE) and plays significant role in many natural language processing systems.It became an active area of research after the first attempts to apply the temporal and event-based reasoning in language and text to IE.
Currently we live in a dynamic world, full of information given to us mostly as unstructured data (text, sound, view).The analysis of information in that form (e.g.natural language) is still a challenging task for researchers.Here we focus on the aspect of changes over time, which can be tracked in text written in natural language.Reasoning about how the world changes requires extracting of information about temporally grounded events.
Temporal expressions refer to the time and tell us when something happens, how long something lasts or how often something occurs.To capture the semantic meaning of temporal expression we often have to analyse the whole context of document.Human usually is conscious of its location in time and knows what is the day and month of the year or what is the part of the month (begin or end).Thus people use expressions like 15th of december, the next week, 4 days before to refer to the specific point in time.Even newspaper articles, written mostly in formal language, often contain temporal expressions, which global semantic meaning can be deduced by analyzing the whole context of the document or even document metadata, e.g.document creation/publication time.Knowing of full temporal context is essential to determine the exact date, to which the temporal expression refers.These examples do not cover the complexity of temporal expression understanding.Sometimes the part of text describes past or possible future, but it is not explicitly stated.However there are some clues in other part of text to find out what tense is given.Sometimes the purpose of using temporal expression is not the reference to real world, but just to describe some fictional events.Other kind of problems is to determine the temporal function of the expression, e.g. three days can be used to describe duration (how long something lasts) or point in time, e.g. in three days.Automatic systems should distinguish between different categories of temporal expressions to interpret their semantic meaning.
Another purpose is the normalization of temporal expressions.It is crucial to share information between different computer systems.For example a software developer from Poland could create the system, which recognize the temporal expression dziewiątego grudnia (ninth of december) and normalize it to 09.12.2014.Another developer from USA could also create similar system, which normalizes the same temporal expression to 12/9/2014.A serious problem may be dealing with two forms of the same date.
This article describes the overview of existing specifications for marking up text with temporal expressions and what is missing in some concepts.It shows how to classify temporal expressions and how to represent and annotate them.It presents annotation and normalization guidelines with the special focus given to Polish language.Finally it describes annotating of Polish Wrocław University of Technology Corpus, validating the result of annotation with comparison to other corpora annotated with temporal expressions and a summary of the planned work for the future.

Related Works 2.1. Information Extraction
Information Extraction task is a subpart of natural language engineering and it focuses at extracting of structured information from unstructured documents.In case of human language texts processing we can observe many directions of IE, with subtasks aimed at recognition of: Named entities -recognition of known entity name, e.g.person names, organization names, location names.Several solutions were proposed for Polish as open source frameworks, e.g.Liner2 (Marcińczuk, Kocoń, & Janicki, 2013;Marcińczuk, & Kocoń, 2013), which consists of several universal methods for sequence chunking like dictionary look-up, statistical processing or pattern matching.
Temporal expressions -many texts, mainly newspapers, describe the world by talking about people, events and states they participate in.The exact temporal locations of events is rarely explicit, many temporal expressions are vague.Automatic extraction of temporal expressions identifies when something occurred by the recognition and normalization of expressions which refer to time.Often used in automatic question answering systems (Pustejovsky, Knippen, Littman, & Saurí, 2005).
The above list is not exhaustive and still the number of IE subtasks increases, because it is becoming an interesting topic for many researchers.IE allows the information from the text to be expressed in a structured way and allows to join the information from multiple sources of documents.

Temporal Information
Temporal expressions in text refer to the time.According to (Mazur, 2012) these expressions are divided into two main categories: instants and intervals.These are atoms of time, which can be used to represent and reason about time.In the literature we can find many terms to describe instants, e.g. a time point, a point, a point in time, a moment.Also interval sometimes is called period like in (Benthem, 1983), where Benthem uses interval as something which is between boundaries.On the other side in (Allen, 1995) we find interval temporal expressions in Benthem's meaning denoted by the term duration.The main difference between instants and intervals is that instants have no duration (treated as a feature of a period).
One of the most widely used specification for English to describe temporal information in natural language corpora is TimeML (Saurí, Littman, Knippen, Gaizauskas, Setzer, & Pustejovsky, 2006).It was developed in the context of a workshop TERQAS,2 as a part of the ARDA-funded program AQUAINT3 in a multi-project effort to improve the performance of question answering systems over documents written in natural language (Pustejovsky, Ingria, Saurí, Castano, Littman, Gaizauskas, Setzer, Katz, & Mani, 2005).The aim of this research was to improve accessing of information from text through content rather than keywords.The main problem was the recognition of events and their temporal anchorings.
Events are naturally anchored in time and these temporally grounded events are the source of information from which we can reason about the world changes.Also other entities and their properties change over time, therefore a database of assertions about entities will be incomplete if it does not capture how these entities are temporally updated (Pustejovsky, Ingria, et al., 2005).
Previous works, focused at temporal and event-based reasoning task, were ACL Workshop on Spatial and Temporal Reasoning (2001) and LREC Workshop "Annotation Standards for Temporal Information in Natural Language (2002).Several papers from these workshops were about the time representation and identification (Saurí, Littman, Gaizauskas, Knippen, Setzer, & Pustejovsky, 2006).One of the results created on the basis of workshops' conclusions is TimeML -Markup Language for Temporal and Event Expressions.It is designed to address these issues which remain unresolved during workshops.In article (Pustejovsky, Ingria, et al., 2005).four major problems were introduced: • Time stamping of events • Ordering events with respect to another • Reasoning with contextually underspecified temporal expressions (such as last week, two weeks before) • Reasoning about the persistence of events (how long does an event last) We see that capturing of the temporal information is important in the process of reasoning about the changes described in text documents.This document describes the first stage of the temporal information capturing and focuses at annotating of the temporal expressions in text.

PLIMEX
PLIMEX is the temporal annotation language, suitable to describe temporal expressions in Polish text documents.It is based on TIDES Instruction Manual for the Annotation of Temporal Expressions (Ferro, 2001), which describe TIMEX2 annotation format.TIDES manual is also the core of TIMEX3 annotation format, used in TimeML specification (Saurí et al., 2006).Both documents present how to use the special Standard Generalized Markup Language tags to annotate temporal expressions, by inserting them directly into the text.In section (Annotation Process) we describe our solution of annotating temporal information.We adapted types of temporal expressions from TIMEX3: DATE, TIME, DURATION and SET.These specific types are described in section (Types of Temporal Expressions in PLIMEX).
TimeML was successfully adapted to many languages and one of the most widely used rule-based system (HeidelTime (Strötgen, & Gertz, 2013;Strötgen, Zell, & Gertz, 2013)), which uses TIMEX3 annotation standard, currently supports 11 languages: English, German, Dutch, Vietnamese, Arabic, Spanish, Italian, French, Chinese, Russian, and Croatian.Our research gives the opportunity to create the cross-domain temporal tagger which supports Polish.

Types of Temporal Expression in PLIMEX
In this section we define the TimeML types of temporal expressions, adapted to Polish.All English translations of Polish examples are given in parentheses.The extent of the annotation in text (if needed) is marked with square brackets.

DATE
DATE is a type of temporal expression which denotes a point on a timeline, i.e., a unit of time greater than or equal to the day.The key question is when.
We need also to mention point-temporal expressions with vague boundaries: zima (winter), weekend (weekend).These units can be used like any type of point expression: 3.3.DURATION DURATION, in contrast to DATE, has two points on a timeline associated with it -a start and an end point.A different name used in literature is period (Saquete et al., 2003).The key question is how long.
Sometimes range expressions are also included to this group (Mizobuchi, Sumitomo, Fuketa, & Aoe, 1998), but these expressions can be treated as separate points in time (Mani & Wilson, 2000).For example Smith był tutaj (Smith stayed there): If a specific piece of information, which relates to the calendar, occurs in the temporal expression, then DATE is the right type of annotation.This is true even if the context suggests that this type of temporal expression indicates the duration of an event, e.g.[Cały 1985] przebywał na emigracji ([The entire 1985] he lived in exile).

DURATION or DATE/TIME
The location in time has two forms.The first is a temporal expression which directly indicates a point in a timeline (e.g.July 3, 20:30, Sunday).The signal of a temporal expression, which appears in this case, indicates the relation of the temporal expression to the described situation (a situation occurs at this time), for example at 20.30 on Sunday.
The second form of the localization is associated with the determination of a period which separates the moment of the situation from another point in time.These temporal terms are prepositional phrases.The signal of a temporal expression indicates a different relation than the one included.For example, in the expression za [sześć godzin] będę w domu (in [six hours] I'll be home), the preposition indicates that it took six hours to the point in time associated with the beginning of the given situation.The temporal expression itself in this situation is the DURATION type.This problem can be solved as follows: • If a temporal term answers the question when or how long and the temporal expression answers the question how long, how many (e.g.how many days, months), then we have to deal with DURATION; • If a temporal term answers the question how long or when and temporal expression answers the question when, what or another question unrelated to the quantification, then we have to deal with DATE or TIME. Examples: (21) jak długo?-przez 4 dni (how long?-for 4 days) przez ile? -4 dni (for how many?-4 days) = DURATION (22) kiedy?-za pięć dni (when?-in five days), but: za ile? -pięć dni (in how many (days)?-five days) = DURATION (23) kiedy?-o ósmej (when?-at eight) o której?-ósmej (what (time)?-eight) = TIME (24) jak długo?-od wczoraj (how long?-from yesterday), but: od kiedy?-wczoraj (since when?-yesterday) = DATE

SET
The SET expression is a type of temporal expression, relating to more than one instance of a time unit -either a point or a period.The key question is how often.This type of an expression is named variously, but the most popular name is SET (Mani & Wilson, 2000).Han et al. also distinguished (inside of this type): recurrence -e.g.co poniedziałek (every Monday) rate -e.g.trzy razy na miesiąc (three times per month) Other subtypes are given by Niemi and Koskenniemi (2007): parity expressions -e.g.nieparzyste poniedziałki miesiąca (odd Mondays of the month) ordinal expressions -e.g.co czwarty rok stulecia (every fourth year of the century) containment expressions -e.g.lata z 29 lutego (the years with a 29th of February) Examples -Jan wraca pijany (John comes back drunk): (25) dwa razy w tygodniu (twice a week) (26) co dwa dni (every two days) (27) każdej niedzieli (every Sunday)

Annotation Guidelines
The aim of that guideline is to describe which temporal expressions are annotated, how to determine the semantic meaning of temporal expression and how to identify correctly boundaries and the class of temporal expressions, which occur in text.

What to Annotate
Our purpose is to annotate expressions which refer to time.We define three types of temporal expressions: • obvious and precise • obvious and not precise • not obvious

Obvious and Precise Temporal Expressions
We call temporal expression obvious and precise if it is possible to determine the exact meaning of the expression, for example: (28) W dniu [15.11.1999 r.] doszło do wypadku (On [15.11.1999] there was an accident) (29) Przez [dwa dni] nie mógł mówić (He could not speak for [two days]) These temporal expressions can be assigned to one of the following classes: DATE, TIME, DURATION or SET.

Obvious and Not Precise Temporal Expressions
We call temporal expression obvious and not precise if it is not possible to determine the exact meaning of the expression (fuzzy reference to time) with the obvious type of time quantification (e.g.year, month, day, etc.), as it was underlined in the following example: We annotate temporal expressions which are part of the event name without determining if it refers to named entity or to common expression, for example: (32) Kampania [wrześniowa] była wielką klęską (The [September] Campaign was a huge failure)

Not Obvious Temporal Expressions
We call temporal expression not obvious if we do not know the type of time quantification, for example: (33) W przeszłym okresie byli do tego zdolni (In the past period they were capable of this) These are mainly expressions like bieżący, przeszły, przyszły (current, former, future).For these expressions the new categories were introduced: PRESENT_REF, PAST_REF, FUTURE_REF and DONT_KNOW.The last one is used if it is difficult to determine to which category the expression should be assigned.Later we decided not to annotate temporal expressions which are not obvious and we focused only at the following types: DATE, TIME, DURATION and SET.

What Not to Annotate
We do not annotate temporal expressions referring to holidays, because they are taken under the consideration at another annotation level (the name of an event), for example: (34) Dzień Niepodległości (Independence Day) (35) Wielkanoc (Easter) We decided also not to annotate some temporal expressions which are relational (describe temporal location with respect to event or another point in time (Laskowski, 2003(Laskowski, , 2005)), mainly: • expressions which determine the order of events, e.g.poprzednio, wcześniej, później, następnie (previously, earlier, later, next) • prepositional phrases referring directly to other events, e.g.po wojnie, przed świtem (after the war, before the dawn) • relative anaphoric pronouns, e.g.tego dnia, w tym czasie, w tamtym roku (that day, at this time, in that year)

Exceptions
Relational expression can be part of the temporal expression as the modifier of temporal expression.The modifier is underlined in the following example: (36) Spotkali się [następnego dnia] (They met the [next day]) In another example we do not include underlined expression, because it is not the modifier of temporal expression:

Annotation Extent
This section describes how to identify the extent of the annotation.The possible grammar categories of temporal expression's constituents are defined in Table 1.(Laskowski, 2003(Laskowski, , 2005)) a preposition is treated as a signal and the remaining part of a phrase as a temporal expression (interpreted as in (Saurí et al., 2006)).
The extent of the annotation cannot contain a preposition.The Table 2 with examples is not exhaustive, but shows the most common occurrences of temporal expressions as constituents of prepositional phrases.There are two exceptions where on Thursday czwartek Thursday przez ostatnie 10 lat during the last 10 years ostatnie 10 lat last ten years the preposition can be the part of the temporal expression: • The preposition (underlined) is the integral (and often internal) part of the temporal expression: 4 In Polish literature terms wyrażenie temporalne (temporal expression) and określenie temporalne (temporal designation) are often used interchangeably (Laskowski, 2003(Laskowski, , 2005)).We use the temporal designation to describe the expression containing the full syntactic component of the expression which refers to the time.The temporal expression in TimeML specification (Laskowski, 2003(Laskowski, , 2005) ) is only the part of the temporal designation, which denotes time (without temporal relation signals, mainly prepositions).
• The preposition in the following expressions: co rok, co miesiąc, co czwartek (each year, each month, each Thursday).It is currently the only option to indicate the temporal expression which refers to recurring event (set of times).The class of these temporal expressions is SET.

Modifiers of Temporal Expressions
Temporal expressions contain modifiers at every position in the expression.The Table 3 contains examples of temporal expressions with modifiers as their constituents.We propose the new type of the temporal expression, called RANGE.If the expression includes two temporal expressions denoting the start and the end point, we also annotate the whole extent as RANGE (underlined): (41) W latach [1945]-[1954] w budynku mieścił się sąd (In the years [1945]-[1954] the building housed the court)

Joined Temporal Expressions
In the case of temporal expressions which contain conjunctions, we annotate them in a similar way as constituents of RANGE expressions, defined in TimeML (Saurí et al., 2006).In Example 44 we annotate two separate temporal expressions: • Even if there are commas in the complex temporal expression, we annotate it if a structured hierarchy of the description is preserved and it refers to a single point in a timeline: (47) [24 września, godzina 19:00] ([September 24, 19:00])

Normalization Guidelines 5.1. The Interpretation of Temporal Expressions
The purpose of the interpretation of temporal expressions is to achieve the contextual meaning of those expressions.From the information processing point of view it is a case of normalization of these temporal expressions.They need to be given in an unambiguous format that can be understood by a computer.A solution that takes both the local and the global semantics of an expression into account needs to be adopted.The local semantics is the result of an analysis of a temporal expression without taking the context of the document into account.Only tokens building the temporal expression and tokens forming a sentence in which the expression occurs are taken into account.One of the proposed specifications which takes the semantics of this type into account is the LTIMEX (Mazur, 2012).
In the case of a determination of the global semantics, the context of the document and, in some cases, other sources of knowledge (e.g.dates of holidays and well known events) are take into account.This level of semantic meaning is defined in the TimeML specification for both TIMEX2 and TIMEX3.
We need to adopt the normalization process of temporal expressions which takes into account the interpretation at both the local and the global levels.First, the local semantics is determined by an analysis of the structure of a temporal expression and of a local context.Then the expression is analyzed in the context of the entire document, taking into account other sources of knowledge (geolocation, calendar etc.).

The Local and The Global Semantics
The semantic meaning of a temporal expression is determined independently from the information concerning the context of the expression.The local semantics reflect the meaning of the individual tokens forming the expression.Sometimes the local and the global semantics are the same, especially for unambiguous points in time: The local semantics are always the same, regardless of the broader context.For example lipiec (July) always refers to the seventh month of the year and wczoraj (yesterday) always refers to the day before the current date.The global semantics depend on the context and, for that reason, the interpretation of these examples may differ in different contexts.The value of such an expression will be different, depending on the context.In the case of vague expressions, the interpretation process assumes the determination of a maximally precise meaning of the temporal expression.The global semantics must take into account the local information and all of the information that can be deduced from an analysis of the document.For example, at the local semantics level, expression 7 rano w listopadowy poranek (7 am on a morning in November) specifies a month and an hour.To determine the global semantics, other information (different than the hour and the month, e.g. the day and the year) must occur in a broader context and refers to the analyzed expression.Sometimes that information does not exist, and in such cases the local and the global semantics are the same.The components of the reference date (date of a creation of the document or the date in the text) do not always specify a date directly.Consider the following sentence: (52) Wyjeżdżamy do Nowego Jorku w [lutym] (We go to New York in [February]) If this sentence is derived from a document created in April 2014, the temporal expression in this sentence refers to February 2015.
Finding the right reference date is important in the normalization process of temporal expressions, for example: • Reference to the date of the utterance, depending on the document's source.
It can be the document creation time, the date of the publication or the date of the last modification (especially in the case of newspaper articles), but also the date of sending an e-mail or a date of the communicator message, etc.In general, all these dates are mentioned in TimeML specification (TIMEX3) as DCT (document creation time).• Reference to another temporal expression in the text (temporal expressions that directly relate to other temporal expressions used in the text), for example: (53) Zbigniew urodził się 13 grudnia 1942 roku.Trzy dni wcześniej zmarł jego ojciec.(Zbigniew was born on December 13, 1942.His father had died three days earlier.) It is necessary to determine the correct meaning of the temporal expression A in the first sentence and to refer the local semantics of the temporal expression B in the second sentence to the date set by A, in order to determine the global semantics B. • Reference to the event described in the text (temporal expressions that refer to elements described in the TimeML specification as descriptions of events), for example: (54) [Trzy dni] po wypadku Zosia odzyskała przytomność (Zosia regained consciousness [three days] after the accident) In this example, the term wypadek (accident) is an event description and refers to the change of state in reality.As long as it is not possible to directly determine when the accident occured, it is not possible to determine the global semantics in the temporal expression trzy dni (three days).However, these types of expressions are not studied in this article.

Representation of The Global Semantics
In this article the global semantics are adopted, as e.g. in the TimeML specification, in order to describe temporal expressions (TIMEX2 and TIMEX3, both of which include the same normalization format).The attribute VAL (Value) is important in the normalization context.It specifies the normalized value of a temporal expression.VAL is a textual representation of a temporal expression, which is assigned using guidelines described in the ISO-8601 standard.In order to describe it, the first letters of the English (but not only English) names that specify time (the size of the letters is important) are used (e.g.Year, Month, Day, hour, minute, BC -before Christ, AD -Anno Domini, Time, MOrning, Half, Quarter, SUmmer, Period).
Each temporal expression can be determined with respect to the proper type, using the coding proposed in the ISO-8601 standard, for example: Both standards (TIMEX2 and TIMEX3) expand the ISO-8601 standard, such that they have the ability to save a normalized form of vague expressions, using an X sign to identify those elements of notation which are unknown.For example, a sunny day in June can be written as XXXX-06-XX.This standard was also expanded for new time tokens, such as the dekada (DEcade), wiek (CEntury) and tysiąclecie (ML -millennium).Normalization in the TimeML standard involves the determination of the global semantics for a temporal expression.There is no indirect form of notation of the local semantics.The introduction of the intermediate stage of the global semantics is reasonable from the normalization point of view.It can be seen in systems that recognize temporal expressions in the English language (e.g.HeidelTime,5 (Strötgen & Gertz, 2013)), which often use their own intermediate standard of normalization.In this article, the LTIMEX standard proposed by Mazur (2012) is adapted.It can be used to determine the local semantics of temporal expressions.
6. Annotation Process 6.1.Inforex and KPWr Inforex (Marcińczuk, Kocoń, & Broda, 2012) is a web-based system designed for managing and annotating text corpora at the semantic level including annotation of Named Entities (NE), anaphora, Word Sense Disambiguation (WSD) and relations between named entities.The system also supports manual text clean-up and automatic text pre-processing including text segmentation, morphosyntactic analysis and word selection for WSD annotation.
KPWr -Korpus Języka Polskiego Politechniki Wrocławskiej (Polish Corpus of Wroclaw University of Technology) (Broda, Marcińczuk, Maziarz, Radziszewski, & Wardyński, 2012;Radziszewski, Maziarz, & Wieczorek, 2012) is a corpus of written and spoken documents available on the Creative Commons license.Documents are divided into 14 categories and annotated at the level of shallow syntactic chunks, selected predicate-argument relations, named entities, relations between named entities, anaphora relations, word senses, temporal expressions and events.

Annotation Statistics
Currently all documents from KPWr (1637) were annotated with temporal expressions.We compared statistics after the annotation process with other publicly available corpora annotated using TIMEX3 annotation format.The evaluation data was prepared to Semantic Evaluation 2013 Workshop6 evaluation tasks.One of the tasks proposed was TempEval-3. 7The aim of that task was to advance research on temporal information processing which could increase the quality of applications like question answering, textual entailment, text summarization, etc.It was a three-part task covering event, temporal expression and temporal relation extraction, using the complete TimeML specification for English and Spanish.The result of the evaluation is presented in (UzZaman, Llorens, Allen, Derczynski, Verhagen, & Pustejovsky, 2012).The Table 5 shows the result of analysis of the dataset provided by SemEval 2013 organizers, compared to KPWr data.The full description of these corpora is available in article (UzZaman et al., 2012).
The results of corpora analysis shows that the distribution of temporal expressions in corpora for both English and Spanish is similar to KPWr.It is worth to mention, that T3Silver corpus is a machine-annotated and automatically merged dataset based on outputs of multiple systems (UzZaman et al., 2012), whereas each document in KPWr was annotated manually with temporal expressions.

Agreement
The inter-annotator agreement was measured on randomly selected 100 documents from the Corpus of Wrocław University of Technology called KPWr.We used the positive specific agreement (Hripcsak & Rothschild, 2005) as it was measured for T3Platinum corpus (UzZaman et al., 2012) and two domain experts to annotate the subset of 100 documents from KPWr.We calculate the value of positive specific agreement (PSA) for each category.The results are presented in Table 6.
According to (UzZaman et al., 2012) the best quality of data was achieved for TempEval-3 platinum corpus (T3Platinum) and it was annotated and reviewed by the organizers.Every file was annotated independently by at least two expert annotators.The result of overall T3Platinum inter-annotator positive specific agreement (PSA) at the level of annotating of temporal expressions with types was 0.88.In our case for 100 randomly selected documents the PSA value achieved was 86.25 (annotating using PLIMEX 1.0 specification).We manage to achieve positive specific agreement at the level of at least 0.9 in PLIMEX 2.0.

Conclusions and Further Work
In this paper we described the PLIMEX 1.0 specification to annotate Polish text documents with the temporal expressions.We performed several comparison methods to other corpora annotated using similar temporal information description methods.We evaluate the quality of the specification by annotating of the whole KPWr corpus (1637 documents) and we calculated the positive specific agreement on the randomly selected subset of 100 documents annotated by two domain experts, achieving very promising result.The next step is to construct the machine learning system which will use KPWr as the evaluation data (a resource of both training and test dataset).At the same time we plan to construct a rule-based system to normalize the recognized temporal expressions and evaluate the result using normalized values added by domain experts to KPWr.

( 37 )
Spotkali się [dwie godziny] później (They met [two hours] later) 4.4.Control Questions 1. Are you able to determine the location in time or duration of an event without referring to other event?NO -do not annotate YES -go to 2. 2. Does the value of temporal expressions can be precisely determined?NO -go to 3. YES -we annotate as one of the following: DATE, TIME, DURATION, SET 3. Does the expression contain obvious type of time quantification (time unit)?NO -do not annotate YES -annotate

( 48 )
Pierwsza wojna światowa wybuchła [28 lipca 1914 roku] (The First World War broke out on [July 28, 1914]) (49) Przysłał nam swoje zdjęcia z [lat dziewięćdziesiątych dwudziestego wieku] (He sent us his pictures from [the 1990's]) Expressions which are context-dependent do not contain enough information in order to obtain the global semantics.Only the contextual analysis of these expressions makes the definition of an exact point in time: (50) Otwarcie gospodarcze stało się faktem na początku [lat dziewięćdziesiątych] (The opening up of the economy became a reality at the beginning of [the 90's]) (51) Byłem [wczoraj] w kinie na ciekawym filmie ([Yesterday] I was at the cinema for an interesting film)

Figure 4 :
Figure 4: Go to the Annotator bookmark and select one of the documents and in the left menu select and set the display of TIMEX3 tokens (View configuration -> click on drop-down menu TimeML -> select right for TIMEX3 -> press the Apply button at the bottom of the menu).

Figure 5 :
Figure 5: Highlight the desired string using the cursor and select proper type of the temporal expression (t3_date, t3_time, t3_duration, t3_set) in the Annotation pad (in the right menu).

Figure 6 :
Figure 6: Go to the Annotation lemmas bookmark and select t3_date, t3_time, t3_duration and t3_set in the right menu and then press the Apply button at the bottom of the menu.

Figure 7 :
Figure 7: Enter, in the proper spaces, normalized temporal expressions.

Figure 8 :
Figure 8: Use flags at the top of the page to determine the status of the process of the document annotation.

Table 1 :
Possible grammatical categories of temporal expression's constituents.

Table 2 :
The most common occurrences of temporal expressions as constituents of prepositional phrases

Table 3 :
Examples of temporal expressions with modifiers as their constituents

Table 4 :
calendar date: YYYY-MM-DD, • week of the year: YYYY-Wxx, • hour: hh:mm:ss, • date and hour: YYYY-MM-DDThh:mm:ss, • duration (the number of weeks): PxW.The table below shows example values of the VAL attribute and the semantic meaning (Mazur, 2012): Examples of VAL attributes and the semantic meaning of temporal expressions.

Table 5 :
The result of analysis of the dataset provided by SemEval 2013 organizers, compared to KPWr data.ALL is the sum of DATEs, TIMEs, DURATIONs and SETs in each corpus.SUM is the number of DATEs, TIMEs, DURATIONs and SETs in all corpora except KPWr.% means what part of all temporal expressions covers particular type.Languages: ENglish, SPanish, PL -Polish.

Table 6 :
The value of positive specific agreement (PSA) calculated on the subset of 100 documents from KPWr, annotated independently by two domain experts using PLIMEX 1.0 guidelines. 1 and 2 means all annotations in which annotators 1 and 2 agreed.Only 1 is the number of annotations made only by annotator 1 and only 2 -the number of annotations made only by annotator 2.