DOI: https://doi.org/10.11649/cs.2015.020

Temporal Expressions in Polish Corpus KPWr

Jan Kocoń, Michał Marcińczuk, Marcin Oleksy, Tomasz Bernaś, Michał Wolski

Abstract


Temporal Expressions in Polish Corpus KPWr

This article presents the result of the recent research in the interpretation of Polish expressions that refer to time. These expressions are the source of information when something happens, how often something occurs or how long something lasts. Temporal information, which can be extracted from text automatically, plays significant role in many information extraction systems, such as question answering, discourse analysis, event recognition and many more. We prepared PLIMEX — a broad description of Polish temporal expressions with annotation guidelines, based on the state-of-the-art solutions for English, mainly TimeML specification. We also adapted the solution to capture the local semantics of temporal expressions, called LTIMEX. Temporal description also supports further event identification and extends event description model, focusing at anchoring events in time, ordering events and reasoning about the persistence of events. We prepared the specification, which is designed to address these issues and we annotated all documents in Polish Corpus of Wroclaw University of Technology (KPWr) using our annotation guidelines.


Keywords


PLIMEX; TIMEX; temporal expressions; TimeML; machine learning; natural language processing; information extraction

Full Text:

PDF (in English)

References


Allen, J. (1995). Natural language understanding (2nd ed.). Redwood City, CA: Benjamin Cummings Publishing Co., Inc.

Andersen, P. M., Hayes, P. J., Huettner, A. K., Schmandt, L. M., Nirenburg, I. B., & Weinstein, S. P. (1992). Automatic extraction of facts from press releases to generate news stories. In Processing of the Third Conference on Applied Natural Language Processing (pp. 170–177). Stroudsburg, PA: Association for Computational Linguistics. http://doi.org/10.3115/974499.974531

Van Benthem, J. F. A. K. (1983). The logic of time: A model-theoretic investigation into the varieties of temporal ontology and temporal discourse. Dordrecht: Springer Netherlands. http://link.springer.com/10.1007/978-94-010-9868-7

Broda, B., Marcińczuk, M., Maziarz, M., Radziszewski, A., & Wardyński, A. (2012, May). KPWr: Towards a free corpus of Polish. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, & S. Piperidis (Eds.), Proceedings of the Eight International Conference on Language Resources and Evaluation (lREC’12). Istanbul, Turkey: European Language Resources Association (ELRA).

Busemann, S., Declerck, T., Diagne, A. K., Dini, L., Klein, J., & Schmeier, S. (1997). Natural language dialogue service for appointment scheduling agents. In Proceedings of the Fifth Conference on Applied Natural Language Processing (pp. 25–32). Stroudsburg, PA: Association for Computational Linguistics. http://doi.org/10.3115/974557.974563

Daniel, N., Radev, D., & Allison, T. (2003). Sub-event based multi-document summarization. In Proceedings of the HLT-NAACL 03 on text summarization workshop (Vol. 5, pp. 9–16). Stroudsburg, PA: Association for Computational Linguistics. http://doi.org/10.3115/1119467.1119469

Ferro, L. (2001). Instruction manual for the annotation of temporal expressions.

Filatova, E. & Hovy, E. (2001). Assigning time-stamps to event-clauses. In Proceedings of the workshop on temporal and spatial information processing (Vol. 13, pp. 13:1–13:8). Stroudsburg, PA: Association for Computational Linguistics. http://doi.org/10.3115/1118238.1118250

Han, B., Gates, D., & Levin, L. (2006). Understanding temporal expressions in emails. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of The Association of Computational Linguistics (pp. 136–143). Stroudsburg, PA: Association for Computational Linguistics. http://doi.org/10.3115/1220835.1220853

Hripcsak, G. & Rothschild, A. S. (2005). Agreement, the f-measure, and reliability in information retrieval. Journal of the American Medical Informatics Association, 12(3), 296–298. http://doi.org/10.1197/jamia.M1733

Laskowski, R. (2003). Wyrażenia przyimkowe o funkcji temporalnej w języku polskim. Studia Slavica Oldenburgensia, 11, 193–226.

Laskowski, R. (2005). Temporalne frazy przyimkowe o funkcji prospektywnej i retrospektywnej. In M. Grochowski (Ed.), Przysłówki i przyimki: Studia ze składni i semantyki języka polskiego (pp. 209–225). Toruń: Wydawnictwo Uniwersytetu Mikołaja Kopernika.

Llorens, H., Saquete, E., & Navarro-Colorado, B. (2010). TimeML events recognition and classification: Learning CRF models with semantic roles. In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 725–733). Stroudsburg, PA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1873781.1873863

Mani, I. & Wilson, G. (2000). Robust temporal processing of news. In Proceedings of the 38th annual meeting on Association for Computational Linguistics (pp. 69–76). Stroudsburg, PA: Association for Computational Linguistics. http://doi.org/10.3115/1075218.1075228

Marcińczuk, M. & Kocoń, J. (2013, August). Recognition of named entities boundaries in Polish texts. In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing (pp. 94–99). Sofia: Association for Computational Linguistics. http://www.aclweb.org/anthology/W13-2414

Marcińczuk, M., Kocoń, J., & Broda, B. (2012, May). Inforex — a web-based tool for text corpus management and semantic annotation. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, & S. Piperidis (Eds.), Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). Istanbul: European Language Resources Association (ELRA).

Marcińczuk, M., Kocoń, J., & Janicki, M. (2013). Liner2 — a customizable framework for proper names recognition for Polish. In R. Bembenik, L. Skonieczny, H. Rybiński, M. Kryszkiewicz, & M. Niezgódka (Eds.), Intelligent tools for building a scientific information platform (pp. 231–253). Berlin: Springer. (Studies in Computational Intelligence, 467). http://link.springer.com/10.1007/978-3-642-35647-6_17

Mazur, P. (2012). Broad-coverage rule-based processing of temporal expressions (Unpublished doctoral dissertation). Politechnika Wrocławska, Wrocław.

Mizobuchi, S., Sumitomo, T., Fuketa, M., & Aoe, J.-I. (1998, October). A method for understanding time expressions. In Systems, man, and cybernetics, 1998. 1998 IEEE International Conference on (Vol. 2, pp. 1151–1155). http://doi.org/10.1109/ICSMC.1998.727858

Negri, M. & Marseglia, L. (2005). Recognition and normalization of time expressions: ITC-irst at TERN 2004.

Niemi, J. & Koskenniemi, K. (2007). Representing calendar expressions with finite-state transducers that bracket periods of time on a hierarchical timeline. In J. Nivre, H.-J. Kaalep, K. Muischnek, & M. Koit (Eds.), Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007 (pp. 355–362). Tartu, Estonia.

Pustejovsky, J., Ingria, B., Saurí, R., Castano, J., Littman, J., Gaizauskas, R., Setzer, A., Katz, G, & Mani, I. (2005). The specification language TimeML. In I. Mani (Ed.), The language of time: A reader (pp. 545–557). Oxford: Oxford University Press.

Pustejovsky, J., Knippen, R., Littman, J., & Saurí, R. (2005). Temporal and event information in natural language text. Language Resources and Evaluation, 39(2–3), 123–164. http://doi.org/10.1007/s10579-005-7882-7

Radziszewski, A., Maziarz, M., & Wieczorek, J. (2012). Shallow syntactic annotation in the Corpus of Wrocław University of Technology. Cognitive Studies | Études cognitives, 12, 129–147.

Saquete, E., Muñoz, R., & Martínez-Barco, P. (2003). TERSEO: Temporal Expression Resolution System Applied to Event Ordering. In V. Matoušek & P. Mautner (Eds.), Text, speech and dialogue (pp. 220–228). Berlin: Springer. (Lecture Notes in Computer Science, 2807). http://dx.doi.org/10.1007/978 -3-540-39398-6_31

Saurí, R., Littman, J., Knippen, B., Gaizauskas, R., Setzer, A., & Pustejovsky, J. (2006). TimeML annotation guidelines, version 1.2.1.

Schilder, F. (2004). Extracting meaning from temporal nouns and temporal prepositions. ACM Transactions on Asian Language Information Processing (TALIP), 3(1), 33–50. http://doi.org/10.1145/1017068.1017071

Schilder, F. & Habel, C. (2001). From temporal expressions to temporal information: Semantic tagging of news messages. In Proceedings of the ACL-2001 Workshop on Temporal and Spatial Information Processing, ACL-2001. Toulouse (Vol. 13, pp. 65–72). Stroudsburg, PA: Association for Computational Linguistics. http://doi.org/10.3115/1118238.1118247

Smith, C. (2009). Temporal structures in discourse. In R. P. Meier, H. Aristar-Dry, & E. Destruel (Eds.), Text, time, and context (pp. 285–302). Dordrecht: Springer Netherlands. (Studies in Linguistics and Philosophy, 87). http://link.springer.com/10.1007/978-90-481-2617-0_12

Strötgen, J., Zell, J., & Gertz, M. (2013, June). HeidelTime: Tuning English and developing Spanish resources for TempEval-3. In Second Joint Conference on Lexical and Computational Semantics (SEM) (Vol. 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 15–19). Atlanta, Georgia: Association for Computational Linguistics. http://www.aclweb.org/anthology/S13-2003

Strötgen, J. & Gertz, M. (2013). Multilingual and cross-domain temporal tagging. Language Resources and Evaluation, 47(2), 269–298. http://doi.org/10.1007/s10579-012-9179-y

UzZaman, N., Llorens, H., Allen, J. F., Derczynski, L., Verhagen, M., & Pustejovsky, J. (2012). TempEval-3: Evaluating events, time expressions, and temporal relations. CoRR, abs/1206.5333. http://arxiv.org/abs/1206.5333

Vicente-Díez, M. T., Samy, D., & Martínez, P. (2008). An empirical approach to a preliminary successful identification and resolution of temporal expressions in Spanish news corpora. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC) (p. 2153–2158).




Copyright (c) 2015 Jan Kocoń, Michał Marcińczuk, Marcin Oleksy, Tomasz Bernaś, Michał Wolski

License URL: http://creativecommons.org/licenses/by/3.0/pl/