DOI: https://doi.org/10.11649/cs.2015.018

Towards an event annotated corpus of Polish

Michał Marcińczuk, Marcin Oleksy, Tomasz Bernaś, Jan Kocoń, Michał Wolski

Abstract


Towards an event annotated corpus of Polish

The paper presents a typology of events built on the basis of TimeML specification adapted to Polish language. Some changes were introduced to the definition of the event categories and a motivation for event categorization was formulated. The event annotation task is presented on two levels – ontology level (language independent) and text mentions (language dependant). The various types of event mentions in Polish text are discussed. A procedure for annotation of event mentions in Polish texts is presented and evaluated. In the evaluation a randomly selected set of documents from the Corpus of Wrocław University of Technology (called KPWr) was annotated by two linguists and the annotator agreement was calculated. The evaluation was done in two iterations. After the first evaluation we revised and improved the annotation procedure. The second evaluation showed a significant improvement of the agreement between annotators. The current work was focused on annotation and categorisation of event mentions in text. The future work will be focused on description of event with a set of attributes, arguments and relations.


Keywords


information extraction; event recognition; corpus annotation

Full Text:

PDF (in English)

References


Agerri, R., Agirre, E., Aldabe, I., Altuna, B., Beloki, Z., Laparra, E., de Lacalle, M. L., Rigau, G., Soroa, A., and Urizar, R. (2014). Newsreader project. In 30th Conference of the Spanish Society for Natural Language Processing (SEPLN).

Apresjan, J. D. (2000). Semantyka leksykalna: Synonimiczne środki języka. (Z. Kozłowska, Z. & A. Markowski, Trans.). Warszawa.

Bach, E. (1986). The algebra of events. Linguistics and Philosophy, 9, 5–16.

Bittar, A. (2010). Building a TimeBank for French: A Reference Corpus Annotated According to the ISO-TimeML Standard (Unpublished Phd thesis). Université Paris Diderot.

Broda, B., Marcińczuk, M., Maziarz, M., Radziszewski, A., & Wardyński, A. (2012). KPWr: Towards a free corpus of Polish. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, & S. Piperidis (Eds.), Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). Istanbul: European Language Resources Association (ELRA).

Caselli, T., Bartalesi Lenzi, V., Sprugnoli, R., Pianta, E., & Prodanof, I. (2011). Annotating events, temporal expressions and relations in Italian: The It-TimeML Experience for the Ita-TimeBank. In Proceedings of the 5th Linguistic Annotation Workshop, LAW V ’11 (pp. 143–151). Stroudsburg, PA, USA: Association for Computational Linguistics.

Comrie, B. (1989). Aspect: An introduction to the study of verbal aspect and related problems. Cambridge: Cambridge University Press.

Hripcsak, G. & Rothschild, A. S. (2005). Technical brief: Agreement, the f-measure, and reliability in information retrieval. Journal of the American Medical Informatics Association, 12(3), 296–298. http://dx.doi.org/10.1197/jamia.M1733

Jędrzejko, E. (1993). Nominalizacje w systemie i w tekstach współczesnej polszczyzny. Katowice: Uniwersytet Śląski. (Prace naukowe Uniwersytetu Śląskiego w Katowicach, 1335)

Jędrzejko, E. (2011). The problematics of describing periphrastic predication: Between word and image. Studies in Polish Linguistics, 6, 27–44.

Jespersen, O. (1965). A modern English grammar – on historical principles (Pt. 6: Morphology). London: Read Books.

Jodłowski, S. (1976). Podstawy polskiej składni. Warszawa: PWN.

Kenny, A. (1963). Actions, Emotions and Will. London: Routledge & Kegan Paul.

Kotsyba, N. (2014). How light are aspectual meanings? A study of the relation between light verbs and lexical aspects in Ukrainian. In K. Robering (Ed.), Events, arguments, and aspects: Topics in the semantics of verbs (pp. 261–299). Amsterdam: John Benjamins Publishing Company. (Studies in Language Companion Series, 152). Retrieved from https://benjamins.com/catalog/slcs.152.07kot

Langacker, R. W. (2010). Control and the mind/body duality: Knowing vs. effecting. In E. Tabakowska, M. Choiński, & Ł. Wiraszka (Eds.), Cognitive linguistics in action: From theory to application and back (pp. 165–207). Berlin: Mouton de Gruyter. (Applications of Cognitive Linguistics, 14)

Laskowski, R. (1998). Kategorie morfologiczne języka polskiego – charakterystyka funkcjonalna. In R. Grzegorczykowa, R. Laskowski, & H. Wróbel (Eds.), Gramatyka współczesnego języka polskiego: Morfologia. Warszawa: PWN.

Lyons, J. (1977). Semantics (Vol. 1). Cambridge: Cambridge University Press.

Marcińczuk, M., Kocoń, J., & Broda, B. (2012). Inforex – a web-based tool for text corpus management and semantic annotation. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, S. Piperidis (Eds.), Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). Istanbul: European Language Resources Association (ELRA).

Maybury, M. T. (1995). Generating summaries from event data. Information Processing & Management, 31(5), 735–751. http://dx.doi.org/10.1016/0306-4573(95)00025-C

Mourelatos, A. P. D. (1978). Events, processes, and states. Linguistics and Philosophy, 2(3), 415–434. http://dx.doi.org/10.1007/BF00149015

Pease, A. (2011). Ontology: A practical guide. Angwin, CA: Articulate Software Press.

Piasecki, M., Szpakowicz, S., & Broda, B. (2009). A wordnet from the ground up. Wrocław: Oficyna Wydawnicza Politechniki Wrocławskiej.

Radziszewski, A. (2013). A tiered CRF tagger for Polish. In H. Rybiński, M. Kryszkiewicz, M. Niezgódka, R. Bembenik, & Ł. Skonieczny (Eds.), Intelligent tools for building a scientific information platform: Advanced architectures and solutions. Berlin: Springer Verlag. Retrieved from http://link.springer.com/10.1007/978-3-642-35647-6_16

Ryle, G. (1949). The Concept of Mind. London: Barnes and Nobles.

Saurí, R., Batiukova, O., & Pustejovsky, J. (n.d.). Annotating Events in Spanish TimeML Annotation Guidelines.

Saurí, R. & Pustejovsky, J. (n.d.). Annotating Events in Catalan. TimeML Annotation Guidelines.

Saurí, R., Littman, J., Knippen, B., Gaizauskas, R., Setzer, A., & Pustejovsky, J. (2006). TimeML Annotation Guidelines, Version 1.2.1.

Seibt, J. (2004). Process theories: Crossdisciplinary studies in dynamic categories. Studies in Philosophy and Religion. Dordrecht: Springer Netherlands.

Topolińska, Z. (1984). Składnia grupy imiennej. In Topolińska, Z. (Ed.) Gramatyka współczesnego języka polskiego (pp. 301–384). Warszawa.

van Erp, M., Fokkens, A., & Vossen, P. (2014). Finding stories in 1,784,532 events: Scaling up computational models of narrative. In Workshop on Computational Models of Narrative (CMN’14), Quebec City, Canada, July 31 – August 2.

Vendler, Z. (1957). Verbs and times. Philosophical Review, 66(2), 143–160. http://dx.doi.org/10.2307/2182371

Vossen, P., Rigau, G., Serafini, L., Stouten, P., Irving, F., Van Hage, W. (2014). NewsReader: Recording history from daily news streams. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), Reykjavik, Iceland, May 26--31.

Zolotova, G. A., Onipenko, N. K., & Sidorova, M. I. (1999). Kommunikativnaia grammatika russkogo jazyka. Moskva: RAN.




Copyright (c) 2015 Michał Marcińczuk, Marcin Oleksy, Tomasz Bernaś, Jan Kocoń, Michał Wolski

License URL: http://creativecommons.org/licenses/by/3.0/pl/