DOI: https://doi.org/10.11649/cs.1717

Persistent semantic identity in WordNet

Eric Kafe

Abstract


Persistent semantic identity in WordNet

Although rarely studied, the persistence of semantic identity in the WordNet lexical database is crucial for the interoperability of all the resources that use WordNet data. The present study investigates the stability of the two primary entities of the WordNet database (the word senses and the synonym sets), by following their respective identifiers (the sense keys and the synset offsets) across all the versions released between 1995 and 2012, while also considering "drifts" of identical definitions and semantic relations. Contrary to expectations, 94.4% of the WordNet 1.5 synsets still persisted in the latest 2012 version, compared to only 89.1% of the corresponding sense keys. Meanwhile, the splits and merges between synonym sets remained few and simple. These results are presented in tables that allow to estimate the lexicographic effort needed for updating WordNet-based resources to newer WordNet versions. We discuss the specific challenges faced by both the dominant synset-based mapping paradigm (a moderate amount of split synsets), and the recommended sense key-based approach (very few identity violations), and conclude that stable synset identifiers are viable, but need to be complemented by stable sense keys in order to adequately handle the split synonym sets.

 

Trwała tożsamość semantyczna w WordNecie

Chociaż rzadko badana, trwałość tożsamości semantycznej w leksykalnej bazie danych WordNet ma kluczowe znaczenie dla interoperacyjności wszystkich zasobów korzystających z danych WordNetowych. W niniejszej pracy zbadano stabilność dwóch podstawowych elementów bazy danych WordNet (jednostek leksykalnych i synsetów – zbiorów synonimicznych jednostek leksykalnych), poprzez prześledzenie ich identyfikatorów (tj. identyfikatorów jednostek i identyfikatorów synsetów) we wszystkich wersjach wydanych w latach 1995-2012. Wzięto również pod uwagę przesunięcia identycznych definicji i relacji semantycznych. Wbrew oczekiwaniom, 94,4% synsetów WordNetu 1.5 zachowało się w najnowszej wersji z 2012 r., w porównaniu do 89,1% odpowiadających im identyfikatorów jednostek. Tymczasem podziały i połączenia pomiędzy synsetami pozostały proste i nieliczne. Wyniki te przedstawiono w tabelach, które pozwalają oszacować wysiłek leksykograficzny potrzebny do aktualizacji zasobów opartych o WordNet do nowszych wersji WordNetu. Omawiamy konkretne wyzwania, przed którymi stoi zarówno dominujący paradygmat rzutowania synsetów (umiarkowana liczba podzielonych synsetów), jak i zalecane podejście oparte na identyfikatorach jednostek (bardzo niewiele naruszeń tożsamości) i stwierdzamy, że można stworzyć stabilne identyfikatory synsetów, ale muszą one iść w parze ze stabilnymi identyfikatorami jednostek, aby odpowiednio zająć się podzielonymi synsetami.


Keywords


wordnets; semantic identifiers; sense keys; key violations; synsets; mappings

Full Text:

PDF (in English)

References


Bond, F. (2017). Discussion notes. In J. P. McCrae, F. Bond, P. Buitelaar, P. Cimiano, T. Declerck, J. Gracia, I. Kernerman, E. Montiel Ponsoda, N. Ordan, & M. Piasecki (Eds.), Proceedings of the LDK 2017 Workshops: 1st Workshop on the OntoLex Model (OntoLex-2017), Shared Task on Translation Inference Across Dictionaries & Challenges for Wordnets co-located with 1st Conference on Language, Data and Knowledge (LDK 2017). Retrieved August 31, 2017, from http://ceur-ws.org/Vol-1899/

Bond, F., & Paik, K. (2012). A survey of wordnets and their licenses. In P. Vossen & C. Fellbaum (Eds.), Proceedings of the 6th Global WordNet Conference, Matsue, Japan. Brno: Tribun EU.

Bond, F., Fellbaum, C., Hsieh, S.-K., Huang, C.-R., Pease, A., & Vossen, P. (2014). A multilingual lexico-semantic database and ontology. In P. Buitelaar & P. Cimiano (Eds.), Towards the Multilingual Semantic Web (pp. 243-258). Springer.

Chen, P. (1976). The entity relationship model - toward a unified view of data. ACM Transactions on Database Systems, 1(1), 312-339. https://doi.org/10.1145/320434.320440

Cohen, S. M. (2016). Aristotle's metaphysics. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University. Retrieved from https://plato.stanford.edu/entries/aristotle-metaphysics/

Cross, R. (2014). Medieval theories of haecceity. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University. Retrieved from https://plato.stanford.edu/entries/medieval-haecceity/

Daudé, J., Padró, L., & Rigau, G. (2001). A complete wn1.5 to wn1.6 mapping. In Proceedings of the NAACL Workshop "WordNet and Other Lexical Resources: Applications, Extensions and Customizations" (NAACL'2001). Pittsburg, PA.

Dziob, A., Piasecki, M., Maziarz, M., Wieczorek, J., & Dobrowolska-Pigoń, M. (2017). Towards revised system of verb wordnet relations for Polish. In J. P. McCrae, F. Bond, P. Buitelaar, P. Cimiano, T. Declerck, J. Gracia, I. Kernerman, E. Montiel Ponsoda, N. Ordan, & M. Piasecki (Eds.), Proceedings of the LDK 2017 Workshops: 1st Workshop on the OntoLex Model (OntoLex-2017), Shared Task on Translation Inference Across Dictionaries & Challenges for Wordnets co-located with 1st Conference on Language, Data and Knowledge (LDK 2017) (pp. 174-187). Retrieved August 31, 2017, from http://ceur-ws.org/Vol-1899/

Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.

Gonzalez-Agirre, A., Laparra, E., & Rigau, G. (2012). Multilingual central repository version 3.0: Upgrading a very large lexical knowledge base. In P. Vossen & C. Fellbaum (Eds.), Proceedings of the 6th Global WordNet Conference, Matsue, Japan. Brno: Tribun EU.

GWA. (2017). ili-map-pwn31.tab. In Collaborative Inter-Lingual Index (CILI). GitHub. Retrieved April 15, 2017, from https://www.github.com/globalwordnet/ili

Kafe, E. (2012). Wordnet mapping. In HyperDic hyper-dictionary. MegaDoc. Retrieved April 15, 2017, from http://www.hyperdic.net/en/doc/mapping

Kafe, E. (2017a). How stable are WordNet synsets? In J. P. McCrae, F. Bond, P. Buitelaar, P. Cimiano, T.Declerck, J. Gracia, I. Kernerman, E. Montiel-Ponsoda, N. Ordan, & M. Piasecki (Eds.), Proceedings of the LDK 2017 Workshops: 1st Workshop on the OntoLex Model (OntoLex-2017), Shared Task on Translation Inference Across Dictionaries & Challenges for Wordnets co-located with 1st Conference on Language, Data and Knowledge (LDK 2017) (pp. 113-124). Retrieved August 31, 2017, from http://ceur-ws.org/Vol-1899/

Kafe, E. (2017b). Sense key index for inter-operability between wordnet-related projects. In SKI. GitHub, Retrieved April 25, 2017, from https://www.github.com/ekaf/ski

Kahusk, N., & Vider, K. (2017). The revision history of Estonian Wordnet. In J. P. McCrae, F. Bond, P. Buitelaar, P. Cimiano, T. Declerck, J. Gracia, I. Kernerman, E. Montiel Ponsoda, N. Ordan, & M. Piasecki (Eds.), Proceedings of the LDK 2017 Workshops: 1st Workshop on the OntoLex Model (OntoLex-2017), Shared Task on Translation Inference Across Dictionaries & Challenges for Wordnets co-located with 1st Conference on Language, Data and Knowledge (LDK 2017) (pp. 164-173). Retrieved August 31, 2017, from http://ceur-ws.org/Vol-1899/

Navigli, R. & Ponzetto, S. P. (2010). Babelnet: Building a very large multilingual semantic network. In ACL'10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 11-16 July 2010 (pp. 216-225). Stroudsburg, PA: Association for Computational Linguistics.

Niles, I., & Pease, A. (2003). Linking lexicons and ontologies: Mapping wordnet to the suggested upper merged ontology. In Proceedings of The 2003 International Conference on Information and Knowledge Engineering (IKE 03), Las Vegas. (pp. 412-416). Association for Computational Linguistics.

Piasecki, M., Broda, B., & Szpakowicz, S. (2009). A wordnet from the ground up. Wrocław: Oficyna Wydawnicza Politechniki Wrocławskiej.

R-team. (2017). R version 3.3.3. In R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved January 12, 2018, from https://www.R-project.org/

Suchanek, F. M., Kasneci, G., & Weikum, G. (2008). Yago: A large ontology from Wikipedia and Wordnet. Web Semantics: Science, Services and Agents on the World Wide Web, 6(3), 203-217. https://doi.org/10.1016/j.websem.2008.06.001

UPC TALP. (2017). Older wordnet mappings. In Collaborative Inter-Lingual Index (CILI). Global Wordnet Association. Retrieved January 12, 2018, from https://www.github.com/globalwordnet/ili

Vossen, P. (2002). EuroWordNet general document. EWN.

Vossen, P., Bond, F., & McCrae, J. P. (2016). Toward a truly multilingual Global Wordnet Grid. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 419-426). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf

WordNet-team. (2010). Wordnet 3.0 reference manual. In WordNet Documentation. Princeton University. Retrieved from https://wordnet.princeton.edu/documentation




Copyright (c) 2018 Eric Kafe

License URL: http://creativecommons.org/licenses/by/3.0/pl/