DOI: https://doi.org/10.11649/cs.1712

Inside Baseball: Coverage, quality, and culture in the Global WordNet

Martin Benjamin

Abstract


Inside Baseball: Coverage, quality, and culture in the Global WordNet

The Global WordNet is succeeding in producing relatively open linguistic data that is coordinated to a degree among numerous languages. The project has grown organically, with no overall plan or direction. The result is a certain amount of incoherence in determining what items should be treated in wordnets, and how the various wordnets should aspire to consistent quality. Using the example of terms related to baseball, which constitute a non-trivial portion of the Princeton WordNet, this paper discusses problems of coverage selection both for English and for other languages, as well as methods to improve quality and depth through public review of current content, and contribution of missing terms and definitions. It is proposed that proper names be removed entirely from WordNet and treated as a separate project, and that individual languages produce annexes of indigenous concepts that can be readily considered within sister projects as a supplement to the Anglo-American weighting of the current endeavor. To produce a consistent product that transmits inter-intelligible understanding at a high level across languages, it is proposed that an open committee of interested stakeholders convene to consider the project's goals and develop a roadmap for how to achieve them.

 

Baseball dla zaawansowanych: pokrycie leksykalne, jakość i kultura w Global WordNet

Global WordNet z powodzeniem tworzy stosunkowo otwarte dane językowe, do pewnego stopnia powiązane pomiędzy wieloma językami. Projekt żyje własnym życiem, bez żadnego ogólnego planu ani kierunku. Rezultatem jest pewna niespójność w określaniu, które elementy powinny znaleźć się w wordnetach oraz w jaki sposób różne wordnety powinny dążyć do utrzymania tej samej jakości. Na przykładzie terminów związanych z baseballem, które zajmują niemałą część Princeton WordNet, niniejszy artykuł omawia problemy wyboru pokrycia leksykalnego zarówno dla języka angielskiego, jak i innych języków, a także metody poprawy jakości poprzez publiczny przegląd aktualnych treści oraz dodanie brakujących terminów i definicji. Proponuje się, aby nazwy własne zostały całkowicie usunięte z WordNetu i potraktowane jako odrębny projekt, a w ramach poszczególnych języków utworzone zostały aneksy rodzimych pojęć, które można wziąć pod uwagę w ramach siostrzanych projektów jako uzupełnienie obecnego anglo-amerykańskiego przedsięwzięcia. W celu stworzenia spójnego produktu, który będzie charakteryzował się wzajemną zrozumiałością na wysokim poziomie w różnych językach, proponuje się zwołanie otwartego komitetu zainteresowanych podmiotów, aby rozważyć cele projektu i opracować plan działania, w jaki sposób cele te osiągnąć.


Keywords


wordnet; lexicography; vocabulary; named entities; multilingual

Full Text:

PDF (in English)

References


Benjamin, M. (2014). Molecular lexicography: A lexical data model for Human Language Technology. Retrieved March 2, 2018, from https://kamusi.org/molecular_lexicography

Benjamin, M. (2015). Crowdsourcing microdata for cost-effective and reliable lexicography. In Proceedings of AsiaLex 2015 Hong Kong (pp. 213-221).

Benjamin, M. (2016). Problems and procedures to make Wordnet Data (Retro)Fit for a multilingual dictionary. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 27-33). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf

Bond, F., & Foster, R. (2013). Linking and extending an open multilingual wordnet. In 51st Annual Meeting of the Association for Computational Linguistics: ACL-2013 (pp. 1352-1362). Sofia: Association for Computational Linguistics (ACL).

Boyd-Graber, J., Fellbaum, C., Osherson, D., & Schapire, R. (2006). Adding dense, weighted connections to WordNet. In P. Sojka, K.-S. Choi, C. Fellbaum, & P. Vossen (Eds.), GWC 2006: Third International WordNet Conference, GWC 2006 Jeju Island, Korea, January 22-26, 2006: Proceedings (pp. 29-35). Retrieved from http://semanticweb.kaist.ac.kr/conference/gwc/pdf2006/gwc06.pdf

Fellbaum, C. (Ed.). (2008). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.

Fellbaum, C. (2016). How and when to add a new concept and how to define it. Paper presented at Workshop on the Collaborative Interlingual Index, Global WordNet Conference 2016, Bucharest, Romania.

Francis, W., & Kucera, H. (1979). Brown Corpus Manual. Providence, RI: Department of Linguistics, Brown University. Retrieved from http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM

Grishman, R., Macleod, C., & Meyers, A. (1994). Comlex Syntax: Building a computational lexicon. In COLING '94 Proceedings of the 15th conference on Computational linguistics (Vol. 1, pp. 268-272). https://doi.org/10.3115/991886.991931

Hornby, A. S. (Ed.). (1980). Oxford advanced learner's dictionary of current English. Oxford: Oxford University Press.

Manning, K. (2013, November). How many saints are there? US Catholic, 78(11), 46. Retrieved March 2, 2018, from http://www.uscatholic.org/articles/201310/how-many-saints-are-there-28027

Mead, R. (2010, January 4). What do you call it? The New Yorker. Retrieved March 2, 2018, from https://www.newyorker.com/magazine/2010/01/04/what-do-you-call-it

Miller, G. (2008a). Forward. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. xv-xxii). Cambridge, MA: MIT Press.

Miller, G. (2008b). Nouns in Wordnet. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. 23-46). Cambridge, MA: MIT Press.

Mojapelo, M. (2016). Semantics of body parts in African WordNet: A case of Northern Sotho. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 233-241). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf

Mrini, K., & Benjamin, M. (2017). Towards Producing Human-Validated Translation Resources for the Fula language through WordNet Linking. In The Proceedings of the First Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT) (pp. 58-64). Varna: RANLP. https://doi.org/10.26615/978-954-452-042-7_008

Mrini, K., & Benjamin, M. (in press). Linking the English Wiktionary: A source for new multilingual data for Kamusi and WordNet. Linguistic Issues in Language Technology: Special Issue on Linking, Integrating and Extending Wordnets.

Osborn, D., Dwyer, D., & Donohoe, J. (1993). A Fulfulde (Maasina)-English-French Lexicon: A root-based compilation drawn from extant sources followed by English-Fulfulde and French-Fulfulde listings. East Lansing: Michigan State University Press.

Piasecki, M., Szpakowicz, S., Maziarz, M., & Rudnicka, E. (2016). plWordNet 3.0 - Almost there. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 290-299). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf

Rodríguez, H., Climent, S., Vossen, P., Bloksma, L., Peters, W., Alonge, A., Bertagna, F., & Roventini, A. (1998). The top-down strategy for building EuroWordNet: Vocabulary, base concepts, and top ontology. In P. Vossen (Ed.), EuroWordNet: A multilingual database with lexical semantic networks (pp. 45-80). Dordrecht: Springer. https://doi.org/10.1007/978-94-017-1491-4_3

Slaughter, L., Wang, W., Morgado da Costa, L., & Bond, F. (2018). Enhancing the Collaborative Interlingual Index for Digital Humanities: Cross-linguistic analysis in the domain of theology. In F. Bond, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the 9th Global Wordnet Conference, Singapore, 8-12 January 2018. Global Wordnet Association.

Vossen, P. (1998). Introduction to EuroWordNet. In P. Vossen (Ed.), EuroWordNet: A multilingual database with lexical semantic networks (pp. 1-17). Dordrecht: Springer. https://doi.org/10.1007/978-94-017-1491-4_1

Vossen, P., Soria, C., & Monachini, M. (2013). Wordnet-LMF: A standard representation for multilingual Wordnets. In G. Francopoulo & P. Paroubek (Eds.), LMF Lexical Markup Framework (pp. 51-66). Hoboken, NJ: Hermess/Lavoisier. https://doi.org/10.1002/9781118712696.ch4




Copyright (c) 2018 Martin Benjamin

License URL: http://creativecommons.org/licenses/by/3.0/pl/