PERSISTENT SEMANTIC IDENTITY IN WORDNET

Although rarely studied, the persistence of semantic identity in the WordNet lexical database is crucial for the interoperability of all the resources that use WordNet data. The present study investigates the stability of the two primary entities of the WordNet database (the word senses and the synonym sets), by following their respective identifiers (the sense keys and the synset offsets) across all the versions released between 1995 and 2012, while also considering drifts of identical definitions and semantic relations. Contrary to expectations, 94.4% of the WordNet 1.5 synsets still persisted in the latest 2012 version, compared to only 89.1% of the corresponding sense keys. Meanwhile, the splits and merges between synonym sets remained few and simple. These results are presented in tables that allow to estimate the lexicographic effort needed for updating WordNet-based resources to newer WordNet versions. We discuss the specific challenges faced by both the dominant synset-based mapping paradigm (a moderate amount of split synsets), and the recommended sense key-based approach (very few identity violations), and conclude that stable synset identifiers are viable, but need to be complemented by stable sense keys in order to adequately handle the split synonym sets.


Sense keys and synset offsets
Wordnets cover an increasing number of languages, and interoperate by using identifiers from the Princeton WordNet (PWN) lexical database (Fellbaum, 1998).PWN represents words by their generic form (lemma), grouping words that share the same meaning in synonym sets (synsets), so that different senses of the same lemma belong to different synsets.Thus, in PWN, sense is equivalent to synset membership, which constitutes a mapping between words and the meaning of the unique synonym set that each particular word-sense pair belongs to.
While the identifier for each synonym set, the synset offset (WordNet-team, 2010, Wndb), changes between each version of the database, each individual word sense has a stable identifier (the sense key) which, in principle, does not change across different PWN versions.So, according Persistent semantic identity in WordNet to the WordNet manual, "A sense key is the best way to represent a sense in semantic tagging or other systems that refer to WordNet senses" (WordNet-team, 2010, Senseidx).
In the Entity-Relation Model (Chen, 1976), each database key needs to be a unique attribute, which allows the retrieval of a single database entity.Before WordNet 1. 5SC (1995), a few sense keys were not unique (and thus not keys), but in all later versions each sense key denotes only one sense of each word.Each word sense is a member of one and only one synonym set, so each sense key maps to only one synset offset in a given WordNet version.Additionally, each synonym set contains one and only one sense of each word that shares this sense, i. e. each synset offset corresponds to only one sense key of each word.

Mappings and updates
Primary keys are used as external keys by foreign databases, so they need to denote the same object over time, to avoid violating the referential integrity of external links.However, instead of using the recommended sense keys as their foreign key, almost all semantic databases that use PWN data are still linked to PWN through the ever-changing synset offsets, and thus bound to one particular version of PWN.This choice of foreign database key makes upgrades of the WN links difficult, and hinders the interoperability between resources that are bound to different PWN versions, such as many large scale ontologies (Niles & Pease, 2003;Suchanek, Kasneci, & Weikum, 2008) semantic knowledge bases (Navigli & Ponzetto, 2010), and most foreign language wordnets (Bond et al., 2014).
On the other hand, the referential integrity of the recommended sense keys has never been investigated.Kafe (2012) noted that the default meaning, i.e. the lex_id 0 (WordNet-team, 2010, Senseidx), of the word "C" had changed from an "alphabetic character" to a "programming language", between PWN 1.6 and 3.0.This violation of sense identity was discovered by comparing the output of a sense key-based mapping (which produced a false target), with the mappings described in Daudé, Padró, and Rigau (2001), which ignored sense keys, and produced the correct mapping target.So far, no other such error has been reported, and since identity violations are not allowed in theory, their practical extent could be negligible.But this assumption lacks confirmation.
Also, updating foreign language wordnets to a newer version of PWN requires additional lexicographic efforts, because the changes (splits, merges, deletions) in the PWN synsets do not always correspond to the composition of the foreign language synonym sets.So, in order to improve the precision of the mappings when updating between PWN versions, foreign language lexicographers need an accurate picture of the changes that occurred between these versions.But previous analyses have been limited to one PWN source and target pair: WN 1.5-1.6 (Daudé et al., 2001), WN 1.6-3.0(Gonzalez-Agirre, Laparra, & Rigau, 2012), WN 3.0-3.1 (Vossen, Bond, & McCrae, 2016).

The stability of WordNet identifiers
The present study investigates the stability of the two essential entities of the PWN databases (the word senses and the synonym sets), by following their respective identifiers (the sense keys and the synset offsets) across all modern versions, ranging from WordNet 1.5 to the latest WordNet 3.1.1for SQL. 1hen the sense keys are unique and persistent, they permit to observe their groupings in synonym sets across PWN versions, and to trace how these synsets evolve in the database over time.Even though synset offsets change between versions, we can follow the sense keys of their members, and obtain a precise recension of all the splits, merges, additions and deletions that occurred between PWN versions, and thus estimate the lexicographic effort needed in order to achieve linguistically satisfying mappings.

k1 (4)
We study these identity violations by considering drifts of identitical definitions in PWN updates: Definitions c%1:10:00 c%1:10:01 WN 1.6 a general-purpose programing language closely associated with the unix operating system WN 1.7 the 3rd letter of the roman alphabet a general-purpose programing language closely associated with the unix operating system In this example, the identity of the sense key c%1:10:00 is violated, because its definition changes so much that it does not define the same thing as previously.On the other hand, the identity of the new key c%1:10:01 is not violated, since this key did not exist previously, but the fact that its definition is identical to the source definition of the first key helps us discover the violation of that key.
We also study the same violation pattern in all the semantic relations included in the PWN data files, using a general two-step algorithm: find the set of persistent sense keys which are present in both the source and the target WN versions, and among the elements of this set, identify all drifts of semantic essence in the form of identical definitions or essential relations from a source key to any different target key of the same word, where the corresponding attribute of the source key is different in the target WN version.These two steps can also be applied in the reverse order, since the result is the intersection of two sets.But while there only exists one definition of the first set (SenseKeys persist ), several interpretations of SenseKeys essencedrif t are possible.In this study, we define essence drift as the drift of any of the following WN attributes: definitions, hypernyms, synonyms, antonyms, holonyms and pertainyms, because these relations emerge as more essential, after initially considering all WN relations.
Finding sense key identity violations: The output is a set of probable key violations that can be examined manually.For example, in the WN 1.7 update, the identity of c%1:10:00 is also violated by its hypernyms.

Sense and essence
To constitute an identity violation, a relation violation must violate the essence of a word sense, i.e. a necessary characteristic, which Aristotle distinguished from more contingent (or accidental) features.The latin translators of Aristotle used the term essentia to designate this central metaphysical concept, which has generated considerable controversy and diverging terminology since then (Cohen, 2016).Most notably, the scholastic philosopher John Duns Scotus opposed the essence of an individual (haecceity, or haecceitas in latin, literally "thisness"), and the related Persistent semantic identity in WordNet concept of quiddity (or quidditas, literally "whatness"), which is the essence common to a group (Cross, 2014).Although opposite according to Scotus, both terms are still synonyms in PWN 3.1, and hyponyms of {kernel, substance, core, center, centre, essence, gist, heart, heart and soul, inwardness, marrow, meat, nub, pith, sum, nitty-gritty}, and they share the same definition: "the essence that makes something the kind of thing it is and makes it different from any other" .The notion of essence is necessary in order to identify violations of semantic identity, so these may remain elusive, until a precise and consensual definition of essence becomes available.Considering that this question already has remained open for almost twenty-five centuries, it seems safe to expect corresponding difficulties with the notion of semantic identity.In the meantime, the present study seeks to eschew the trap of a circular definition (violations of semantic identity are violations of essence and vice versa), by reviewing the violations of WordNet relations separately, in order to determine which ones violate the semantic identity of their arguments.As a practical example, even though the identity of the word "C" is violated, this does not affect the integrity of its hypernyms.The hypernyms express what "C" is, i. e. quiddity, which is a part of essence, while "C" does not add anything essential to the notion of "programming language".Thus, violations of superordinate relations like hyperonymy are more likely to be perceived as essence violations, since they express a necessary characteristic shared by all narrower word senses, while hyponymy violations only affect a part of the word sense which is not essential, since different hyponyms do not need to share that characteristic.

The sense keys
After collapsing the part of speech and synset offset fields from the SKI database file into the 9-digit synset id format used in WNprolog (WordNet-team, 2010, Prologdb), we applied the built-in xtabs cross-tabulatation function in the R statistical environment (R-team, 2017), to obtain a table containing all the PWN versions as columns and all the sense keys as rows, with the synset id corresponding to each sense key and each PWN version in the cells, and 0 when the sense key was absent from the corresponding PWN version.
For each pair of consecutive PWN versions (see Table 2), we count the number of sense keys present in either the source version (WN source ) or the target version (WN target ), or both.Most sense keys persist in both versions, and their percentage expresses the recall of mappings that use only Rule 2. Sense keys that only appear in the source have been removed in the target, and those that only appear in the target have been added to the source.Violated sense keys are a special case: although present in both versions, they do not denote the same sense, and persist only in apparence.Here, we consider that their old sense is removed from the source WN version, while their new sense is added to the target WN, so that they are not counted as persistent.
The persistent and removed sense keys add up to T otal source , so we calculate their ratios as percentages of T otal source , which add up to 100.The persistent and added sense keys add up to T otal target , but their percentages do not add up to 100, because they are ratios of different totals.Both totals are identical to the Word-Sense Pairs reported in (WordNet-team, 2010, Wnstats).

Persistent, added and removed synonym sets
We analyse the evolution of the synonym sets, by considering whether their corresponding sense keys are present in either or both of the source and target PWN versions (see Table 3).
The source synset offsets of persistent sense keys have at least one translation in the target, and are counted as persistent synsets.Since violated sense keys are not considered persistent, they do not contribute to the persistent synsets.Source synset offsets that do not have a sense key present in the target correspond to removed synsets, while target synsets that do not have a sense key that was present in the source, have been added in the PWN update.

Persistent semantic identity in WordNet
These figures and their percentages are calculated as for Table 2: the persistent and removed synsets add up to T otal source , and their percentages add up to 100.The synset totals are identical to those from each corresponding WN Stats manual page (WordNet-team, 2010, Wnstats).But, because of splits and merges, the number of persistent synsets in the source (i.e. the figure we use here) is not identical to the number in the target, which together with the number of added synsets, would add up to T otal target .

Split and merged synsets
In a mapping with unique pairs of (source, target) synset offsets, split synsets are those appearing more than once in the source column, while merged synsets are those appearing more than once in the target.The number of times that these synsets appear is a measure of the complexity of the split or merge operation.We indicate this size in Table 4 with a subscript, so that split 2 and split 3 are the number of synsets that were split in respectively two or three different target synsets.Similarly, merged 2 and merged 3 are the number of merges from two or three different source synsets.Some synonym sets are both split and merged, and we indicate their frequency as & merged split .The synonym sets counted as persistent here satisfy a minimal condition of stability, because they have at least one sense key present in both PWN versions.Extending Mapping Rule 2 allows to increase recall, by mapping removed sense keys to the target synset of their synonyms.The resulting Rule 7 generalizes the mapping previously established by Rule 2 for a single sense key k1 from s1, to predict that its synonyms in version v1 also belong to s2 in PWN version v2 by associativity.But Rule 7 produces fallacies when s1 was split into different target synsets, where k1 and k2 are no longer synonyms. (7) Studying the evolution of the sense keys allows us to detect all splits or merges, and to assess their frequency and complexity, i. e. the maximal number of synonym sets involved in one split or merge operation (see Table 4), which permits to precisely identify and count the maximal number of false positives that Rule 7 can produce.

Performance analysis
The mappings released by the SKI project (Kafe, 2017b, ski-mappings-pwn) apply only the Mapping Rules 2 and 7.The true performance of these mappings lies somewhere above a lower bound that can be calculated by finding the theoretical minimum of the number of correct mapping predictions, and the maximal number of possible fallacies.

Mapped
Not mapped True tp = Keys P ersist tn = 0 As reference, we use the imaginary performance of a hypothetic ideal mapping which would be able to map everything accurately, achieving 100% precision and 100% recall.In this ideal situation, there are no true negatives (tn = 0), so the sense keys pertaining to the removed synsets from Table 3, which our less ideal mapping cannot map, are false negatives (fn).Only the persistent sense keys from Table 2 are the true positives (tp), while all the rest of the mapping could be false positives (fp).The number of removed keys is equal to the sum of fp and fn, so fp can be obtained Persistent semantic identity in WordNet either by subtracting tp from the length of the mapping, or by subtracting fn from the number of SenseKeys removed , and both results are expected to be identical, which is verifiable in practice.These values allow us to use standard formulas to calculate lower bounds for the precision and recall of the mappings.

Identity violations in WordNet updates
We started by investigating violation patterns in all WordNet relations and definitions, and found only 239 attribute violations in total, when adding all consecutive WordNet updates between the earliest version 1.5 and the latest version 3.1.1,and slightly less (230) in the direct update between the same two versions.The violations peaked during the update from WN 1.6 to WN 1.7, and have declined in absolute numbers since then.
As a consequence of the low total number of violations, we reviewed all these cases, and found that all the violations of hypernymy, definitions, synonymy, antonymy, holonymy and pertainyms resulted in genuine violations of semantic identity, which reveals that these relations participate in describing the essence of their first argument.Other relations (hyponymy, verb group, part meronym, derivations, similar to) were more rarely violated, and only denoted an essential violation when they accompanied a violation of one of the more essential relations, but not otherwise.The Persistent semantic identity in WordNet remaining WordNet relations (like instance hypernyms) were never violated, but from the violations of hypernymy which have become instance hypernyms in later WordNet versions (like the Armstrong example below), we may infer that instance hypernymy is also an essential relation, while instance hyponymy is not.The full list of essential violations found in all consecutive PWN updates is included in Appendix A.
Although the number of identity violations shown in Table 1 was globally low, it is still higher than expected, since essential violations are not allowed in theory.In these results, the relations are ordered from left to right by their decreasing number of violations.In this ordering, hypernymy emerges as the most frequent source of violations, which confirms the importance of hypernymy in expressing essential characteristics of word sense.The Totals reflect the cardinality of the set union of all violations, which is lower than their sum, since each violation can manifest itself in several relations simultaneously.Out of these totals, the Essential column shows the number of genuine identity violations involving the more essential relations mentioned earlier.
The most evident violations of sense identity concern proper names, so the inclusion of named entities in PWN is a major help for recognizing identity breaches, since persons born or dead at different times cannot be the same person, like f. ex.gregory%1:18:01, a synonym of pope Gregory I (540?-604) in PWN 2.0 and of Gregory VII (1020-1085) in PWN 2.1.Likewise, places situated in different regions cannot be the same place, like worcester%:1:15:00, which has Massachusets as part holonym in PWN 1.6, but England in PWN 1.7.The first violation concerns the individual thisness of "Gregory", while the second example concerns the shared whatness of "Worcester".This shows that WordNet can also express haecceity through synonymy, and quiddity by subordination to a holonym.Likewise, the quiddity (astronaut vs. jazzman) of armstrong%1:18:00 was violated by hypernymy during the same PWN update, while his haecceity (Neil vs. Louis) was violated by synonymy.

Violation
soft%3:00:05 soft%3:00:07 Definitions WN 2.0 used chiefly as a direction or description in music WN 2.1 (of light) transmitted from a broad light source or reflected used chiefly as a direction or description in music Synonyms wn 2.0 piano%3:00:00 wn 2.1 diffuse%3:00:00 diffused%3:00:00 piano%3:00:00 Antonyms wn 2.0 forte%3:00:00 loud%3:00:02 wn 2.1 concentrated%3:00:01 hard%3:00:05 forte%3:00:00 loud%3:00:02 Between WN 1.5 and 3.1.1,the Essential violations concerned only 0.1% of the total number of sense keys, so their impact on the global sense stability of the PWN keys is almost imperceptible.These low absolute numbers are however only rough estimates and not strict upper bounds, because although this study considers all WordNet relations, it only compared identical definitions.So it is possible that additional violated definitions could be found among the modified definitions that are not accompanied by any relation violation.This eventuality seems less likely, since any essential violation in a definition could be expected to entail corresponding changes in the relations.However, 6 out of the 39 violations of identical definitions in the update between PWN 1.6 and 1.7 indeed occurred without any relation violation, which confirms that this eventuality is real.If a similar proportion (6/39 is 15%) holds among all definition violations (including the yet unknown ones), then 85% of these would also present relation violations, and thus already be identified in this study.Then, we may estimate the number of still unknown violations of edited definitions to be approximately close to 15/85 (17.6%) of the number of essential relation violations reported here.Since these numbers are always very low, and only amount to a tiny fraction of the total number of sense keys in any PWN version, the rounded stability percentages presented in the following sections can be expected to hold within very narrow confidence intervals.

Identity preservation in WordNet updates
Table 2 displays the number of persistent, added, and removed sense keys for the nine WordNet updates from version 1.5 to 3.1.1,and four typical long-distance updates between non-consecutive versions, which are relevant for some foreign language wordnets (Dziob, Piasecki, Maziarz, Wieczorek, & Dobrowolska-Pigoń, 2017; Kahusk & Vider, 2017), or studied in previous literature (Daudé et al., 2001;Gonzalez-Agirre et al., 2012;Vossen et al., 2016).Only few wordnets are linked to other versions (Bond & Paik, 2012), so we did not study every possible combination of non-consecutive WordNet versions separately.However, approximate figures can be derived by adding the changes found in the intermediate consecutive versions.For example, a simple addition of the removals between all the versions from WN 2.0 to WN 3.1.1 is sufficient in order to obtain a reliable estimate of this long-distance stability result.
Table 2 shows a high persistence of the sense keys after version 1.6: less than 1% were typically removed between consecutive versions, so the percentage of persistent keys was generally above 99.But before version 1.6, the persistence was a little lower, with approx.3% removals between versions.For long-distance updates, the lost sense keys accumulate: in total 18368 sense keys have been removed since PWN 1.5, so the ratio of keys from PWN 1.5 that persist in the latest PWN Persistent semantic identity in WordNet 3.1.1drops to 89.1%.Most often, the number of additions have by far exceeded the deletions, the only exception being the latest WN 3.1.1update, which mostly consisted in removals.
Overall, the rounded percentages found here are identical with the results reported by Kafe (2017a), which did not consider sense key violations, and the only effect of those is a 0.1% stability decrease in two old PWN updates, which is almost negligible.

The persistence of synonym sets
Table 3 shows that the synonym sets were always more persistent than the individual sense keys.The lowest persistence rate was 94.4% for the long-distance update from PWN 1.5 to 3.1.1.Again, the overall percentages were identical with Kafe (2017a), except for an almost negligible 0.1% stability decrease in a few PWN updates, after taking the key violations into account.
The superior stability of the synonym sets may actually be expected, considering that removed word senses are mapped to the target synset of their synonyms.For example, although the adjective sense key for "froward" disappeared between WN 3.1 and 3.1.1because the orthography of the lemma was corrected to "forward", it is still mapped through synonyms like "headstrong".So mappings that link synset offsets have a higher recall than those that only link sense keys, because they cover whole sets of words, and thus avoid some of the losses incurred from the removal of individual sense keys.However, when synsets are split, mapping each key to all its synonyms causes a loss of precision, which we can quantify through a more precise analysis of the splits.

Splits and merges
The following example from PWN 2.1 displays an addition (medusoid), a deletion (medusa#2), a split (jellyfish), and a merge (medusan).The deletion of medusa#2 is implied by the fact that there is already a sense of medusa in the target synset.The next example shows that the adverb observably migrated to its antonym set, during the update from WordNet 2.0 to 2.1.In this case, applying the mapping Rule 7 to its source synonyms imperceptibly and unnoticeably would aggravate the confusion between synonyms and antonyms, instead of resolving it.To avoid such errors, it is crucial to review all the splits manually.This example also shows that merges do not produce false positives, since the other merged source synset (perceptibly and noticeably) is only mapped to the correct target.

False splits and merges
Many identity violations coincide with a synset being split, merged or both.These interactions reveal that the corresponding splits may only be apparent.For example, from the raw sense keys, it would seem that the synset containing soft%3:00:05 was split between WN 2.0 and WN 2.1.

Mapping link:
W N 2.0 sof t%3:00:05 ↔ W N 2.1 sof t%3:00:07 Then, if we apply this mapping, the source synset is not split, because the violated key remains a synonym of piano.Thus, the number of real splits and merges differs from their apparent number, and depends on the sense key violations.Applying this mapping before other mappings prevents errors that would occur otherwise: if we just map the raw sense keys, the mapping rule 2 discussed earlier will make soft%3:00:05 a synonym of diffuse, thus producing a false positive, and break its synonymy with piano, resulting in a false negative.Mapping Rule 7 produces a different set of errors (two false positives and zero false negatives), because the synonymy with piano would persist, so there would be no false negatives, but an additional false positive would be produced when piano also becomes a fallacious synonym of diffuse.
By contrast, the stable synset identifiers from the Inter-Lingual Index (ILI) (Vossen, 2002;Vossen et al., 2016) are not affected by this false split, since the ILI-WordNet mapping (UPC TALP, 2017) only maps the involved synsets to their correct targets, without need for any particular handling of the individual word senses.The stability of the ILI-based sense key soft#i6334 shows that the corresponding PWN key violation could have been avoided by keeping the lex_id of soft%3:00:05 unchanged as a synonym of piano while assigning the new key soft%3:00:07 to the synonym of diffuse.So the lexicographers' liberty to freely assign lex_id appears to be the probable cause of the sense key violations found in WordNet.

True splits and merges
In this study, as explained in the Methods section, we blocked the fallacious confusion of violated sense keys, by considering them as removals from the source WN and new additions to the target version.As a consequence, the false splits and merges resulting from sense violations are avoided, and the figures presented in Table 4 correspond to a more restrictive set of true splits and merges.For example, out of 223 apparent splits found by Kafe (2017a) for the update from WN 1.6 to 1.7, we only retain 196, which means that the difference, amounting to 27 (8%) false splits constitutes an important part (42%) of the 64 essential sense key violations reported in Table 1.So our study shows that, although almost negligible in absolute numbers, the sense key violations have a significant impact on the number of split synsets, which is often approximately 10% lower than reported by Kafe (2017a).
Through all updates, merged 2 and merged 3 always add up to the total number of merges, so no target synset was ever merged from more than three source synsets.Similarly, after PWN Persistent semantic identity in WordNet version 1.5SC, split 2 and split 3 also add up to the total number of splits, so no source synset was split into more than three target synsets.Only in the mapping between WordNet 1.5 and 1.5SC, the total number of splits includes a very small number of four and five-way splits.The number and size of the splits and merges was generally low, and there were always more splits than merges.Almost all splits and merges only involved two synsets, and operations involving three synsets were very rare.Synsets that were split and merged at the same time most often resulted from the migration of a single sense key to another synset.

The performance of simple sense key mappings
Analysing the sense key-based mappings released by the SKI project (Kafe, 2017b, ski-mappings-pwn) shows, as expected, that applying Mapping Rule 7 increases recall but deteriorates precision (cf.Table 5).However, after version 1.6, both measures show excellent performance.Compared with Kafe (2017a), the sense key violations have no impact on the recall and lead to 0.1% decreased precision in only three PWN updates.
This analysis differs from human evaluations by considering the whole PWN dataset, instead of smaller samples, so it provides exact metrics, while human evaluations of limited samples add sample and evaluator biases that can yield higher standard error, resulting in wider confidence intervals.Larger human evaluations are needed, as well as deeper analyses, since both approaches have complementary merits, and allow meaningful comparisons.
Only few partial studies have previously been conducted about the performance of mappings between PWN versions.Daudé et al. (2001) produced a complete synset offset mapping from PWN 1.5 to 1.6, by applying a relaxation labelling algorithm, with a set of constraints that involved all semantic relations, and additional heuristics such as gloss similarity.They evaluated the results manually, by applying different constraint sets on samples drawn from the monosemous vs. ambiguous nouns, verbs, adjectives and adverbs (4200 synsets in total), and found 98.8% precision and 98.9% recall for the nouns overall, when using the complete constraint set.In all cases, recall was higher than precision.A particular strength of these mappings is their ability to correct identity violations, which constitute a weakness for sense keys.
By comparison, the corresponding sense key-based mapping for this old PWN update also shows higher lower bounds for recall (97.6%) than for precision (95%).However, with the later versions this tendency was inversed, since precision was consistently higher than recall, and both figures stayed mostly above 99.5% in the later consecutive PWN mappings.

WordNet synsets are very stable
After establishing that the number of sense key violations in WordNet was almost negligible, we simply followed the stable sense keys between all WordNet versions, and saw that the synonym sets have remained very stable throughout every update.There was never more than a few hundred split or merged synonym sets between consecutive versions and, after version 1.6, the complexity of these changes was often the lowest possible, because each split or merge almost always involved only two synsets, and never more than three.This is the first comprehensive analysis of the persistence of semantic identity in all the Word-Net versions released between 1995 and 2012.Our results show that both existing mapping paradigms (the recommended sense-key based, and the dominant synset-based) have their respective specific challenge (the identity violations vs. the split synsets), which is often a strength of the other paradigm.Thus, all WordNet mappings can be improved by focusing on their specific sources of errors.In both cases, the number of problems is limited, which indicates that it will become possible to update wordnets to new PWN versions with greater confidence and less effort.So, alt- hough rarely studied, the persistence of semantic identity is an essential question that has acute consequences for the interoperability of all projects that use WordNet data.Lexicographers can use Tables 1, 2, 3, 4 and 5 to estimate the effort required to update a resource between two PWN versions.For example, when updating to PWN 3.0, a resource that uses PWN 1.6 sense keys and just applies Rule 2 would obtain almost perfect precision (subtracting the 144 essential violations from Table 1), and 95.9% recall (Table 2), which can be improved by a review of the 7208 removed sense keys, as well as the eventual collapses of identical words (like medusa#2 in a previous example) resulting from the 231 merged synsets (Table 4).Mapping Rule 7 improves recall (97.3% in Table 5), which can be further improved by reviewing the same 231 merges, and the 4576 false negatives remaining from the 7208 removed sense keys that belong to the 3025 removed synsets (Table 3), while the rest of these 7208 removed sense keys could be false positives produced by Rule 7, and need to also be reviewed in order to increase precision, in addition to the 506 splits, which cause a decrease of precision that does not affect sense keys.
So these results confirm that "sense keys are the best way to represent a sense" (WordNet-team, 2010, Senseidx), but only by a small margin.Contrary to expectations, synset identifiers provide a reasonable alternative, since the splits between most versions are relatively few and simple.As a consequence, stable synset identifiers like the Inter-Lingual Index (ILI) appear viable, although they will need to be complemented by mapping links between ILI-based sense keys, in order to handle the split synsets.

Practical application
For the older wordnets that are still mapped to PWN 1.5, like Polish (Dziob et al., 2017) and Estonian (Kahusk & Vider, 2017), upgrading to PWN 3.1.1requires to review the intersection of the source data with the 1120 PWN splits reported in Table 4, and the 214 sense key violations counted in Table 1, and listed in Appendix A. More recent projects linked to PWN 2.0 are in a luckier position, since the addition of all the changes that have occurred in the intermediate WN versions yields only 194 splits, and 36 violations.
Obviously, projects already linked to WN 3.0 enjoy a more fortunate situation.For example, updating the wordnets from MCR30-2016(Gonzalez-Agirre et al., 2012) to PWN 3.1 is much simpler, since only 33 splits need to be checked.One of these is the following example from WordNet, where Pluto was moved from the Greek to the Roman "gods of the underworld", while the corresponding Greek name Plouton is not in PWN yet.The ILI 3.1 mapping (GWA, 2017) provides correct identifiers at the synset level, but cannot help in mapping local translations of Pluto to their adequate PWN 3.1 synset, so the eventual local splits have to be resolved by local lexicographers.For example, the Spanish WordNet from MCR30-2016(Gonzalez-Agirre et al., 2012) also includes the involved synsets.Thus, the Spanish lexicographers need to consider whether Plutón corresponds to either the Greek or the Latin name, or eventually to both, and other WN 3.0-based resources also face the same issue.In this particular Persistent semantic identity in WordNet example, applying Mapping Rule 7 would place Plutón in both the Roman and the Greek synsets, which seems adequate if both gods are the same.

Sense
Persistent synset identifiers like ILI are useful at the coarse synonym-set level, but they need to be complemented by sense keys to handle the more precise word-level.Since words are unique within each synset, a sense key can be constructed by simply combining each word and its synset identifier.If persistent ILI identifiers are used for the synsets, the corresponding ILI-based sense keys are also persistent, except when they denote one of the few violated or migrated senses.Then, a mapping link between ILI-based sense keys can handle these exceptional cases, and for example express the migration of Pluto to a different synset.This format can express mappings between any source and target word senses, and is thus also able to adequately to handle the violations of semantic identity listed in Appendix A.

Strengths and limitations
Objectively, the key violation mappings from Appendix A provide only 0.1% increased precision in just a few PWN versions.But this tiny quantitative gain may correspond to a larger qualitative improvement of the concerned PWN links, since the more severe violations (like confusing clearly distinct cities or persons) have a damaging impact on the perceived quality of a lexical resource, and the confidence in its use.This study relies crucially on the low number of identity violations in WordNet.But, as mentioned earlier, we do not know whether all sense key violations were identified here, so the results in Table 1 are only estimates.Although the PWN Sensemap (WordNet-team, 2010, Sensemap) could be expected to map additional violations, this does not seem to be the case: on the contrary, applying our general formula for finding sense key violations (5) in Sensemap just reveals a few dubious mappings that could be avoided by considering the definitions.For the moment, we may estimate that the real violations are probably too few to change the rounded percentages reported here, but this will need to be confirmed in future studies.Some controversy must be expected, since we still miss a consensual and exhaustive definition of semantic essence, so diverging appreciations of the notion of semantic identity are inevitable.
Also, the present study is limited to only two primary mapping inference rules, based on sense key identity (2) and persistent synonymy (7).Additional mapping links can also be inferred automatically from gloss similarity and other relations, as in (Daudé et al., 2001).However, since these additional heuristics are more uncertain, they should be studied separately, and applied at a later stage.
Our results show that synset identifiers like the ILI could be stable in theory, but the actual stability of the ILI has not yet been investigated in practice.A similar study of the various ILI versions would be interesting.
Last but not least, we still do not know how to avoid future identity violations, both in PWN and in other wordnets.According to Christiane Fellbaum2 , "The WordNet lexicographers are free to change the sense inventory as they see fit", though in later versions, the PWN compiler (WordNet-team, 2010, grind) flags eventual duplicate lex_ids for the same word within a lexical file.According to Randee Tengi 2 , the grind program then "leaves it to the lexicographer to view the synsets and be sure that the correct lex_id is used to carry forward a synset with the same meaning that it had in the previous version".This ensures unique sense keys, but does not protect against identity violations.Perhaps a closer study of the examples in Appendix A can lead to a better understanding of this potential pitfall.

Table 1 :
Attribute Violations in WordNet Updates

Table 2 :
Persistence of the sense keys between WordNet versions WN source WN target T otal source T otal target Added % Removed % Persist %

Table 3 :
Persistence of the synonym sets between WordNet versions WN source WN target T otal source T otal target Added % Removed % Persist %

Table 4 :
Splits and Merges in the synonym sets between WordNet versions WN source WN target Split split 2 split 3 Merged merged 2 merged 3 & merged

Table 5 :
Performance lower bounds of sense-key mappings between WordNet versions