AbstractsLanguage, Literature & Linguistics

From Talking Animal to Talking Machine. Lexical semantic relations in WordNet

by Tatiana Valeria Kantorovich

Institution: University of Iceland
Year: 2015
Keywords: Enska
Record ID: 1221329
Full text PDF: http://hdl.handle.net/1946/20386


A long time passed from the first word said by Homo sapiens to the first word said by a machine. The possibility of machines talking like men confronts us with the richness and complexity of human linguistic competence and its cognitive underpinnings. The contemporary study of linguistics attempts to explain this complexity explicitly. Part of this is lexical semantics, which studies the meanings of words and the relations between them. This essay addresses the attempt to represent one aspect of lexical semantic linguistic competence (lexical semantic relations) in a major computational resource: WordNet. There are various kinds of relations in lexical semantics: homonymy, synonymy, antonymy, hyponymy, meronymy, and troponymy. They were used in WordNet to represent the organisation of the human lexicon. WordNet has a synset as a main building block. Synsets are sets of word forms that are close in meaning in context. In WordNet, nouns and verbs have taxonomic structures. The word forms are divided into domains related to a specific subject and shared features. Adjectives have a structure based on the antonymy relation where bipolar adjectives divided into clusters referring to a certain meaning. Adverbs are gathered in a single file. Psycholinguists have often attacked the WordNet structure as a representation of human linguistic competence. However, computational linguists have found the lexical semantic database useful for machine applications and natural language processing. WordNet have been translated into many languages and combined into multilingual databases such as EuroWordNet. Each language has developed its own wordnet but they are interconnected with interlingual links. Expand and merge approaches are used for data acquisition. The expand approach assumes bilingual translation with automatic, manual and hybrid methods to fill up gaps in data. Linguistic bias between languages can be reduced by data from sources such as Wikipedia or dictionary translation by professional interpreters. The merge approach assumes use of monolingual corpora for data acquisition. WordNet moved from cognitive science to natural language processing. It is one of the remarkable discoveries that helped scientists to come closer to the desire to teach machines to speak.