AbstractsComputer Science

Natural language interaction with semantic web ontologies

by Gerasimos Lampouras




Institution: Athens University Economics and Business (AUEB); Οικονομικό Πανεπιστήμιο Αθηνών
Department:
Year: 2015
Keywords: Παραγωγή φυσικής γλώσσας; Επεξεργασία φυσικής γλώσσας; Οντολογίες; Σημασιολογικός ιστός; Ακέραιος γραμμικός προγραμματισμός; Έξαγωγή προτύπων από τον Ιστό; Natural language generation; Natural language processing; Ontologies; Semantic web; Integer linear programming; Pattern extraction from the Web
Record ID: 1155599
Full text PDF: http://hdl.handle.net/10442/hedi/35583


Abstract

The Semantic Web is an effort to establish standards and mechanisms that will allow computers to reason more easily about the semantics of Web resources (documents, data etc.). Ontologies play a central role in this endeavour. An ontology provides a conceptualization of a knowledge domain (e.g., consumer electronics) by defining the classes and subclasses of the domain's entities, the types of possible relations between them etc. The current standard to specify Semantic Web ontologies is OWL, a formal language based on description logics and RDF, with OWL 2 being the latest OWL standard. Given an OWL ontology for a knowledge domain, one can publish on the Web machine-readable data pertaining to that domain (e.g., catalogues of products, their features etc.), with the data having formally defined semantics based on the conceptualization of the ontology. Several OWL syntaxes have been developed, but people unfamiliar with formal knowledge representation often have difficulties understanding them. This thesis considered methods that allow end-users to view ontology-based knowledge representations of the Semantic Web in the form of automatically generated texts in multiple natural languages.The first part of the thesis improved NaturalOWL, a Natural Language Generation system for OWL ontologies previously developed at AUEB. The system was modified to support OWL 2 and to be able to produce higher quality texts. Experiments showed that the texts generated by the new version of NaturalOWL are indeed of high quality and significantly better than texts generated by simpler systems, often called ontology verbalizers, provided that appropriate domain-dependent linguistic resources (e.g., sentence plans to express relations) are available to NaturalOWL. The second part of the thesis considered text mining and machine learning methods to automatically or semi-automatically extract from the Web the most important of the domain-dependent linguistic resources that NaturalOWL needs to produce high quality texts. Experiments showed that a semi-automatic approach, where a human inspects automatically produced linguistic resources, allows NaturalOWL to produce texts of almost the same quality as with linguistic resources authored manually from scratch. The third part of the thesis aimed to further improve the quality of the generated texts by developing an Integer Linear Programming model that jointly considers content selection, lexicalization, sentence aggregation, and a limited form of referring expression generation, unlike the pipeline architecture of most natural language generation systems, where the four stages are greedily considered one after the other. Experiments indicated that the new model allows NaturalOWL to express more information per word, which is useful when space is limited (e.g., in advertising), with no deterioration in the perceived quality of the generated texts.Throughout the thesis, ontologies from different domains (e.g., cultural heritage, consumer electronics, bioinformatics) were used. Using the methods…