AbstractsComputer Science

Graph-based learning for information systems

by Xin Li




Institution: University of Arizona
Department:
Year: 2009
Keywords: Data mining; Graph-based learning; Graph-structured data; Information systems; Knowledge discovery; Knowledge managment
Record ID: 1854342
Full text PDF: http://hdl.handle.net/10150/193827


Abstract

The advance of information technologies (IT) makes it possible to collect a massive amount of data in business applications and information systems. The increasing data volumes require more effective knowledge discovery techniques to make the best use of the data. This dissertation focuses on knowledge discovery on graph-structured data, i.e., graph-based learning. Graph-structured data refers to data instances with relational information indicating their interactions in this study. Graph-structured data exist in a variety of application areas related to information systems, such as business intelligence, knowledge management, e-commerce, medical informatics, etc. Developing knowledge discovery techniques on graph-structured data is critical to decision making and the reuse of knowledge in business applications.In this dissertation, I propose a graph-based learning framework and identify four major knowledge discovery tasks using graph-structured data: topology description, node classification, link prediction, and community detection. I present a series of studies to illustrate the knowledge discovery tasks and propose solutions for these example applications. As to the topology description task, in Chapter 2 I examine the global characteristics of relations extracted from documents. Such relations are extracted using different information processing techniques and aggregated to different analytical unit levels. As to the node classification task, Chapter 3 and Chapter 4 study the patent classification problem and the gene function prediction problem, respectively. In Chapter 3, I model knowledge diffusion and evolution with patent citation networks for patent classification. In Chapter 4, I extend the context assumption in previous research and model context graphs in gene interaction networks for gene function prediction. As to the link prediction task, Chapter 5 presents an example application in recommendation systems. I frame the recommendation problem as link prediction on user-item interaction graphs, and propose capturing graph-related features to tackle this problem. Chapter 6 examines the community detection task in the context of online interactions. In this study, I propose to take advantage of the sentiments (agreements and disagreements) expressed in users' interactions to improve community detection effectiveness. All these examples show that the graph representation allows the graph structure and node/link information to be more effectively utilized in addressing the four knowledge discovery tasks.In general, the graph-based learning framework contributes to the domain of information systems by categorizing related knowledge discovery tasks, promoting the further use of the graph representation, and suggesting approaches for knowledge discovery on graph-structured data. In practice, the proposed graph-based learning framework can be used to develop a variety of IT artifacts that address critical problems in business applications.