AbstractsComputer Science

A SYSTEM FOR AUTOMATED CONTENT ORGANIZATION

by Ye Tian




Institution: Case Western Reserve University
Department: Computer Engineering
Degree: PhD
Year: 2006
Keywords: ACOSys; FCA; Content Organization; Information Retrieval; hierarchical menu; concepts; context; Berkeley DB
Record ID: 1787467
Full text PDF: http://rave.ohiolink.edu/etdc/view?acc_num=case1151077709


Abstract

The main goal of Information Retrieval (IR) is to facilitate information access from large document collections. Starting from a user’s query, usually made in a natural language, a classic IR system retrieves a set of items relevant to the user’s query and displays them as a ranked list. Search-engines are examples of IR Systems. They are effective in finding specific items, but search results for less specific information tend to be off-target, overwhelming, and less useful. In this thesis, we report the design, prototyping, and experiences of an experimental system called ACOSys for automated organization of content using menu/folder hierarchies based on a mathematical theory called Formal Concept Analysis (FCA). ACOSys utilizes the concept of FCA and the structure of a hierarchical menu to categorize search results into more specific groups. The resulting items can be found by quickly zeroing in on subfolders where they may reside, saving the effort of browsing through thousands of off-target items. The technical contribution of this thesis consists of the design and implementation of algorithms in three related categories. First, we develop the principle and the coding of a new algorithm for generating concepts and rules. We show by both theoretical and practical study that it is an efficient algorithm for both sparse and dense contexts. Second, we develop an algorithm for maintaining and updating the construction of concept lattices. In comparison to other incremental algorithms, our algorithm not only updates the concept set, but also updates the menu/folder structure; additional items can be added incrementally, and not as an overhaul. Third, instead of using a simple string-match, we provide a semi-automated process for keyword selection, which involves decision-making by a user based on measures such as the word-distribution statistics of a collection. By using the indexing strategy of the Berkeley DB (database system), context sensitive menu hierarchies are constructed in seconds, making ACOSys practical on large number of objects and attributes.