AbstractsComputer Science

Towards Large Scale Summarization

by Janara Maria Christensen




Institution: University of Washington
Department:
Degree: PhD
Year: 2015
Keywords: ; Computer science
Record ID: 2060246
Full text PDF: http://hdl.handle.net/1773/27448


Abstract

As the Internet grows and information is increasingly available, it is more and more difficult to understand what is most important without becoming overwhelmed by details. We need systems which can organize this information and present it in a coherent fashion. These systems should also be flexible, enabling the user to tailor the results to his or her own needs. Current solutions such as summarization are static and lack coherent organization. Even structured solutions such as timelines are inflexible. These problems become increasingly important as the size of the information grows. I propose a new approach to scaling up summarization called hierarchical summarization, which emphasizes organization and flexibility. In a hierarchical summary, the top level gives the most general overview of the information, and each subsequent level gives more detail. Hierarchical summarization allows the user to understand at a high level the most important information, and then explore what is most interesting to him or her without being overwhelmed by information. In this work, I formalize the characteristics necessary for good hierarchical summaries and provide algorithms to generate them. I perform user studies which demonstrate the value of hierarchical summaries over competing methods on datasets much larger than used for traditional summarization.