AbstractsComputer Science

Mining and Analyzing Subjective Experiences in UserGenerated Content

by Lu Chen




Institution: Wright State University
Department:
Year: 2016
Keywords: Computer Science; Information Science; Information Technology; subjective experience, subjective information, sentimentanalysis, opinion mining, context-dependency, user generatedcontent, social media
Posted: 02/05/2017
Record ID: 2112568
Full text PDF: http://rave.ohiolink.edu/etdc/view?acc_num=wright1472164969


Abstract

Web 2.0 and social media enable people to create, share and discover information instantly anywhere, anytime. A great amount of this information is subjective information  – the information about people's subjective experiences, ranging from feelings of what is happening in our daily lives to opinions on a wide variety of topics. Subjective information is useful to individuals, businesses, and government agencies to support decision making in areas such as product purchase, marketing strategy, and policy making. However, much useful subjective information is buried in ever-growing user generated data on social media platforms, it is still difficult to extract high quality subjective information and make full use of it with current technologies.Current subjectivity and sentiment analysis research has largely focused on classifying the text polarity  – whether the expressed opinion regarding a specific topic in a given text is positive, negative, or neutral. This narrow definition does not take into account the other types of subjective information such as emotion, intent, and preference, which may prevent their exploitation from reaching their full potential. This dissertation extends the definition and introduces a unified framework for mining and analyzing diverse types of subjective information. We have identified four components of a subjective experience: an individual who holds it, a target that elicits it (e.g., a movie, or an event), a set of expressions that describe it (e.g., 'excellent', 'exciting'), and a classification or assessment that characterize it (e.g., positive vs. negative). Accordingly, this dissertation makes contributions in developing novel and general techniques for the tasks of identifying and extracting these components.We first explore the task of extracting sentiment expressions from social media posts. We propose an optimization-based approach that extracts a diverse set of sentiment-bearing expressions, including formal and slang words/phrases, for a given target from an unlabeled corpus. Instead of associating the overall sentiment with a given text, this method assesses the more fine-grained target-dependent polarity of each sentiment expression. Unlike pattern-based approaches which often fail to capture the diversity of sentiment expressions due to the informal nature of language usage and writing style in social media posts, the proposed approach is capable of identifying sentiment phrases of different lengths and slang expressions including abbreviations and spelling variations. Unlike supervised approaches which require data annotation when applied to a new domain, the proposed approach is unsupervised and thus is highly portable to new domains. We then look into the task of finding opinion targets in product reviews, where the product features (product attributes and components) are usually the targets of opinions. We propose a clustering approach that identifies product features and groups them into aspect categories. Unlike many existing approaches that first extract features… Advisors/Committee Members: Sheth, Amit (Advisor).