Learning Topical Social Media Sensors for Twitter

by Zahra Iman

Institution: Oregon State University
Year: 2016
Keywords: Social Media Sensors; Social media
Posted: 02/05/2017
Record ID: 2067052
Full text PDF: http://hdl.handle.net/1957/59188


Social media sources such as Twitter represent a massively distributed social sensor over diverse topics ranging from social and political events to entertainment and sports news. However, due to the overwhelming volume of content, it can be difficult to identify novel and significant content within a broad topic in a timely fashion. To this end, this thesis proposes a scalable and practical method to automatically construct social sensors for generic topics. The concept of using social media as a sensor for detection of events and news has been proposed in the literature. However, we argue that most of these works do not focus on targeted content detection or they use very basic methods for collecting the topical data for further analysis. This demonstrates a gap in the use of social media as a sensor for high-quality topical content detection that we aim to address via machine learning. In this thesis, given minimal supervised training content from a user, we learn to identify topical tweets from millions of features capturing content, user and social interactions on Twitter. On a corpus of over 800 million English Tweets collected from the Twitter streaming API during 2013 and 2014 and learning for 10 diverse topics, we empirically show that our learned social sensor automatically generalizes to unseen future content with high ranking and precision scores. Furthermore, we provide an extensive analysis of features and feature types across different topics that reveals, for example, that (1) largely independent of topic, simple terms are the most informative feature followed by location features and that (2) the number of unique hashtags and tweets by a user correlates more with their informativeness than their follower or friend count. In summary, this work provides a novel, effective, and efficient way to learn topical social sensors requiring minimal user curation effort and offering strong generalization performance for identifying future topical content. Advisors/Committee Members: Sanner, Scott P. (advisor), Termehchy, Arash (committee member).