|Keywords:||Engineering and Technology; Teknik och teknologier; Master Programme in Engineering Physics; Civilingenjörsprogrammet i teknisk fysik|
|Full text PDF:||http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-251645|
The importance of finding extreme events or unexpected patterns has increased over the last two decades, mainly due rapid advancements in technology. These events or patterns are referred to as anomalies. This thesis focuses on detecting anomalies in form of sudden peaks occurring in time series generated from online text analysis in Gavagai’s live environment. To our knowledge there exist a limited number of sequential peak detection models applicable in this domain. We introduce a novel technique using the Local Outlier Factor model as well as a model built on simple linear regression with a Bayesian error function, both operating in real-time. We also study a model based on linear Poisson regression. With the constraint from Gavagai that the models should be easy to setup for different targets, it requires them to be non-parametric. The Local Outlier Factor model and the simple linear regression model show promising results comparing them to Gavagai’s current working model. All models were tested on 3 datasets representing 3 different sentiment targets; positivity, negativity and frequency. Not only do our models superiorly succeed to detect the anomalies, but also they do so with fixed parameters independent of target looked at. This means that our models have lower error rate even though they are non-parametric constructed, compared to Gavagai’s current model that requires tuning per target of interest to operate with sufficient accuracy.