|Institution:||Delft University of Technology|
|Keywords:||mean shift; data stream clustering; data mining|
|Full text PDF:||http://resolver.tudelft.nl/uuid:7fdb578a-a3e3-430c-b257-c85bfc45d3d9|
Mean Shift is a well-known clustering algorithm that has attractive properties such as the ability to find non convex and local clusters even in high dimensional spaces, while remaining relatively insensitive to outliers. However, due to its poor computational performance, real-world applications are limited. In this thesis, we propose a novel acceleration strategy for the traditional Mean Shift algorithm, along with a two-layers strategy, resulting in a considerable performance increase, while maintaining high cluster quality. We also show how to to find clusters in a streaming environment with bounded memory, in which queries can be answered at interactive rates, and for which no Mean Shift-based algorithm currently exists. Our online structure can be updated at very minimal cost and as infrequently as possible, and we show how to detect the time at which this update needs to be performed. Our technique is validated extensively in both static and streaming environments.