AbstractsComputer Science

Unsupervised video segmentation and its application to activity recognition

by Hsien Ting Cheng




Institution: University of Illinois – Urbana-Champaign
Department: 1200
Degree: PhD
Year: 2015
Keywords: segmentation
Record ID: 2057929
Full text PDF: http://hdl.handle.net/2142/72891


Abstract

We addressed the fundamental problem of computer vision: segmentation and recognition, in the space-time domain. With the knowledge that generic image segmentation introduces unstable regions due to illumination, com- pression, etc., we utilized temporal information to achieve consistent 3D video segmentation. By exploiting non-local structure in both spatial and temporal space, the instabilities of the segmented regions were alleviated. A segmentation tree was built within every frame, and the label consistency was enforced within each subtree (i.e. spatial clique). By roughly tracking 2D regions across each frame, temporal clique was built in which label consis- tency was enforced as well. The high-order (more than binary) Conditional Random Field (CRF) is designed and solved efficiently. Experimental results demonstrate high-quality segmentation quantitatively and qualitatively. Taking segmented 3D regions, called tubes, as input, we developed an activity recognition framework not only to determine which activity existed in a video but also to locate where it happens. A robust tube feature was extracted with photometric and shape dynamics information. Activity was described as a Parts Activity Model (PAM) with a root template and four- part template under the root. Given the nature of the activity recognition problem that only some parts on the video were used to determine the activity label, we used Multiple Instance Learning (MIL) to formulate the problem. Latent variables included a tube index and the parts location under the root template. Experiments were conducted on three well-known datasets and a state-of-the-art result was achieved.