High dimensional inference: structured sparse models and non-linear measurement channels

by Delaram Motamedvaziri

Institution: Boston University
Year: 2014
Record ID: 2025068
Full text PDF: http://hdl.handle.net/2144/11155


High dimensional inference is motivated by many real life problems such as medical diagnosis, security, and marketing. In statistical inference problems, n data samples are collected where each sample contains p attributes. High dimensional inference deals with problems in which the number of parameters, p, is larger than the sample size, n. To hope for any consistent result within high dimensional framework, data is assumed to lie on a low dimensional manifold. This implies that only k « p parameters are required to characterize p feature variables. One way to impose such a low dimensional structure is a regularization based approach. In this approach, statistical inference problem is mapped to an optimization problem in which a regularizer term penalizes the deviation of the model from a specific structure. The choice of appropriate penalizing functions is often challenging. We explore three major problems that arise in the context of this approach. First, we probe the reconstruction problem under sparse Poisson models. We are motivated by applications in explosive identification, and online marketing where the observations are the counts of a recurring event. We study the amplitude effect which distinguishes our problem from a conventional linear regression least squares problem. Motivated by applications in decentralized sensor networks and distributed multi-task learning, we study the effect of decentralization on high dimensional inference. Finally, we provide a general framework to study the impact of multiple structured models on performance of regularization based reconstruction methods. For each of the afore- mentioned scenarios, we propose an equivalent optimization problem and specify the conditions under which the optimization problem can be solved. Moreover, we mathematically analyze the performance of such recovery method in terms of reconstruction error, prediction error, probability of successful recovery, and sample complexity.