AbstractsMathematics

Newton-based Optimization Methods for Noise-Contrastive Estimation

by Beenish Qaiser




Institution: University of Helsinki
Department:
Year: 2015
Keywords: Algorithms and Machine Learning
Record ID: 1133099
Full text PDF: http://hdl.handle.net/10138/153589


Abstract

The main focus of this thesis is on the use of Newton-based optimization methods for the optimiza- tion of an objective function that is used in the estimation of unnormalized statistical models. For these models, the probability density function (pdf) is known only up to a multiplicative normal- izing factor. A properly normalized pdf is essential in maximum likelihood estimation (MLE) for density estimation. An unnormalized model can be converted into a normalized one by diving it by its integral (or sum) known as the partition function. We can compute the partition function ana- lytically or approximate it using numerical integration methods. Here, we assume that the partition function is not available in a closed form. This makes MLE unsuitable for density estimation. We use a method known as noise-contrastive estimation (NCE) for density estimation of unnormalized models. This method does not rely on numerical integration to approximate the partition function. It estimates the normalizing constant along with the other unknown quantities of the model. The estimation process is based on the optimization of a well-defined objective function also known as the cross-entropy error function. There are no constraints in the optimization and hence, powerful optimization methods designed for unconstrained optimization can be used. Currently, a first-order optimization method known as the non-linear conjugate gradient (CG) method is being used. However, it has been shown that this method converges at a slow rate in case of large datasets. It is possible to use only a fraction of input samples (data and noise) in order to reduce the computation time of the algorithm. This technique is known as sample average approximation (SAA). However, accuracy of the estimates is compromised when random subsets of input samples are used in order to improve the computational performance of the non-linear CG method. There exists a trade-off between statistical accuracy of the estimates and computational performance of the algorithm. We propose to use the second-order Newton-based optimization methods such as the line search Newton-CG and the trust region Newton-CG methods. These methods produce better search directions than the non-linear CG method as they employ both the gradient and the Hessian of the objective function. However, the Newton method requires the Hessian to be positive definite in order to make progress. Thus, we use the Gauss-Newton approx- imation to the Hessian to avoid directions of negative curvature in case they occur. Furthermore, every iteration of the Newton method is computationally intensive as it requires computation of the Hessian and its inverse. We integrate the Newton-CG methods with the SAA framework to provide an efficient solution. The gradient is computed using whole sets of input samples whereas the Hessian is computed using random subsets. As a result, we are able to reduce the computation times of the Newton-CG algorithms without losing the statistical accuracy of the estimates. It is shown that the…