|Institution:||University of Washington|
|Keywords:||audio; classification; deep learning; enhancement; nonstationary signals; statistical models; Electrical engineering; Computer science; Electrical engineering|
|Full text PDF:||http://hdl.handle.net/1773/40878|
Improving the modeling and processing of nonstationary signals remains an important yet challenging problem. In the past, the most effective approach for processing these signals has been statistical modeling. Statistical models can effectively encode domain knowledge and lead to principled algorithms for the fundamental tasks of enhancement, detection, and classification. However, the performance of statistical models can be limited because they inherently make assumptions about the distribution of the data. Deep neural networks, in contrast, have recently outperformed state-of-the-art statistical models of nonstationary signals. Deep neural networks are completely data-driven, and learn to set their parameters by training on large datasets that are assumed to match the distribution of the data. This dissertation follows two approaches for improving modeling and processing of nonstationary signals. The first approach examines conventional model assumptions and suggests improvements that lead to improved performance for processing nonstationary signals. Specifically, noncircular distributions of the complex-valued short-time Fourier transform are shown to improve detection of realistic nonstationary signals. Then the parameterization of a recently-proposed recurrent neural network for processing nonstationary signals is reexamined. By using an optimization method that preserves the capacity of the recurrence matrix, superior performance is achieved on a battery of benchmarks that test the ability of recurrent neural networks to process nonstationary signals. The second approach uses the recently-proposed framework of deep unfolding, which provides a principled means of transforming statistical model inference algorithms into deep networks. This dissertation expands the deep unfolding framework specifically for nonstationary signals. Using this framework, a model-based explanation is provided for state-of-the-art recurrent neural architectures, including gated recurrent unit and unitary recurrent neural networks. Additionally, deep unfolding results in deep network architectures that arise in principled ways from statistical model assumptions. This statistical model foundation provides initializations for the unfolded networks, which lead to better generalization, faster training, and competitive or superior performance on a variety of tasks, including single- and multichannel acoustic source separation and classification of acoustic signals.Advisors/Committee Members: Atlas, Les (advisor), Pitton, James (advisor).