Effects of clipping distortion on an Automatic Speaker Recognition system

by Jose Luis Ramirez

Institution: University of Colorado at Denver
Year: 2016
Keywords: Computer science
Posted: 02/05/2017
Record ID: 2109038
Full text PDF: http://pqdtopen.proquest.com/#viewpdf?dispub=10112619


Clipping distortion is a common problem faced in the audio recording world in which an audio signal is recorded at higher amplitude than the recording system’s limitations, resulting in a portion of the acoustic event not being recorded. Several government agencies employ the use of Automatic Speaker Recognition (ASR) systems in order to identify the speaker of an acquired recording. This is done automatically using a nonbiased approach by running a questioned recording through an ASR system and comparing it to a pre-existing database of voice samples of whom the speakers are known. A matched speaker is indicated by a high correlation of likelihood between the questioned recording and the ones from the known database. It is possible that during the process of making the questioned recording the speaker was speaking too loudly into the recording device, a gain setting was set too high, or there was post-processing done to the point that clipping distortion is introduced into the recording. Clipping distortion results from the amplitude of an audio signal surpassing the maximum sampling value of the recording system. This affects the quantized audio signal by truncating peaks at the max value rather than the actual amplitude of the input signal. In theory clipping distortion will affect likelihood ratios in a negative way between two compared recordings of the same speaker. This thesis will test this hypothesis. Currently there is no research that has helped as a guideline for knowing the limitations when using clipped recordings. This thesis will investigate to what degree of effect will clipped material have on the system performance of a Forensic Automatic Speaker Recognition system.