AbstractsComputer Science

Full Forest Treebanking

by Woodley Packard




Institution: University of Washington
Department:
Year: 2015
Keywords: annotation; grammar; HPSG; treebank; Linguistics
Record ID: 2062676
Full text PDF: http://hdl.handle.net/1773/33194


Abstract

In this thesis, I present a new method of producing treebanks using constraint-based grammars. Rather than requiring an explicitly enumerated set of candidate analyses per utterance, my method works from an implicit representation, allowing the annotator to efficiently select the correct analysis from trillions of possibilities, without requiring the user or the computer to store or iterate over all of them. I explain the advantages and disadvantages of this method, and show the details and motivation for the algorithms that make it possible. Relative to comparable prior art (i.e. top-N treebanking), my solution enables higher coverage treebanks without a significant reduction in annotation speed, and reduces storage and computational resource consumption.