The textual dimension "Involved-Informational": A corpus-based study

by Marc Reymann

Institution: Universität Regensburg
Department: Sprach- und Literatur- und Kulturwissenschaften
In the study "The Textual Dimension Involved-Informational", algorithms and their application to corpora of the English Language will be presented. A ground-breaking study predestined to be exhaustively transferred to computer-aided linguistics is presented in Douglas Biber's "Variation across speech and writing" (1988) which describes a way to establish a general typology of English texts. In his study, Biber derives a so-called multi-dimensional (MD) approach to typology based on the frequency of specific grammatical phenomena. The study at hand will focus on the dimension "Involved-Informational". The first chapter deals with the establishing of a completely automatic and modularized computer system written in the programming language PERL, that is able to process any given 'raw' text and produce CSV (comma separated values) files of feature occurrences of the 30 features listed by Biber (1989: 8). The second chapter describes its application on text corpora of English, such as the commonly used LOB/FLOB and BROWN/FROWN corpus pairs as representatives of written English, and the less commonly analyzed corpora of spoken English SEC and COLT. Die Arbeit "The Textual Dimension Involved-Informational" befasst sich mit Algorithmen und deren Anwendung auf Korpora der englischen Sprache. Die Arbeit transferiert die in Douglas Bibers "Variation across speech and writing" (1988) in PL/I aufgestellten Algorithmen nach PERL und konzentriert sich auf die Dimension "Involved-Informational". Der erste Teil stellt ein System zur automatischen Analyse beliebiger englischer Texte vor. Der zweite Teil wendet dieses System auf verschiedene englische Korpora an (LOB/FLOB, BROWN/FROWN, SEC und COLT) und gibt einen kurzen Überblick über synchrone und diachrone Auffälligkeiten.