AbstractsComputer Science

Continuous Authentication using Stylometry

by Marcelo Luiz Brocardo




Institution: University of Victoria
Department:
Year: 2015
Keywords: Continuous authentication; Stylometry; Deep Belief Network; short message verification
Record ID: 2059502
Full text PDF: http://hdl.handle.net/1828/6098


Abstract

Static authentication, where user identity is checked once at login time, can be circumvented no matter how strong the authentication mechanism is. Through attacks such as man-in-the-middle and its variants, an authenticated session can be hijacked later after the initial login process has been completed. In the last decade, continuous authentication (CA) using biometrics has emerged as a possible remedy against session hijacking. CA consists of testing the authenticity of the user repeatedly throughout the authenticated session as data becomes available. CA is expected to be carried out unobtrusively, due to its repetitive nature, which means that the authentication information must be collectible without any active involvement of the user and without using any special purpose hardware devices (e.g. biometric readers). Stylometry analysis, which consists of checking whether a target document was written or not by a specific individual, could potentially be used for CA. Although stylometric techniques can achieve high accuracy rates for long documents, it is still challenging to identify an author for short documents, in particular when dealing with large author populations. In this dissertation, we propose a new framework for continuous authentication using authorship verification based on the writing style. Authorship verification can be checked using stylometric techniques through the analysis of linguistic styles and writing characteristics of the authors. Different from traditional authorship verification that focuses on long texts, we tackle the use of short messages. Shorter authentication delay (i.e. smaller data sample) is essential to reduce the window size of the re-authentication period in CA. We validate our method using different block sizes, including 140, 280, and 500 characters, and investigate shallow and deep learning architectures for machine learning classification. Experimental evaluation of the proposed authorship verification approach based on the Enron emails dataset with 76 authors yields an Equal Error Rate (EER) of 8.21% and Twitter dataset with 100 authors yields an EER of 10.08%. The evaluation of the approach using relatively smaller forgery samples with 10 authors yields an EER of 5.48%.