Abstracts

Extending the explanatory power of factor pricing models using topic modeling

by Nils Everling




Institution: KTH
Department:
Year: 2017
Keywords: topic modeling; nlp; nmf; nonnegative matrix factorization; earnings call; transcript; risk; apt; factor model; gics; global industry classification standard; msci; industry; portfolio management; stock market; equities; Computer Sciences; Datavetenskap (datalogi)
Posted: 02/01/2018
Record ID: 2195799
Full text PDF: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210253


Abstract

Factor models attribute stock returns to a linear combination of factors. A model with great explanatory power (R2) can be used to estimate the systematic risk of an investment. One of the most important factors is the industry which the company of the stock operates in. In commercial risk models this factor is often determined with a manually constructed stock classification scheme such as GICS. We present Natural Language Industry Scheme (NLIS), an automatic and multivalued classification scheme based on topic modeling. The topic modeling is performed on transcripts of company earnings calls and identifies a number of topics analogous to industries. We use non-negative matrix factorization (NMF) on a term-document matrix of the transcripts to perform the topic modeling. When set to explain returns of the MSCI USA index we find that NLIS consistently outperforms GICS, often by several hundred basis points. We attribute this to NLIS ability to assign a stock to multiple industries. We also suggest that the proportions of industry assignments for a given stock could correspond to expected future revenue sources rather than current revenue sources. This property could explain some of NLIS success since it closely relates to theoretical stock pricing. Faktormodeller frklarar aktieprisrrelser med en linjr kombination av faktorer. En modell med hg frklaringsgrad (R2) kan anvndas fratt skatta en investerings systematiska risk. En av de viktigaste faktorerna r aktiebolagets industritillhrighet. I kommersiella risksystem bestms industri oftast med ett aktieklassifikationsschema som GICS, publicerat av ett finansiellt institut. Vi presenterar Natural Language Industry Scheme (NLIS), ett automatiskt klassifikationsschema baserat p topic modeling. Vi utfr topic modeling p transkript av aktiebolags investerarsamtal. Detta identifierar mnen, eller topics, som r jmfrbara med industrier. Topic modeling sker genom icke-negativmatrisfaktorisering (NMF) p en ord-dokumentmatris av transkripten. Nr NLIS anvnds fr att frklara prisrrelser hos MSCI USA-indexet finner vi att NLIS vertrffar GICS, ofta med 2-3 procent. Detta tillskriver vi NLIS frmga att ge flera industritillhrigheter t samma aktie. Vi freslr ocks att proportionerna hos industritillhrigheterna fr en aktie kan motsvara frvntade inkomstkllor snarare n nuvarande inkomstkllor. Denna egenskap kan ocks vara en anledning till NLIS framgng d den nra relaterar till teoretisk aktieprissttning.