|Keywords:||Computer Science; Health Sciences, General|
|Full text PDF:||http://nrs.harvard.edu/urn-3:HUL.InstRepos:14398542|
Though prevalence and awareness for Autism Spectrum Disorder (ASD) has steadily increased, a true understanding is hard to reach because of the behavior-based nature of the diagnosis and the heterogeneity of its manifestations. Parents and caregivers often informally discuss symptoms and behaviors they observe from their children with autism through online medical forums, contrasting the more traditional and structured text of electronic medical records collected by doctors. We modify an anchor word driven topic model algorithm originally proposed by Arora et al. (2012a) to elicit and compare the medical concept topics, or “themes” from both modes of data: the novel data set of posts from autism-specific online medical forums and electronic medical records. We present methods to extract relevant medical concepts from colloquially written forum posts through the use of choice sections of the consumer health vocabulary and other filtering techniques. In order to account for the sparsity of concept data, we propose and evaluate a more robust approach to selecting anchor words that takes into account variance and inclusivity. This approach that combines concept and anchor words selection seeds the discussion about how unstructured text can influence and expand understanding of the enigmatic disorder, autism, and how these methods can be applied to similar sources of texts to solve other problems.