|Institution:||Technische Universität Dresden|
|Full text PDF:||http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-159884|
The amount of biomedical literature has been increasing rapidly during the last decade. Text mining techniques can harness this large-scale data, shed light onto complex drug mechanisms, and extract relation information that can support computational polypharmacology. In this work, we introduce CASSANDRA, a fully corpus-based and unsupervised algorithm which uses the MEDLINE indexed titles and abstracts to infer drug gene associations and assist drug repositioning. CASSANDRA measures the Pointwise Mutual Information (PMI) between biomedical terms derived from Gene Ontology (GO) and Medical Subject Headings (MeSH). Based on the PMI scores, drug and gene profiles are generated and candidate drug gene associations are inferred when computing the relatedness of their profiles. Results show that an Area Under the Curve (AUC) of up to 0.88 can be achieved. The algorithm can successfully identify direct drug gene associations with high precision and prioritize them over indirect drug gene associations. Validation shows that the statistically derived profiles from literature perform as good as (and at times better than) the manually curated profiles. In addition, we examine CASSANDRA’s potential towards drug repositioning. For all FDA-approved drugs repositioned over the last 5 years, we generate profiles from publications before 2009 and show that the new indications rank high in these profiles. In summary, co-occurrence based profiles derived from the biomedical literature can accurately predict drug gene associations and provide insights onto potential repositioning cases.