|Institution:||California State University – Northridge|
|Keywords:||Hadoop; Dissertations, Academic – CSUN – Computer Science.|
|Full text PDF:||http://hdl.handle.net/10211.3/171795|
Apache Spark is an engine for large scale data processing, best described as a more flexible version of the venerable MapReduce framework with the capability to perform in-memory processing to speed up computation. BLAST (Basic Local Alignment Search Tool) is a tool used to determine the similarity of DNA and protein sequences that is used extensively in bioinformatics studies. To illustrate the use of Apache Spark in porting bioinformatics tools to the world of cluster computing, I demonstrate the creation of a distributed version of BLAST with minimal coding and simple cluster deployment. I measure performance, result accuracy, and cost efficiency to determine the viability of this approach to BLAST and other bioinformatics programs. Advisors/Committee Members: Wang, Taehyung (advisor), Kaplan, Adam B (committee member).