AbstractsComputer Science

SparkBLAST: An Implementation of NCBI BLAST in the Apache Spark Framework

by Christopher Chambers




Institution: California State University – Northridge
Department:
Year: 2016
Keywords: Hadoop; Dissertations, Academic  – CSUN  – Computer Science.
Posted: 02/05/2017
Record ID: 2134326
Full text PDF: http://hdl.handle.net/10211.3/171795


Abstract

Apache Spark is an engine for large scale data processing, best described as a more flexible version of the venerable MapReduce framework with the capability to perform in-memory processing to speed up computation. BLAST (Basic Local Alignment Search Tool) is a tool used to determine the similarity of DNA and protein sequences that is used extensively in bioinformatics studies. To illustrate the use of Apache Spark in porting bioinformatics tools to the world of cluster computing, I demonstrate the creation of a distributed version of BLAST with minimal coding and simple cluster deployment. I measure performance, result accuracy, and cost efficiency to determine the viability of this approach to BLAST and other bioinformatics programs. Advisors/Committee Members: Wang, Taehyung (advisor), Kaplan, Adam B (committee member).