|Keywords:||bayesian model, next generation sequencing, large ammounts of data, java, machine learning|
|Full text PDF:||http://dspace.library.uu.nl:8080/handle/1874/298579|
We address the problem of accurately and efficiently identifying de-novo mutations in the human germline. More precisely, how can we detect de-novo point mutations on the sex chromosome in a robust yet sensible manner? What are the challenges that arise from the quality of the available data for this chromosome? What is the pattern of de-novo events on this chromosome, compared to the rest of our genome? The challenge of devising a discovery method for such events comes from their rarity relative to the error rates of the underlying technology involved in DNA reading. We discuss the relevance of this research in the light of our increasing understanding of evolution and our genetic code’s structure and function, as well as its practical applications of finding genetic disease risk factors. We present the field’s currently most used analysis methods and technologies, and describe each step that influences the design and/or performance of the model we implement. We present a straightforward yet efficient general model of de-novo mutations discovery and then show how the model needs to be adapted in order to correctly capture the particularities of the chromosome. Furthermore we illustrate what information can be explained by our model and where we still need to apply domain knowledge to correct the output. Finally, we show how the model is integrated in the complex and modular analysis pipeline used in the community.