Collaborative Research: Innovation: Pioneering New Approaches to Explore Pangenomic Space at Scale.
About
We propose to develop new software tools for pangenomic analysis. These tools make use of a graph based representation of a pangenome and exploit this representation to efficiently find both shared and unique regions of interest within a pangenome. The proposed work builds on initial work based on finding frequented regions (FRs) in De Bruijn graphs. In this “Innovation” proposal, we will extend our existing work with the goal of developing useful algorithms and software tools that facilitate new pangenomics research. The proposed work will refine algorithms and develop software to address important problems in each of the identified areas. The research team has a variety of complementary expertise ranging from molecular biology, algorithms, machine learning and genomics research. Pangenomic biology will be advanced through automatic identification of candidate regions of interest in a pangenome. Methods will be developed to discover regions that are conserved across evolutionary space, regions that are novel, and regions that have diverged due to positive selection. Machine learning techniques will be used to search for FRs underlying important genomic regions. Lastly, this work will complement the work being done on the model plant, Medicago truncatula, contributing to research on its symbiotic relationships. The current trajectory of next generation sequencing improvements, including falling costs and increased read lengths and throughput, ensure that multiple genomes per species will be routine within the next decade. This proposal initiates work on a next generation of bioinformatics software that can exploit the increased information content available from multiple accessions and intelligently use the data for unbiased, species-wide analyses.
FUNDING AGENCY: National Science Foundation (NSF), DBI - ADVANCES IN BIOINFORMATICS
ESTIMATED END DATE: June 30th 2021
Members