Software package helps make sense of complex tree data
Most plant features arise from complex interactions of genes, proteins and metabolites. The identification and analysis of these genetic traits is very challenging, especially when the sequenced genomes are fragmented. In his thesis, Bastian Schiffthaler has improved the genome information from European aspen and developed bioinformatic tools that help to analyze complex genetic traits in plants.
For sequencing a genome, the DNA is normally cut into small pieces, the sequence is read and then bioinformatic software assembles the whole sequence information using overlapping regions of these small pieces in an iterative process that ideally yields full length chromosomes. For trees, which often have very complex genomes, most available genome assemblies are therefore not very contiguous. Bastian Schiffthaler worked on improving the contiguity of such genomes focussing on European aspen.
The genome sequence of European aspen was already quite good when compared for example to Norway spruce. However, it was still fragmented which made it difficult to carry out analyzes that depend on a highly contiguous assembly. Examples of this are the detection of DNA signatures that relate to traits via genome wide association, or studying evolutionary history by looking at large scale genomic rearrangements.
"Our strategy included modern long read sequencing, polished with highly accurate short-read data and combined with an optical and a genetic map to further link the initially assembled scaffolds into fully assembled chromosomes. At close to 20,000 genetic markers, the genetic map is one of the most comprehensive ones created for any organism to date. This was an overwhelming mass of information that most of the commonly used free software programs were not able to handle," says Bastian Schiffthaler.
Ordering markers on a genetic map is a classic application of the traveling salesman problem. To derive the perfect order for only sixty markers would take more calculations than are atoms in the universe, hence all software relies on approximations, but even those were too slow for a dataset of this size.
To overcome this problem, Bastian Schiffthaler developed "BatchMap," a software package that speeds up the computations required to find the order of genetic markers with the highest likelihood given their inheritance patterns. The software divides calculations into small batches, which are easy to compute and can run in parallel. This drastically decreased the calculation time and Bastian Schiffthaler could produce a dense map of genetic signatures on the European aspen chromosomes. Since the creation of BatchMap, it has now been adopted by other genome projects such as those assembling the Norway spruce and octa-ploid strawberry.
"We wanted to evaluate our improved assembly in the context of genome wide association studies to look for genes that are involved in the salicinoid metabolism. These metabolites are only available in Populus and Salix species and help to protect the plant against herbivores," explains Bastian Schiffthaler. "When compared to previous attempts using the more fragmented assembly, we could see that our new genome version improved the analysis of this complex trait a lot and we were able to gain new insights into the evolution of the different Populus species."
To identify genes that are controlling complex traits is very challenging. Bastian Schiffthaler studied leaf shape variation in European aspen, a complex trait that is inherited from the parents but still highly diverse between individuals. Their results show that leaf shape is controlled by a complex network of many different genes, but the individual gene often exerted only a minor influence on the final leaf shape.
Bastian Schiffthaler believes that to better understand the workings of traits like leaf shape, an integrative approach, where traits are analyzed at all stages that contribute to their emergence. He therefore developed "Seidr," a toolkit to study the interactions of genes that are actively being made into protein within an organism. He hopes that integrating "Seidr" with other layers of data will enable scientists to better predict complex traits in trees in the future.