Software package helps make sense of complex tree data

trees winter
Credit: Pixabay/CC0 Public Domain

Most plant features arise from complex interactions of genes, proteins and metabolites. The identification and analysis of these genetic traits is very challenging, especially when the sequenced genomes are fragmented. In his thesis, Bastian Schiffthaler has improved the genome information from European aspen and developed bioinformatic tools that help to analyze complex genetic traits in plants.

For sequencing a genome, the DNA is normally cut into small pieces, the sequence is read and then bioinformatic software assembles the whole sequence information using overlapping regions of these small pieces in an iterative process that ideally yields full length chromosomes. For trees, which often have very complex genomes, most available genome assemblies are therefore not very contiguous. Bastian Schiffthaler worked on improving the contiguity of such genomes focussing on European aspen.

The genome sequence of European aspen was already quite good when compared for example to Norway spruce. However, it was still fragmented which made it difficult to carry out analyzes that depend on a highly contiguous assembly. Examples of this are the detection of DNA signatures that relate to traits via genome wide association, or studying evolutionary history by looking at large scale genomic rearrangements.

"Our strategy included modern long read sequencing, polished with highly accurate short-read data and combined with an optical and a genetic map to further link the initially assembled scaffolds into fully assembled chromosomes. At close to 20,000 , the genetic map is one of the most comprehensive ones created for any organism to date. This was an overwhelming mass of information that most of the commonly used free software programs were not able to handle," says Bastian Schiffthaler.

Ordering markers on a genetic map is a classic application of the traveling salesman problem. To derive the perfect order for only sixty markers would take more calculations than are atoms in the universe, hence all software relies on approximations, but even those were too slow for a dataset of this size.

Making sense of complex tree data
Bastian Schiffthaler defended successfully his thesis on the 12th of June at Umeå University. Credit: Alena Aliashkevich

To overcome this problem, Bastian Schiffthaler developed "BatchMap," a that speeds up the computations required to find the order of genetic markers with the highest likelihood given their inheritance patterns. The software divides calculations into small batches, which are easy to compute and can run in parallel. This drastically decreased the calculation time and Bastian Schiffthaler could produce a dense map of genetic signatures on the European aspen chromosomes. Since the creation of BatchMap, it has now been adopted by other genome projects such as those assembling the Norway spruce and octa-ploid strawberry.

"We wanted to evaluate our improved assembly in the context of genome wide association studies to look for genes that are involved in the salicinoid metabolism. These metabolites are only available in Populus and Salix species and help to protect the plant against herbivores," explains Bastian Schiffthaler. "When compared to previous attempts using the more fragmented assembly, we could see that our new version improved the analysis of this complex trait a lot and we were able to gain new insights into the evolution of the different Populus species."

To identify genes that are controlling complex traits is very challenging. Bastian Schiffthaler studied leaf shape variation in European aspen, a complex trait that is inherited from the parents but still highly diverse between individuals. Their results show that leaf shape is controlled by a complex network of many different genes, but the individual gene often exerted only a minor influence on the final leaf shape.

Bastian Schiffthaler believes that to better understand the workings of traits like leaf shape, an integrative approach, where traits are analyzed at all stages that contribute to their emergence. He therefore developed "Seidr," a toolkit to study the interactions of genes that are actively being made into protein within an organism. He hopes that integrating "Seidr" with other layers of data will enable scientists to better predict complex traits in trees in the future.


Explore further

Explaining the shape of a leaf with the help of systems biology

More information: Embracing the data flood: Integrating diverse data to improve phenotype association discovery in forest trees, umu.diva-portal.org/smash/record.jsf?pid=diva2%3A1429905&dswid=-5277
Provided by Umea University
Citation: Software package helps make sense of complex tree data (2020, June 26) retrieved 13 July 2020 from https://phys.org/news/2020-06-software-package-complex-tree.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
5 shares

Feedback to editors

User comments