Advancing the application of genomic sequences through 'Kmasker plants'

Advancing the application of genomic sequences through 'Kmasker plants'
Applications and methods of the bioinformatics tool "Kmasker plants" for the analysis of sequence data. Credit: Chris Ulpinnis / IPB Halle & Pixabay

The development of next-generation-sequencing (NGS) has enabled researchers to investigate genomes that might previously have been considered too complex or too expensive. Nevertheless, the analysis of complex plant genomes, which often have an enormous amount of repetitive sequences, is still a challenge. Therefore, bioinformatics researchers from Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Martin Luther University Halle-Wittenberg (MLU) and Leibniz Institute of Plant Biochemistry (IPB) have now published "Kmasker plants," a program that allows the identification of repetitive sequences and thus facilitates the analysis of plant genomes.

In bioinformatics, the term k-mer is used to describe a of a certain length "k." By defining and counting such sequences, researchers can quantify repetitive sequences in the they are studying and assign them to corresponding positions. As early as 2014, researchers at IPK in Gatersleben used this approach to develop the in-silico (computer-based) tool "Kmasker." It was used to detect repetitions in the characterisation of the barley genome (Schmutzer et al., 2014).

The use of NGS is becoming more and more important, but the error-free composition of complex genomes from NGS results is still a challenge. For this reason, the researchers recently decided to revive and expand this initial proof-of-concept project. Under the leadership of Dr. Thomas Schmutzer, formerly from the research group "Bioinformatics and Information Technology" at IPK and now affiliated with the MLU, scientists from the MLU, the IPK, Wageningen University & Research and the IPB Halle worked in close cooperation on the redesign and development of "Kmasker ." This collaboration was largely supported by the two service centres "GCBN" and "CiBi" from the German Network for Bioinformatics Infrastructure "de.NBI."

"Kmasker plants" allows for the rapid and reference-free screening of nucleotide sequences using genome-wide derived k-mers. In extension to the previous version, the bioinformatics tool now also enables comparative studies between different cultivars or closely , and supports the identification of sequences suitable as fluorescence in situ hybridisation (FISH) probes or CRISPR/Cas9-specific guide RNAs. Furthermore, "Kmasker plants" has been published with a web service that contains the pre-computed indices for selected economically important crop plants, such as barley or wheat. Dr. Schmutzer emphasises that "this tool will enable plant researchers all over the world to test plant genomes and thus, for example, identify repeat free parts of their sequence of interest." Rather, he believes that the enhanced features will make it possible to detect sequence candidate regions that have multiplied in the genome of one species but are missing in other species or occur in smaller copy numbers. This is a common effect that contributes to phenotypic variation of agronomic importance in various crops. A significant example is the Vrn-H2 gene, which is present in a single copy in winter barley, while it is missing in barley spring lines.

The "Kmasker plants" web-service is now available as part of the IPK Crop Analysis Tool Suite (CATS) and therefore as a service of the de.NBI Service Platform. Alternatively, the "Kmasker plants" source code can directly be accessed and installed via GitHub.

Explore further

Illuminating the genome

More information: Sebastian Beier et al, Kmasker plants – a tool for assessing complex sequence space in plant species, The Plant Journal (2019). DOI: 10.1111/tpj.14645

"Kmasker plants" is available as a web service or can be installed via

Journal information: The Plant Journal

Provided by Leibniz Institute of Plant Genetics and Crop Plant Research
Citation: Advancing the application of genomic sequences through 'Kmasker plants' (2020, January 21) retrieved 28 November 2021 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Feedback to editors