Complex grammar of the genomic language

November 9, 2015, Karolinska Institutet
Researchers Arttu Jolma and Jussi Taipale in the lab at the Department of Biosciences and Nutrition, Karolinska Institutet in Sweden. Credit: Ulf Sirborn

A new study from Sweden's Karolinska Institutet shows that the 'grammar' of the human genetic code is more complex than that of even the most intricately constructed spoken languages in the world. The findings, published in the journal Nature, explain why the human genome is so difficult to decipher—and contribute to the further understanding of how genetic differences affect the risk of developing diseases on an individual level.

"The genome contains all the information needed to build and maintain an organism, but it also holds the details of an individual's risk of developing common diseases such as diabetes, heart disease and cancer", says study lead-author Arttu Jolma, doctoral student at the Department of Biosciences and Nutrition. "If we can improve our ability to read and understand the , we will also be able to make better use of the rapidly accumulating genomic information on a large number of diseases for medical benefits."

The sequencing of the human genome in the year 2000 revealed how the 3 billion letters of A, C, G and T, that the human genome consists of, are ordered. However, knowing just the order of the letters is not sufficient for translating the genomic discoveries into medical benefits; one also needs to understand what the sequences of letters mean. In other words, it is necessary to identify the 'words' and the 'grammar' of the language of the genome.

The cells in our body have almost identical genomes, but differ from each other because different genes are active (expressed) in different types of cells. Each gene has a regulatory region that contains the instructions controlling when and where the gene is expressed. This gene regulatory code is read by proteins called transcription factors that bind to specific 'DNA words' and either increase or decrease the expression of the associated gene.

Under the supervision of Professor Jussi Taipale, researchers at Karolinska Institutet have previously identified most of the DNA words recognised by individual transcription factors. However, much like in a natural human language, the DNA words can be joined to form compound words that are read by multiple transcription factors. However, the mechanism by which such compound words are read has not previously been examined. Therefore, in their recent study in Nature, the Taipale team examines the binding preferences of pairs of , and systematically maps the compound DNA words they bind to.

Their analysis reveals that the grammar of the is much more complex than that of even the most complex human languages. Instead of simply joining two words together by deleting a space, the individual words that are joined together in compound DNA words are altered, leading to a large number of completely new words.

"Our study identified many such words, increasing the understanding of how genes are regulated both in normal development and cancer", says Arttu Jolma. "The results pave the way for cracking the genetic code that controls the expression of genes. "

Explore further: Language of gene switches unchanged across the evolution

More information: Arttu Jolma et al. DNA-dependent formation of transcription factor pairs alters their binding specificity, Nature (2015). DOI: 10.1038/nature15518

Related Stories

Language of gene switches unchanged across the evolution

March 17, 2015

The language used in the switches that turn genes on and off has remained the same across millions of years of evolution, according to a new study led by researchers at Karolinska Institutet in Sweden. The findings, which ...

Learning the alphabet of gene control

January 17, 2013

Scientists at Karolinska Institutet in Sweden have made a large step towards the understanding of how human genes are regulated. In a new study, published in the journal Cell, they identified the DNA sequences that bind to ...

Cell memory mechanism discovered

August 15, 2013

The cells in our bodies can divide as often as once every 24 hours, creating a new, identical copy. DNA binding proteins called transcription factors are required for maintaining cell identity. They ensure that daughter cells ...

The human genome: A complex orchestra

August 20, 2015

A team of Swiss geneticists from the University of Geneva (UNIGE), the École Polytechnique Fédérale de Lausanne (EPFL), and the University of Lausanne (UNIL) discovered that genetic variation has the potential to affect ...

Recommended for you

Galactic center visualization delivers star power

March 21, 2019

Want to take a trip to the center of the Milky Way? Check out a new immersive, ultra-high-definition visualization. This 360-movie offers an unparalleled opportunity to look around the center of the galaxy, from the vantage ...

Physicists reveal why matter dominates universe

March 21, 2019

Physicists in the College of Arts and Sciences at Syracuse University have confirmed that matter and antimatter decay differently for elementary particles containing charmed quarks.


Adjust slider to filter visible comments by rank

Display comments: newest first

4.1 / 5 (17) Nov 09, 2015
See also: Conservation of transcription factor binding specificities across 600 million years of bilateria evolution

Excerpt: "Amino-acid sequence similarity predicts TF DNA-binding specificity

As suggested by previous studies (Wei et al., 2010; Jolma et al., 2013; Weirauch et al., 2014), different TF structural families had clearly divergent specificities, and amino-acid sequence similarity was predictive of overall DNA binding specificity in most TF families."

It makes no sense to comment on nutrient-dependent similarities in RNA-mediated amino acid substitutions that are pheromone-controlled and to keep trying to place differences into the context of 600 million years of evolution.

The similarities in the cell types of one microbial species were linked to nutrient-dependent pheromone-controlled RNA-mediated differences in the cell types of the same species that "re-evolved" its flagellum over-the-weekend via two amino acid substitutions.
1.9 / 5 (13) Nov 09, 2015
Idiot young earth creationists like JVK do have a problem with 600 million years of evolution.

Actually, idiots like JVK have a problem with evolution and the earth being more than 6,000 years old.
Captain Stumpy
2 / 5 (12) Nov 10, 2015
Idiot young earth creationists like JVK do have a problem with 600 million years of evolution.

Actually, idiots like JVK have a problem with evolution and the earth being more than 6,000 years old.
the funniest thing about that is: you can debunk their "young-earth" age claims by simply letting them visit a couple of tree's in California

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.