The DNA of every organism holds the blueprints for building all the proteins it needs for its metabolic processes. While researchers already know what the blueprints look like for most proteins, they do not know what many of these proteins actually do in the body.
An interdisciplinary team composed of experimental and computational scientists from the Luxembourg Centre for Systems Biomedicine (LCSB) of the University of Luxembourg has now systematically quantified and characterized the extent of this knowledge gap. An unprecedented effort has been directed towards predicting more specifically how many, among the proteins of unknown function, are enzymes. These are proteins specialized in enabling the thousands of chemical reactions occurring at all times in living cells. "We have found that about 30 percent of the 'unknown' proteins found for example in yeast and in the human body are enzymes for which we are ignorant of what role they play in the cells or in the organism as a whole," says Dr. Carole Linster at LCSB. The team published its results in the scientific journal Nucleic Acids Research.
Many diseases, in particular inherited metabolic diseases, are associated with a genetic defect that results in the misfolding or even complete lack of certain enzymes. Researchers therefore hope to gain a better insight into the onset and triggers of these diseases through the analysis of gene sequences, the genetic blueprints for these enzymes. Thanks to modern sequencing techniques, it is already possible to decipher an entire genome – the whole complement of an organism's DNA – quickly and affordably.
However, scientists are aware of a serious gap in their understanding. "So far, we have deciphered thousands of genomes from many different species and we know what proteins they are translated into," says Linster, who led the study. "But we saw from our analyses that, when it comes to our understanding of them, there are still a huge number of blind spots on the protein map. Even in organisms that have been investigated intensively for years, for about one third of the proteins produced, we are uncertain as to what function they serve in the organism."
The biochemist draws an analogy for this knowledge gap with an archaeologist who has found an ancient script: "Even if the researcher can decipher the individual letters, it does not automatically mean he can understand the message of what has been written. For that, he first has to find out what the individual words mean." The situation is very similar for researchers who are investigating the causes of rare genetic diseases.
"If we want to find out how specific genetic defects affect an organism, it is not enough to know which letters have been changed in the gene sequences of the mutated proteins. We need to know what functions these proteins perform in the organism in order to understand how their deficiency can lead to disease." Accordingly, the next step Dr. Linster and colleagues want to take is to study the role of a number of these poorly understood proteins in greater detail, and thus to contribute towards progressively closing this remaining gap in our knowledge.
Explore further: Scientists describe a well-defined mitochondrial proteome in baker's yeast
Kenneth W. Ellens et al. Confronting the catalytic dark matter encoded by sequenced genomes, Nucleic Acids Research (2017). DOI: 10.1093/nar/gkx937