New insight into how proteins find their DNA binding sites in the genome

April 3, 2015 by Katharine Gammon, University of Southern California
New insight into how proteins find their dna binding sites in the genome
In 2014, Remo Rohs and his team summarized the progress toward cracking the transcription factor code in the Cell Press journal Trends in Biochemical Sciences. The journal cover, designed by USC Dornsife doctoral student Lin Yang, highlights the many variables that influence transcription factor-DNA binding.

Remo Rohs is looking for some deep connections. He is integrating genomics and structural biology to uncover some significant insights into how proteins recognize DNA.

While genomics deciphers DNA by studying the sequences of base pairs that encode genetic information, explores the impact of the actual 3-D structure of DNA. Rohs, however, aims to unite the two fields into something new—and hopefully more useful.

"Structural biology and genomics are big fields, but there is little interaction between these two worlds," said Rohs, assistant professor of biological sciences, chemistry and physics at USC Dornsife. "Genomics thinks of entire genomes in terms of sequence, and structural biology thinks of 3-D structures at high resolution but limited size."

In a March 9 paper in the Proceedings of the National Academy of Sciences, a team led by Rohs, which included researchers from Duke University and Columbia University, used a large data set of proteins to show that combining information on DNA shape and sequence resulted in a better understanding of protein-DNA recognition.

"Transcription factors are proteins that bind DNA to regulate genes, so knowing how and where they bind is of central importance in biology. This paper describes how modeling DNA shape can improve our understanding of transcription factor binding, with broad implications for many areas of research," said Steven Henikoff, a member of the National Academy of Sciences who edited Rohs' paper for PNAS.

Rohs' group used to train models that predict how well and where the transcription factors will bind to the genome. When thinking about how machine learning works, Rohs said, look no further than a search engine. "When Google tries to understand your consumer behavior by looking at the websites you visit—that is a feature," said Rohs, who holds a joint appointment in computer science at the USC Viterbi School of Engineering. "In the same way, we can use the features of the DNA—its sequence and shape—to predict whether a binding site is occupied by a protein or not."

Tianyin Zhou, a former graduate student in Rohs' lab who earned a Ph.D. in computational biology and bioinformatics from USC Dornsife in 2014, and is the lead author of the study, said that there are dual implications for the work.

"First, once we incorporate the DNA shape, we can get very good predictive models," he said. "And with this information, we can tell how gene expression is regulated. Second, when you know a mechanism, you can design or engineer a sequence to make it bind to the protein you want," said Zhou, who is now working as a software engineer at Google.

In another paper, published April 2 in the journal Cell, Rohs collaborated with Richard Mann, an experimental biologist at Columbia University, to tease apart the contributions of DNA shape and sequence. They took proteins that they knew require DNA shape for binding and mutated the amino acids that only recognize shape but not sequence.

The researchers looked at a group of proteins known as Hox , which are critical for early embryonic development. Rohs and colleagues found that introducing shape-recognizing amino acids from one transcription factor to another swapped binding specificities between Hox proteins.

Lin Yang, a doctoral student in computational biology and bioinformatics at USC Dornsife, said that understanding the fundamentals of binding specificities is a vital scientific goal. "When these proteins don't work properly—when they bind arbitrarily, or bind to incorrect sites—it might cause disease." Rohs credits Yang for the acceptance of the paper in Cell because Yang successfully used machine learning to identify the DNA shape features that are important for recognition.

A third paper, published March 11 in Genome Research in collaboration with Eran Segal from the Weizmann Institute of Science, found that regions outside the are important for binding.

In the future, Rohs hopes to continue his work on gene regulation in a more complex way. "When we talk about protein binding to DNA, we assume DNA is accessible, but in the cell it is folded up and covered by other proteins," he said. "So the next step is to integrate information about cooperative binding and the accessibility of binding sites, going from the in vitro to the more complex in vivo situation. This also includes epigenetic mechanisms such as DNA methylation, which is another interest of my team."

Explore further: 1-D to 3-D genomics

More information: "Quantitative modeling of transcription factor binding specificities using DNA shape." PNAS 2015 ; published ahead of print March 9, 2015, DOI: 10.1073/pnas.1422023112

"Unraveling determinants of transcription factor binding outside the core binding site." Genome Res. gr.185033.114Published in Advance March 11, 2015, DOI: 10.1101/gr.185033.114

"Deconvolving the Recognition of DNA Shape from Sequence." DOI:

Related Stories

1-D to 3-D genomics

June 11, 2013

( —Since his recent selection as an Alfred P. Sloan Research Fellow, Remo Rohs continues to demonstrate the research and creativity necessary to become a leader in the scientific community.

Solving the Hox Specificity Paradox

January 22, 2015

The remarkable diversity of anatomical features along the body axis of animals—the differences between the head, the thorax and the abdomen, for example—is determined by proteins in the Hox family. But almost as soon ...

Epigenetic 'switch' regulates RNA-protein interactions

February 25, 2015

Chemical changes - also known as epigenetic modifications - to messenger RNA (mRNA) are thought to play an important role in gene expression, and have recently been found to affect biological processes such as circadian clock ...

Scientists find DNA is packaged like a yoyo

March 16, 2015

To pack two meters of DNA into a microscopic cell, the string of genetic information must be wound extremely carefully into chromosomes. Surprisingly the DNA's sequence causes it to be coiled and uncoiled much like a yoyo, ...

Predicting protein binding sites on DNA

October 15, 2012

In silico prediction of protein folding has the potential to reveal the specificity of a given protein sequence for DNA. Such methods are particularly promising as they could open the road to the rational design of novel ...

Recommended for you

Scientists ID another possible threat to orcas: pink salmon

January 19, 2019

Over the years, scientists have identified dams, pollution and vessel noise as causes of the troubling decline of the Pacific Northwest's resident killer whales. Now, they may have found a new and more surprising culprit: ...

Researchers come face to face with huge great white shark

January 18, 2019

Two shark researchers who came face to face with what could be one of the largest great whites ever recorded are using their encounter as an opportunity to push for legislation that would protect sharks in Hawaii.

Why do Hydra end up with just a single head?

January 18, 2019

Often considered immortal, the freshwater Hydra can regenerate any part of its body, a trait discovered by the Geneva naturalist Abraham Trembley nearly 300 years ago. Any fragment of its body containing a few thousands cells ...

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

1 / 5 (2) Apr 03, 2015
See also: Epigenetic 'switch' regulates RNA-protein interactions

And: "Inching toward the 3D genome" http://comments.s....6217.10

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.