New tool helps researchers identify DNA patterns of cancer, genetic disorders

May 19, 2009
This symbolic scatter plot reveals the structure of a portion of the gene responsible for Huntington's disease. (Note the section of repeated 3-mers). Credit: David Cox, North Carolina State University

A new tool will help researchers identify the minute changes in DNA patterns that lead to cancer, Huntington's disease and a host of other genetic disorders. The tool was developed at North Carolina State University and translates DNA sequences into graphic images, which allows researchers to distinguish genetic patterns more quickly and efficiently than was historically possible using computers.

David Cox, a Ph.D. student in computer science at NC State, devised the "symbolic scatter plot" tool to provide a visual representation of a DNA sequence. Cox explains, "The human visual system is more adept at identifying patterns, and differentiating between patterns, than existing computer programs such as those that try to identify repetitions of DNA sequences." In other words, the naked eye sees patterns better than computers can.

Identifying patterns in a sequence of DNA is important because it can help researchers identify the minute genetic variations between subjects that suffer from a disease, such as cancer, and subjects that do not. "Improved identification of relevant DNA sequences will hopefully expedite the development of successful treatment for a range of diseases," Cox says, "by allowing researchers to focus on the components of DNA that are related to the disease and improving our understanding of the genetic mechanisms of these diseases. For example, what turns specific genes on and off?"

This symbolic scatter plot reveals patterns in the human Y chromosome. Credit: David Cox, North Carolina State University

So, how does the symbolic scatter plot create a visual representation of DNA? DNA is composed of a series of nucleotides. There are only four types of nucleotides, represented by the letters A, T, G and C. Each three-letter string of these nucleotides, such as AAA or ATG, is called a 3-mer. Cox explains, "There are only 64 possible 3-mers, thus each 3-mer maps to a number from zero to 63. The symbolic scatter plots take a very long string of letters representing a DNA sequence and split it into a bunch of 3-mers. It then plots a point for each 3-mer, zero through 63, with that number serving as the y-coordinate." The x-axis is the order that the 3-mer appears in the genetic sequence.

"If this seems really simple," Cox says, "that's because it really is simple. Even so, the resulting scatter plots reveal interesting patterns in the original DNA. I can also string these scatter plots together to produce animations for the purpose of comparing ."

Cox chose to focus on 3-mers because they correlate to codons, which are the genetic codes the body uses to specify the insertion of a specific amino acid during the creation of proteins. In other words, they oversee the creation of proteins - which are themselves the basic building blocks of the human body. "There are 64 3-mers, but only 20 ," Cox says, "so each amino acid corresponds to multiple 3-mers." Cox designed the symbolic scatter plot so that those 3-mers that correspond to the same amino acid are adjacent to one another.

"This way," Cox says, "it is easier to determine when a difference in 3-mers is significant - from one amino acid to another - rather than a difference in 3-mers that still results in the production of the same amino acid. A change in a single amino acid can be the difference between a relatively harmless disease and a fatal one," Cox says.

More information: Cox will present the research this July at BIOCOMP '09 - The 2009 International Conference on Bioinformatics and Computational Biology in Las Vegas.

Source: North Carolina State University (news : web)

Explore further: Fighting bacteria—with viruses

add to favorites email to friend print save as pdf

Related Stories

The sound of proteins

May 03, 2007

Biologists have converted protein sequences into classical music in an attempt to help vision-impaired scientists and boost the popularity of genomic biology. New research published today in the open access journal Genome Bi ...

Recommended for you

Fighting bacteria—with viruses

Jul 24, 2014

Research published today in PLOS Pathogens reveals how viruses called bacteriophages destroy the bacterium Clostridium difficile (C. diff), which is becoming a serious problem in hospitals and healthcare institutes, due to its re ...

Atomic structure of key muscle component revealed

Jul 24, 2014

Actin is the most abundant protein in the body, and when you look more closely at its fundamental role in life, it's easy to see why. It is the basis of most movement in the body, and all cells and components ...

Brand new technology detects probiotic organisms in food

Jul 23, 2014

In the food industr, ity is very important to ensure the quality and safety of products consumed by the population to improve their properties and reduce foodborne illness. Therefore, a team of Mexican researchers ...

Protein evolution follows a modular principle

Jul 23, 2014

Proteins impart shape and stability to cells, drive metabolic processes and transmit signals. To perform these manifold tasks, they fold into complex three-dimensional shapes. Scientists at the Max Planck ...

Report on viruses looks beyond disease

Jul 22, 2014

In contrast to their negative reputation as disease causing agents, some viruses can perform crucial biological and evolutionary functions that help to shape the world we live in today, according to a new report by the American ...

User comments : 0