A genetic variant implicated in several cancers by genome-wide association studies (GWAS) has been found to drive increased expression of a known oncogene in the prostate.
The study, published July 13th in Genome Research, showcases a new protocol for studying the activity of cancer-risk variants suggested by GWAS studies. The results also underscore the dramatic consequences of small genetic changes even in the vast stretches of DNA, known as "gene deserts," that do not code for proteins.
"This paper shows a way to follow-up on GWAS leads that pointed to these barren areas of the genome," said senior author Marcelo Nobrega, MD, PhD, assistant professor of human genetics and member of the University of Chicago Comprehensive Cancer Center at the University of Chicago Medical Center.
Since the completion of the Human Genome Project in 2000, GWAS projects seeking genetic risk factors for disease have become a popular scientific tool. A population of people with a particular disease is compared to controls without the disease, uncovering genetic variants correlated with increased disease risk.
But the hope that such genetic variants would provide easy targets for novel therapies was initially deflated by an unexpected result. Most of the variants, called single nucleotide polymorphisms or SNPs, associated with disease risk were found not in the sequences encoding proteins, but in the other 98 percent of the genome where the biological role is less clear.
Attention has since turned to short regulatory sequences lying undiscovered as of yet in the "gene deserts." With the power to control expression of faraway genes - including when and where they are expressed - these regulators could exert dramatic effects.
"There are all kinds of functional DNA sequences that have important biological roles that are not protein-encoding sequences," Nobrega said. "There's every reason to believe that mutations in these non-coding sequences may lead to disease or increase the risk of disease."
But finding those non-coding sequences, much less determining their function, is a challenge. Regulatory sequences are typically around 500 base pairs in length, Nobrega said, dispersed in regions that are millions of base pairs long.With colleagues Nora Wasserman and Ivy Aneas, Nobrega devised a method of quickly testing gene deserts to find biologically relevant sequences. A gene desert upstream of a known oncogene called MYC was chosen because of several GWAS results implicating the region in different cancers, including prostate. "This is one of strongest genetic signals to prostate cancer that has been identified so far," Nobrega said.
Three artificial chromosomes were created that reproduced partial, overlapping segments of that gene desert, plus a gene called lacZ that produces a blue color in the cell when expressed. After introducing the chromosomes into a strain of mouse, the researchers measured where blue dye appeared - reflecting organs where MYC expression was under the control of regulatory sequences present in each of the artificial chromosomes. Because two of the three chromosomes promoted expression in the prostate, the team was able to narrow down the relevant sequence to a 5,000 base pair segment.
That segment included a SNP called rs6983267, which had previously been associated through GWAS studies with increased risk of prostate and colorectal cancer. A second phase of experiments tested the "risk allele" version of that SNP against the non-risk allele - different by only a single nucleotide.
Such a small difference produced dramatically different expression patterns, the study found. Transgenic mice carrying the risk allele exhibited robust blue staining in their prostate, while mice given the non-risk allele showed little to no detectable gene expression in the organ.
"Perhaps what this is telling us is that by inheriting the risk allele here, you may drive the overexpression of MYC," Nobrega said. "It's not going to cause prostate cancer, but it could increase the risk for prostate cancer."
The differences in prostate expression between risk allele and non-risk allele are also apparent as early as embryonic stages, suggesting that the predisposition toward prostate cancer is set long before the disease actually appears.
"The mechanistic link between MYC expression levels and prostate may be much earlier than the cancer itself," Nobrega said. "It could potentially prime the system for cancer and then, depending on either secondary mutations or environmental injuries, it might or might not develop."
A next step would be to determine why the risk allele is capable of enhancing prostate expression with only a single nucleotide change. Nobrega suggested that an intermediary protein that binds to the enhancer sequence may be a promising target for preventive therapy in those carrying the risk allele.
But most importantly, the results confirm that the small genetic variants turned up by GWAS analyses are not artifacts, but highly relevant biological differences.
"This is a convincing demonstration in vivo that these noncoding SNPs that have been associated with complex diseases do lead to phenotypic differences," Nobrega said. "It strongly suggests that this is a way to follow up on these associations for all kinds of disease; not only for cancer but for diabetes, obesity, and other conditions."
Explore further: Bioinformatics profiling identifies a new mammalian clock gene
More information: The manuscript will be published online ahead of print on July 13, 2010. Its full citation is as follows: Wasserman NF, Aneas I, Nobrega MA. An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. Genome Res doi:10.1101/gr.105361.110