Why it took 20 years to 'finish' the human genome—and why there's still more to do
The release of the draft human genome sequence in 2001 was a seismic moment in our understanding of the human genome, and paved the way for advances in our understanding of the genomic basis of human biology and disease.
But sections were left unsequenced, and some sequence information was incorrect. Now, two decades later, we have a much more complete version, published as a preprint (which is yet to undergo peer review) by an international consortium of researchers.
Technological limitations meant the original draft human genome sequence covered just the "euchromatic" portion of the genome—the 92% of our genome where most genes are found, and which is most active in making gene products such as RNA and proteins.
The newly updated sequence fills in most of the remaining gaps, providing the full 3.055 billion base pairs ("letters") of our DNA code in its entirety. This data has been made publicly available, in the hope other researchers will use it to further their research.
Why did it take 20 years?
Much of the newly sequenced material is the "heterochromatic" part of the genome, which is more "tightly packed" than the euchromatic genome and contains many highly repetitive sequences that are very challenging to read accurately.
These regions were once thought not to contain any important genetic information but they are now known to contain genes that are involved in fundamentally important processes such as the formation of organs during embryonic development. Among the 200 million newly sequenced base pairs are an estimated 115 genes predicted to be involved in producing proteins.
Two key factors made the completion of the human genome possible:
1. Choosing a very special cell type
The newly published genome sequence was created using human cells derived from a very rare type of tissue called a complete hydatidiform mole, which occurs when a fertilised egg loses all the genetic material contributed to it by the mother.
Most cells contain two copies of each chromosome, one from each parent and each parent's chromosome contributing a different DNA sequence. A cell from a complete hydatidiform mole has two copies of the father's chromosomes only, and the genetic sequence of each pair of chromosomes is identical. This makes the full genome sequence much easier to piece together.
2. Advances in sequencing technology
After decades of glacial progress, the Human Genome Project achieved its 2001 breakthrough by pioneering a method called "shotgun sequencing," which involved breaking the genome into very small fragments of about 200 base pairs, cloning them inside bacteria, deciphering their sequences, and then piecing them back together like a giant jigsaw.
This was the main reason the original draft covered only the euchromatic regions of the genome—only these regions could be reliably sequenced using this method.
The latest sequence was deduced using two complementary new DNA-sequencing technologies. One was developed by PacBio, and allows longer DNA fragments to be sequenced with very high accuracy. The second, developed by Oxford Nanopore, produces ultra-long stretches of continuous DNA sequence. These new technologies allows the jigsaw pieces to be thousands or even millions of base pairs long, making it easier to assemble.
The new information has the potential to advance our understanding of human biology including how chromosomes function and maintain their structure. It is also going to improve our understanding of genetic conditions such as Down syndrome that have an underlying chromosomal abnormality.
Is the genome now completely sequenced?
Well, no. An obvious omission is the Y chromosome, because the complete hydatidiform mole cells used to compile this sequence contained two identical copies of the X chromosome. However, this work is underway and the researchers anticipate their method can also accurately sequence the Y chromosome, despite it having highly repetitive sequences.
Even though sequencing the (almost) complete genome of a human cell is an extremely impressive landmark, it is just one of several crucial steps towards fully understanding humans' genetic diversity.
The next job will be to study the genomes of diverse populations (the complete hydatidiform mole cells were Eurpean). Once the new technology has matured sufficiently to be used routinely to sequence many different human genomes, from different populations, it will be better positioned to make a more significant impact on our understanding of human history, biology and health.
Both care and technological development are needed to ensure this research is conducted with a full understanding of the diversity of the human genome to prevent exacerbation of health disparities by limiting discoveries to specific populations.