The largest systematic assessment the process of genome assembly is published today in BGI and BioMed Central's open access journal GigaScience. The second Assemblaton competition saw 21 teams submit 43 entries based on data from three different unassembled bird, fish, and snake genomes sequenced using three different technologies. BGI participated in the competition with their SOAPdenovo team, and also provided sequencing data for the bird genome. Ten key metrics are outlined, based on over 100 different measures for each assembly, and they focus on different aspects of an assembly's quality.
The research came to publication via an unusual peer review process. Assemblathon2 is on a preprint server and the named reviewers have blogged and commented on their reviews of the paper. Since the data was in the public domain and the authors enjoyed the discussion, GigaScience's editors encouraged open discussion of the peer review of this article.
With a new species genome announced almost daily, genomics is getting faster and cheaper all the time. Piecing together genomes from raw sequencing data to produce high quality finished genome sequences without the aid of a previously assembled reference is still technically challenging and requires a huge amount of computational power and resources. It is performed by more and more labs around the world. With new sequencing tools every month, and nearly limitless ways of carrying this complex process out, it is not clear as to which is the best method of piecing a genome together. The Assemblathon is a set of periodic collaborative efforts aiming to address this issue to help improve how genomics is carried out.
The logistics of carrying out such a large competition were challenging, with large volumes of test and entry data hosted by supercomputing centers and mirrored in the cloud, and automated scripts calculated and presented the many results. Reviewing the paper was equally challenging and novel; everyone embraced GigaScience's open and transparent review process, with authors and reviewers tweeting and posting comments online and in blogs during the review process. The results of this real-time, open peer-review are available to view on the Assemblathon website, with the signed reviewer reports and history also archived and viewable alongside the article. To boost reproducibility the supporting data and 27 GB of entries are hosted in the GigaScience GigaDB database and in the NCBI SRA database.
Explore further: Geneticists solve 40-year-old dilemma to explain why duplicate genes remain in the genome
More information: GigaScience 2013 2:10 Doi: 10.1186/2047-217X-2-10