The emergence of technologies for analyzing gene expression at the genomic scale has required parallel efforts to develop software that make sense of the data. Such 'browser' tools provide scientists with a visual atlas of the thousands of genes that are switched on and off in a given experiment. However, these tools become increasingly unwieldy as the studies grow larger. A team of researchers led by Alistair Forrest and Jessica Severin from the RIKEN Center for Life Science Technologies have now developed software that can efficiently handle far greater volumes of data.
The FANTOM5 project has the ambitious goal of mapping the circuits that regulate gene activity throughout the human genome based on comparative analysis of numerous datasets from hundreds of cell types. "With existing genome browsers, you basically needed to create a separate 'track', or horizontal visual representation, for each experiment," explains Forrest. "We were looking at thousands of experiments and needed some way of visualizing the data without ending up with a web page that stretched for tens of meters!"
Forrest, Severin and other FANTOM5 colleagues worked to produce ZENBU, a system that pools large numbers of individual datasets into a single track. As incorporating each dataset individually would slow analysis to a crawl, the researchers employed computational methods that simultaneously access information from numerous experiments.
"Only a few elements of data from this pooled track are in memory at any moment in time, which allows us to process and analyze the data in an interactive manner in near-real-time," says Severin. These datasets are all 'linked' to the single genome track in a way that enables users to easily zoom in on and analyze activity at specific genetic loci of interest across every single experiment.
FANTOM5 scientists use techniques ranging from RNA transcript sequencing to determination of the binding patterns of transcription factors to specific chromosomal sequences. ZENBU is preconfigured to immediately interpret these various experimental formats without any additional tinkering. As such, although specifically designed for FANTOM5, ZENBU stands to benefit the broader genome research community by bringing unprecedented speed and ease-of-use to the interpretation process.
Forrest and Severin envision a general ecosystem for collective data sharing and analysis. "Every site will have access to all the data in the ZENBU federation," says Severin. "Each location can focus on the curation of their data, but every other ZENBU site can also access it remotely as if it was loaded locally."
Explore further: Formal mathematics underpins new approach that standardizes analysis of genome information
Severin, J., Lizio, M., Harshbarger, J., Kawaji, H., Daub, C. O., Hayashizaki, Y., The FANTOM Consortium, Bertin, N. & Forrest, A. R. R. "Interactive visualization and analysis of large-scale sequencing datasets using ZENBU." Nature Biotechnology 32, 217–219 (2014). DOI: 10.1038/nbt.2840