Israeli gene-crunching firm aims to cut data down to size

August 31, 2017
Credit: CC0 Public Domain

The human genome has been mapped, but the genomes of most humans have not – at least not yet. When individual genomes are mapped, the world will have a problem: there is simply not enough space in the world's computer systems to store that data.

"The mere size of the genomic data" said Rafael Feitelberg, CEO of Petah Tikvah-based Geneformics, is "one of the main inhibitors for genomics to be really ubiquitous in the world." A sequenced might be 200 to 300 gigabytes of raw data, while an analyzed could take up a full terabyte of disk space. "If you want to create gene banks, the mere size of the data is going to be very, very prohibitive."

Geneformics, he said, "is all about providing the tools and infrastructure to make genomics data accessible through compression," Feitelberg said.

The point of mapping the human genome is not just to know how genes interact in general, but to be able to apply that mapping to individuals. With a mapped genome at one's disposal, for example, the era of personalized medicine would flourish. Doctors would be able to develop customized medicines for patients, ensuring that the drug will be targeted specifically to deal with the problem without any side effects.

Geneformics now counts among its customers two of the biggest gene-sequencing organizations in the world, Massachusetts-based sequencing company WuXi NextCode and the Garvan Institute of Medical Research in Sydney.

One of the issues in is what happens when it gets decompressed to become functional again. "Data compression should be something which is really hidden from them in a lossless and transparent way," Feitelberg said. "What that means from a compression and solution perspective is that we are capable of decompressing the data at high speeds and actually streaming it back to all of these applications in a lossless form. It is equivalent, bit for bit, to the original, uncompressed file."

Geneformics grew out of the Weizmann Institute of Science in Rehovot, Israel, based on the data compression work of Weizmann computational biologist Eran Segal, who cofounded the company in 2014 with career technologist and current Geneformics chief technology officer Arik Keshet.

Funding has come from investors including Geneformics chairman Dov Moran, who created DiskOnKey, widely cited as the first USB flash drive. Moran and two private equity firms have put about $2.85 million into Geneformics, according to Crunchbase. The company recently released Geneformics D, its first purely cloud-based offering.

How it works, said CTO Keshet, is a trade secret. "This being a young industry, there are really no compression standards as of now. You don't have your equivalent of JPEG or MPEG" in genomics, he said.

"Eventually, when this space matures, we expect the standards to be formed. At that point, we will have the technology and the [intellectual property] and the market presence to influence those."

What the company will say, said Feitelberg, is that data savings can be significant. "With compression, we will reduce the footprint by up to 90 percent. In addition, by having an intelligent tiering at the granular level of the , then we can even increase those savings more," he said.

With its deal with the Garvan Institute, which has one of the largest genomic datasets in the world, the firm is on its way to international success. "It was a very fruitful partnership in being able to build an infrastructure for them so that as much as they'll grow, they'll always grow in a compressed and efficient way," said Feitelberg. "Our view is that researchers and bioinformaticians shouldn't ever change the analysis that they're doing because of data ," he said.

Explore further: Samtools CRAMS in support for improved compression formats

Related Stories

Samtools CRAMS in support for improved compression formats

August 15, 2014

Computer scientists at the Wellcome Trust Sanger Institute have released a major upgrade of Samtools, one of the most popular next-generation sequence analysis tools. The revised Samtools 1.0 enables researchers to easily ...

Mazda announces gasoline engine using compression ignition

August 9, 2017

(Tech Xplore)—A new car engine will eventually come on the scene. This week's car watching sites have reacted to Tuesday's announcement from Mazda with interest. At a time when the total focus appears to be on electric ...

Program does impressive file size reductions

September 18, 2009

We intuitively understand the value of being able to make things smaller without sacrificing performance. The endeavor produces smaller speakers with bigger sound and a host of portable electronic devices such as digital ...

Navigating the human genome with Sequins

August 8, 2016

Australian genomics researchers have announced the development of Sequins—synthetic 'mirror' DNA sequences that reflect the human genome. This intuitive new technology, which can be used to better map and analyse complexity ...

Searching genomic data faster with new algorithm

July 10, 2012

In 2001, the Human Genome Project and Celera Genomics announced that after 10 years of work at a cost of some $400 million, they had completed a draft sequence of the human genome. Today, sequencing a human genome is something ...

Recommended for you

Tasmanian tiger doomed long before humans came along

December 12, 2017

The Tasmanian tiger was doomed long before humans began hunting the enigmatic marsupial, scientists said Tuesday, with DNA sequencing showing it was in poor genetic health for thousands of years before its extinction.

Searching for the CRISPR Swiss-army knife

December 12, 2017

Scientists at the University of Copenhagen, led by the Spanish Professor Guillermo Montoya, are investigating the molecular features of different molecular scissors of the CRISPR-Cas system to shed light on the so-called ...

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

dirk_bruere
not rated yet Aug 31, 2017
Since the base pair sequence of a genome is less than a gigabyte, what is the rest of the data about?

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.