The human genome has been mapped, but the genomes of most humans have not – at least not yet. When individual genomes are mapped, the world will have a problem: there is simply not enough space in the world's computer systems to store that data.
"The mere size of the genomic data" said Rafael Feitelberg, CEO of Petah Tikvah-based Geneformics, is "one of the main inhibitors for genomics to be really ubiquitous in the world." A sequenced human genome might be 200 to 300 gigabytes of raw data, while an analyzed genome could take up a full terabyte of disk space. "If you want to create gene banks, the mere size of the data is going to be very, very prohibitive."
Geneformics, he said, "is all about providing the tools and infrastructure to make genomics data accessible through compression," Feitelberg said.
The point of mapping the human genome is not just to know how genes interact in general, but to be able to apply that mapping to individuals. With a mapped genome at one's disposal, for example, the era of personalized medicine would flourish. Doctors would be able to develop customized medicines for patients, ensuring that the drug will be targeted specifically to deal with the problem without any side effects.
Geneformics now counts among its customers two of the biggest gene-sequencing organizations in the world, Massachusetts-based sequencing company WuXi NextCode and the Garvan Institute of Medical Research in Sydney.
One of the issues in data compression is what happens when it gets decompressed to become functional again. "Data compression should be something which is really hidden from them in a lossless and transparent way," Feitelberg said. "What that means from a compression and solution perspective is that we are capable of decompressing the data at high speeds and actually streaming it back to all of these applications in a lossless form. It is equivalent, bit for bit, to the original, uncompressed file."
Geneformics grew out of the Weizmann Institute of Science in Rehovot, Israel, based on the data compression work of Weizmann computational biologist Eran Segal, who cofounded the company in 2014 with career technologist and current Geneformics chief technology officer Arik Keshet.
Funding has come from investors including Geneformics chairman Dov Moran, who created DiskOnKey, widely cited as the first USB flash drive. Moran and two private equity firms have put about $2.85 million into Geneformics, according to Crunchbase. The company recently released Geneformics D, its first purely cloud-based offering.
How it works, said CTO Keshet, is a trade secret. "This being a young industry, there are really no compression standards as of now. You don't have your equivalent of JPEG or MPEG" in genomics, he said.
"Eventually, when this space matures, we expect the standards to be formed. At that point, we will have the technology and the [intellectual property] and the market presence to influence those."
What the company will say, said Feitelberg, is that data savings can be significant. "With compression, we will reduce the footprint by up to 90 percent. In addition, by having an intelligent tiering at the granular level of the genomic data, then we can even increase those savings more," he said.
With its deal with the Garvan Institute, which has one of the largest genomic datasets in the world, the firm is on its way to international success. "It was a very fruitful partnership in being able to build an infrastructure for them so that as much as they'll grow, they'll always grow in a compressed and efficient way," said Feitelberg. "Our view is that researchers and bioinformaticians shouldn't ever change the analysis that they're doing because of data compression," he said.
Explore further: Samtools CRAMS in support for improved compression formats