February 14, 2022

Compressing gene libraries to expand accessibility, research opportunities

by Gabrielle Stewart, Pennsylvania State University

In image compression, a large file that could be cumbersome to store or share loses a small amount of visual information. This "lossiness" largely preserves the image while vastly reducing its file size—and serves as the inspiration for a new research direction in genomics, according to Justin Pritchard, assistant professor of biomedical engineering.

Pritchard and a Penn State-led team of interdisciplinary researchers developed a methodology for "compressing" extensive genetic data libraries to more manageable sizes. They published their findings in Nature Communications on Feb. 2.

"This idea of compression dramatically reduces the scale of the experiments, opening up possibilities for new experiments," said Pritchard, who also holds the Dorothy Foehr Huck and J. Lloyd Huck Early Career Entrepreneurial Professorship. "This can unlock biological mysteries, such as why different genes and drugs work differently together, and it allows us to unravel very complicated biology using simpler experiments."

The researchers referred to genome-scale CRISPR experiments containing data on thousands of gene effects tested in different human cell types. The effect when the gene is turned off can vary between cell types, so a large number of cells is often needed to understand the interplay between genes and phenotypes.

To predict the larger genome-scale effects from the smaller "compressed" CRISPR library, the team used a custom algorithm rooted in a common machine learning technique known as random forests. This method incorporates data provided by the researchers into a series of randomly generated decision trees that collectively produce predictions about the relationship between gene inactivation and cell growth. The model was trained on the majority of the data—leaving one data subset out—and then initially validated by testing its capacity to predict data for the omitted subset. This accuracy extended to datasets that were generated in different labs using different experimental conditions and CRISPR libraries.

This performance was possible using only a small percentage—about 1%—of the original library's information. Finally, the Penn State group performed new experiments in which they physically built these "lossy compression libraries" using synthetic biology techniques and validated the predictions in new experiments.

"A genome-scale experiment probes 18,000 genes," Pritchard said. "Using machine learning, we tunably compressed the scale of the experiment to as few as 200 genes. Despite the loss of some data in the compression, we found that a subset of 200 genes could provide surprisingly good information on the full 18,000 genes."

The technique also opens opportunities for other research, according to Pritchard. It showed transferability, meaning it could make accurate predictions matching information from entirely different datasets despite only being trained on the CRISPR data. The capacity to reduce the number of genes also enables more research on cells that can be difficult or impossible to aggregate in large amounts, such as cells within a living organism.

"We're excited about the future of this research," Pritchard said. "We can alter the composition of these lossy compression sets in real time, for different experimental questions and conditions in areas from cancer biology to biopharmaceuticals, using newer machine learning techniques. The method also helps us improve basic science by answering questions about how the genome works and encodes information on cell growth."

Boyang Zhao, Edward P. O'Brien, Luke Gilbert, Scott Leighow and Yiyun Rao from Penn State contributed to this work. Zhao contributed as first author and is also affiliated with Quantalarity Research Group in Houston. Gilbert is affiliated with the University of California San Francisco and the Helen Diller Family Comprehensive Cancer Center in San Francisco.

More information: Boyang Zhao et al, A pan-CRISPR analysis of mammalian cell specificity identifies ultra-compact sgRNA subsets for genome-scale experiments, Nature Communications (2022). DOI: 10.1038/s41467-022-28045-w

Journal information: Nature Communications

Provided by Pennsylvania State University

Citation: Compressing gene libraries to expand accessibility, research opportunities (2022, February 14) retrieved 16 July 2024 from https://phys.org/news/2022-02-compressing-gene-libraries-accessibility-opportunities.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New technique for studying cancer mutations may yield approaches for future therapies

23 shares

Feedback to editors

Compressing gene libraries to expand accessibility, research opportunities

Silicon photonics light the way toward large-scale applications in quantum information

Earth system scientists discover missing piece in climate models

Research team uses satellite data and machine learning to predict typhoon intensity

Researchers directly simulate the fusion of oxygen and carbon nuclei

New tool can predict bitterness in foods without prior knowledge of their chemical structures

Nano-confinement may be key to improving hydrogen production

Superlubricity study shows a frictionless state can be achieved at macroscale

How climate change is altering the Earth's rotation

Surprising ring sheds light on galaxy formation

New concept explains how tiny particles navigate water layers, with implications for marine conservation

Relevant PhysicsForums posts

New and Interesting Publications Relevant to the Origin of Life

The Cass Report (UK)

Medical tape cut off blood flow to fetus?

Is meat broth really nutritious?

Havana Syndrome

Innovative ideas and technologies to help folks with disabilities

New technique for studying cancer mutations may yield approaches for future therapies

Genome-editing tool TALEN outperforms CRISPR-Cas9 in tightly packed DNA

Study encourages cautious approach to CRISPR therapeutics

Thousands of genes influence most diseases, researchers report

Engineers model mutations causing drug resistance

Novel CRISPR method identifies key genes for Toxoplasma survival

Zooplankton study challenges traditional views of evolution

Unlocking secrets of stomatal regulation: Phosphoactivation of SLAC1 in plant guard cells

Big boost for new epigenetics paradigm: CoRSIVs, first discovered in humans, now found in cattle

Insight into one of life's earliest ancestors revealed in new study

Understanding the role of RNA methylation in cancer

A better way to make RNA drugs: Enzymatic synthesis method expands capabilities while eliminating toxic byproducts

Medical Xpress

Tech Xplore

Science X

Compressing gene libraries to expand accessibility, research opportunities

Silicon photonics light the way toward large-scale applications in quantum information

Earth system scientists discover missing piece in climate models

Research team uses satellite data and machine learning to predict typhoon intensity

Researchers directly simulate the fusion of oxygen and carbon nuclei

New tool can predict bitterness in foods without prior knowledge of their chemical structures

Nano-confinement may be key to improving hydrogen production

Superlubricity study shows a frictionless state can be achieved at macroscale

How climate change is altering the Earth's rotation

Surprising ring sheds light on galaxy formation

New concept explains how tiny particles navigate water layers, with implications for marine conservation

Relevant PhysicsForums posts

Related Stories

New technique for studying cancer mutations may yield approaches for future therapies

Genome-editing tool TALEN outperforms CRISPR-Cas9 in tightly packed DNA

Study encourages cautious approach to CRISPR therapeutics

Thousands of genes influence most diseases, researchers report

Engineers model mutations causing drug resistance

Novel CRISPR method identifies key genes for Toxoplasma survival

Recommended for you

Zooplankton study challenges traditional views of evolution

Unlocking secrets of stomatal regulation: Phosphoactivation of SLAC1 in plant guard cells

Big boost for new epigenetics paradigm: CoRSIVs, first discovered in humans, now found in cattle

Insight into one of life's earliest ancestors revealed in new study

Understanding the role of RNA methylation in cancer

A better way to make RNA drugs: Enzymatic synthesis method expands capabilities while eliminating toxic byproducts

Newsletter sign up

Donate and enjoy an ad-free experience