March 27, 2023

A machine learning framework to predict and quantify synthesis difficulties for designer chromosomes

Machine learning-aided scoring of synthesis difficulties for designer chromosomes — A, Collection of the DNA sequences obtained from high-throughput synthesis. The sequences were classified into easy-to-synthesize (blue) or difficult-to-synthesize (red). B, Graphical representations of DNA sequences: repeat, GC content, information entropy and other types of features. Key features were identified from these sequence features by machine learning methods. C, The XGBoost algorithm utilized to build the classification model and calculate the S-index. D, Methods used to interpret the model. The feature contributions were quantified according to the global importance scores and local SHAP explanations. e, Application of the S-index on a specific chromosome. The heatmap indicates the synthesis difficulties for the different fragments, which range from difficult (red) to easy (blue). The white sequences indicate the unanalyzed chromosome sequence. Credit: Science China Press

Artificially synthesizing genomes has broad prospects in fields such as medical research and developing industrial strains. From the synthesis of the artificial life JCVI-syn1.0 by Craig Venter's team in 2010, to the rewriting and synthesis of the prokaryotic E. coli genome, and to the Sc2.0 project's artificial synthesis of the yeast genome, researchers are constantly advancing in the depth and breadth of genome design and synthesis.

However, there are still difficulties in synthesizing certain gene segments, ultimately leading to the inability to complete artificial chromosomes, which limits the application and promotion of artificial genome synthesis technology. To address this issue, the team of Professor Yingjin Yuan from Tianjin University has developed an interpretable machine learning framework that can predict and quantify the difficulty of chromosome synthesis, providing guidance for optimizing chromosome design and synthesis processes.

The research team designed an efficient feature selection method by analyzing data of a large number of known chromosome fragments, and identified six key sequence features that cover energy and structural information during DNA chemical synthesis and assembly. Based on these results, the team developed an eXtreme Gradient Boosting (XGBoost) model that can effectively predict the synthesis difficulties of chromosome fragments.

The model achieved an AUC (area under the receiver operating characteristic curves) of 0.895 in cross-validation and an AUC of 0.885 on an independent test set in collaboration with a DNA synthesis company, demonstrating a high accuracy and predictive ability.

The research team proposed a Synthesis difficulty Index (S-index) based on the SHAP algorithm to evaluate and interpret the synthesis difficulties of chromosomes. The study found that there were significant differences in the synthesis difficulties of different chromosomes, and the S-index could quantitatively explain the causes of synthesis difficulties for some gene fragments, providing a basis for chromosome sequence design and synthesis and improving the efficiency and success rate of designer chromosome synthesis.

This achievement provides a practical tool for researchers in chromosome engineering and genome rewriting, and is expected to provide more comprehensive guidance and support for chromosome design and synthesis.

The paper is published in the journal Science China Life Sciences.

More information: Yan Zheng et al, Machine learning-aided scoring of synthesis difficulties for designer chromosomes, Science China Life Sciences (2023). DOI: 10.1007/s11427-023-2306-x

Journal information: Science China Life Sciences

Provided by Science China Press

Citation: A machine learning framework to predict and quantify synthesis difficulties for designer chromosomes (2023, March 27) retrieved 10 May 2024 from https://phys.org/news/2023-03-machine-framework-quantify-synthesis-difficulties.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Researchers propose electrodriven chemical looping ammonia synthesis mediated by lithium imide

33 shares

Feedback to editors

A machine learning framework to predict and quantify synthesis difficulties for designer chromosomes

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

Clues from deep magma reservoirs could improve volcanic eruption forecasts

Study shows AI conversational agents can help reduce interethnic prejudice during online interactions

NASA's Chandra notices the galactic center is venting

Wildfires in old-growth Amazon forest areas rose 152% in 2023, study shows

GoT-ChA: New tool reveals how gene mutations affect cells

Accelerating material characterization: Machine learning meets X-ray absorption spectroscopy

Life expectancy study reveals longest and shortest-lived cats

New research shows microevolution can be used to predict how evolution works on much longer timescales

Stable magnetic bundles achieved at room temperature and zero magnetic field

Relevant PhysicsForums posts

Who chooses official designations for individual dolphins, such as FB15, F153, F286?

Is it usual for vaccine injection site to hurt again during infection?

The Cass Report (UK)

Is 5 milliamps at 240 volts dangerous?

Major Evolution in Action

If theres a 15% probability each month of getting a woman pregnant...

Researchers propose electrodriven chemical looping ammonia synthesis mediated by lithium imide

DNA synthesis technologies to close the gene writing gap

Biology team makes breakthrough in synthetic yeast project

New discovery dramatically reduces time it takes to build molecules

Studying the active role of the maize B chromosome in the modulation of gene expression

New tRNA promotes synthesis of streptomyces antibiotics

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

New research shows microevolution can be used to predict how evolution works on much longer timescales

GoT-ChA: New tool reveals how gene mutations affect cells

Researchers reveal new cellular mechanical transducer

Researchers shed new light on carboxysomes in key discovery that could boost photosynthesis

Scientists link oocyte-specific histone H1FOO to better iPS cell generation

Medical Xpress

Tech Xplore

Science X

A machine learning framework to predict and quantify synthesis difficulties for designer chromosomes

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

Clues from deep magma reservoirs could improve volcanic eruption forecasts

Study shows AI conversational agents can help reduce interethnic prejudice during online interactions

NASA's Chandra notices the galactic center is venting

Wildfires in old-growth Amazon forest areas rose 152% in 2023, study shows

GoT-ChA: New tool reveals how gene mutations affect cells

Accelerating material characterization: Machine learning meets X-ray absorption spectroscopy

Life expectancy study reveals longest and shortest-lived cats

New research shows microevolution can be used to predict how evolution works on much longer timescales

Stable magnetic bundles achieved at room temperature and zero magnetic field

Relevant PhysicsForums posts

Related Stories

Researchers propose electrodriven chemical looping ammonia synthesis mediated by lithium imide

DNA synthesis technologies to close the gene writing gap

Biology team makes breakthrough in synthetic yeast project

New discovery dramatically reduces time it takes to build molecules

Studying the active role of the maize B chromosome in the modulation of gene expression

New tRNA promotes synthesis of streptomyces antibiotics

Recommended for you

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

New research shows microevolution can be used to predict how evolution works on much longer timescales

GoT-ChA: New tool reveals how gene mutations affect cells

Researchers reveal new cellular mechanical transducer

Researchers shed new light on carboxysomes in key discovery that could boost photosynthesis

Scientists link oocyte-specific histone H1FOO to better iPS cell generation

Newsletter sign up

Donate and enjoy an ad-free experience