Looking into the HERA tunnel: Berkeley Lab scientists have developed new machine learning algorithms to accelerate the analysis of data collected decades ago by HERA, the world's most powerful electron-proton collider that ran at the DESY national research center in Germany from 1992 to 2007. Credit: DESY

Protons are tiny yet they carry a lot of heft. They inhabit the center of every atom in the universe and play a critical role in one of the strongest forces in nature.

And yet, have a down-to-earth side, too.

Like most particles, protons have spin that act like tiny magnets. Flipping a proton's spin or polarity may sound like , but it is the basis of technological breakthroughs that have become essential to our daily lives, such as magnetic resonance imaging (MRI), the invaluable medical diagnostics tool.

Despite such advancements, the proton's inner workings remain a mystery.

"Basically everything around you exists because of protons—and yet we still don't understand everything about them. One huge puzzle that physicists want to solve is the proton's spin," said Ben Nachman, a physicist who leads the Machine Learning Group in the Physics Division at the Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab).

Understanding how and why protons spin could lead to technological advancements we can't even imagine today, and help us understand the strong force, a fundamental property that gives all protons and therefore atoms mass.

But it's not such an easy problem to solve. For one, you can't exactly pick up a proton and place it in a petri dish: Protons are unfathomably small—their radius is a hair shy of one quadrillionth of a meter, and visible light passes right through them. What's more, you can't even observe their insides with the world's most powerful electron microscopes.

Recent work by Nachman and his team could bring us closer to solving this perplexing proton puzzle.

As a member of the H1 Collaboration—an international group that now includes 150 scientists from 50 institutes and 15 countries, and is based at the DESY national research center in Germany—Nachman has been developing new machine learning algorithms to accelerate the analysis of data collected decades ago by HERA, the world's most powerful electron-proton collider that ran at DESY from 1992 to 2007.

HERA—a ring 4 miles in circumference—worked like a giant microscope that accelerated both electrons and protons to nearly the speed of light. The particles were collided head-on, which could scatter a proton into its constituent parts: quarks and gluons.

Scientists at HERA took measurements of the particle debris cascading from these electron-proton collisions, what physicists call "deep inelastic scattering," through sophisticated cameras called , one of which was the H1 detector.

Unfolding secrets of the strong force

The H1 stopped collecting data in 2007, the year HERA was decommissioned. Today, the H1 Collaboration is still analyzing the data and publishing results in .

The HERA electron-proton collider accelerated both electrons and protons to nearly the speed of light. The particles were collided head-on, which could scatter a proton into its constituent parts: quarks (shown as green and purple balls in the illustration above) and gluons (illustrated as black coils). Credit: DESY

It can take a year or more when using conventional computational techniques to measure quantities related to proton structure and the strong force, such as how many particles are produced when a proton collides with an electron.

And if a researcher wants to examine a different quantity, such as how fast particles are flying in the wake of a quark-gluon jet stream, they would have to start the long computational process all over again, and wait yet another year.

A new machine learning tool called OmniFold—which Nachman co-developed—can simultaneously measure many quantities at once, thereby reducing the amount of time to run an analysis from years to minutes.

OmniFold does this by using at once to combine computer simulations with data. (A is a machine learning tool that processes complex data that is impossible for scientists to do manually.)

Nachman and his team applied OmniFold to H1 experimental data for the first time in a June issue of the journal Physical Review Letters and more recently at the 2022 Deep Inelastic Scattering (DIS) Conference.

To develop OmniFold and test its robustness against H1 data, Nachman and Vinicius Mikuni, a postdoctoral researcher in the Data and Analytics Services (DAS) group at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) and a NERSC Exascale Science Applications Program for Learning fellow, needed a supercomputer with a lot of powerful GPUs (graphics processing units), Nachman said.

Coincidentally, Perlmutter, a new supercomputer designed to support simulation, data analytics, and artificial intelligence experiments requiring multiple GPUs at a time, had just opened up in the summer of 2021 for an "early science phase," allowing scientists to test the system on real data. (The Perlmutter supercomputer is named for the Berkeley Lab cosmologist and Nobel laureate Saul Perlmutter.)

"Because the Perlmutter supercomputer allowed us to use 128 GPUs simultaneously, we were able to run all the steps of the analysis, from data processing to the derivation of the results, in less than a week instead of months. This improvement allows us to quickly optimize the neural networks we trained and to achieve a more precise result for the observables we measured," said Mikuni, who is also a member of the H1 Collaboration.

A central task in these measurements is accounting for detector distortions. The H1 detector, like a watchful guard standing sentry at the entrance of a sold-out concert arena, monitors particles as they fly through it. One source of measurement errors happens when particles fly around the detector rather than through it, for example—sort of like a ticketless concert goer jumping over an unmonitored fence rather than entering through the ticketed security gate.

Correcting for all distortions simultaneously had not been possible due to limited computational methods available at the time. "Our understanding of subatomic physics and data analysis techniques have advanced significantly since 2007, and so today, scientists can use new insights to analyze the H1 data," Nachman said.

Scientists today have a renewed interest in HERA's particle experiments, as they hope to use the data—and more precise computer simulations informed by tools like OmniFold—to aid in the analysis of results from future electron-proton experiments, such as at the Department of Energy's next-generation Electron-Ion Collider (EIC).

The EIC—to be built at Brookhaven National Laboratory in partnership with the Thomas Jefferson National Accelerator Facility—will be a powerful and versatile new machine capable of colliding high-energy beams of polarized electrons with a wide range of ions (or charged atoms) across many energies, including polarized protons and some polarized ions.

"It's exciting to think that our method could one day help scientists answer questions that still remain about the strong force," Nachman said.

"Even though this work might not lead to practical applications in the near term, understanding the building blocks of nature is why we're here—to seek the ultimate truth. These are steps to understanding at the most basic level what everything is made of. That is what drives me. If we don't do the research now, we will never know what exciting new technological advances we'll get to benefit future societies."

More information: V. Andreev et al, Measurement of Lepton-Jet Correlation in Deep-Inelastic Scattering with the H1 Detector Using Machine Learning for Unfolding, Physical Review Letters (2022). DOI: 10.1103/PhysRevLett.128.132002

OmniFold: arxiv.org/abs/1911.09107

Conference presentation: www-h1.desy.de/psfiles/confpap … /H1prelim-22-034.pdf