A celebrated AI has learned a new trick: How to do chemistry

Deep learning uses algorithms, often neural networks that are trained on large amounts of data, to extract information from new data. It is very different from traditional computing with its step-by-step instructions. Rather, it learns from data. Deep learning is far less transparent than traditional computer programming, leaving important questions—what has the system learned, what does it know?

As a chemistry professor I like to design tests that have at least one difficult question that stretches the students' knowledge to establish whether they can combine different ideas and synthesize new ideas and concepts. We have devised such a question for the poster child of AI advocates, AlphaFold, which has solved the protein-folding problem.

Protein folding

Proteins are present in all living organisms. They provide the cells with structure, catalyze reactions, transport small molecules, digest food and do much more. They are made up of long chains of amino acids like beads on a string. But for a protein to do its job in the cell, it must twist and bend into a complex , a process called protein folding. Misfolded proteins can lead to disease.

In his chemistry Nobel acceptance speech in 1972, Christiaan Anfinsen postulated that it should be possible to calculate the three-dimensional structure of a protein from the sequence of its building blocks, the amino acids.

Figuring out what makes some proteins glow requires an understanding of chemistry. Credit: eLife - the journal, CC BY-SA

Within milliseconds of the exit of an amino acid chain (left) from the ribosome, it is folded into the lowest-energy 3D shape (right), which is required for the protein’s function. Credit: Marc Zimmer, CC BY-ND

Neurons expressing fluorescent proteins reveal the brain structures of two fruit fly larvae. Credit: Wen Lu and Vladimir I. Gelfand, Feinberg School of Medicine, Northwestern University

AlphaFold2 can take the amino acid sequence of fluorescent proteins (letters at the top) and predict their 3D barrel shapes (middle). This isn’t surprising. What is totally unexpected is that it can also predict which fluorescent proteins are ‘broken’ and can’t fluoresce. Credit: Marc Zimmer, CC BY-ND