It was only in 1957 that scientists gained special access to the molecular third dimension.
After 22 years of exhaustive experiments, John Kendrew of the University of Cambridge has finally revealed the 3D structure of the protein. It was the twisted blueprint of myoglobin, the fibrous chain of 154 amino acids that helps supply our muscles with oxygen. As revolutionary as this discovery was, Kendrew did not open the floodgates of protein architecture. Less than a dozen more would be identified over the next decade.
Fast forward to today, 65 years since that Nobel Prize-winning breakthrough.
On Thursday, Google’s sister company, DeepMind, announced that it had successfully used artificial intelligence to predict the 3D structures of nearly every cataloged protein known to science. That’s more than 200 million proteins found in plants, bacteria, animals, humans—just about anything you can imagine.
“You can basically think of it as covering the entire protein universe,” Demis Hassabis, founder and CEO of DeepMind, told reporters this week.
This is thanks to AlphaFold, DeepMind’s breakthrough artificial intelligence system, which has an open-source database so that scientists around the world can plug it into their research at will and for free. Since AlphaFold’s official launch in July of last year — when it identified only about 350,000 3D proteins — the program has made a noticeable dent in the research landscape.
“More than 500,000 researchers and biologists have used the database to view more than 2 million structures,” Hassabis said. “And these predictive structures have helped scientists make great new discoveries.”
For example, in April, researchers at Yale University asked the AlphaFold database to help them develop a new, highly effective malaria vaccine. And in July last year, scientists at the University of Portsmouth used the system to create enzymes to fight single-use plastic pollution.
“That put us a year ahead, if not two,” John McGeehan, director of the Portsmouth Center for Enzyme Innovation and the researcher behind the second study, told the New York Times.
These efforts are just a small sample of AlphaFold’s maximum reach.
“In the last year alone, there have been over a thousand scientific papers on a wide range of research topics that use AlphaFold structures; I’ve never seen anything like it,” Sameer Velankar, DeepMind collaborator and team leader at the European Laboratory for Molecular Biology. Protein Databank, she said in a press release.
Others who have used the database, according to Hassabis, include those trying to improve our understanding of Parkinson’s disease, people hoping to protect the health of bees, and even some who want to gain valuable insight into human evolution.
“AlphaFold is already changing the way we think about the survival of molecules in the fossil record, and I see it soon becoming an essential tool for researchers working not only in evolutionary biology, but also in archeology and other paleosciences,” Beatrice Demarchi, associate professor at the University of Turin, who recently used the system in a study of the ancient egg controversy, said in a press release.
In the coming years, DeepMind also intends to work with teams from the Drugs For Neglected Diseases Initiative and the World Health Organization to find cures for little-studied but ubiquitous tropical diseases such as Chagas disease and leishmaniasis.
“It’s going to get a lot of researchers around the world thinking about what kind of experiments they could do,” Ewan Birney, a DeepMind associate and deputy director of EMBL, told reporters. “And think about what’s going on in the organisms and systems they’re studying.”
Locks and keys
So why do so many scientific advances depend on this treasure trove of 3D protein modeling? Let’s explain it.
Suppose you are trying to make a key that fits perfectly into a lock. But you have no way to view the structure of that lock. All you know is that this lock exists, some data about its materials, and maybe numerical information about how big each ridge is and where those ridges should be.
It might not be impossible to develop this key, but it would be quite difficult. The keys must be accurate or they won’t work. Therefore, before you begin, you will probably try your best to model a few different fake locks with whatever information you have so that you can make your key.
In this analogy, the lock is a protein and the key is a small molecule that binds to that protein.
For scientists, whether they are doctors trying to create new drugs or botanists dissecting the anatomy of plants to make fertilizers, the interplay between certain molecules and proteins is crucial.
For example, with drugs, the specific way a molecule in the drug binds to a protein can be the tipping point as to whether it works. This interaction gets complicated because even though proteins are just chains of amino acids, they are not straight or flat. Inevitably, they fold, bend, and sometimes get tangled around themselves like headphone wires in your pocket.
In fact, a protein’s unique folds determine how it functions—and even the smallest folding errors in the human body can lead to disease.
But going back to small molecule drugs, sometimes pieces of the folded protein are blocked from binding the drug. It could be that they are folded in a special way that makes them inaccessible, for example. Things like this are very important information for scientists trying to get their drug molecule to stick. “I think it’s true that almost every drug that has come to market in the last few years has been designed in part based on knowledge of protein structures,” EMBL researcher Janet Thornton told the conference.
This is why researchers typically spend an incredible amount of time and effort to decode the complex 3D structure of the protein they are working with, the way you would begin your journey to making keys by assembling a mold of a lock. If you know the exact structure, it is much easier to say where and how a molecule would attach to a given protein, as well as how that attachment might affect the folding of the protein in reaction.
But this effort is not easy. Or cheap.
“The cost of solving a new, unique structure is in the order of $100,000,” Steve Darnell, a structural and computational biologist at the University of Wisconsin and a researcher at the bioinformatics company DNAStar, said in a statement.
That’s because the solution usually comes from great complex laboratory experiments.
Kendrew, for example, has used a technique called X-ray crystallography in the past. Basically, the method requires you to take solid crystals of the protein of interest, place them in an X-ray beam, and watch what pattern the beam produces. This pattern is very much a position thousands of atoms inside the crystal. Only then can you use the pattern to reveal the structure of the protein.
There is also a newer technique known as cryo-electron microscopy. This is similar to X-ray crystallography, except that the protein sample is directly shot with electrons instead of an X-ray beam. And while it’s considered much higher resolution than other techniques, it can’t exactly penetrate everything. Further in the realm of technology, some have attempted to digitally create protein folding structures. But the first attempts, like a few attempts in the 1980s and 1990s, were not great. As you can imagine, laboratory methods are also tedious – and difficult.
Over the years, these obstacles have led to what is called the “protein folding problem”. Quite simply, scientists do not know how proteins fold, and they have faced considerable obstacles to overcome this problem.
AlphaFold’s AI could be a game changer.
Solving the “folding problem”
In short, AlphaFold was trained by DeepMind engineers to predict protein structures without the need for a lab. No crystals, no burning electrons, no $100,000 experiments.
To get AlphaFold to where it is today, the system was first exposed to 100,000 known protein folding structures, according to the company’s website. Then, as time went on, learning how to decode the rest began.
It really is that straightforward. (Well, except for the talent that went into AI coding.)
“I don’t know, it takes at least $20,000 and a lot of time to crystallize the protein,” Birney said. “That means the experimenters have to decide what they’re going to do – AlphaFold hasn’t had to make that decision yet.” This thoroughness feature of AlphaFold is quite fascinating. This means that scientists have more freedom to guess and check, follow a hunch or instinct, and cast a wide net in their research when it comes to protein structures. They won’t have to worry about costs or timelines.
“Models also come with prediction error,” said Jan Kosinski, a DeepMind collaborator and structural modeler at EMBL in Hamburg, Germany. “And usually—actually, in many cases—the error is really small. So we call it almost atomic precision.”
Furthermore, the DeepMind team also says that it has conducted a wide range of risk assessments to ensure that AlphaFold is safe and ethical to use. The DeepMind team also suggested that artificial intelligence in general may carry biosecurity risks that we hadn’t thought to evaluate before — especially as such technology continues to penetrate the medical space.
But as the future unfolds, the DeepMind crew says AlphaFold will adapt seamlessly and address such concerns on a case-by-case basis. So far, it seems to be working—with the protein model universe returning to the modest portrait of myoglobin.
“Until two years ago,” Birney said, “we just didn’t realize it was possible.”
Correction at 6:45 a.m. PT: Janet Thornton’s last name and title have been corrected.