AI Unfolding Life’s Building Blocks

AI Unfolding Life’s Building Blocks

Pathbreaking database can predict the exact 3D structure of a protein in minutes, down to atomic accuracy – thanks to neural network

Google’s DeepMindTechnologies has developed a solution that would fast-forward research timelines in biotechnology to an unimaginable extent. The British artificial intelligence subsidiary of Alphabet Inc. has now harnessed neural network technology to come up with a database that provides open access to over 200 million protein structure predictions to accelerate scientific research.

Called the AlphaFoldProtein Structure Database, and developed as a collaboration between DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI), the latest release contains over 200 million entries and is freely available to the scientific community.

Know your blocks

But first things first. What is the significance of this database?

Proteins are the building blocks of life. They are billions of tiny molecular machines that run the entire biological processes inside the body of any living organism. Currently, there are over 200 million known proteins, with many more found every year. Each one has a unique 3D shape that determines how it works and what it does.

Mapping out the exact 3D structure of a protein remains an expensive and often time-consuming process. Scientists have only been able to unravel the exact structure of a very tiny fraction of known proteins till date. But not knowing the exact structure means, they are unable to understand or control the mechanisms of these proteins – which is necessary to know how our physiology works, how diseases progress or how to tackle diseases.

The only solution is to find a method that can quickly predict the structure of millions of unknown proteins. And AlphaFold is doing just that. It is an AI system that can predict a protein’s 3D structure from its amino acid sequence – often within hours. This would have required months, years, or even decades in the conventional method.

The mystery lies in the folds

A protein molecule is like a string of beads consisting of different amino acids arranged in specific sequences. These sequences are assembled according to an organism’s genetic instructions embedded in the DNA. Various dynamics between the 20 different types of amino acids cause a string of protein to spontaneously fold or twist in various intricate patterns – like loops, curls, or pleats.

Understanding the 3D structure of a protein is important because that would help us to know how exactly other molecules will bind to that specific protein. That can be a key to follow – for example – how a particular group of bacteria would attach itself to that protein, and what kind of drug molecule could be deployed to counter that bacteria. This is a meticulous process, and the more clearly we can map the 3D structure of a protein, the more effectively we would be able to address such issues. But these spontaneous folds within a protein are puzzles in the process. They make it extremely difficult to follow the structural pattern, owing to the unexpected twists and turns and curls. This is called the “protein-folding problem”.For decades, scientists have been trying to find a method to reliably determine a protein’s structure just from its sequence of amino acids.

Image: A few 3D protein structures predicted by AlphaFold; Source: DeepMind

Neural network to the rescue

DeepMind and EMBL-EBI researchers started working on this challenge in 2016. AlphaFold relies on neural network technology trained on the sequences and structures of around 100,000 known proteins.This training material was derived from decades of manual research, using traditional methods to determine the structure of proteins. Such conventional experimental techniques were painstakingly laborious, time consuming, and most often involved millions of dollars. But AlphaFold can predict the shape of a protein, at scale and in minutes, down to atomic accuracy.

This is a game-changing AI breakthrough that can have an unimaginable impact on both medicine and biotechnology.

A free database

The first release of the AlphaFold database was announced on 22 July 2021. It contained over 350,000 structures, including the complete human proteome (meaning all of the 20,000-plus known proteins in the human body). It also covered the proteomes of 20 additional organisms that are central to modern biological research and routinely used in laboratories – like yeast, fruit fly, and mouse.

AlphaFold’s latest release on 28 July this year expands this database to over 200 million structures – including nearly all catalogued proteins known to science. This dramatically expanded the knowledge of protein structures. What’s more, this more than doubled the number of high-accuracy human protein structures available to scientists around the world.

The AlphaFold database is freely available for use to the scientific community. In case any unique protein is not available in the database, scientists can request fora customised structure analysis.  The developers will continue updating the database with structures for newly discovered protein sequences, and to improve features and functionality based on user feedback.

The database can be accessed at:

Know more about AlphaFold at:

Jumper, J et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021).

Varadi, M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research (2021).

Know more about the syllabus and placement record of our Top Ranked Data Science Course in KolkataData Science course in BangaloreData Science course in Hyderabad, and Data Science course in Chennai.

© 2023 Praxis. All rights reserved. | Privacy Policy
   Contact Us
Praxis Tech School
PG Program in Data Science