Report this

What is the reason for this report?

Predicting Molecular Structures with AlphaFold 2 and 3

Published on February 4, 2026
Predicting Molecular Structures with AlphaFold 2 and 3

Introduction

In October of 2024, the Nobel Prize in Chemistry recognized the work of AlphaFold 2, a deep learning model developed by Google DeepMind that accurately predicts the 3D structure of a protein solely from its amino acid sequence. To say this work is revolutionary feels like an understatement given how critical the role proteins play in biological processes and how interrelated the structure of a protein is to its function. Historically researchers would spend their entire PhD or years of their career working to solve the structure of a single protein using experimental methods like X-ray crystallography or cryo-electron microscopy. This slowed down a number of areas of biological research including identifying good drug candidates for treating a plethora of currently untreatable conditions.

Key Takeaways

  • AlphaFold 2 is not meant to rival wet lab experimental techniques, but rather provide testable hypotheses that can guide researchers in protein structure prediction. AlphaFold 2 was trained with data from the Protein Data Bank.

  • AlphaFold 3 has additional capabilities beyond Alphafold 2 including the ability to predict structures of complexes that contain all the molecular types present in the Protein Data Bank, except water molecules. AlphaFold 3 can handle:

    • Complexes of proteins with DNA, RNA, small molecule ligands, and ions
    • Structures with post-translational modifications of proteins (including glycosylation)
  • Both AlphaFold 2 and 3 use MSA (Multiple Sequence Alignment) as an input, but AlphaFold 3 uses RNA chains in addition to proteins.

  • AlphaFold 2 is free to use under the Apache 2.0 licence whereas AlphaFold 3 is limited to non-commercial use.

Prerequisites

Given this article covers molecular structure prediction, we recommend you have or acquire familiarity with biomolecules and related biological terminology (e.g., proteins, RNA, ligands, etc.). You will also need to request Alphafold 3 model parameters to use the Alphafold 3 model non-commercially. Access may be granted within 2-3 business days.

Furthermore, to follow the deployment instructions, the article assumes familiarity with technical concepts including cloud infrastructure (DigitalOcean GPU Droplets), command-line operations (SSH, git), and containerization using Docker.

Undestanding the Inputs

Multiple Sequence Alignment

AlphaFold2 uses Multiple Sequence Alignment (MSA) to understand the evolutionary relationships between proteins. MSA is effective because of the logic that amino acids in contact with one another are likely to both have mutations should a mutation be induced.

Model Architecture ( AF2 vs. AF3)

Feature AlphaFold 2 AlphaFold 3
Main Processor Evoformer: Deeply integrates MSA and pair features throughout. Pairformer: Simplifies MSA processing; focuses on pair representations.
3D Output Engine Structure Module: Uses physical/geometric biases (frames/torsions). Diffusion Module: Generative denoising of raw atom coordinates.
Symmetry Constraints Rigidly enforces rotation/translation invariance (SE(3)). Drops many explicit geometric constraints for greater flexibility.
Input Versatility Primarily optimized for amino acid sequences (proteins). Unified “token” system for proteins, nucleic acids, and small ligands.

Understanding the Outputs

Confidence Metrics

AlphaFold provides confidence metrics, such as pLDDT, pTM, and PAE.

pLDDT

pLDDT (predicted local distance difference test) refers to predicting the LDDT value (Local Distance Difference Test). This value is a measure of the confidence in local structure per-residue. In other words, how sure we are about the predicted structure agreeing with an experimental structure. This value is scaled from 0 to 100, where higher scores indicate higher confidence and usually a more accurate prediction.

pTM and ipTM

Predicted template modelling (pTM) score and interface predicted template modeling (ipTM) score are both derived from the template modeling score, which measures the accuracy of the entire structure. A pTM score over 0.5 suggests the predicted complex fold may match the true structure. ipTM assesses the predicted relative positioning of subunits: values above 0.8 indicate high-confidence predictions, below 0.6 suggests likely failure, and 0.6-0.8 is uncertain. For small structures or short chains (under 20 tokens), TM scoring becomes overly strict, yielding pTM values below 0.05; here, PAE or pLDDT better indicate prediction quality.

PAE

PAE, predicted aligned error, is a measure of how confident AlphaFold 2 is of the relative position and orientation of two residues (tokens) in the predicted structure. Higher values correspond to a higher predicted error and therefore a lower confidence.

Should I use AlphaFold 2 or 3?

AlphaFold 3 surpasses AlphaFold 2 in accuracy and can predict multi-molecule complexes, but licensing differences mean AlphaFold 2 remains essential for many users. AlphaFold 2 is freely available for academic and commercial use under the Apache 2.0 license, while AlphaFold 3 is restricted to non-commercial use only; you cannot use it for commercial research, to train competing ML models, or produce outputs for commercial purposes. Additionally, AlphaFold 3’s confidence scores for polymers are heavily influenced by non-polymer context like ions or ligands, so for polymer-only studies such as protein-protein interactions, you may need to add contextual molecules to get reliable scores, whereas AlphaFold 2 avoids this complexity though with slightly lower accuracy. Given these factors, Google DeepMind continues supporting AlphaFold 2 as a valuable tool for research and development.

Alphafold 2 on DigitalOcean

AlphaFold 2 requires downloading nearly ~2.5 TB of genetic databases (UniRef90, MGnify, BFD, etc.). We’re going to need to attach a Block Storage Volume (2.5 TB for AF2 or 1 TB for AF3) to house the genetic databases.

Step 1: Environment Setup (AF2)

Choose the “AI/ML Ready” Ubuntu image to ensure NVIDIA drivers and Docker are pre-installed.

Connect to your GPU Droplet via SSH:

ssh root@your_droplet_ip

We also need to ensure our system is up to date.

Update your local package index.

sudo apt update && sudo apt upgrade -y

Download the genetic databases and model parameters. This step may take some time.

scripts/download_all_data.sh /path/to/your/storage > download.log 2> download_error.log &

Step 3: Build the Docker image and install dependencies

docker build -f docker/Dockerfile -t alphafold .
pip3 install -r docker/requirements.txt

Step 4: Run the Model

python3 docker/run_docker.py \
  --fasta_paths=your_protein.fasta \
  --max_template_date=2022-01-01 \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir

Alphafold 3 on DigitalOcean

We’re also going to show you how to run AlphaFold 3 using our GPU Droplets. Note that this model is limited to non-commercial use. Fill out this form to receive access to the model parameters. Permission is typically granted within 2-3 business days.

Step 1: Environment Setup (AF3)

Choose the “AI/ML Ready” Ubuntu image to ensure NVIDIA drivers and Docker are pre-installed.

Connect to your GPU Droplet via SSH:

ssh root@your_droplet_ip

Step 2: Clone the Repository

Install git and download the AlphaFold 3 repository:

git clone https://github.com/google-deepmind/alphafold3.git
cd alphafold3

Step 3: Run the Alphafold 3 Model

docker build -t alphafold3 -f docker/Dockerfile .

docker run -it \
    --volume $HOME/af_input:/root/af_input \
    --volume $HOME/af_output:/root/af_output \
    --volume <MODEL_PARAMETERS_DIR>:/root/models \
    --volume <DB_DIR>:/root/public_databases \
    --gpus all \
    alphafold3 \
    python run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --model_dir=/root/models \
    --output_dir=/root/af_output

FAQ

What are the primary differences between AlphaFold 2 and AlphaFold 3?

While AlphaFold 2 revolutionized protein structure prediction, AlphaFold 3 expands the horizon significantly. The key differences include:

  • Molecular Scope: AF2 focuses almost entirely on proteins. AF3 can predict complexes involving DNA, RNA, ligands, and ions.
  • Architecture: AF2 uses the “Evoformer” module, whereas AF3 utilizes a simpler “Pairformer” and a diffusion-based head.
  • Licensing: This is the big one. AF2 is under the Apache 2.0 license (commercial use allowed), while AF3 is currently restricted to non-commercial use.

Why do I need 2.5 TB of storage for AlphaFold 2?

The model itself doesn’t take much storage - it’s actually the genetic databases. AlphaFold 2 relies on Multiple Sequence Alignment (MSA) to understand how the proteins have evolved. To do this, it needs to search through massive libraries like UniRef90, MGnify, and the Big Fantastic Database (BFD).

Note: If you are tight on storage, AlphaFold 3 requires a smaller database footprint (~1 TB) compared to the full AlphaFold 2 stack.

How do I interpret the $pLDDT$ confidence scores?

The pLDDT (predicted Local Distance Difference Test) is a per-residue measure of confidence on a scale of 0 to 100:

  • 90: High confidence; these regions are likely highly accurate and can be used for detailed structural analysis.

  • 70 - 90: Good confidence; the backbone is likely correct.
  • 50 - 70: Low confidence; use caution.
  • < 50: Very low confidence; these regions are often “intrinsically disordered,” meaning they don’t have a fixed 3D structure in isolation.

Can I run AlphaFold on a standard Droplet without a GPU?

Technically, the “folding” (inference) could run on a CPU, but it would be prohibitively slow - this would be days or weeks for complex structures that a GPU handles in minutes. Furthermore, the AlphaFold Docker images are optimized for NVIDIA’s CUDA toolkit. For any practical research, a DigitalOcean GPU Droplet is essentially a requirement.

What is a SMILES string, and why does AlphaFold 3 need it?

SMILES stands for Simplified Molecular-Input Line-Entry System. It is a notation system that represents chemical structures as a string of text. Since AlphaFold 3 can predict how proteins interact with small molecule drugs (ligands), you provide the drug’s structure via a SMILES string (e.g., CC(=O)OC1=CC=CC=C1C(=O)O for Aspirin).

Is the predicted structure “final”?

Not necessarily. While AlphaFold is exceptionally accurate, it is a predictive model, not an experimental observation. In drug discovery or molecular biology, AlphaFold is best used to generate testable hypotheses but guide verifiable wet lab experiments.

Conclusion

Congrats on making it through. You just (hopefully succesfully) deployed Alphafold 2 and/or 3 on a DigitalOcean GPU Droplet with DigitalOcean Volume Storage. AlphaFold makes structural biology more accessible, enabling researchers worldwide to gain insights that would have required years of expensive experimental work. However, some researchers caution that relying solely on AlphaFold without experimental validation—particularly in drug discovery—can lead to erroneous mechanistic models, underscoring the continued importance of integrating computational predictions with laboratory verification. Beyond its immediate scientific impact, AlphaFold exemplifies how artificial intelligence can tackle complex scientific challenges, offering a glimpse of how computational tools will continue to reshape our understanding of the natural world and our ability to address pressing challenges in scientific discovery.

References and Additional Resources

AlphaFold 2 paper: Highly accurate protein structure prediction with AlphaFold

AlphaFold3 paper: Accurate structure prediction of biomolecular interactions with AlphaFold 3

Multiple Sequence Alignment : This is a great resource for building your intuition around MSA.

Predicting protein structures with ColabFold and AlphaFold2 Colab : This will provide you with a background with running AlphaFold without installing and running the full AlphaFold2 software. Here, the most important parameters that can be changed are : number of recycles, depth of the multiple sequence alignment (MSA), random seeds used to initialize predictions, and whether to supply a template structure. The output of ColabFold includes PDB files, pLDDT values/plots, MSA file, etc.

AlphaFold Server FAQ: AlphaFold Server is another alternative to running AlphaFold3.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

Melani Maheswaran
Melani Maheswaran
Author
See author profile

Melani is a Technical Writer at DigitalOcean based in Toronto. She has experience in teaching, data quality, consulting, and writing. Melani graduated with a BSc and Master’s from Queen's University.

Category:
Tags:

Still looking for an answer?

Was this helpful?


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Creative CommonsThis work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.