How to Use an AlphaFold Structure for Molecular Docking: Complete Workflow
AlphaFold has made high-quality protein structures available for hundreds of millions of proteins — most of which have never been crystallized. This tutorial bridges the structure prediction and docking pillars: how to go from an AlphaFold prediction to a docking-ready receptor, with every consideration that makes AlphaFold docking different from docking into a crystal structure.
Before you start: is AlphaFold the right receptor?
AlphaFold structures are the right starting point for docking when no experimental structure exists — or when an existing crystal structure is of poor quality, incomplete, or in the wrong conformation. Before committing to an AlphaFold receptor, check two things:
- Does an experimental structure exist? Search the RCSB PDB (rcsb.org) for your target. If a high-resolution crystal structure exists in an apo or relevant-conformation state, use that — experimental structures are ground truth. AlphaFold is the fallback, not the first choice.
- What conformation do you need? AlphaFold predicts the apo (unbound) state. If your target is known to adopt a dramatically different conformation when bound to an inhibitor — like many kinases, GPCRs, or nuclear receptors — and a homolog crystallized in that conformation exists in the PDB, consider homology modeling from that template instead.
If no experimental structure exists and AlphaFold is your best option, continue with this workflow.
Step 1 — Download from the AlphaFold Database
Go to alphafold.ebi.ac.uk. Search for your protein by UniProt ID (e.g. P04637 for human TP53) or gene name. On the protein page, download two files — both are essential:
- PDB file — the 3D coordinates. The pLDDT score for each residue is stored in the B-factor column.
- JSON file — contains the full pLDDT array and the complete PAE matrix. You need this to evaluate inter-domain confidence and to generate the PAE plot.
If your protein isn’t in the database — for example a novel sequence or mutant — run a prediction with ColabFold first, then return to this workflow with the output PDB and JSON files.
# Download PDB and confidence JSON for UniProt ID P04637
wget https://alphafold.ebi.ac.uk/files/AF-P04637-F1-model_v4.pdb
wget https://alphafold.ebi.ac.uk/files/AF-P04637-F1-predicted_aligned_error_v4.json
The filename convention is AF-{UniProtID}-F{fragment}-model_v{version}.pdb. Fragment number is 1 for most proteins; longer proteins above ~2700 residues are split into overlapping fragments numbered F1, F2, etc.
Step 2 — Assess pLDDT and PAE confidence
This step determines whether it’s worth docking at all. Load the PDB in PyMOL and color by pLDDT to immediately see where confidence is high and where it isn’t:
# Load and color by pLDDT (stored in B-factor column)
load AF-P04637-F1-model_v4.pdb
spectrum b, blue_cyan_yellow_orange_red, minimum=50, maximum=100
show cartoon
set cartoon_fancy_helices, 1
Now focus specifically on the predicted binding site — the region where you intend to dock. Ask: what is the pLDDT of the residues lining the binding pocket?
If the binding site residues have pLDDT > 70: the pocket geometry is reliable enough to proceed. Most docking studies report the mean pLDDT of binding site residues as a confidence metric.
If the binding site has pLDDT 50–70: proceed with caution. Consider running short MD simulation to relax the structure first, or running an ensemble docking protocol with multiple conformations.
If key binding site residues have pLDDT below 50: stop. Docking into a low-confidence binding site is not a reliable prediction — the pocket geometry is uncertain and results will be misleading.
Step 3 — Identify and evaluate the binding site
With a crystal structure, the binding site is typically obvious — the co-crystallized ligand sits in it, and you center your docking grid on that ligand. With an AlphaFold structure, you must identify the binding site independently.
Option A: known binding site from literature or mutagenesis
If published mutagenesis data, biochemical studies, or structural studies on homologs identify which residues are important for binding, use those to define your docking grid box center. In PyMOL, select those residues and calculate their center of mass:
# Select known active site residues
select active_site, resi 175+248+249+273+282
# Calculate center of mass for grid box centering
centerofmass active_site
Option B: computational binding site prediction
If the binding site is unknown, use a pocket detection tool. fpocket (free, open source) is the standard choice — it identifies putative binding pockets by rolling a sphere across the protein surface and identifying concave regions:
# Install fpocket
conda install -c conda-forge fpocket -y
# Run pocket prediction
fpocket -f AF-P04637-F1-model_v4.pdb
# Results in AF-P04637-F1-model_v4_out/ directory
# Pockets ranked by druggability score — examine top 3
Examine the top-ranked pockets in PyMOL. Load the pocket PDB files output by fpocket and check whether they coincide with conserved residues, known functional sites, or high-pLDDT regions. A high-druggability pocket score in a low-pLDDT region is not useful — always cross-reference fpocket hits with the confidence map.
Step 4 — Prepare the structure for docking
AlphaFold structures require the same preparation as crystal structures — plus additional steps specific to predicted models. Run through this sequence in PyMOL before any docking tool processing:
1. Remove the signal peptide and propeptide (if present)
AlphaFold models the full UniProt sequence, which may include signal peptides, propeptides, or transit peptides that are cleaved in the mature protein. Check the UniProt entry for your protein — the “Chain” annotation under PTM/Processing lists the exact residue range of the mature protein. Remove the non-mature regions before docking.
remove resi 1-24
save protein_mature.pdb
2. Remove low-confidence terminal regions
N- and C-terminal disordered tails with pLDDT below 50 contribute nothing to docking and can interfere with grid box definition. Remove them:
# Identify and remove very low-confidence terminal residues
select low_term, (resi 1-20 or resi 390-400) and b < 50
remove low_term
3. Add hydrogens and assign protonation states
AlphaFold PDB files do not include hydrogen atoms. Add them and assign protonation states at pH 7.4 using H++ (newapp.chemistry.gatech.edu/h++) or PropKa before converting to PDBQT format. Pay particular attention to histidines in and around the binding site — their protonation state affects binding pocket electrostatics significantly.
4. Energy minimization
AlphaFold structures sometimes contain minor geometric imperfections — slightly non-ideal bond lengths, angles, or rotamers — that cause problems during PDBQT conversion or docking. A brief energy minimization in PyMOL or with GROMACS before conversion resolves these:
# Quick geometry cleanup in PyMOL
load protein_mature.pdb
optimize # basic energy minimization in PyMOL
save protein_minimized.pdb
5. Generate the PDBQT receptor
Convert to PDBQT format using AutoDockTools or Open Babel, following the same preparation steps as for any protein receptor:
python prepare_receptor4.py \
-r protein_minimized.pdb \
-o receptor.pdbqt \
-A hydrogens \
-U nphs_lps
Step 5 — Handle low-confidence regions
AlphaFold models of most proteins contain at least some low-confidence regions. How you handle them depends on their proximity to the binding site:
Low-confidence regions far from the binding site
If pLDDT-low regions (below 70) are distant from your intended binding site — terminal regions, surface loops on the opposite face — you have three options, all acceptable: leave them as-is (simplest, doesn't affect docking if truly distant), remove them (cleaner structure, eliminates any steric artifacts), or restrain them during subsequent MD simulation. For simple docking studies, leaving distal low-confidence regions in place is fine.
Low-confidence loops adjacent to the binding site
This is the most consequential case. A flexible loop that borders the binding pocket — even one that isn't directly lining it — affects the pocket shape and volume. If AlphaFold modeled it in an arbitrary conformation (as it often does for low-pLDDT loops), that conformation may occlude or artificially open the pocket.
The best approaches, in order of rigor:
- Ensemble docking: Run a short MD simulation (10–50 ns) of the AlphaFold structure, sample multiple frames where the loop adopts different conformations, and dock into each. Report results from the most populated conformation consistent with the literature.
- Loop refinement: Use a loop modeling tool (Rosetta Remodel, MODELLER loopmodel) to generate an ensemble of loop conformations and dock into all of them.
- Acknowledge the limitation: For simpler studies, perform docking in the AlphaFold conformation, report the pLDDT of loop residues in the methods section, and discuss the limitation explicitly. This is acceptable for hypothesis-generating work.
Step 6 — Run and validate the docking
Run docking with AutoDock Vina, GNINA, or your preferred engine using the prepared receptor. The docking protocol itself is identical to standard docking — but validation is more demanding when using a predicted receptor.
Unlike crystal structure docking, you cannot perform standard self-docking validation (redocking the co-crystallized ligand) because no co-crystallized ligand exists. Use these alternatives instead:
- Cross-docking validation: If known actives exist for your target (from ChEMBL, literature IC50 data), dock them and verify they rank above known inactives. This tests whether the predicted binding site can distinguish binders from non-binders.
- Pharmacophore consistency: Check whether the top-ranked docking pose makes interactions with residues known to be important from mutagenesis or biochemical data. A pose that forms H-bonds with catalytic residues or known pharmacophore anchors is more credible than one that doesn't.
- MD validation: Run a 50–100 ns MD simulation of the protein-ligand complex. A binding mode that is stable in MD and maintains key interactions throughout is significantly more credible than a docking score alone.
- Comparison to homolog structures: If the binding site is conserved in a homolog with a crystal structure, dock your compound into the homolog structure and compare binding modes. Consistent binding geometry between the AlphaFold prediction and the experimental homolog structure supports the reliability of both.
AlphaFold vs crystal structure: key differences for docking
| Property | AlphaFold structure | Crystal structure |
|---|---|---|
| Conformation | Apo state — may differ from active/bound form | Can capture ligand-bound, active, or specific states |
| Binding site geometry | Uncertain for flexible loops; confident for well-folded cores | Experimentally determined — ground truth at given resolution |
| Self-docking validation | Not possible — no co-crystallized ligand | Standard validation — redock native ligand, check RMSD <2 Å |
| Confidence information | pLDDT + PAE — quantified per residue and per pair | B-factors indicate mobility but not prediction confidence |
| Availability | 200M+ proteins, instant download | Limited to crystallizable proteins — ~200,000 in PDB |
| Water molecules | Not modeled — no structural waters in binding site | Structural waters often present, may mediate binding |
| Crystal contacts / packing artifacts | None — free from crystal packing distortions | Crystal contacts can distort loops and surface residues |
What to report in your methods section
When publishing docking results using an AlphaFold receptor, reviewers expect explicit documentation of the confidence assessment. A complete methods statement should include:
- AlphaFold Database version and model version used (e.g. v4)
- Mean pLDDT of binding site residues (e.g. "mean pLDDT of active site residues: 87.3")
- PAE assessment for multi-domain proteins
- Which regions were removed or handled specially (e.g. signal peptide, low-confidence loops)
- What validation was used in place of self-docking (cross-docking, pharmacophore, MD)
- Statement that no experimental structure was available and the limitation this imposes
AlphaFold docking in one paragraph
Using an AlphaFold structure for docking is more viable than ever — but requires additional confidence checks that crystal structure docking doesn't demand. Check pLDDT for the binding site residues before proceeding: above 70 is workable, below 50 is not. Check PAE for multi-domain targets. Remove signal peptides and disordered terminal regions. Minimize before PDBQT conversion. Because self-docking validation is impossible, validate with known actives, pharmacophore consistency, or MD simulation of the docked complex. Report the binding site pLDDT and your validation approach in the methods section — reviewers will ask if you don't.