How to Use an AlphaFold Structure for Molecular Dynamics Simulation
AlphaFold structures can be simulated directly with GROMACS — but they need additional preparation steps that crystal structures don’t require. This tutorial covers everything from pLDDT assessment and handling disordered regions to running the simulation and validating results that reviewers will trust.
Why AlphaFold structures need special MD preparation
Crystal structures are experimentally determined — the atomic positions are measured, not predicted. They come with known limitations (resolution, B-factors, crystal contacts) but are fundamentally grounded in physical reality. AlphaFold structures are computational predictions with a different set of characteristics that require adjusted preparation and validation workflows.
- Atom positions measured at known resolution
- Contains crystallographic waters — some may be structural
- May have alternate conformations, crystal contacts
- Missing residues from disordered regions
- Standard energy minimization sufficient
- Self-docking validation available to confirm prep
- Atom positions predicted — confidence varies per residue
- No crystal waters — solvent must be added entirely
- No alternate conformations — one model only
- Disordered regions are modeled (unreliably)
- Extended minimization + graduated restraints recommended
- No experimental validation baseline — extra checks required
The core practical consequence: disordered regions predicted by AlphaFold — those with low pLDDT — are present in the model but in arbitrary conformations. When you start MD, these regions will rapidly deviate from their predicted positions, which is expected and correct behavior — they’re genuinely disordered. The danger is treating their predicted coordinates as meaningful starting geometry, or having them crash the simulation through steric clashes.
Step 1 — Assess pLDDT before doing anything else
Load the AlphaFold PDB in PyMOL and immediately color by pLDDT (stored in the B-factor column). This gives you a spatial map of where to be careful before any other preparation step.
# Load and color by confidence
load AF-P04637-F1-model_v4.pdb
spectrum b, blue_cyan_yellow_orange_red, minimum=50, maximum=100
show cartoon
Use this confidence map to make decisions about every subsequent preparation step. Here’s how pLDDT translates to MD-specific decisions:
Step 2 — Handle disordered regions and signal peptides
Remove signal peptides and propeptides
AlphaFold models the full UniProt sequence, which includes signal peptides, propeptides, and transit sequences that are cleaved in the mature protein. These should not be in your MD system — they’re not present in the biologically relevant form. Check the UniProt entry for your protein under PTM/Processing → Chain to identify the mature protein boundaries.
# Example: remove signal peptide residues 1-24
remove resi 1-24
save protein_mature.pdb
Decision: keep or remove low-pLDDT terminal regions
N- and C-terminal disordered tails (pLDDT below 50) should generally be removed before MD simulation if they are not the focus of your study. They add computational cost, increase the likelihood of simulation instability, and their dynamics are not meaningful because their starting positions are arbitrary.
To identify and remove them systematically in PyMOL:
# Find the first and last residues with pLDDT above 50
iterate (name CA and b > 50), print(resi)
# Remove terminal low-confidence residues (adjust range to your protein)
remove resi 1-12 or resi 488-500
save protein_cleaned.pdb
When to keep low-pLDDT regions
If your research question is specifically about a disordered region — a flexible linker, an intrinsically disordered regulatory domain, or a loop that you believe is functionally important — keep it in the simulation. But document clearly in your methods that the starting conformation of that region has pLDDT below 50 and should be considered arbitrary rather than a biologically meaningful starting point. The MD simulation itself will sample conformations for that region, which is more informative than the AlphaFold model.
Step 3 — Generate the topology with pdb2gmx
Run pdb2gmx as you would for any protein, with two additional considerations specific to AlphaFold structures:
gmx pdb2gmx \
-f protein_cleaned.pdb \
-o protein.gro \
-p topol.top \
-water tip3p \
-ff amber99sb-ildn \
-ignh \
-his
The -his flag is especially important for AlphaFold structures. Crystal structures often have pH-derived protonation state information from the crystallographic methods; AlphaFold models do not. GROMACS must assign protonation states to every histidine, and the automatic assignment may be wrong for active-site residues. The -his flag prompts you to assign each histidine manually — check your literature for known protonation states at pH 7.4, or run PropKa/H++ on the cleaned PDB first to get predictions.
pdb2gmx, check for unresolved atom types before proceeding: grep " ? " topol.top should return nothing. AlphaFold sometimes includes non-standard residues or modified amino acids from the UniProt annotation that GROMACS force fields don’t recognize. Fix these before solvation — they’ll cause silent errors later if left in place.
Step 4 — Extended energy minimization
AlphaFold structures may have more geometric imperfections than crystal structures — particularly in predicted loop regions and around the junction between well-folded and disordered segments. Use a more aggressive minimization protocol with a tighter convergence criterion:
integrator = steep
emtol = 500 ; stricter than standard 1000 kJ/mol/nm
emstep = 0.01
nsteps = 100000 ; more steps than standard 50,000
cutoff-scheme = Verlet
coulombtype = PME
rcoulomb = 1.2
rvdw = 1.2
pbc = xyz
Run minimization after solvation and ion addition (following the standard GROMACS preparation workflow):
gmx grompp -f minim.mdp -c system.gro -p topol.top -o em.tpr
gmx mdrun -v -deffnm em
If minimization does not converge even at 100,000 steps, a low-pLDDT region almost certainly has a severe geometric clash. Identify the problematic residue from the Fmax atom number reported in the log, then consider trimming or restraining that region more aggressively before re-running.
Step 5 — Restrained equilibration protocol
Standard MD equilibration uses position restraints on protein heavy atoms during NVT and NPT phases, then releases them for production. For AlphaFold structures, a graduated restraint release protocol is strongly recommended — especially for proteins with significant low-confidence regions.
The rationale: releasing all restraints simultaneously can cause low-pLDDT regions to undergo violent, unphysical motions in the early production phase, potentially destabilizing the well-folded core. Graduated release gives each part of the structure time to equilibrate in the presence of the force field before full freedom is granted.
A practical four-stage protocol:
- NVT equilibration — strong restraints on all backbone and side chain heavy atoms (1000 kJ/mol/nm² force constant). 200 ps.
- NPT equilibration phase 1 — strong backbone restraints (1000), weaker side chain restraints (500). 200 ps.
- NPT equilibration phase 2 — moderate backbone restraints (200), no side chain restraints. 200 ps. This stage is particularly important — it allows disordered regions to relax their side chains while backbone geometry is still guided.
- NPT equilibration phase 3 — weak backbone restraints (50) only. 200 ps. By this point most geometric tensions should be resolved.
- Production MD — no restraints. Standard run.
posre.itp with a default force constant of 1000 kJ/mol/nm². To use different force constants, create separate ITP files for each equilibration stage (e.g., posre_200.itp, posre_50.itp) with the appropriate force constant values, and reference them with different -DPOSRES defines in each MDP file.
Step 6 — Production MD and analysis
The production MD MDP file for an AlphaFold structure is identical to a crystal structure run — same force field, same timestep, same ensemble settings (Nosé-Hoover thermostat, Parrinello-Rahman barostat). Aim for at least 100 ns for a standard analysis; 200–500 ns gives more reliable flexibility profiles and convergence.
During analysis, track high-pLDDT and low-pLDDT regions separately:
# Create index group for high-confidence residues
# First identify which residues have pLDDT > 70 from the JSON
gmx make_ndx -f md.tpr -o highconf.ndx
# In the prompt: select residues known to be high-confidence
# Then calculate RMSD of only those residues
gmx rms -s md.tpr -f md_centered.xtc -n highconf.ndx -o rmsd_core.xvg -tu ns
Separating core RMSD from loop RMSD gives you a cleaner equilibration picture — the well-folded core should plateau within a few nanoseconds, while disordered loops may fluctuate throughout, which is expected and correct.
Validation — what’s different for predicted structures
Standard MD validation checks (RMSD plateau, energy conservation, density stabilization) all apply. AlphaFold-specific validation adds several additional steps:
-
RMSD plateaus for the high-pLDDT core during production. The overall protein backbone is stable — not still drifting at the end of the run.
-
Temperature, pressure, and density all stable throughout production. Energy conservation verified with a short NVE test run.
-
Compare simulated RMSF to predicted pLDDT. Low-pLDDT residues should show high RMSF (high flexibility) in the simulation. High RMSF in high-pLDDT regions is unexpected and warrants investigation — it may indicate a force field incompatibility or a region that AlphaFold predicted incorrectly.
-
Check secondary structure persistence. Helices and sheets in high-pLDDT regions should remain intact throughout the simulation. Rapid unfolding of a predicted secondary structure element suggests the AlphaFold conformation was not energy-stable in the force field — a known issue for some regions.
-
Compare to experimental data where available. If any experimental information exists — SAXS envelope, HDX-MS flexibility data, mutagenesis, NMR chemical shifts — compare to your simulation. Consistency with independent experimental data is the strongest validation you can provide for an AlphaFold-based MD study.
-
Run at least two independent replicas. This is important for any MD study, but especially for AlphaFold-based simulations where the starting conformation carries more uncertainty. If two replicas initialized with different velocities show consistent behavior of the high-pLDDT regions, the results are robust.
What to report in your methods section
AlphaFold MD in one paragraph
Using an AlphaFold structure for molecular dynamics simulation is straightforward if you treat the pLDDT map as a preparation guide. Before anything else, assess confidence — remove signal peptides, trim extremely low-pLDDT terminal regions, and flag any disordered loops near your region of interest. Use a stricter energy minimization criterion to resolve geometry issues in predicted regions, and use graduated restraint release during equilibration rather than releasing all restraints simultaneously. In analysis, validate that high-pLDDT regions behave like well-folded proteins while low-pLDDT regions show the expected flexibility. Report your confidence assessment and any special handling in the methods section — reviewers increasingly know to look for it.