How to Use an AlphaFold Structure for Molecular Dynamics Simulation

How to Use an AlphaFold Structure for Molecular Dynamics Simulation

AlphaFold structures can be simulated directly with GROMACS — but they need additional preparation steps that crystal structures don’t require. This tutorial covers everything from pLDDT assessment and handling disordered regions to running the simulation and validating results that reviewers will trust.

Why AlphaFold structures need special MD preparation

Crystal structures are experimentally determined — the atomic positions are measured, not predicted. They come with known limitations (resolution, B-factors, crystal contacts) but are fundamentally grounded in physical reality. AlphaFold structures are computational predictions with a different set of characteristics that require adjusted preparation and validation workflows.

Crystal structure for MD
Experimentally determined
  • Atom positions measured at known resolution
  • Contains crystallographic waters — some may be structural
  • May have alternate conformations, crystal contacts
  • Missing residues from disordered regions
  • Standard energy minimization sufficient
  • Self-docking validation available to confirm prep
AlphaFold structure for MD
Computationally predicted
  • Atom positions predicted — confidence varies per residue
  • No crystal waters — solvent must be added entirely
  • No alternate conformations — one model only
  • Disordered regions are modeled (unreliably)
  • Extended minimization + graduated restraints recommended
  • No experimental validation baseline — extra checks required

The core practical consequence: disordered regions predicted by AlphaFold — those with low pLDDT — are present in the model but in arbitrary conformations. When you start MD, these regions will rapidly deviate from their predicted positions, which is expected and correct behavior — they’re genuinely disordered. The danger is treating their predicted coordinates as meaningful starting geometry, or having them crash the simulation through steric clashes.

Step 1 — Assess pLDDT before doing anything else

1
Step 1 — Critical
Map confidence onto the structure before touching the PDB

Load the AlphaFold PDB in PyMOL and immediately color by pLDDT (stored in the B-factor column). This gives you a spatial map of where to be careful before any other preparation step.

PyMOL command line
# Load and color by confidence
load AF-P04637-F1-model_v4.pdb
spectrum b, blue_cyan_yellow_orange_red, minimum=50, maximum=100
show cartoon

Use this confidence map to make decisions about every subsequent preparation step. Here’s how pLDDT translates to MD-specific decisions:

pLDDT
What it means for MD
What to do
90–100
Well-folded, reliable backbone and side chains. Will simulate normally.
Prepare and simulate normally — no special handling needed.
70–90
Good confidence. Minor geometric imperfections possible. Backbone reliable.
Standard preparation. Extended minimization recommended before equilibration.
50–70
Flexible or uncertain region. Will deviate significantly from predicted position in MD.
Flag this region. Apply position restraints during equilibration. Interpret dynamics carefully.
< 50
Likely disordered. Predicted position is arbitrary — will move dramatically in early MD.
Consider removing if terminal or non-essential. If retained, use strong restraints during equilibration; document in methods.
Low pLDDT regions can crash your simulation
AlphaFold models low-confidence disordered regions in arbitrary extended or partially folded conformations. When MD starts, these regions can have severe steric clashes with each other or with the folded core — generating forces large enough to cause the simulation to explode immediately. Extended energy minimization (described in Step 4) resolves most of these, but if minimization fails to converge, a low-pLDDT region is often the culprit.

Step 2 — Handle disordered regions and signal peptides

2
Step 2
Clean the structure before topology generation

Remove signal peptides and propeptides

AlphaFold models the full UniProt sequence, which includes signal peptides, propeptides, and transit sequences that are cleaved in the mature protein. These should not be in your MD system — they’re not present in the biologically relevant form. Check the UniProt entry for your protein under PTM/Processing → Chain to identify the mature protein boundaries.

PyMOL
# Example: remove signal peptide residues 1-24
remove resi 1-24
save protein_mature.pdb

Decision: keep or remove low-pLDDT terminal regions

N- and C-terminal disordered tails (pLDDT below 50) should generally be removed before MD simulation if they are not the focus of your study. They add computational cost, increase the likelihood of simulation instability, and their dynamics are not meaningful because their starting positions are arbitrary.

To identify and remove them systematically in PyMOL:

PyMOL
# Find the first and last residues with pLDDT above 50
iterate (name CA and b > 50), print(resi)

# Remove terminal low-confidence residues (adjust range to your protein)
remove resi 1-12 or resi 488-500
save protein_cleaned.pdb

When to keep low-pLDDT regions

If your research question is specifically about a disordered region — a flexible linker, an intrinsically disordered regulatory domain, or a loop that you believe is functionally important — keep it in the simulation. But document clearly in your methods that the starting conformation of that region has pLDDT below 50 and should be considered arbitrary rather than a biologically meaningful starting point. The MD simulation itself will sample conformations for that region, which is more informative than the AlphaFold model.

Step 3 — Generate the topology with pdb2gmx

3
Step 3
Topology generation — with careful protonation assignment

Run pdb2gmx as you would for any protein, with two additional considerations specific to AlphaFold structures:

Terminal
gmx pdb2gmx \
  -f protein_cleaned.pdb \
  -o protein.gro \
  -p topol.top \
  -water tip3p \
  -ff amber99sb-ildn \
  -ignh \
  -his

The -his flag is especially important for AlphaFold structures. Crystal structures often have pH-derived protonation state information from the crystallographic methods; AlphaFold models do not. GROMACS must assign protonation states to every histidine, and the automatic assignment may be wrong for active-site residues. The -his flag prompts you to assign each histidine manually — check your literature for known protonation states at pH 7.4, or run PropKa/H++ on the cleaned PDB first to get predictions.

Verify unknown atom types immediately
After pdb2gmx, check for unresolved atom types before proceeding: grep " ? " topol.top should return nothing. AlphaFold sometimes includes non-standard residues or modified amino acids from the UniProt annotation that GROMACS force fields don’t recognize. Fix these before solvation — they’ll cause silent errors later if left in place.

Step 4 — Extended energy minimization

4
Step 4 — AlphaFold-specific
More aggressive minimization than for crystal structures

AlphaFold structures may have more geometric imperfections than crystal structures — particularly in predicted loop regions and around the junction between well-folded and disordered segments. Use a more aggressive minimization protocol with a tighter convergence criterion:

minim.mdp — AlphaFold-specific settings
integrator  = steep
emtol       = 500       ; stricter than standard 1000 kJ/mol/nm
emstep      = 0.01
nsteps      = 100000    ; more steps than standard 50,000

cutoff-scheme = Verlet
coulombtype   = PME
rcoulomb      = 1.2
rvdw          = 1.2
pbc           = xyz

Run minimization after solvation and ion addition (following the standard GROMACS preparation workflow):

Terminal
gmx grompp -f minim.mdp -c system.gro -p topol.top -o em.tpr
gmx mdrun -v -deffnm em

If minimization does not converge even at 100,000 steps, a low-pLDDT region almost certainly has a severe geometric clash. Identify the problematic residue from the Fmax atom number reported in the log, then consider trimming or restraining that region more aggressively before re-running.

Step 5 — Restrained equilibration protocol

5
Step 5 — AlphaFold-specific
Graduated restraint release during equilibration

Standard MD equilibration uses position restraints on protein heavy atoms during NVT and NPT phases, then releases them for production. For AlphaFold structures, a graduated restraint release protocol is strongly recommended — especially for proteins with significant low-confidence regions.

The rationale: releasing all restraints simultaneously can cause low-pLDDT regions to undergo violent, unphysical motions in the early production phase, potentially destabilizing the well-folded core. Graduated release gives each part of the structure time to equilibrate in the presence of the force field before full freedom is granted.

A practical four-stage protocol:

  • NVT equilibration — strong restraints on all backbone and side chain heavy atoms (1000 kJ/mol/nm² force constant). 200 ps.
  • NPT equilibration phase 1 — strong backbone restraints (1000), weaker side chain restraints (500). 200 ps.
  • NPT equilibration phase 2 — moderate backbone restraints (200), no side chain restraints. 200 ps. This stage is particularly important — it allows disordered regions to relax their side chains while backbone geometry is still guided.
  • NPT equilibration phase 3 — weak backbone restraints (50) only. 200 ps. By this point most geometric tensions should be resolved.
  • Production MD — no restraints. Standard run.
Using custom restraint force constants in GROMACS
GROMACS position restraints are defined in posre.itp with a default force constant of 1000 kJ/mol/nm². To use different force constants, create separate ITP files for each equilibration stage (e.g., posre_200.itp, posre_50.itp) with the appropriate force constant values, and reference them with different -DPOSRES defines in each MDP file.

Step 6 — Production MD and analysis

6
Step 6
Production run — same protocol, more interpretation needed

The production MD MDP file for an AlphaFold structure is identical to a crystal structure run — same force field, same timestep, same ensemble settings (Nosé-Hoover thermostat, Parrinello-Rahman barostat). Aim for at least 100 ns for a standard analysis; 200–500 ns gives more reliable flexibility profiles and convergence.

During analysis, track high-pLDDT and low-pLDDT regions separately:

GROMACS — calculate RMSD of folded core only (pLDDT > 70)
# Create index group for high-confidence residues
# First identify which residues have pLDDT > 70 from the JSON
gmx make_ndx -f md.tpr -o highconf.ndx
# In the prompt: select residues known to be high-confidence
# Then calculate RMSD of only those residues
gmx rms -s md.tpr -f md_centered.xtc -n highconf.ndx -o rmsd_core.xvg -tu ns

Separating core RMSD from loop RMSD gives you a cleaner equilibration picture — the well-folded core should plateau within a few nanoseconds, while disordered loops may fluctuate throughout, which is expected and correct.

Validation — what’s different for predicted structures

Standard MD validation checks (RMSD plateau, energy conservation, density stabilization) all apply. AlphaFold-specific validation adds several additional steps:

  • Standard
    RMSD plateaus for the high-pLDDT core during production. The overall protein backbone is stable — not still drifting at the end of the run.
  • Standard
    Temperature, pressure, and density all stable throughout production. Energy conservation verified with a short NVE test run.
  • AF-specific
    Compare simulated RMSF to predicted pLDDT. Low-pLDDT residues should show high RMSF (high flexibility) in the simulation. High RMSF in high-pLDDT regions is unexpected and warrants investigation — it may indicate a force field incompatibility or a region that AlphaFold predicted incorrectly.
  • AF-specific
    Check secondary structure persistence. Helices and sheets in high-pLDDT regions should remain intact throughout the simulation. Rapid unfolding of a predicted secondary structure element suggests the AlphaFold conformation was not energy-stable in the force field — a known issue for some regions.
  • AF-specific
    Compare to experimental data where available. If any experimental information exists — SAXS envelope, HDX-MS flexibility data, mutagenesis, NMR chemical shifts — compare to your simulation. Consistency with independent experimental data is the strongest validation you can provide for an AlphaFold-based MD study.
  • AF-specific
    Run at least two independent replicas. This is important for any MD study, but especially for AlphaFold-based simulations where the starting conformation carries more uncertainty. If two replicas initialized with different velocities show consistent behavior of the high-pLDDT regions, the results are robust.

What to report in your methods section

Example methods statement — annotated
“The structure of [protein] was obtained from the AlphaFold Protein Structure Database (AF-P04637-F1-model_v4.pdb, alphafold.ebi.ac.uk). Confidence was assessed using per-residue pLDDT scores; residues 1–18 and 492–500 (pLDDT < 50) were removed prior to simulation as they represent predicted disordered terminal regions. The remaining structure had a mean pLDDT of 83.4 for the folded core (residues 19–491). MD simulations were performed using GROMACS 2024.3 with the AMBER ff19SB force field and TIP3P water. Protonation states were assigned at pH 7.4 using PropKa. Energy minimization used a convergence criterion of 500 kJ/mol/nm (stricter than standard to account for geometric imperfections in the predicted structure). Equilibration used a graduated restraint release protocol (four stages: 1000, 500, 200, and 50 kJ/mol/nm² force constants over 200 ps each) before 200 ns production MD. Results were validated by confirming RMSD convergence of the high-pLDDT core (>70), secondary structure stability, and RMSF-pLDDT anti-correlation.”

AlphaFold MD in one paragraph

Using an AlphaFold structure for molecular dynamics simulation is straightforward if you treat the pLDDT map as a preparation guide. Before anything else, assess confidence — remove signal peptides, trim extremely low-pLDDT terminal regions, and flag any disordered loops near your region of interest. Use a stricter energy minimization criterion to resolve geometry issues in predicted regions, and use graduated restraint release during equilibration rather than releasing all restraints simultaneously. In analysis, validate that high-pLDDT regions behave like well-folded proteins while low-pLDDT regions show the expected flexibility. Report your confidence assessment and any special handling in the methods section — reviewers increasingly know to look for it.

Last updated on

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *