Protein-Ligand Molecular Dynamics Simulation in GROMACS: Parameterize, Run and Analyze
Protein-only MD is the foundation. Protein-ligand MD is where it gets useful for drug discovery — validating docking results, characterizing binding stability, and calculating binding free energies. The extra complexity is almost entirely in Step 1: getting force field parameters for the small molecule. This tutorial covers the whole pipeline.
What’s different about protein-ligand MD
The GROMACS workflow you learned in the main tutorial covers protein-only simulations well. Protein-ligand MD adds one major complication: the ligand needs its own force field parameters.
Standard protein force fields (AMBER, CHARMM, GROMOS) define parameters for the 20 canonical amino acids and common modifications. They say nothing about an arbitrary drug molecule. Before you can simulate a protein-ligand complex, you need bond lengths, angles, torsional parameters, and partial charges for every atom in your ligand — and these must be compatible with your protein force field.
This tutorial uses the AMBER/GAFF2 combination: AMBER ff19SB for the protein, GAFF2 for the ligand. The parameterization tool is ACPYPE (Antechamber Python Parser Interface), which runs Antechamber and GAFF2 internally and outputs GROMACS-compatible files.
Step 1 — Parameterize the ligand with ACPYPE
Install ACPYPE via conda if you don’t have it:
conda activate md
conda install -c conda-forge acpype ambertools -y
You need a 3D ligand structure as input. Download ibuprofen from PubChem (CID 3672) as a 3D SDF file, then convert to MOL2 or PDB format using Open Babel:
# Download ibuprofen 3D structure from PubChem
wget "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/CID/3672/SDF?record_type=3d" \
-O ibuprofen.sdf
# Convert to PDB with correct protonation at pH 7.4
obabel ibuprofen.sdf -O ibuprofen.pdb -p 7.4 --gen3d
Now run ACPYPE to generate GROMACS-compatible GAFF2 parameters. The -n flag sets the net charge — check this carefully for your ligand (ibuprofen is neutral at most simulation conditions):
acpype \
-i ibuprofen.pdb \
-b IBP \
-n 0 \
-a gaff2 \
-c bcc
# -b IBP = residue name (3 chars) — avoid MOL, LIG — use something unique
# -n 0 = net charge (0 = neutral)
# -a gaff2 = use GAFF2 force field
# -c bcc = AM1-BCC charge method (standard for drug-like molecules)
ACPYPE creates a directory called IBP.acpype/. The key files you need:
├── IBP_GMX.itp ← ligand topology (force field parameters)
├── IBP_GMX.gro ← ligand coordinates in GROMACS format
└── posre_IBP.itp ← position restraints for equilibration
grep "qtot" IBP.acpype/IBP_GMX.itp | tail -5 to check the cumulative charge sum in the generated ITP.
Step 2 — Prepare the protein
Protein preparation for a complex follows the same steps as protein-only MD, with one important difference: you keep the docked ligand coordinates from your docking run rather than removing it and re-docking it.
Start from your best docked pose (from AutoDock Vina or GNINA). Extract the protein and ligand into separate PDB files:
# Load docked complex
load 5KIR_receptor.pdbqt, receptor
load docked_pose1.pdbqt, ligand
# Save protein only (no waters, no ligand)
save protein_only.pdb, receptor
# Save ligand only
save ligand_pose.pdb, ligand
Generate the protein topology with AMBER ff19SB:
gmx pdb2gmx \
-f protein_only.pdb \
-o protein.gro \
-p topol.top \
-water tip3p \
-ff amber99sb-ildn \
-ignh
Step 3 — Combine protein and ligand into one system
This is the most manual step in the workflow. You need to merge two coordinate files into one GRO file, and edit the topology to include the ligand force field.
Merge the coordinate files
# Get atom counts from each file
head -2 protein.gro # line 2 = atom count
head -2 IBP.acpype/IBP_GMX.gro
# Combine: take protein atoms + ligand atoms, update total count + box
# Python script to merge GRO files reliably:
python3 - << 'EOF'
with open('protein.gro') as f:
plines = f.readlines()
with open('IBP.acpype/IBP_GMX.gro') as f:
llines = f.readlines()
# Atom lines: protein[2:-1], ligand[2:-1]
p_atoms = plines[2:-1]
l_atoms = llines[2:-1]
total = len(p_atoms) + len(l_atoms)
box_line = plines[-1] # use protein box line (will resize in editconf)
with open('complex.gro', 'w') as f:
f.write('Complex protein-ligand system\n')
f.write(f' {total}\n')
f.writelines(p_atoms)
f.writelines(l_atoms)
f.write(box_line)
print(f'complex.gro written: {total} atoms')
EOF
Edit the topology to include the ligand
Open topol.top in a text editor. You need to make three additions:
Addition 1: After the last #include line at the top of the file, add a line to include the ligand force field:
; Ligand force field parameters
#include "IBP.acpype/IBP_GMX.itp"
#include "IBP.acpype/posre_IBP.itp"
Addition 2: At the end of the [ molecules ] section, add the ligand molecule. The name must exactly match the residue name in your ACPYPE output:
[ molecules ]
; Compound #mols
Protein_chain_A 1
IBP 1 ← add this line
[ molecules ] section of topol.top must exactly match the order atoms appear in the GRO file. If protein atoms come first in complex.gro, Protein must come first in [ molecules ]. If the order doesn’t match, GROMACS will exit with a topology error during grompp. This is the most common source of errors in this step.
Step 4 — Solvate, add ions, and minimize
From here, the workflow is essentially identical to protein-only MD. Define the box, solvate, add ions, and minimize:
# Define simulation box — same as protein-only
gmx editconf \
-f complex.gro \
-o box.gro \
-c -d 1.0 -bt dodecahedron
# Solvate
gmx solvate \
-cp box.gro \
-cs tip3p.gro \
-o solvated.gro \
-p topol.top
# Add ions
gmx grompp -f ions.mdp -c solvated.gro -p topol.top -o ions.tpr -maxwarn 1
echo "SOL" | gmx genion \
-s ions.tpr \
-o system.gro \
-p topol.top \
-pname NA -nname CL \
-neutral -conc 0.15
# Energy minimization
gmx grompp -f minim.mdp -c system.gro -p topol.top -o em.tpr
gmx mdrun -v -deffnm em
If you see an error during grompp like Inconsistent box type or Fatal error: number of coordinates in topology does not match, the most likely cause is a mismatch between the atom count in complex.gro and the molecule list in topol.top. Recheck that the merged GRO has the correct total atom count on line 2.
Step 5 — Equilibrate and run production MD
The equilibration MDP files are nearly identical to protein-only MD, with one change: include position restraints on both the protein AND the ligand during NVT and NPT equilibration. Add the ligand restraint include to your MDP file:
; Restrain both protein and ligand during equilibration
define = -DPOSRES -DPOSRES_IBP
The -DPOSRES_IBP flag activates the ligand position restraints from posre_IBP.itp. Without this, the ligand may drift out of the binding site during the early violent phase of heating before the system has equilibrated around it.
# NVT — temperature equilibration
gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr
gmx mdrun -v -deffnm nvt
# NPT — pressure equilibration
gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr
gmx mdrun -v -deffnm npt
# Production MD — remove all restraints
# Use same md.mdp as protein-only tutorial (no define = -DPOSRES)
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md.tpr
gmx mdrun -v -deffnm md
Step 6 — Analyze binding stability
After centering the trajectory (gmx trjconv -pbc mol -center, selecting Protein_IBP as center and System as output), run these three key analyses:
Ligand RMSD
# Fit to protein backbone, measure ligand displacement
gmx rms \
-s md.tpr \
-f md_centered.xtc \
-o rmsd_ligand.xvg \
-tu ns
# Fit group: Backbone
# RMSD group: IBP (or your ligand group name)
Protein-ligand hydrogen bonds
gmx hbond \
-s md.tpr \
-f md_centered.xtc \
-num hbond_pl.xvg \
-hbm hbmap.xpm
# Group 1: Protein
# Group 2: IBP
# Convert XPM to readable format
gmx xpm2ps -f hbmap.xpm -o hbmap.eps -by 1
Key residue contact distances
# Create index group for key residues (e.g. ARG120, TYR355 for COX-2)
gmx make_ndx -f md.tpr -o contacts.ndx
# In the prompt: create groups for specific residues
# "r 120" selects residue 120, then name the group
# Measure minimum distance between ligand and each key residue
gmx mindist \
-s md.tpr \
-f md_centered.xtc \
-n contacts.ndx \
-od mindist_arg120.xvg \
-tu ns
Common problems and fixes
These errors appear in the vast majority of protein-ligand MD setups:
- Fatal error: atomtype not found in parameters — The ligand ITP uses atom types not in your force field. Check that you included the GAFF2 atom type definitions from ACPYPE. Add
#include "IBP.acpype/IBP_GMX.itp"before the protein include, not after. - Fatal error: number of coordinates in topology (N) does not match coordinates in coordinate file (M) — The atom count in complex.gro (line 2) doesn’t match the sum of atoms across all molecules in
[ molecules ]. Recount — protein atoms + ligand atoms must equal the GRO header number. - Ligand RMSD increases monotonically and exceeds 0.5 nm — The ligand has left the binding site. Check: (1) were ligand position restraints active during both NVT and NPT equilibration? (2) Is the ligand net charge correct? A charged ligand in a neutral site will be repelled. (3) Was the docked pose actually inside the binding site — visualize the starting complex in PyMOL before running.
- ACPYPE fails: cannot determine atom types — The input geometry has problems. Make sure you used a proper 3D structure (not a 2D SMILES-derived flat structure). Run
obabel -i sdf ibuprofen.sdf -o pdb -O ibuprofen.pdb --gen3dto ensure proper 3D geometry is generated.
- ACPYPE ran successfully — IBP_GMX.itp and IBP_GMX.gro generated
- Ligand net charge verified — matches expected value from structure
- complex.gro atom count on line 2 matches protein + ligand atoms
- IBP_GMX.itp included in topol.top before protein topology
- IBP appears in [ molecules ] section of topol.top
- Both -DPOSRES and -DPOSRES_IBP active during NVT and NPT
- Ligand RMSD plateaus below 0.3 nm in production — ligand stayed bound
The key difference in one sentence
Protein-ligand MD is protein-only MD with one extra step at the beginning — parameterizing the ligand with ACPYPE — and one extra care item throughout: ensuring the force field for the ligand is compatible with the force field for the protein. Get those two things right and the rest of the pipeline is identical. The result is a trajectory that tells you whether the docked pose is stable, which specific contacts maintain binding, and — when combined with MM-GBSA — how tightly the ligand is predicted to bind.