How to Prepare a Protein for Molecular Dynamics Simulation: Complete GROMACS Guide
System preparation is where most MD simulations succeed or fail — and where most beginners spend the most time stuck. This tutorial walks through every step from a raw PDB file to a simulation-ready system, with exact GROMACS commands and explanations of what each step actually does.
What you’ll need before starting
This tutorial uses GROMACS 2024 and the CHARMM36 force field. Before starting, confirm:
- GROMACS is installed and
gmx --versionreturns a version number - You have about 500 MB of free disk space
- You’re comfortable running commands in a terminal
Set up your working directory
Keep all files in one organized folder from the start. A messy working directory is one of the most consistent sources of “file not found” errors in GROMACS.
mkdir -p ~/md/lysozyme && cd ~/md/lysozyme
Every command in this tutorial runs from inside this directory.
Step 1 — Download and inspect the PDB structure
Download the structure directly from the RCSB:
wget https://files.rcsb.org/download/1AKI.pdb
Before doing anything else, inspect the file:
# Read the REMARK section for key metadata
head -80 1AKI.pdb
# Check what non-protein molecules are present
grep "^HETATM" 1AKI.pdb | awk '{print $4}' | sort -u
For 1AKI you’ll see water molecules (HOH) listed as HETATM records. GROMACS’s pdb2gmx handles water separately, so we remove them from the PDB before processing — the solvation step in Step 4 will add explicit water back properly.
# Remove crystal water molecules
grep -v "HOH" 1AKI.pdb > 1AKI_clean.pdb
Step 2 — Generate the topology with pdb2gmx
The pdb2gmx command is GROMACS’s protein preparation tool. It reads the PDB, assigns a force field, adds missing hydrogens, assigns partial charges, and generates three critical files needed for simulation.
gmx pdb2gmx \
-f 1AKI_clean.pdb \
-o protein.gro \
-water spce \
-ignh
After running this command, GROMACS will prompt you to choose a force field. A numbered list appears — type the number corresponding to CHARMM36 and press Enter. For this tutorial, CHARMM36m or the plain CHARMM36 option both work well.
-f = input PDB file. -o = output coordinate file in GRO format (GROMACS’s native structure format). -water spce = use the SPC/E water model, which is compatible with CHARMM36. -ignh = ignore hydrogens in the input PDB — GROMACS will add its own correctly based on the force field.
This command generates three output files:
- protein.gro— Protein coordinates in GROMACS format, with hydrogens added
- topol.top— Topology file: all force field parameters, bonds, angles, charges
- posre.itp— Position restraints: used to hold the protein still during equilibration
Handling protonation states
By default, pdb2gmx assigns histidine protonation states automatically. For most surface histidines this is fine. For histidines in or near active sites, use the -his flag to assign each one manually:
gmx pdb2gmx -f 1AKI_clean.pdb -o protein.gro -water spce -ignh -his
# GROMACS will prompt you for each HIS: 0=HIE (ε), 1=HID (δ), 2=HIP (both)
For publication-quality work on any enzyme or binding protein, use PropKa or H++ at pH 7.4 first to predict protonation states, then assign manually with -his.
Step 3 — Define the simulation box
The simulation box must be large enough that the protein never interacts with its own periodic image. The standard minimum distance from any protein atom to the box edge is 1.0 nm (10 Å). A truncated octahedral box is more computationally efficient than a cubic box for globular proteins — use it unless you have a reason not to.
gmx editconf \
-f protein.gro \
-o box.gro \
-c \
-d 1.0 \
-bt dodecahedron
The flags: -c centers the protein in the box. -d 1.0 sets the minimum distance between the protein and the box edge to 1.0 nm. -bt dodecahedron specifies a rhombic dodecahedron box shape — more efficient than cubic for globular proteins, containing about 30% fewer water molecules for the same minimum distance.
GROMACS prints the box dimensions to the terminal. Note them — they confirm the box is large enough. For 1AKI you should see dimensions of roughly 6–7 nm.
Step 4 — Solvate the system
The gmx solvate command fills the simulation box with water molecules, placing them everywhere that isn’t occupied by the protein. The water model must match what you specified in pdb2gmx — we used SPC/E, so we use the spc216.gro water configuration file that ships with GROMACS.
gmx solvate \
-cp box.gro \
-cs spc216.gro \
-o solvated.gro \
-p topol.top
The -p topol.top flag tells GROMACS to automatically update the topology file to include the water molecules. After running, check the last few lines printed to the terminal — you’ll see a count of water molecules added, typically several thousand for a small protein like lysozyme.
Processing topology
Output configuration contains 129514 atoms
Example output — ~7,000 water molecules added, system now contains ~130k atoms total.
Step 5 — Add ions
Real biological systems contain salt. More importantly, the system must have zero net charge — a charged simulation box causes artifacts in the long-range electrostatics calculation. The gmx genion command replaces randomly chosen water molecules with Na⁺ and Cl⁻ ions to neutralize the system and reach physiological ionic strength.
Adding ions requires a pre-processed run input file. First generate this with grompp using a minimal ion-addition MDP file:
cat > ions.mdp << 'EOF'
; Minimal settings for ion addition — not a real simulation
integrator = steep
nsteps = 0
EOF
gmx grompp \
-f ions.mdp \
-c solvated.gro \
-p topol.top \
-o ions.tpr \
-maxwarn 1
gmx genion \
-s ions.tpr \
-o system.gro \
-p topol.top \
-pname NA \
-nname CL \
-neutral \
-conc 0.15
When prompted to select a group to replace with ions, choose SOL (the solvent group). GROMACS will replace randomly selected water molecules with ions, printing a summary of how many Na⁺ and Cl⁻ were added. The -neutral flag ensures the system charge is exactly zero; -conc 0.15 adds additional ions to reach 0.15 mol/L — standard physiological salt concentration.
grep "qtot" topol.top | tail -1. The total charge shown should be 0.000. Any non-zero value means something went wrong with the neutralization step.
Step 6 — Energy minimization
Crystal structures frequently contain small steric clashes — atoms slightly too close to each other due to crystallographic averaging or limited resolution. Running dynamics on a system with clashes immediately generates enormous forces that crash the simulation. Energy minimization moves atoms to relieve these conflicts before any dynamics start.
First create the minimization parameters file:
cat > minim.mdp << 'EOF'
; Energy minimization settings
integrator = steep ; Steepest descent algorithm
emtol = 1000.0 ; Stop when max force < 1000 kJ/mol/nm
emstep = 0.01 ; Step size
nsteps = 50000 ; Maximum steps
; Non-bonded settings
nstlist = 1
cutoff-scheme = Verlet
ns_type = grid
coulombtype = PME
rcoulomb = 1.2
rvdw = 1.2
pbc = xyz
EOF
Pre-process into a binary run input file, then run minimization:
# Pre-process
gmx grompp \
-f minim.mdp \
-c system.gro \
-p topol.top \
-o em.tpr
# Run minimization (uses all available CPU cores by default)
gmx mdrun \
-v \
-deffnm em
The -v flag prints progress to the terminal so you can watch the potential energy drop. A successful minimization converges — the maximum force drops below the emtol threshold of 1000 kJ/mol/nm. This typically takes 100–500 steps and a few seconds to minutes depending on system size.
Step= 50, Dmax= 4.5e-03 nm, Epot= -5.66273e+05 Fmax= 3.91628e+03, atom= 7421
Step= 200, Dmax= 2.1e-03 nm, Epot= -5.81044e+05 Fmax= 8.74392e+02, atom= 5102
Steepest Descents converged to Fmax < 1000 in 231 steps
Potential Energy = -5.8204e+05
Maximum force = 9.8721e+02 on atom 5102
Norm of force = 3.8104e+01
Successful minimization — Fmax drops below 1000 kJ/mol/nm and GROMACS reports "converged".
If minimization does not converge after 50,000 steps, or if the potential energy becomes Inf or NaN, there is a structural problem in your input — usually severe steric clashes from a missing or incorrect preparation step. Check that you removed crystal waters, that the box is large enough, and that genion ran correctly.
Verify the minimized structure
Extract and plot the potential energy over the minimization to confirm it decreased smoothly:
# Extract potential energy
echo "Potential" | gmx energy -f em.edr -o potential.xvg
# Check final value is large negative number (more negative = better)
tail -3 potential.xvg
A properly minimized system has a large negative potential energy — for lysozyme in water, expect roughly −5.8 × 10⁵ kJ/mol. The exact value matters less than confirming it's negative and stable.
Verify the system is ready
After energy minimization completes successfully, you have a fully prepared simulation system. Here is a summary of the files produced and what each is used for:
- em.gro— Minimized coordinates — use as input for NVT equilibration
- topol.top— Complete topology — required for all subsequent GROMACS steps
- posre.itp— Position restraints — applied during equilibration to hold protein
- em.edr— Energy trajectory — use gmx energy to extract and check
- Crystal waters removed from PDB before processing
pdb2gmxcompleted — topol.top, protein.gro, posre.itp generated- Simulation box defined with ≥ 1.0 nm minimum distance to edge
- System solvated — thousands of water molecules added
- Ions added — system charge is zero, concentration ~0.15 M
- Energy minimization converged — Fmax below 1000 kJ/mol/nm
- Potential energy is a large negative value and decreasing monotonically
What you've built
A protein system ready for molecular dynamics simulation: a crystal structure cleaned of artifacts, assigned a force field with correct protonation, placed in a periodic water box with physiological salt, and relaxed to remove steric conflicts. The six steps in this tutorial are the same regardless of which protein or force field you use — swap 1AKI for your own PDB ID and the workflow is identical. The next tutorial runs NVT equilibration, NPT equilibration, and production MD on this system.