How to Prepare a Protein for Molecular Dynamics Simulation: Complete GROMACS Guide

How to Prepare a Protein for Molecular Dynamics Simulation: Complete GROMACS Guide

System preparation is where most MD simulations succeed or fail — and where most beginners spend the most time stuck. This tutorial walks through every step from a raw PDB file to a simulation-ready system, with exact GROMACS commands and explanations of what each step actually does.

Step 1
Download PDB
Step 2
Generate topology
Step 3
Define box
Step 4
Solvate
Step 5
Add ions
Step 6
Minimize

What you’ll need before starting

This tutorial uses GROMACS 2024 and the CHARMM36 force field. Before starting, confirm:

  • GROMACS is installed and gmx --version returns a version number
  • You have about 500 MB of free disk space
  • You’re comfortable running commands in a terminal
What we’re preparing in this tutorial
We’ll prepare 1AKI — lysozyme, a small, well-characterized enzyme that is the standard teaching example in the GROMACS community. It’s ideal for learning: small enough to run quickly, well-studied enough to validate against known results, and straightforward to prepare without special handling. You can substitute any protein by replacing 1AKI with your PDB ID throughout.

Set up your working directory

Keep all files in one organized folder from the start. A messy working directory is one of the most consistent sources of “file not found” errors in GROMACS.

Terminal
mkdir -p ~/md/lysozyme && cd ~/md/lysozyme

Every command in this tutorial runs from inside this directory.

Step 1 — Download and inspect the PDB structure

1
Step 1
Download and clean the PDB file

Download the structure directly from the RCSB:

Terminal
wget https://files.rcsb.org/download/1AKI.pdb

Before doing anything else, inspect the file:

Terminal
# Read the REMARK section for key metadata
head -80 1AKI.pdb

# Check what non-protein molecules are present
grep "^HETATM" 1AKI.pdb | awk '{print $4}' | sort -u

For 1AKI you’ll see water molecules (HOH) listed as HETATM records. GROMACS’s pdb2gmx handles water separately, so we remove them from the PDB before processing — the solvation step in Step 4 will add explicit water back properly.

Terminal
# Remove crystal water molecules
grep -v "HOH" 1AKI.pdb > 1AKI_clean.pdb
Check for missing residues before proceeding
In the REMARK 465 section of any PDB file you’ll find a list of residues not resolved in the crystal structure. Missing residues in or near the active site can cause structural artifacts — if critical residues are missing, you need to model them with a tool like MODELLER or Swiss-Model before proceeding. For 1AKI, the structure is complete and no modeling is needed.

Step 2 — Generate the topology with pdb2gmx

2
Step 2
Generate topology, position restraints, and post-processed PDB

The pdb2gmx command is GROMACS’s protein preparation tool. It reads the PDB, assigns a force field, adds missing hydrogens, assigns partial charges, and generates three critical files needed for simulation.

Terminal
gmx pdb2gmx \
  -f 1AKI_clean.pdb \
  -o protein.gro \
  -water spce \
  -ignh

After running this command, GROMACS will prompt you to choose a force field. A numbered list appears — type the number corresponding to CHARMM36 and press Enter. For this tutorial, CHARMM36m or the plain CHARMM36 option both work well.

What the flags mean
-f = input PDB file. -o = output coordinate file in GRO format (GROMACS’s native structure format). -water spce = use the SPC/E water model, which is compatible with CHARMM36. -ignh = ignore hydrogens in the input PDB — GROMACS will add its own correctly based on the force field.

This command generates three output files:

Files generated by pdb2gmx
  • protein.gro— Protein coordinates in GROMACS format, with hydrogens added
  • topol.top— Topology file: all force field parameters, bonds, angles, charges
  • posre.itp— Position restraints: used to hold the protein still during equilibration

Handling protonation states

By default, pdb2gmx assigns histidine protonation states automatically. For most surface histidines this is fine. For histidines in or near active sites, use the -his flag to assign each one manually:

Terminal
gmx pdb2gmx -f 1AKI_clean.pdb -o protein.gro -water spce -ignh -his
# GROMACS will prompt you for each HIS: 0=HIE (ε), 1=HID (δ), 2=HIP (both)

For publication-quality work on any enzyme or binding protein, use PropKa or H++ at pH 7.4 first to predict protonation states, then assign manually with -his.

Step 3 — Define the simulation box

3
Step 3
Place the protein in a periodic simulation box

The simulation box must be large enough that the protein never interacts with its own periodic image. The standard minimum distance from any protein atom to the box edge is 1.0 nm (10 Å). A truncated octahedral box is more computationally efficient than a cubic box for globular proteins — use it unless you have a reason not to.

Terminal
gmx editconf \
  -f protein.gro \
  -o box.gro \
  -c \
  -d 1.0 \
  -bt dodecahedron

The flags: -c centers the protein in the box. -d 1.0 sets the minimum distance between the protein and the box edge to 1.0 nm. -bt dodecahedron specifies a rhombic dodecahedron box shape — more efficient than cubic for globular proteins, containing about 30% fewer water molecules for the same minimum distance.

GROMACS prints the box dimensions to the terminal. Note them — they confirm the box is large enough. For 1AKI you should see dimensions of roughly 6–7 nm.

Don’t use a box that’s too small
The 1.0 nm minimum distance corresponds to the typical non-bonded cutoff in most GROMACS force field setups. If the box is smaller, atoms will interact with their own periodic images — an artifact that invalidates the simulation. For larger proteins or for runs involving significant conformational change, increase to 1.2 nm to be safe.

Step 4 — Solvate the system

4
Step 4
Fill the box with explicit water molecules

The gmx solvate command fills the simulation box with water molecules, placing them everywhere that isn’t occupied by the protein. The water model must match what you specified in pdb2gmx — we used SPC/E, so we use the spc216.gro water configuration file that ships with GROMACS.

Terminal
gmx solvate \
  -cp box.gro \
  -cs spc216.gro \
  -o solvated.gro \
  -p topol.top

The -p topol.top flag tells GROMACS to automatically update the topology file to include the water molecules. After running, check the last few lines printed to the terminal — you’ll see a count of water molecules added, typically several thousand for a small protein like lysozyme.

Terminal — solvate output
Number of solvent molecules: 7316
Processing topology
Output configuration contains 129514 atoms

Example output — ~7,000 water molecules added, system now contains ~130k atoms total.

Step 5 — Add ions

5
Step 5
Neutralize the system and add physiological salt

Real biological systems contain salt. More importantly, the system must have zero net charge — a charged simulation box causes artifacts in the long-range electrostatics calculation. The gmx genion command replaces randomly chosen water molecules with Na⁺ and Cl⁻ ions to neutralize the system and reach physiological ionic strength.

Adding ions requires a pre-processed run input file. First generate this with grompp using a minimal ion-addition MDP file:

Terminal — create ions.mdp
cat > ions.mdp << 'EOF'
; Minimal settings for ion addition — not a real simulation
integrator  = steep
nsteps      = 0
EOF
Terminal — generate tpr for genion
gmx grompp \
  -f ions.mdp \
  -c solvated.gro \
  -p topol.top \
  -o ions.tpr \
  -maxwarn 1
Terminal — add ions
gmx genion \
  -s ions.tpr \
  -o system.gro \
  -p topol.top \
  -pname NA \
  -nname CL \
  -neutral \
  -conc 0.15

When prompted to select a group to replace with ions, choose SOL (the solvent group). GROMACS will replace randomly selected water molecules with ions, printing a summary of how many Na⁺ and Cl⁻ were added. The -neutral flag ensures the system charge is exactly zero; -conc 0.15 adds additional ions to reach 0.15 mol/L — standard physiological salt concentration.

Check your system charge after genion
After genion, verify the net charge is zero by checking the topology: grep "qtot" topol.top | tail -1. The total charge shown should be 0.000. Any non-zero value means something went wrong with the neutralization step.

Step 6 — Energy minimization

6
Step 6
Energy minimize to relieve steric clashes

Crystal structures frequently contain small steric clashes — atoms slightly too close to each other due to crystallographic averaging or limited resolution. Running dynamics on a system with clashes immediately generates enormous forces that crash the simulation. Energy minimization moves atoms to relieve these conflicts before any dynamics start.

First create the minimization parameters file:

Terminal — create minim.mdp
cat > minim.mdp << 'EOF'
; Energy minimization settings
integrator      = steep     ; Steepest descent algorithm
emtol           = 1000.0    ; Stop when max force < 1000 kJ/mol/nm
emstep          = 0.01      ; Step size
nsteps          = 50000     ; Maximum steps

; Non-bonded settings
nstlist         = 1
cutoff-scheme   = Verlet
ns_type         = grid
coulombtype     = PME
rcoulomb        = 1.2
rvdw            = 1.2
pbc             = xyz
EOF

Pre-process into a binary run input file, then run minimization:

Terminal
# Pre-process
gmx grompp \
  -f minim.mdp \
  -c system.gro \
  -p topol.top \
  -o em.tpr

# Run minimization (uses all available CPU cores by default)
gmx mdrun \
  -v \
  -deffnm em

The -v flag prints progress to the terminal so you can watch the potential energy drop. A successful minimization converges — the maximum force drops below the emtol threshold of 1000 kJ/mol/nm. This typically takes 100–500 steps and a few seconds to minutes depending on system size.

Terminal — energy minimization output
Step= 1, Dmax= 1.0e-02 nm, Epot= -5.06736e+05 Fmax= 2.47318e+04, atom= 9938
Step= 50, Dmax= 4.5e-03 nm, Epot= -5.66273e+05 Fmax= 3.91628e+03, atom= 7421
Step= 200, Dmax= 2.1e-03 nm, Epot= -5.81044e+05 Fmax= 8.74392e+02, atom= 5102

Steepest Descents converged to Fmax < 1000 in 231 steps
Potential Energy = -5.8204e+05
Maximum force = 9.8721e+02 on atom 5102
Norm of force = 3.8104e+01

Successful minimization — Fmax drops below 1000 kJ/mol/nm and GROMACS reports "converged".

If minimization does not converge after 50,000 steps, or if the potential energy becomes Inf or NaN, there is a structural problem in your input — usually severe steric clashes from a missing or incorrect preparation step. Check that you removed crystal waters, that the box is large enough, and that genion ran correctly.

Verify the minimized structure

Extract and plot the potential energy over the minimization to confirm it decreased smoothly:

Terminal
# Extract potential energy
echo "Potential" | gmx energy -f em.edr -o potential.xvg

# Check final value is large negative number (more negative = better)
tail -3 potential.xvg

A properly minimized system has a large negative potential energy — for lysozyme in water, expect roughly −5.8 × 10⁵ kJ/mol. The exact value matters less than confirming it's negative and stable.

Verify the system is ready

After energy minimization completes successfully, you have a fully prepared simulation system. Here is a summary of the files produced and what each is used for:

Final output files — all needed for equilibration
  • em.gro— Minimized coordinates — use as input for NVT equilibration
  • topol.top— Complete topology — required for all subsequent GROMACS steps
  • posre.itp— Position restraints — applied during equilibration to hold protein
  • em.edr— Energy trajectory — use gmx energy to extract and check
  • Crystal waters removed from PDB before processing
  • pdb2gmx completed — topol.top, protein.gro, posre.itp generated
  • Simulation box defined with ≥ 1.0 nm minimum distance to edge
  • System solvated — thousands of water molecules added
  • Ions added — system charge is zero, concentration ~0.15 M
  • Energy minimization converged — Fmax below 1000 kJ/mol/nm
  • Potential energy is a large negative value and decreasing monotonically

What you've built

A protein system ready for molecular dynamics simulation: a crystal structure cleaned of artifacts, assigned a force field with correct protonation, placed in a periodic water box with physiological salt, and relaxed to remove steric conflicts. The six steps in this tutorial are the same regardless of which protein or force field you use — swap 1AKI for your own PDB ID and the workflow is identical. The next tutorial runs NVT equilibration, NPT equilibration, and production MD on this system.

Last updated on

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *