The Complete Guide to Molecular Dynamics Simulation: Tools, Workflow & Best Practices

The Complete Guide to Molecular Dynamics Simulation: Tools, Workflow & Best Practices

Molecular dynamics simulation is one of the most powerful tools in computational structural biology — but it’s also one of the most technically demanding to learn. This guide covers everything: what MD actually is, how to choose software, the complete workflow from system preparation to analysis, and the best practices that separate publishable results from wasted compute time.

What molecular dynamics simulation is

Molecular dynamics (MD) simulation is a computational method that models the physical movement of atoms and molecules over time. Starting from a 3D structure — typically a crystal structure or an AlphaFold model — it uses the laws of classical mechanics to calculate how every atom in the system moves, one tiny timestep at a time.

The output is a trajectory: a time-ordered series of structural snapshots showing how the system evolves. From this trajectory you can extract an enormous range of information — how flexible a protein is, whether a ligand stays bound, how a mutation changes protein dynamics, what conformational states a protein visits, and much more.

The key word in that description is time. This is what fundamentally distinguishes MD from molecular docking. Docking gives you a static prediction of how a ligand binds at a single moment. MD gives you a dynamic picture of how the entire system moves, breathes, and evolves. One is a photograph; the other is a film.

How MD relates to molecular docking
If you’ve come from the docking tutorials on this site, here’s the relationship in one sentence: docking predicts where a ligand binds; MD validates whether it stays bound and reveals how the protein-ligand complex actually behaves. Most serious drug discovery workflows use both in sequence — docking to screen, MD to validate.

How it works: force fields, timesteps and thermostats

At its core, an MD simulation does one thing repeatedly: calculate the force on every atom, use that force to update every atom’s position and velocity, advance time by one tiny step, and repeat. For a system of 100,000 atoms simulated for 100 nanoseconds at a 2-femtosecond timestep, this loop runs 50 million times. Modern MD software and GPU hardware make this tractable — but it’s worth understanding what’s happening inside each iteration.

Force fields

A force field is the mathematical model that describes how atoms interact. It defines the potential energy of the system as a function of atomic positions, covering bond stretching, angle bending, torsional rotation, and non-bonded interactions (electrostatics and van der Waals). The force on each atom is the negative gradient of this potential energy — in plain terms, atoms are pushed toward lower-energy configurations.

The major protein force fields — AMBER, CHARMM, GROMOS, and OPLS — differ in how they parameterize these terms, which training data they used, and which types of molecules they handle best. Choosing the right force field for your system is one of the most consequential decisions in setting up an MD simulation, and one of the most frequently glossed over in tutorials.

Timestep

The timestep is how much simulated time advances with each iteration of the loop. A typical value is 2 femtoseconds (2 × 10⁻¹⁵ seconds). This has to be small enough that the simulation remains numerically stable — if the timestep is too large, atoms can overshoot their equilibrium positions, forces become unrealistically large, and the simulation either crashes or produces nonsensical results. Hydrogen bonds are the limiting factor: because hydrogen atoms are light and move fast, they set the upper bound on timestep size for standard simulations.

Thermostats and barostats

Real biological systems exist at constant temperature (~310 K for physiological conditions) and constant pressure (~1 bar). Straight Newtonian dynamics doesn’t preserve either of these — the temperature and pressure fluctuate as atoms exchange kinetic energy. Thermostats (like the V-rescale or Nosé-Hoover algorithm) and barostats (like the Parrinello-Rahman algorithm) are mathematical coupling schemes that keep the system in the right thermodynamic ensemble. Choosing the wrong coupling scheme — or using one that’s inappropriate for equilibration vs. production — is a common source of artifacts in MD trajectories.

NVT vs NPT — the ensembles you’ll use
NVT (constant Number of particles, Volume, and Temperature) — used for initial equilibration of the temperature. Volume is fixed, no pressure coupling. NPT (constant Number of particles, Pressure, and Temperature) — used for equilibration of density and for production runs. Both temperature and pressure are controlled. Nearly all production MD simulations run in the NPT ensemble.

What MD can tell you that docking cannot

Molecular docking treats the protein as rigid and gives you one snapshot. MD gives you the full dynamic picture. Here are the four most important things MD uniquely reveals:

Binding stability
Does the docked ligand actually stay in the binding site over time, or does it drift out? MD answers this where docking cannot. A ligand with a great docking score that escapes in the first 10 ns of MD is not a real binder.
Protein flexibility
Which regions of the protein are flexible? Which loops gate the binding site? MD reveals the conformational heterogeneity that crystallography captures only as a single averaged state.
Conformational transitions
Does the protein adopt different conformations — open vs. closed, active vs. inactive? Short MD simulations often capture local transitions; longer ones can reveal global conformational changes relevant to function and drug design.
Binding free energy
Methods like MM-GBSA and free energy perturbation (FEP) use MD trajectories to calculate binding affinities far more accurately than docking scores. These are the gold standard for ranking closely related compounds in a drug discovery campaign.

Timescales: what you can and cannot simulate

One of the most important things to understand about MD before you run your first simulation is what timescales are computationally accessible — and what biologically relevant processes fall outside them.

Simulated timescale vs. accessible phenomena
1–10 ps
Bond vibrations, water reorientation
1–100 ns
Side-chain rotations, loop flexibility, ligand binding stability
100 ns–1 μs
Domain motions, helix folding, small protein folding
1–100 μs
Protein folding, large conformational changes, ligand binding events
> 1 ms
Enzyme catalysis, large-scale assembly — generally inaccessible to standard MD

A typical academic MD simulation runs for 100–500 nanoseconds on a GPU cluster, taking hours to days of wall-clock time. This timescale covers local flexibility, ligand binding stability, and small conformational changes — which is sufficient for the vast majority of structural biology and drug discovery applications at the PhD level.

The timescale trap
The most common conceptual mistake in MD is over-interpreting a short simulation. A 100 ns simulation that shows a ligand remaining bound does not prove the ligand is a strong binder — 100 ns may simply not be long enough for unbinding to occur. Always frame MD results in terms of the timescale simulated, and be explicit about what you cannot conclude from that timescale.

Which software should you use?

Several programs dominate academic and industrial MD simulation. The choice depends on your system, your hardware, your budget, and what force fields you need. Here is an honest comparison of the main options:

SoftwareCostBest hardwareStrengthsBest for
GROMACS Free CPU + GPU Fastest free option, excellent documentation, huge user community, multiple force fields Most academic protein/ligand MD — the default choice
NAMD Free CPU + GPU CHARMM force field native support, good for membrane systems, integrates with VMD Membrane proteins, CHARMM workflows
AMBER Free (AmberTools) / Paid (AMBER) GPU strongly recommended Best GPU performance, native AMBER force fields, excellent for nucleic acids RNA/DNA simulations, high-performance GPU clusters
OpenMM Free GPU (CUDA/OpenCL) Python API, highly customizable, good for non-standard systems Custom force fields, ML-enhanced MD, Python workflows
LAMMPS Free CPU + GPU Extremely flexible, handles materials science and non-biomolecular systems Coarse-grained MD, materials science, non-protein systems
Desmond (Schrödinger) Commercial GPU Excellent GUI, integrated with Schrödinger suite, FEP+ support Industry drug discovery, Schrödinger workflow users

For most grad students doing protein or protein-ligand MD simulation at a university: start with GROMACS. It is free, fast, comprehensively documented, and has the largest community of any MD software — which means nearly every problem you encounter has been answered somewhere on the GROMACS mailing list or Stack Exchange.

The complete MD workflow

A rigorous MD simulation follows a defined sequence of steps. Skipping or rushing any of them is the most common cause of unreliable results. Here is the complete workflow, with what each step accomplishes and why it cannot be omitted.

1
Obtain and inspect the starting structure

Download your protein structure from the RCSB PDB. Before doing anything else, read the REMARK section: check the resolution, note any missing residues or chain breaks, identify co-crystallized ligands and cofactors. Missing residues in loop regions that border the binding site are particularly important — they need to be modeled before simulation or the binding site geometry will be wrong.

For structures above 3.0 Å resolution, consider whether the positional uncertainty is acceptable for your research question. For AlphaFold models, assess the pLDDT confidence scores — regions with pLDDT below 70 are structurally uncertain and should be interpreted cautiously.

2
Prepare the protein topology

Choose your force field and generate the protein topology — the file that defines every bond, angle, dihedral, and non-bonded parameter for every atom. In GROMACS this is done with the pdb2gmx command, which also adds missing hydrogens and lets you specify protonation states for titratable residues.

Protonation state assignment is the step most beginners skip and most experienced researchers treat carefully. Histidine residues in particular need to be assigned manually based on their local environment — use a tool like H++ or PropKa at pH 7.4, then review active site residues by hand.

3
Parameterize the ligand (if present)

If you’re simulating a protein-ligand complex, the ligand needs its own force field parameters. This is not automatic — standard protein force fields don’t include parameters for arbitrary small molecules. The most common approach for AMBER-compatible parameters is ACPYPE (using GAFF force field); for CHARMM-compatible parameters use CGenFF. This step is technically demanding and frequently where beginners get stuck. A separate tutorial on this site covers it in detail.

4
Build the simulation box and solvate

Place the protein in a periodic simulation box and fill it with explicit water molecules. The box must be large enough that the protein never interacts with its own periodic image — a minimum of 10 Å from any protein atom to the box edge is the standard. Truncated octahedral boxes are more computationally efficient than cubic boxes for globular proteins and are preferred for production simulations.

The most common water model for biomolecular simulation is TIP3P (fast, compatible with most force fields) or TIP4P-Ew (more accurate but slower). Your choice of water model should match the force field you’re using — mismatched combinations are a source of subtle errors.

5
Add ions and neutralize the system

Real biological systems contain ions. Add Na⁺ and Cl⁻ ions to achieve physiological ionic strength (~0.15 M) and to neutralize the net charge of the system. A charged simulation box causes artifacts in the electrostatic calculations — always neutralize before proceeding. GROMACS handles this with the genion tool, which replaces randomly selected water molecules with ions.

6
Energy minimization

Before any dynamics can run, the system needs to be at a local energy minimum. Crystal structures often have small clashes between atoms — steric conflicts that would generate enormous forces and immediately crash an MD simulation. Energy minimization moves atoms to relieve these clashes using a steepest-descent or conjugate gradient algorithm. Run until the maximum force drops below 1000 kJ/mol/nm. If minimization fails to converge, the input structure has serious structural problems that need diagnosis before proceeding.

7
NVT equilibration

Heat the system gradually to your target temperature (typically 300–310 K) under constant volume conditions. This phase typically runs for 100–500 ps with position restraints on the protein heavy atoms — this allows the water and ions to equilibrate around the protein without the protein itself moving dramatically. Monitor temperature throughout to confirm it stabilizes at the target value.

8
NPT equilibration

Switch to constant pressure conditions and allow the system density to equilibrate. This phase also typically runs for 100–500 ps with position restraints, gradually released. Monitor both temperature and pressure, and watch the system density — it should stabilize at approximately 1.0 g/cm³ for aqueous systems. Only proceed to production once both temperature and pressure have plateaued.

9
Production MD run

Remove position restraints and run the actual simulation. The length depends on your research question: 100 ns is a reasonable starting point for stability studies; 500 ns–1 μs for conformational sampling. Save coordinates at regular intervals — typically every 10–100 ps — to build the trajectory you’ll analyze. Write checkpoints frequently in case the simulation needs to be restarted.

10
Trajectory analysis

Center and wrap the trajectory to correct for periodic boundary conditions, then calculate your metrics of interest. The analysis section below covers the most important quantities and what they tell you.

Essential analysis metrics

A trajectory file is raw data — gigabytes of atomic coordinates over time. Making sense of it requires calculating summary metrics that reduce this complexity to interpretable quantities. Here are the metrics every MD practitioner needs to know:

  • RMSD
    Root Mean Square Deviation of atomic positions from a reference structure (usually the starting structure). Monitors overall structural drift over time. A rising RMSD that plateaus indicates the protein has reached a stable conformation. An RMSD that never stabilizes indicates either insufficient equilibration or a genuine conformational transition.
  • RMSF
    Root Mean Square Fluctuation — the average displacement of each residue from its mean position over the trajectory. Measures per-residue flexibility. High RMSF regions are flexible loops; low RMSF regions are rigid secondary structure elements. Directly comparable to B-factors from crystallography.
  • Rg
    Radius of gyration — the mass-weighted RMS distance of all atoms from the center of mass. A measure of overall protein compactness. Decreasing Rg indicates compaction; increasing Rg indicates unfolding or domain separation. Stable proteins show a near-constant Rg throughout production.
  • Hydrogen bonds
    Number and occupancy of hydrogen bonds — both intramolecular (within the protein or ligand) and intermolecular (between protein and ligand). Ligand hydrogen bond occupancy is one of the most useful metrics for characterizing binding stability. An H-bond with >80% occupancy over the trajectory is a stable, meaningful interaction.
  • Ligand RMSD
    RMSD of the ligand relative to its starting (docked) position over the trajectory. The most direct measure of binding stability. A ligand RMSD that stays below 2.0 Å is stable in its binding pose. One that drifts above 4–5 Å has effectively left the binding site.
  • MM-GBSA / MM-PBSA
    Post-processing methods that use frames from the MD trajectory to estimate binding free energy. Significantly more accurate than docking scores. Calculated over an ensemble of structures rather than a single pose, capturing conformational flexibility. The standard upgrade from docking scores for hit validation.

Best practices before you publish

Confirm the system equilibrated properly

Before analyzing production trajectory data, verify that equilibration was successful. The RMSD of the protein backbone should plateau during the first portion of the production run. Temperature and pressure should fluctuate narrowly around their target values. System density should be stable. If any of these show drift or instability throughout the production run, the simulation was not equilibrated and the results cannot be trusted.

Run at least two independent replicas

A single MD trajectory is a single sample from a stochastic process. Key conclusions — especially quantitative ones like binding free energies or conformational populations — should be reproduced across at least two independent simulations starting from different initial velocities. If your replicas disagree substantially, either run longer simulations or report the disagreement explicitly.

Check for periodic boundary condition artifacts

The protein must never come within interaction distance of its own periodic image. Verify this by checking that the minimum image distance stays above your electrostatic cutoff (typically 10–12 Å) throughout the simulation. If the protein is large relative to the box, this can fail during the simulation as the protein moves and rotates.

Report your simulation parameters completely

A reproducible methods section includes: force field name and version, water model, box type and minimum distance to edge, ion concentration, thermostat and barostat algorithms, timestep, simulation length, frequency of coordinate and energy saving, software version, and hardware used. Reviewers at journals like JCTC, JCIM, and JACS increasingly check methods sections carefully against field standards.

How long is long enough?
There is no universal answer, but a practical rule: run until your metrics of interest have converged. If RMSD is still rising at the end of your simulation, it hasn’t converged — the system is still evolving and your snapshot-based analysis is unreliable. RMSF values should be similar between the first and second halves of your production run. If they’re not, run longer.

Next steps

This guide gave you the conceptual framework for molecular dynamics simulation — what it is, how it works, what it can tell you, and what the full workflow looks like. The tutorials linked below walk through each part of that workflow in detail, with exact commands for GROMACS and step-by-step guidance for the steps that trip up most beginners.

If you’re coming from the docking tutorials on this site, the natural starting point is the GROMACS installation guide, followed by the system preparation tutorial. If you’ve done MD before but want to level up your analysis or learn to run protein-ligand simulations, jump directly to the analysis or protein-ligand tutorials.

The one-paragraph summary

Molecular dynamics simulation uses Newton’s laws and a force field to model how every atom in a biological system moves over time. It tells you things docking fundamentally cannot: whether a ligand stays bound, how flexible the protein is, what conformational states it visits, and — with methods like MM-GBSA — accurate binding free energies. The standard academic workflow runs in GROMACS, takes a protein from a PDB file through energy minimization, NVT and NPT equilibration, and production MD, then analyzes the trajectory with RMSD, RMSF, hydrogen bond occupancy, and free energy calculations. Getting the preparation right — force field choice, protonation states, box size, equilibration — is what separates publishable results from wasted compute time.

Last updated on

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *