The Complete Guide to Molecular Dynamics Simulation: Tools, Workflow & Best Practices
Molecular dynamics simulation is one of the most powerful tools in computational structural biology — but it’s also one of the most technically demanding to learn. This guide covers everything: what MD actually is, how to choose software, the complete workflow from system preparation to analysis, and the best practices that separate publishable results from wasted compute time.
What molecular dynamics simulation is
Molecular dynamics (MD) simulation is a computational method that models the physical movement of atoms and molecules over time. Starting from a 3D structure — typically a crystal structure or an AlphaFold model — it uses the laws of classical mechanics to calculate how every atom in the system moves, one tiny timestep at a time.
The output is a trajectory: a time-ordered series of structural snapshots showing how the system evolves. From this trajectory you can extract an enormous range of information — how flexible a protein is, whether a ligand stays bound, how a mutation changes protein dynamics, what conformational states a protein visits, and much more.
The key word in that description is time. This is what fundamentally distinguishes MD from molecular docking. Docking gives you a static prediction of how a ligand binds at a single moment. MD gives you a dynamic picture of how the entire system moves, breathes, and evolves. One is a photograph; the other is a film.
How it works: force fields, timesteps and thermostats
At its core, an MD simulation does one thing repeatedly: calculate the force on every atom, use that force to update every atom’s position and velocity, advance time by one tiny step, and repeat. For a system of 100,000 atoms simulated for 100 nanoseconds at a 2-femtosecond timestep, this loop runs 50 million times. Modern MD software and GPU hardware make this tractable — but it’s worth understanding what’s happening inside each iteration.
Force fields
A force field is the mathematical model that describes how atoms interact. It defines the potential energy of the system as a function of atomic positions, covering bond stretching, angle bending, torsional rotation, and non-bonded interactions (electrostatics and van der Waals). The force on each atom is the negative gradient of this potential energy — in plain terms, atoms are pushed toward lower-energy configurations.
The major protein force fields — AMBER, CHARMM, GROMOS, and OPLS — differ in how they parameterize these terms, which training data they used, and which types of molecules they handle best. Choosing the right force field for your system is one of the most consequential decisions in setting up an MD simulation, and one of the most frequently glossed over in tutorials.
Timestep
The timestep is how much simulated time advances with each iteration of the loop. A typical value is 2 femtoseconds (2 × 10⁻¹⁵ seconds). This has to be small enough that the simulation remains numerically stable — if the timestep is too large, atoms can overshoot their equilibrium positions, forces become unrealistically large, and the simulation either crashes or produces nonsensical results. Hydrogen bonds are the limiting factor: because hydrogen atoms are light and move fast, they set the upper bound on timestep size for standard simulations.
Thermostats and barostats
Real biological systems exist at constant temperature (~310 K for physiological conditions) and constant pressure (~1 bar). Straight Newtonian dynamics doesn’t preserve either of these — the temperature and pressure fluctuate as atoms exchange kinetic energy. Thermostats (like the V-rescale or Nosé-Hoover algorithm) and barostats (like the Parrinello-Rahman algorithm) are mathematical coupling schemes that keep the system in the right thermodynamic ensemble. Choosing the wrong coupling scheme — or using one that’s inappropriate for equilibration vs. production — is a common source of artifacts in MD trajectories.
What MD can tell you that docking cannot
Molecular docking treats the protein as rigid and gives you one snapshot. MD gives you the full dynamic picture. Here are the four most important things MD uniquely reveals:
Timescales: what you can and cannot simulate
One of the most important things to understand about MD before you run your first simulation is what timescales are computationally accessible — and what biologically relevant processes fall outside them.
A typical academic MD simulation runs for 100–500 nanoseconds on a GPU cluster, taking hours to days of wall-clock time. This timescale covers local flexibility, ligand binding stability, and small conformational changes — which is sufficient for the vast majority of structural biology and drug discovery applications at the PhD level.
Which software should you use?
Several programs dominate academic and industrial MD simulation. The choice depends on your system, your hardware, your budget, and what force fields you need. Here is an honest comparison of the main options:
| Software | Cost | Best hardware | Strengths | Best for |
|---|---|---|---|---|
| GROMACS | Free | CPU + GPU | Fastest free option, excellent documentation, huge user community, multiple force fields | Most academic protein/ligand MD — the default choice |
| NAMD | Free | CPU + GPU | CHARMM force field native support, good for membrane systems, integrates with VMD | Membrane proteins, CHARMM workflows |
| AMBER | Free (AmberTools) / Paid (AMBER) | GPU strongly recommended | Best GPU performance, native AMBER force fields, excellent for nucleic acids | RNA/DNA simulations, high-performance GPU clusters |
| OpenMM | Free | GPU (CUDA/OpenCL) | Python API, highly customizable, good for non-standard systems | Custom force fields, ML-enhanced MD, Python workflows |
| LAMMPS | Free | CPU + GPU | Extremely flexible, handles materials science and non-biomolecular systems | Coarse-grained MD, materials science, non-protein systems |
| Desmond (Schrödinger) | Commercial | GPU | Excellent GUI, integrated with Schrödinger suite, FEP+ support | Industry drug discovery, Schrödinger workflow users |
For most grad students doing protein or protein-ligand MD simulation at a university: start with GROMACS. It is free, fast, comprehensively documented, and has the largest community of any MD software — which means nearly every problem you encounter has been answered somewhere on the GROMACS mailing list or Stack Exchange.
The complete MD workflow
A rigorous MD simulation follows a defined sequence of steps. Skipping or rushing any of them is the most common cause of unreliable results. Here is the complete workflow, with what each step accomplishes and why it cannot be omitted.
Download your protein structure from the RCSB PDB. Before doing anything else, read the REMARK section: check the resolution, note any missing residues or chain breaks, identify co-crystallized ligands and cofactors. Missing residues in loop regions that border the binding site are particularly important — they need to be modeled before simulation or the binding site geometry will be wrong.
For structures above 3.0 Å resolution, consider whether the positional uncertainty is acceptable for your research question. For AlphaFold models, assess the pLDDT confidence scores — regions with pLDDT below 70 are structurally uncertain and should be interpreted cautiously.
Choose your force field and generate the protein topology — the file that defines every bond, angle, dihedral, and non-bonded parameter for every atom. In GROMACS this is done with the pdb2gmx command, which also adds missing hydrogens and lets you specify protonation states for titratable residues.
Protonation state assignment is the step most beginners skip and most experienced researchers treat carefully. Histidine residues in particular need to be assigned manually based on their local environment — use a tool like H++ or PropKa at pH 7.4, then review active site residues by hand.
If you’re simulating a protein-ligand complex, the ligand needs its own force field parameters. This is not automatic — standard protein force fields don’t include parameters for arbitrary small molecules. The most common approach for AMBER-compatible parameters is ACPYPE (using GAFF force field); for CHARMM-compatible parameters use CGenFF. This step is technically demanding and frequently where beginners get stuck. A separate tutorial on this site covers it in detail.
Place the protein in a periodic simulation box and fill it with explicit water molecules. The box must be large enough that the protein never interacts with its own periodic image — a minimum of 10 Å from any protein atom to the box edge is the standard. Truncated octahedral boxes are more computationally efficient than cubic boxes for globular proteins and are preferred for production simulations.
The most common water model for biomolecular simulation is TIP3P (fast, compatible with most force fields) or TIP4P-Ew (more accurate but slower). Your choice of water model should match the force field you’re using — mismatched combinations are a source of subtle errors.
Real biological systems contain ions. Add Na⁺ and Cl⁻ ions to achieve physiological ionic strength (~0.15 M) and to neutralize the net charge of the system. A charged simulation box causes artifacts in the electrostatic calculations — always neutralize before proceeding. GROMACS handles this with the genion tool, which replaces randomly selected water molecules with ions.
Before any dynamics can run, the system needs to be at a local energy minimum. Crystal structures often have small clashes between atoms — steric conflicts that would generate enormous forces and immediately crash an MD simulation. Energy minimization moves atoms to relieve these clashes using a steepest-descent or conjugate gradient algorithm. Run until the maximum force drops below 1000 kJ/mol/nm. If minimization fails to converge, the input structure has serious structural problems that need diagnosis before proceeding.
Heat the system gradually to your target temperature (typically 300–310 K) under constant volume conditions. This phase typically runs for 100–500 ps with position restraints on the protein heavy atoms — this allows the water and ions to equilibrate around the protein without the protein itself moving dramatically. Monitor temperature throughout to confirm it stabilizes at the target value.
Switch to constant pressure conditions and allow the system density to equilibrate. This phase also typically runs for 100–500 ps with position restraints, gradually released. Monitor both temperature and pressure, and watch the system density — it should stabilize at approximately 1.0 g/cm³ for aqueous systems. Only proceed to production once both temperature and pressure have plateaued.
Remove position restraints and run the actual simulation. The length depends on your research question: 100 ns is a reasonable starting point for stability studies; 500 ns–1 μs for conformational sampling. Save coordinates at regular intervals — typically every 10–100 ps — to build the trajectory you’ll analyze. Write checkpoints frequently in case the simulation needs to be restarted.
Center and wrap the trajectory to correct for periodic boundary conditions, then calculate your metrics of interest. The analysis section below covers the most important quantities and what they tell you.
Essential analysis metrics
A trajectory file is raw data — gigabytes of atomic coordinates over time. Making sense of it requires calculating summary metrics that reduce this complexity to interpretable quantities. Here are the metrics every MD practitioner needs to know:
-
RMSDRoot Mean Square Deviation of atomic positions from a reference structure (usually the starting structure). Monitors overall structural drift over time. A rising RMSD that plateaus indicates the protein has reached a stable conformation. An RMSD that never stabilizes indicates either insufficient equilibration or a genuine conformational transition.
-
RMSFRoot Mean Square Fluctuation — the average displacement of each residue from its mean position over the trajectory. Measures per-residue flexibility. High RMSF regions are flexible loops; low RMSF regions are rigid secondary structure elements. Directly comparable to B-factors from crystallography.
-
RgRadius of gyration — the mass-weighted RMS distance of all atoms from the center of mass. A measure of overall protein compactness. Decreasing Rg indicates compaction; increasing Rg indicates unfolding or domain separation. Stable proteins show a near-constant Rg throughout production.
-
Hydrogen bondsNumber and occupancy of hydrogen bonds — both intramolecular (within the protein or ligand) and intermolecular (between protein and ligand). Ligand hydrogen bond occupancy is one of the most useful metrics for characterizing binding stability. An H-bond with >80% occupancy over the trajectory is a stable, meaningful interaction.
-
Ligand RMSDRMSD of the ligand relative to its starting (docked) position over the trajectory. The most direct measure of binding stability. A ligand RMSD that stays below 2.0 Å is stable in its binding pose. One that drifts above 4–5 Å has effectively left the binding site.
-
MM-GBSA / MM-PBSAPost-processing methods that use frames from the MD trajectory to estimate binding free energy. Significantly more accurate than docking scores. Calculated over an ensemble of structures rather than a single pose, capturing conformational flexibility. The standard upgrade from docking scores for hit validation.
Best practices before you publish
Confirm the system equilibrated properly
Before analyzing production trajectory data, verify that equilibration was successful. The RMSD of the protein backbone should plateau during the first portion of the production run. Temperature and pressure should fluctuate narrowly around their target values. System density should be stable. If any of these show drift or instability throughout the production run, the simulation was not equilibrated and the results cannot be trusted.
Run at least two independent replicas
A single MD trajectory is a single sample from a stochastic process. Key conclusions — especially quantitative ones like binding free energies or conformational populations — should be reproduced across at least two independent simulations starting from different initial velocities. If your replicas disagree substantially, either run longer simulations or report the disagreement explicitly.
Check for periodic boundary condition artifacts
The protein must never come within interaction distance of its own periodic image. Verify this by checking that the minimum image distance stays above your electrostatic cutoff (typically 10–12 Å) throughout the simulation. If the protein is large relative to the box, this can fail during the simulation as the protein moves and rotates.
Report your simulation parameters completely
A reproducible methods section includes: force field name and version, water model, box type and minimum distance to edge, ion concentration, thermostat and barostat algorithms, timestep, simulation length, frequency of coordinate and energy saving, software version, and hardware used. Reviewers at journals like JCTC, JCIM, and JACS increasingly check methods sections carefully against field standards.
Next steps
This guide gave you the conceptual framework for molecular dynamics simulation — what it is, how it works, what it can tell you, and what the full workflow looks like. The tutorials linked below walk through each part of that workflow in detail, with exact commands for GROMACS and step-by-step guidance for the steps that trip up most beginners.
If you’re coming from the docking tutorials on this site, the natural starting point is the GROMACS installation guide, followed by the system preparation tutorial. If you’ve done MD before but want to level up your analysis or learn to run protein-ligand simulations, jump directly to the analysis or protein-ligand tutorials.
The one-paragraph summary
Molecular dynamics simulation uses Newton’s laws and a force field to model how every atom in a biological system moves over time. It tells you things docking fundamentally cannot: whether a ligand stays bound, how flexible the protein is, what conformational states it visits, and — with methods like MM-GBSA — accurate binding free energies. The standard academic workflow runs in GROMACS, takes a protein from a PDB file through energy minimization, NVT and NPT equilibration, and production MD, then analyzes the trajectory with RMSD, RMSF, hydrogen bond occupancy, and free energy calculations. Getting the preparation right — force field choice, protonation states, box size, equilibration — is what separates publishable results from wasted compute time.