How Does Molecular Dynamics Simulation Work? Force Fields, Timesteps & Thermostats Explained
Understanding how MD simulation works under the hood isn’t just academic — it determines which settings you choose, what can go wrong, and how to interpret your results. This article explains every core component clearly, without assuming a physics background.
The core simulation loop
At its heart, an MD simulation is a loop — a calculation that runs millions of times. Each pass through the loop represents one timestep of physical time. Here is what happens in every iteration:
- Calculate the force on every atom from every neighboring atom using the force field
- Use that force to compute the new velocity and position of each atom using Newton’s second law
- Save the new coordinates to the trajectory file at the specified output frequency
- Advance the simulation clock by one timestep and repeat
For a protein system of 100,000 atoms simulated for 100 nanoseconds at a 2-femtosecond timestep, this loop runs 50 million times. The entire trajectory — every saved snapshot — is the raw data you’ll analyze. Everything else in MD simulation setup is in service of making this loop run correctly, stably, and in conditions that represent real biology.
Force fields: the rulebook for atomic interactions
A force field is the set of mathematical equations and parameters that describes how atoms in the system interact with each other. Given the positions of all atoms, it computes the potential energy of the system — and the force on each atom is the negative gradient of that energy with respect to position. Atoms move to lower their potential energy, just as a ball rolls downhill.
Force field energy terms fall into two categories: bonded and non-bonded.
The major biomolecular force fields — AMBER, CHARMM, GROMOS, and OPLS-AA — all use this same general framework but differ in their parameterization: the specific values for equilibrium bond lengths, force constants, partial charges, and van der Waals radii. These parameters are derived from a combination of quantum mechanical calculations and fitting to experimental data. No force field is perfect, and the choice of force field is one of the most consequential decisions in setting up a simulation.
Newton’s second law: turning forces into motion
Once forces are calculated, MD uses Newton’s second law of motion to determine how each atom moves. The relationship is straightforward:
From the acceleration, the integrator algorithm (usually the leap-frog or Verlet scheme) calculates the new velocity and then the new position for each atom. This is purely classical mechanics — MD does not account for quantum effects like zero-point energy, tunneling, or electronic structure. For the overwhelming majority of structural biology questions this is an entirely adequate approximation, but it’s a fundamental limitation to keep in mind.
The timestep: why 2 femtoseconds?
The timestep is the amount of simulated time that advances with each iteration. It must be small enough that the numerical integration remains stable — if it’s too large, atoms can overshoot their equilibrium positions, forces become enormous, and the simulation either crashes with an error or, worse, produces physically nonsensical trajectories without crashing.
The practical upper limit on the timestep is set by the fastest atomic motions in the system. In biomolecular simulations, the fastest motion is the vibration of bonds involving hydrogen — specifically X-H bonds (where X is carbon, nitrogen, or oxygen), which vibrate with a period of roughly 10 femtoseconds. The Nyquist sampling criterion requires the timestep to be substantially smaller than the period of the fastest motion being simulated.
The standard solution is to constrain X-H bonds — fix their lengths using algorithms like LINCS or SHAKE, effectively removing those fast vibrations from the dynamics. With X-H bonds constrained, the fastest remaining motions are slower heavy-atom vibrations, and a 2 femtosecond timestep is stable. Without constraints, 1 femtosecond or less is needed.
Fatal error: Stepsize too small or velocities become nan, the most common cause is a timestep that’s too large for the forces present — usually because of a steric clash in the starting structure that wasn’t resolved during energy minimization. The fix is almost always to go back and run energy minimization more thoroughly before starting dynamics.
The simulation box
MD simulations don’t run in infinite space — the system is placed in a finite simulation box, typically containing the protein, water molecules, and ions. Box choice involves two decisions: shape and size.
Shape: The most common box shapes for protein simulations are the cubic box and the truncated octahedron. The truncated octahedron is more efficient — for a given minimum protein-to-edge distance, it contains about 71% of the water molecules a cubic box would require, reducing computational cost without affecting the physics. For membrane simulations, rectangular (orthorhombic) boxes are standard to accommodate the flat bilayer geometry.
Size: The box must be large enough that the protein never comes within interaction distance of its own periodic image — a minimum of 10 Å from any protein atom to the box edge is the conventional standard, corresponding to the typical non-bonded cutoff distance. Violating this creates artificial self-interactions that corrupt the simulation.
Periodic boundary conditions
Placing a protein in a finite water box creates an artificial problem: water molecules at the box edge would experience a vacuum interface — nothing like the bulk solvent environment of a real cell. Periodic boundary conditions (PBC) solve this by making the simulation box tile infinitely in all directions. When an atom exits the box through one face, it re-enters from the opposite face. The protein effectively exists in an infinite, periodic medium.
PBC introduces its own artifact: the minimum image convention. When calculating non-bonded interactions, each atom interacts only with the nearest image of every other atom — either in the real box or one of the periodic copies. This is why the box must be large enough that no atom interacts with the periodic image of itself or its covalently bonded neighbors.
gmx trjconv for this — it’s one of the first post-processing steps and skipping it produces visual artifacts and incorrect RMSD calculations.
Thermostats and barostats
Uncontrolled Newtonian dynamics conserves total energy, not temperature — the temperature of the system fluctuates as kinetic and potential energy exchange. Real biological experiments happen at constant temperature and pressure, so MD needs coupling algorithms to maintain these conditions.
A thermostat controls temperature by rescaling atomic velocities or coupling the system to a heat bath. A barostat controls pressure by periodically adjusting the simulation box volume. The choice of algorithm affects both accuracy and which thermodynamic ensemble the simulation represents.
| Algorithm | Type | Recommended for | Notes |
|---|---|---|---|
| V-rescale | Thermostat | Equilibration Production | Stochastic velocity rescaling — correct canonical ensemble, good for general use |
| Nosé-Hoover | Thermostat | Production | Rigorous NVT ensemble — preferred for production runs where thermodynamic accuracy matters |
| Berendsen | Thermostat + Barostat | Equilibration only | Fast equilibration but incorrect ensemble — never use for production or free energy calculations |
| Parrinello-Rahman | Barostat | Production | Correct NPT ensemble — the standard barostat for production MD in GROMACS |
| Monte Carlo | Barostat | Production | Used in OpenMM; correct NPT ensemble; efficient for systems with slow pressure fluctuations |
The Berendsen thermostat/barostat deserves special mention because it’s one of the most common sources of subtle errors in published MD studies. It equilibrates systems quickly and stably — which is why it’s appropriate for equilibration — but it artificially suppresses fluctuations and does not produce a correct thermodynamic ensemble. Using it for production runs gives the wrong heat capacity, compressibility, and free energy — errors that are invisible unless you’re specifically checking for them.
Ensembles: NVT vs NPT
A thermodynamic ensemble defines which quantities are held constant during the simulation. The two ensembles you’ll use in every MD workflow are:
The standard MD workflow runs NVT first — to equilibrate temperature — then switches to NPT — to equilibrate density before production. This two-stage approach ensures the system reaches a stable thermodynamic state before any data collection begins. Running production MD in NVT (fixed volume) without prior NPT equilibration means your system density may not match real physiological conditions — a subtle but meaningful error.
The mechanism in one paragraph
MD simulation works by repeatedly calculating forces on every atom using a force field, applying Newton’s second law to update positions and velocities, and advancing the clock by a tiny timestep. The protein is placed in a periodic water box so there are no artificial edges. A thermostat and barostat keep temperature and pressure at physiological values. The choice of force field, timestep, box size, coupling algorithms, and ensemble all affect the quality of the simulation — and understanding what each does is what allows you to set them correctly and interpret what goes wrong when they don’t.