How to Analyze Molecular Dynamics Results: RMSD, RMSF, Radius of Gyration and Hydrogen Bonds
A finished MD trajectory is raw data — gigabytes of atomic coordinates that mean nothing until you calculate something from them. This tutorial covers the four most important analysis metrics, how to compute each with GROMACS, how to visualize them with Python, and how to tell a good result from a bad one.
md.tpr (the run input file), md.xtc (the trajectory), and md.edr (the energy file). All commands run from your simulation working directory.
Before you start: center your trajectory
Before any analysis, the trajectory must be post-processed to remove periodic boundary condition artifacts. Proteins often appear to “jump” across the box or become fragmented in the raw trajectory — this is a visualization artifact, not a physical problem, but it breaks distance-based analysis if uncorrected.
gmx trjconv \
-s md.tpr \
-f md.xtc \
-o md_centered.xtc \
-center \
-pbc mol \
-ur compact
# When prompted:
# Group for centering: 1 (Protein)
# Group for output: 0 (System)
Use md_centered.xtc for all subsequent analysis. Every metric calculated on the unwrapped raw trajectory will be incorrect for any frame where the protein crosses a box boundary.
RMSD — Root Mean Square Deviation
What it measures: How much the protein structure has deviated from a reference structure — typically the starting coordinates — at each point in time. RMSD is calculated by superimposing each trajectory frame onto the reference and measuring the average displacement of selected atoms. A rising RMSD means the structure is drifting; a plateau means it has reached a stable conformation.
Why it matters: RMSD is the first analysis every MD study reports. It’s the primary check that a simulation has equilibrated — that the protein has settled into a stable state and is no longer drifting systematically. An RMSD that never plateaus means either the simulation wasn’t long enough or the protein is undergoing a genuine conformational transition worth investigating.
Calculate with GROMACS
gmx rms \
-s md.tpr \
-f md_centered.xtc \
-o rmsd_backbone.xvg \
-tu ns
# Select Backbone for both groups (fit and RMSD calculation)
# -tu ns converts x-axis to nanoseconds
For a protein-ligand system, calculate RMSD separately for the protein backbone and for the ligand:
# Ligand RMSD — first fit to protein backbone, then measure ligand
gmx rms \
-s md.tpr \
-f md_centered.xtc \
-o rmsd_ligand.xvg \
-tu ns
# Group 1 (fit): Backbone
# Group 2 (RMSD): Ligand (or MOL, or your ligand group name)
How to interpret it
The absolute value matters less than whether RMSD has converged. An RMSD of 0.35 nm that plateaued by 50 ns and stayed flat for 200 ns is a well-behaved simulation. An RMSD of 0.15 nm that’s still rising at the end of the run is not — the simulation hasn’t equilibrated regardless of the low absolute value.
RMSF — Root Mean Square Fluctuation
What it measures: The average displacement of each residue from its mean position over the entire trajectory. Unlike RMSD which measures drift over time from a fixed reference, RMSF measures how much each residue fluctuates around its own average position. High RMSF = flexible; low RMSF = rigid.
Why it matters: RMSF gives you the protein’s flexibility fingerprint — which regions are structurally rigid (helices, buried strands) and which are mobile (surface loops, termini, disordered regions). This is directly comparable to crystallographic B-factors: high RMSF residues correspond to high B-factor residues in the crystal structure. The comparison is a useful validation — if RMSF-flexible regions match B-factor-flexible regions, your simulation is capturing real protein dynamics.
Calculate with GROMACS
gmx rmsf \
-s md.tpr \
-f md_centered.xtc \
-o rmsf_backbone.xvg \
-res
# Select Backbone
# -res averages over residues, giving one value per residue number
The output XVG file has two columns: residue number and RMSF in nm. A value of 0.05 nm means that residue fluctuates by an average of 0.5 Å from its mean position — typical for a rigid secondary structure element.
Radius of Gyration
What it measures: The mass-weighted root mean square distance of all atoms from the protein’s center of mass — essentially a measure of how “spread out” the protein is in space. A compact, well-folded protein has a low, stable Rg. An unfolding or expanding protein shows increasing Rg over time.
Why it matters: Rg is a quick check on the overall structural integrity of the protein during simulation. A stable Rg confirms the protein maintained its folded state throughout the run. Decreasing Rg can indicate artificial compaction (sometimes seen with certain force field or water model combinations); increasing Rg can indicate partial unfolding or domain separation.
Calculate with GROMACS
gmx gyrate \
-s md.tpr \
-f md_centered.xtc \
-o gyrate.xvg
# Select Protein
# Output: time, Rg, Rgx, Rgy, Rgz (Rg along each axis)
The output contains the overall Rg plus its components along each Cartesian axis. For a globular protein the three axis components should be similar; a large difference between them indicates an elongated or asymmetric shape, which is expected for some proteins but unexpected drift for others.
Hydrogen Bonds
What it measures: Hydrogen bond analysis in MD can answer two distinct questions: how many H-bonds does the protein maintain over time (overall count), and which specific hydrogen bonds between a ligand and protein are stable (occupancy analysis). The first tells you about overall structural integrity; the second is essential for characterizing protein-ligand interactions.
Total intramolecular H-bond count over time
gmx hbond \
-s md.tpr \
-f md_centered.xtc \
-num hbnum.xvg
# Select Protein for both donor and acceptor groups
# Output: number of H-bonds at each timepoint
Protein-ligand H-bond occupancy
For protein-ligand systems, the more useful analysis is which specific H-bonds form between the ligand and protein residues, and what fraction of the simulation time they are present (occupancy):
# Calculate H-bonds between protein and ligand
gmx hbond \
-s md.tpr \
-f md_centered.xtc \
-num hbnum_pl.xvg \
-hbm hbmap.xpm \
-hbn hbindex.ndx
# Select Protein for donor/acceptor group 1
# Select Ligand for donor/acceptor group 2
# Convert H-bond map to table format
gmx hbond \
-s md.tpr \
-f md_centered.xtc \
-shell 0.35 \
-ac hbac.xvg
The occupancy of each H-bond is the fraction of trajectory frames in which it is present. Here’s how to interpret occupancy values:
| Occupancy | Stability | Interpretation |
|---|---|---|
| > 80% | Very stable | Core binding interaction — report as a key contact in your paper |
| 40–80% | Moderate | Intermittent H-bond — contributes to binding but not a rigid anchor |
| < 40% | Transient | Weak or transient interaction — mention with caveat or omit from figure |
Visualizing everything with Python and matplotlib
GROMACS outputs XVG files — plain-text files with a header and two or more data columns. Visualizing them with Python gives you publication-quality plots with full control over style, and makes it trivial to combine multiple metrics into a single figure.
Install the required packages if you haven’t already:
conda activate md
conda install -c conda-forge matplotlib numpy -y
Save the following as plot_analysis.py in your working directory. It reads all four XVG files and produces a clean four-panel publication figure:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
# ── Helper: parse GROMACS XVG files ──────────────────────────
def read_xvg(path):
"""Read a GROMACS XVG file, skipping comment/label lines."""
data = []
with open(path) as f:
for line in f:
if line.startswith(('#', '@')):
continue
vals = line.split()
if vals:
data.append([float(v) for v in vals])
return np.array(data)
# ── Load data ─────────────────────────────────────────────────
rmsd = read_xvg('rmsd_backbone.xvg')
rmsf = read_xvg('rmsf_backbone.xvg')
rg = read_xvg('gyrate.xvg')
hbond = read_xvg('hbnum.xvg')
# ── Style ─────────────────────────────────────────────────────
plt.rcParams.update({
'font.family': 'sans-serif',
'font.size': 10,
'axes.spines.top': False,
'axes.spines.right': False,
'axes.linewidth': 0.8,
})
TEAL = '#1d9e75'
BLUE = '#185fa5'
PURPLE = '#534ab7'
AMBER = '#ba7517'
# ── Figure: 2×2 panel ────────────────────────────────────────
fig = plt.figure(figsize=(10, 7))
gs = gridspec.GridSpec(2, 2, hspace=0.45, wspace=0.35)
# Panel A — RMSD
ax1 = fig.add_subplot(gs[0, 0])
ax1.plot(rmsd[:, 0], rmsd[:, 1] * 10, color=TEAL, lw=1.2)
ax1.set_xlabel('Time (ns)')
ax1.set_ylabel('RMSD (Å)')
ax1.set_title('A Backbone RMSD', loc='left', fontweight='bold')
ax1.axhline(y=2.0, color='gray', ls='--', lw=0.8, alpha=0.5)
# Panel B — RMSF
ax2 = fig.add_subplot(gs[0, 1])
ax2.bar(rmsf[:, 0], rmsf[:, 1] * 10, color=BLUE, alpha=0.7, width=0.8)
ax2.set_xlabel('Residue number')
ax2.set_ylabel('RMSF (Å)')
ax2.set_title('B Per-residue RMSF', loc='left', fontweight='bold')
# Panel C — Radius of gyration
ax3 = fig.add_subplot(gs[1, 0])
ax3.plot(rg[:, 0], rg[:, 1] * 10, color=PURPLE, lw=1.2)
ax3.set_xlabel('Time (ns)')
ax3.set_ylabel('Rg (Å)')
ax3.set_title('C Radius of gyration', loc='left', fontweight='bold')
# Panel D — Hydrogen bonds
ax4 = fig.add_subplot(gs[1, 1])
ax4.plot(hbond[:, 0], hbond[:, 1], color=AMBER, lw=0.8, alpha=0.6)
# Running average for clarity
window = 50
hb_smooth = np.convolve(hbond[:, 1],
np.ones(window)/window, mode='valid')
ax4.plot(hbond[window-1:, 0], hb_smooth,
color=AMBER, lw=2.0, label='50-frame average')
ax4.set_xlabel('Time (ns)')
ax4.set_ylabel('H-bond count')
ax4.set_title('D Intramolecular H-bonds', loc='left', fontweight='bold')
ax4.legend(fontsize=8)
fig.suptitle('MD simulation analysis — Lysozyme (1AKI, 1 ns)',
fontsize=11, y=1.01)
plt.savefig('md_analysis.png', dpi=300, bbox_inches='tight')
plt.savefig('md_analysis.pdf', bbox_inches='tight') # for publication
print('Saved md_analysis.png and md_analysis.pdf')
python plot_analysis.py
This produces a clean two-by-two panel figure: RMSD top-left, RMSF as a bar chart top-right, Rg bottom-left, H-bond count with running average bottom-right. Both PNG (for quick viewing) and PDF (for publication, vector format) are saved. The values are multiplied by 10 to convert from nm to Å — conventional for reporting in structural biology papers.
-tu ns flag on analysis tools converts time to nanoseconds in the output. When plotting, always verify the axis units — multiply nm × 10 to get Å, or divide ps by 1000 to get ns if you forgot the flag. Reporting RMSD in nm is also valid and increasingly common; just be explicit in your figure caption.
Analysis checklist before publishing
Before reporting any MD analysis in a paper or presentation, work through this checklist:
- Trajectory processed:
gmx trjconvrun with-center -pbc molbefore all analysis - RMSD converged: backbone RMSD has plateaued — the system equilibrated before analysis window
- Equilibration excluded: RMSD and all other metrics calculated only over the production phase, not including the equilibration run
- RMSF compared to B-factors: flexible regions in simulation match flexible regions in crystal structure
- Rg stable: protein maintained its overall fold throughout the simulation
- H-bond occupancy reported: key protein-ligand contacts quantified as % occupancy, not just presence/absence
- Replicas considered: key conclusions supported by at least two independent simulations with different initial velocities
- Methods complete: all analysis tool versions and parameters reported in methods section
Analysis in one paragraph
RMSD tells you whether the simulation equilibrated and the protein stayed structurally stable. RMSF tells you which regions are flexible and validates that your simulation reproduces experimentally observed dynamics. Radius of gyration tells you whether the overall fold was maintained. Hydrogen bond occupancy tells you which specific interactions between ligand and protein are stable enough to matter. Run all four on every simulation, visualize them together, and check them before trusting any other result from the trajectory.