How to Analyze Molecular Dynamics Results: RMSD, RMSF, Radius of Gyration and Hydrogen Bonds

How to Analyze Molecular Dynamics Results: RMSD, RMSF, Radius of Gyration and Hydrogen Bonds

A finished MD trajectory is raw data — gigabytes of atomic coordinates that mean nothing until you calculate something from them. This tutorial covers the four most important analysis metrics, how to compute each with GROMACS, how to visualize them with Python, and how to tell a good result from a bad one.

This tutorial assumes
You have a completed production trajectory from the GROMACS tutorial — specifically md.tpr (the run input file), md.xtc (the trajectory), and md.edr (the energy file). All commands run from your simulation working directory.

Before you start: center your trajectory

Before any analysis, the trajectory must be post-processed to remove periodic boundary condition artifacts. Proteins often appear to “jump” across the box or become fragmented in the raw trajectory — this is a visualization artifact, not a physical problem, but it breaks distance-based analysis if uncorrected.

Terminal
gmx trjconv \
  -s md.tpr \
  -f md.xtc \
  -o md_centered.xtc \
  -center \
  -pbc mol \
  -ur compact

# When prompted:
# Group for centering: 1 (Protein)
# Group for output:   0 (System)

Use md_centered.xtc for all subsequent analysis. Every metric calculated on the unwrapped raw trajectory will be incorrect for any frame where the protein crosses a box boundary.

RMSD — Root Mean Square Deviation

RMSD
Metric 1
Root Mean Square Deviation — structural stability over time

What it measures: How much the protein structure has deviated from a reference structure — typically the starting coordinates — at each point in time. RMSD is calculated by superimposing each trajectory frame onto the reference and measuring the average displacement of selected atoms. A rising RMSD means the structure is drifting; a plateau means it has reached a stable conformation.

Why it matters: RMSD is the first analysis every MD study reports. It’s the primary check that a simulation has equilibrated — that the protein has settled into a stable state and is no longer drifting systematically. An RMSD that never plateaus means either the simulation wasn’t long enough or the protein is undergoing a genuine conformational transition worth investigating.

Calculate with GROMACS

Terminal
gmx rms \
  -s md.tpr \
  -f md_centered.xtc \
  -o rmsd_backbone.xvg \
  -tu ns

# Select Backbone for both groups (fit and RMSD calculation)
# -tu ns converts x-axis to nanoseconds

For a protein-ligand system, calculate RMSD separately for the protein backbone and for the ligand:

Terminal
# Ligand RMSD — first fit to protein backbone, then measure ligand
gmx rms \
  -s md.tpr \
  -f md_centered.xtc \
  -o rmsd_ligand.xvg \
  -tu ns
# Group 1 (fit): Backbone
# Group 2 (RMSD): Ligand (or MOL, or your ligand group name)

How to interpret it

< 0.2 nm
Stable
Protein stays close to starting structure. Expected for folded, rigid proteins.
0.2–0.4 nm
Moderate drift
Acceptable for flexible proteins or those with disordered regions. Check if plateau reached.
> 0.4 nm
Investigate
Large drift — may indicate conformational change, unfolding, or insufficient equilibration.

The absolute value matters less than whether RMSD has converged. An RMSD of 0.35 nm that plateaued by 50 ns and stayed flat for 200 ns is a well-behaved simulation. An RMSD of 0.15 nm that’s still rising at the end of the run is not — the simulation hasn’t equilibrated regardless of the low absolute value.

RMSF — Root Mean Square Fluctuation

RMSF
Metric 2
Root Mean Square Fluctuation — per-residue flexibility

What it measures: The average displacement of each residue from its mean position over the entire trajectory. Unlike RMSD which measures drift over time from a fixed reference, RMSF measures how much each residue fluctuates around its own average position. High RMSF = flexible; low RMSF = rigid.

Why it matters: RMSF gives you the protein’s flexibility fingerprint — which regions are structurally rigid (helices, buried strands) and which are mobile (surface loops, termini, disordered regions). This is directly comparable to crystallographic B-factors: high RMSF residues correspond to high B-factor residues in the crystal structure. The comparison is a useful validation — if RMSF-flexible regions match B-factor-flexible regions, your simulation is capturing real protein dynamics.

Calculate with GROMACS

Terminal
gmx rmsf \
  -s md.tpr \
  -f md_centered.xtc \
  -o rmsf_backbone.xvg \
  -res

# Select Backbone
# -res averages over residues, giving one value per residue number

The output XVG file has two columns: residue number and RMSF in nm. A value of 0.05 nm means that residue fluctuates by an average of 0.5 Å from its mean position — typical for a rigid secondary structure element.

Comparing RMSF to B-factors
To compare your RMSF directly to crystallographic B-factors, convert RMSF to the equivalent B-factor using the relation: B = (8π²/3) × RMSF². Both quantities reflect atomic displacement, so a strong correspondence between simulation RMSF and experimental B-factors is a validation signal that your simulation is producing physically realistic dynamics.

Radius of Gyration

Rg
Metric 3
Radius of Gyration — overall protein compactness

What it measures: The mass-weighted root mean square distance of all atoms from the protein’s center of mass — essentially a measure of how “spread out” the protein is in space. A compact, well-folded protein has a low, stable Rg. An unfolding or expanding protein shows increasing Rg over time.

Why it matters: Rg is a quick check on the overall structural integrity of the protein during simulation. A stable Rg confirms the protein maintained its folded state throughout the run. Decreasing Rg can indicate artificial compaction (sometimes seen with certain force field or water model combinations); increasing Rg can indicate partial unfolding or domain separation.

Calculate with GROMACS

Terminal
gmx gyrate \
  -s md.tpr \
  -f md_centered.xtc \
  -o gyrate.xvg

# Select Protein
# Output: time, Rg, Rgx, Rgy, Rgz (Rg along each axis)

The output contains the overall Rg plus its components along each Cartesian axis. For a globular protein the three axis components should be similar; a large difference between them indicates an elongated or asymmetric shape, which is expected for some proteins but unexpected drift for others.

Flat line
Stable fold
Rg constant throughout — protein maintained its folded state. Ideal result.
Slow increase
Investigate
Protein may be gradually unfolding or a flexible domain is sampling extended states.
Sharp increase
Problem
Rapid unfolding — check force field compatibility, system preparation, and equilibration.

Hydrogen Bonds

H-bonds
Metric 4
Hydrogen Bonds — interaction stability and occupancy

What it measures: Hydrogen bond analysis in MD can answer two distinct questions: how many H-bonds does the protein maintain over time (overall count), and which specific hydrogen bonds between a ligand and protein are stable (occupancy analysis). The first tells you about overall structural integrity; the second is essential for characterizing protein-ligand interactions.

Total intramolecular H-bond count over time

Terminal
gmx hbond \
  -s md.tpr \
  -f md_centered.xtc \
  -num hbnum.xvg

# Select Protein for both donor and acceptor groups
# Output: number of H-bonds at each timepoint

Protein-ligand H-bond occupancy

For protein-ligand systems, the more useful analysis is which specific H-bonds form between the ligand and protein residues, and what fraction of the simulation time they are present (occupancy):

Terminal
# Calculate H-bonds between protein and ligand
gmx hbond \
  -s md.tpr \
  -f md_centered.xtc \
  -num hbnum_pl.xvg \
  -hbm hbmap.xpm \
  -hbn hbindex.ndx

# Select Protein for donor/acceptor group 1
# Select Ligand for donor/acceptor group 2

# Convert H-bond map to table format
gmx hbond \
  -s md.tpr \
  -f md_centered.xtc \
  -shell 0.35 \
  -ac hbac.xvg

The occupancy of each H-bond is the fraction of trajectory frames in which it is present. Here’s how to interpret occupancy values:

OccupancyStabilityInterpretation
> 80%
Very stable
Core binding interaction — report as a key contact in your paper
40–80%
Moderate
Intermittent H-bond — contributes to binding but not a rigid anchor
< 40%
Transient
Weak or transient interaction — mention with caveat or omit from figure

Visualizing everything with Python and matplotlib

GROMACS outputs XVG files — plain-text files with a header and two or more data columns. Visualizing them with Python gives you publication-quality plots with full control over style, and makes it trivial to combine multiple metrics into a single figure.

Install the required packages if you haven’t already:

Terminal
conda activate md
conda install -c conda-forge matplotlib numpy -y

Save the following as plot_analysis.py in your working directory. It reads all four XVG files and produces a clean four-panel publication figure:

plot_analysis.py
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

# ── Helper: parse GROMACS XVG files ──────────────────────────
def read_xvg(path):
    """Read a GROMACS XVG file, skipping comment/label lines."""
    data = []
    with open(path) as f:
        for line in f:
            if line.startswith(('#', '@')):
                continue
            vals = line.split()
            if vals:
                data.append([float(v) for v in vals])
    return np.array(data)

# ── Load data ─────────────────────────────────────────────────
rmsd  = read_xvg('rmsd_backbone.xvg')
rmsf  = read_xvg('rmsf_backbone.xvg')
rg    = read_xvg('gyrate.xvg')
hbond = read_xvg('hbnum.xvg')

# ── Style ─────────────────────────────────────────────────────
plt.rcParams.update({
    'font.family': 'sans-serif',
    'font.size': 10,
    'axes.spines.top': False,
    'axes.spines.right': False,
    'axes.linewidth': 0.8,
})

TEAL   = '#1d9e75'
BLUE   = '#185fa5'
PURPLE = '#534ab7'
AMBER  = '#ba7517'

# ── Figure: 2×2 panel ────────────────────────────────────────
fig = plt.figure(figsize=(10, 7))
gs  = gridspec.GridSpec(2, 2, hspace=0.45, wspace=0.35)

# Panel A — RMSD
ax1 = fig.add_subplot(gs[0, 0])
ax1.plot(rmsd[:, 0], rmsd[:, 1] * 10, color=TEAL, lw=1.2)
ax1.set_xlabel('Time (ns)')
ax1.set_ylabel('RMSD (Å)')
ax1.set_title('A  Backbone RMSD', loc='left', fontweight='bold')
ax1.axhline(y=2.0, color='gray', ls='--', lw=0.8, alpha=0.5)

# Panel B — RMSF
ax2 = fig.add_subplot(gs[0, 1])
ax2.bar(rmsf[:, 0], rmsf[:, 1] * 10, color=BLUE, alpha=0.7, width=0.8)
ax2.set_xlabel('Residue number')
ax2.set_ylabel('RMSF (Å)')
ax2.set_title('B  Per-residue RMSF', loc='left', fontweight='bold')

# Panel C — Radius of gyration
ax3 = fig.add_subplot(gs[1, 0])
ax3.plot(rg[:, 0], rg[:, 1] * 10, color=PURPLE, lw=1.2)
ax3.set_xlabel('Time (ns)')
ax3.set_ylabel('Rg (Å)')
ax3.set_title('C  Radius of gyration', loc='left', fontweight='bold')

# Panel D — Hydrogen bonds
ax4 = fig.add_subplot(gs[1, 1])
ax4.plot(hbond[:, 0], hbond[:, 1], color=AMBER, lw=0.8, alpha=0.6)
# Running average for clarity
window = 50
hb_smooth = np.convolve(hbond[:, 1],
                        np.ones(window)/window, mode='valid')
ax4.plot(hbond[window-1:, 0], hb_smooth,
         color=AMBER, lw=2.0, label='50-frame average')
ax4.set_xlabel('Time (ns)')
ax4.set_ylabel('H-bond count')
ax4.set_title('D  Intramolecular H-bonds', loc='left', fontweight='bold')
ax4.legend(fontsize=8)

fig.suptitle('MD simulation analysis — Lysozyme (1AKI, 1 ns)',
             fontsize=11, y=1.01)

plt.savefig('md_analysis.png', dpi=300, bbox_inches='tight')
plt.savefig('md_analysis.pdf', bbox_inches='tight')  # for publication
print('Saved md_analysis.png and md_analysis.pdf')
Terminal
python plot_analysis.py

This produces a clean two-by-two panel figure: RMSD top-left, RMSF as a bar chart top-right, Rg bottom-left, H-bond count with running average bottom-right. Both PNG (for quick viewing) and PDF (for publication, vector format) are saved. The values are multiplied by 10 to convert from nm to Å — conventional for reporting in structural biology papers.

XVG units — always check
GROMACS outputs distances in nm and time in ps by default. The -tu ns flag on analysis tools converts time to nanoseconds in the output. When plotting, always verify the axis units — multiply nm × 10 to get Å, or divide ps by 1000 to get ns if you forgot the flag. Reporting RMSD in nm is also valid and increasingly common; just be explicit in your figure caption.

Analysis checklist before publishing

Before reporting any MD analysis in a paper or presentation, work through this checklist:

  • Trajectory processed: gmx trjconv run with -center -pbc mol before all analysis
  • RMSD converged: backbone RMSD has plateaued — the system equilibrated before analysis window
  • Equilibration excluded: RMSD and all other metrics calculated only over the production phase, not including the equilibration run
  • RMSF compared to B-factors: flexible regions in simulation match flexible regions in crystal structure
  • Rg stable: protein maintained its overall fold throughout the simulation
  • H-bond occupancy reported: key protein-ligand contacts quantified as % occupancy, not just presence/absence
  • Replicas considered: key conclusions supported by at least two independent simulations with different initial velocities
  • Methods complete: all analysis tool versions and parameters reported in methods section

Analysis in one paragraph

RMSD tells you whether the simulation equilibrated and the protein stayed structurally stable. RMSF tells you which regions are flexible and validates that your simulation reproduces experimentally observed dynamics. Radius of gyration tells you whether the overall fold was maintained. Hydrogen bond occupancy tells you which specific interactions between ligand and protein are stable enough to matter. Run all four on every simulation, visualize them together, and check them before trusting any other result from the trajectory.

Last updated on

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *