How to Analyze Hydrogen Bonds in MD Simulations with MDAnalysis
H-bond analysis answers the questions that RMSD can’t: which specific interactions stabilize the protein fold? Which residues form persistent contacts with the ligand? MDAnalysis’s HydrogenBondAnalysis class finds every H-bond in every frame automatically — this tutorial shows you how to run it, extract occupancy, and identify the bonds that matter.
H-bond geometry — what counts as a hydrogen bond
MDAnalysis uses a geometric criterion to detect H-bonds: a bond D–H···A is counted when the donor-acceptor distance and the D–H···A angle both fall within defined thresholds. These thresholds match the standard used in structural biology literature.
u.add_TopologyAttr("hydrogens") if missing, or use GROMACS topologies which always include H positions.HydrogenBondAnalysis requires explicit hydrogen atom positions — it traces the D–H···A geometry directly. GROMACS topologies (.gro, .tpr) always include hydrogens. PDB files from crystal structures often don’t. If your topology lacks H atoms, either use a GROMACS-prepared system or add hydrogens with GROMACS (gmx pdb2gmx) before loading into MDAnalysis.
Setting up HydrogenBondAnalysis
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA
u = mda.Universe("topology.gro", "trajectory.xtc")
# Basic setup — all intra-protein H-bonds
hbonds = HBA(universe=u)
# The class auto-detects donors and acceptors from the topology
# But you can also specify them explicitly:
# Custom donor selection (atoms with H attached)
hbonds = HBA(
universe=u,
donors_sel="protein and (name N or name O or name S)",
acceptors_sel="protein and (name O or name N or name S)",
d_a_cutoff=3.0, # donor-acceptor distance cutoff (Å)
d_h_a_angle_cutoff=150, # D-H-A angle cutoff (degrees)
)
# Protein-ligand H-bonds only
hbonds_pl = HBA(
universe=u,
between=[["protein"], ["resname LIG"]],
d_a_cutoff=3.5, # slightly looser for ligand contacts
d_h_a_angle_cutoff=120,
)
between parameter restricts the analysis to H-bonds that cross between two groups — only bonds where one end is in group 1 and the other end is in group 2. Without it, all possible H-bonds within the entire selection are detected, which includes intra-protein backbone H-bonds, water-mediated contacts, and everything else. For protein-ligand or chain A–chain B interface analysis, always use between.
Running the analysis
# Run on all frames
hbonds.run()
# Run on a subset of frames (skip equilibration)
hbonds.run(start=50, stop=None, step=1)
# Run on every other frame (faster for exploratory analysis)
hbonds.run(step=2)
# Run with verbose progress output
hbonds.run(verbose=True)
# After running — check how many H-bond events were found
print(f"Total H-bond observations: {len(hbonds.results.hbonds)}")
Understanding the results table
After running, all detected H-bonds across all frames are stored in hbonds.results.hbonds — a NumPy array where each row is one H-bond observation in one frame.
| frame | donor_idx | hydrogen_idx | acceptor_idx | d_a_dist | d_h_a_angle |
|---|---|---|---|---|---|
| 0 | 142 | 143 | 891 | 2.84 | 162.3 |
| 0 | 891 | 892 | 1204 | 2.71 | 171.8 |
| 1 | 142 | 143 | 891 | 2.91 | 158.1 |
| 2 | 315 | 316 | 678 | 3.02 | 154.7 |
| … | … | … | … | … | … |
import numpy as np
results = hbonds.results.hbonds
# Column indices
# 0: frame, 1: donor atom index, 2: hydrogen atom index
# 3: acceptor atom index, 4: D-A distance, 5: D-H-A angle
# Convert atom indices to residue info
donor_resids = [u.atoms[int(i)].resid for i in results[:, 1]]
donor_resnames = [u.atoms[int(i)].resname for i in results[:, 1]]
acceptor_resids = [u.atoms[int(i)].resid for i in results[:, 3]]
# Mean D-A distance across all H-bond events
print(f"Mean D-A distance: {results[:, 4].mean():.2f} Å")
print(f"Mean D-H-A angle: {results[:, 5].mean():.1f}°")
# H-bonds per frame
n_frames = len(u.trajectory)
hbonds_per_frame = np.array([
(results[:, 0] == frame).sum()
for frame in range(n_frames)
])
print(f"Mean H-bonds per frame: {hbonds_per_frame.mean():.1f}")
Counting occupancy
Occupancy is the fraction of frames in which a specific H-bond exists — the most useful summary statistic from H-bond analysis. MDAnalysis provides a built-in method:
# Count how often each unique D-A pair appears across all frames
hbonds.count_by_ids()
# Or compute occupancy manually with full control
import pandas as pd
results = hbonds.results.hbonds
n_frames = len(u.trajectory)
# Build a DataFrame with residue-level identity
rows = []
for frame, d_idx, h_idx, a_idx, dist, angle in results:
d_atom = u.atoms[int(d_idx)]
a_atom = u.atoms[int(a_idx)]
rows.append({
"frame": int(frame),
"donor_resid": d_atom.resid,
"donor_resname": d_atom.resname,
"donor_name": d_atom.name,
"acc_resid": a_atom.resid,
"acc_resname": a_atom.resname,
"acc_name": a_atom.name,
"distance": round(dist, 2),
"angle": round(angle, 1),
})
df = pd.DataFrame(rows)
# Calculate occupancy for each unique D-A pair
occupancy = (
df.groupby(["donor_resid", "donor_resname", "donor_name",
"acc_resid", "acc_resname", "acc_name"])
.agg(
count=("frame", "count"),
mean_dist=("distance", "mean"),
)
.reset_index()
)
occupancy["occupancy"] = (occupancy["count"] / n_frames * 100).round(1)
occupancy = occupancy.sort_values("occupancy", ascending=False)
print(occupancy.head(10).to_string(index=False))
Identifying persistent H-bonds
# Filter to persistent H-bonds (occupancy > 60%)
persistent = occupancy[occupancy["occupancy"] >= 60].copy()
print(f"Persistent H-bonds (≥60% occupancy): {len(persistent)}")
print(persistent[["donor_resname", "donor_resid", "acc_resname",
"acc_resid", "occupancy", "mean_dist"]].to_string(index=False))
# Export all occupancy data
occupancy.to_csv("hbond_occupancy.csv", index=False)
persistent.to_csv("hbond_persistent.csv", index=False)
Protein-ligand H-bonds
# Set up protein-ligand specific analysis
hbonds_pl = HBA(
universe=u,
between=[["protein"], ["resname LIG"]],
d_a_cutoff=3.5,
d_h_a_angle_cutoff=120,
)
hbonds_pl.run(start=50) # skip equilibration
results_pl = hbonds_pl.results.hbonds
if len(results_pl) == 0:
print("No protein-ligand H-bonds detected")
else:
rows_pl = []
for frame, d_idx, h_idx, a_idx, dist, angle in results_pl:
d_atom = u.atoms[int(d_idx)]
a_atom = u.atoms[int(a_idx)]
rows_pl.append({
"donor": f"{d_atom.resname}{d_atom.resid}-{d_atom.name}",
"acceptor": f"{a_atom.resname}{a_atom.resid}-{a_atom.name}",
"frame": int(frame),
"distance": round(dist, 2),
})
df_pl = pd.DataFrame(rows_pl)
n_eq_frames = len(u.trajectory) - 50
occ_pl = (
df_pl.groupby(["donor", "acceptor"])
.agg(count=("frame", "count"), mean_dist=("distance", "mean"))
.reset_index()
)
occ_pl["occupancy_pct"] = (occ_pl["count"] / n_eq_frames * 100).round(1)
occ_pl = occ_pl.sort_values("occupancy_pct", ascending=False)
print("Protein-ligand H-bond occupancy:")
print(occ_pl.to_string(index=False))
Plotting results
import matplotlib.pyplot as plt
import numpy as np
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
# Panel A: H-bonds per frame over time
ax1.plot(hbonds_per_frame, color="#0d6e54", linewidth=0.8, alpha=0.7)
ax1.axhline(hbonds_per_frame.mean(), color="#0d6e54", linestyle="--",
label=f"Mean: {hbonds_per_frame.mean():.1f}")
ax1.set_xlabel("Frame"); ax1.set_ylabel("Number of H-bonds")
ax1.set_title("(A) Total H-bonds per frame", fontweight="bold", loc="left")
ax1.legend(fontsize=9)
# Panel B: Occupancy bar chart for top 15 H-bonds
top15 = occupancy.head(15)
colors = ["#0d6e54" if occ >= 60 else ("#ba7517" if occ >= 20 else "#928e85")
for occ in top15["occupancy"]]
labels = [f"{row.donor_resname}{row.donor_resid}→{row.acc_resname}{row.acc_resid}"
for _, row in top15.iterrows()]
ax2.barh(range(len(top15)), top15["occupancy"], color=colors)
ax2.set_yticks(range(len(top15))); ax2.set_yticklabels(labels, fontsize=9)
ax2.axvline(60, color="#0d6e54", linestyle="--", linewidth=0.8, label="60% threshold")
ax2.axvline(20, color="#ba7517", linestyle="--", linewidth=0.8, label="20% threshold")
ax2.set_xlabel("Occupancy (%)")
ax2.set_title("(B) Top H-bond occupancy", fontweight="bold", loc="left")
ax2.legend(fontsize=9); ax2.invert_yaxis()
plt.tight_layout()
plt.savefig("hbond_analysis.png", dpi=300, bbox_inches="tight")
plt.close()
print("Saved hbond_analysis.png")
H-bond analysis in one paragraph
Import HydrogenBondAnalysis from MDAnalysis.analysis.hydrogenbonds.hbond_analysis, create it with your Universe and optional between groups for cross-group analysis, and call .run(start=N) to skip equilibration. Results are in hbonds.results.hbonds — a row per H-bond per frame with columns for frame index, donor/hydrogen/acceptor atom indices, D-A distance, and D-H-A angle. Convert atom indices to residue names using u.atoms[int(idx)].resname, then group by donor-acceptor pair and divide by frame count to get occupancy. H-bonds with occupancy above 60% are persistent and worth reporting; above 20% are moderate. Use between to restrict to protein-ligand contacts — these are the interactions that appear in binding mode tables in drug discovery papers.