How to Analyze Hydrogen Bonds in MD Simulations with MDAnalysis

How to Analyze Hydrogen Bonds in MD Simulations with MDAnalysis

H-bond analysis answers the questions that RMSD can’t: which specific interactions stabilize the protein fold? Which residues form persistent contacts with the ligand? MDAnalysis’s HydrogenBondAnalysis class finds every H-bond in every frame automatically — this tutorial shows you how to run it, extract occupancy, and identify the bonds that matter.

H-bond geometry — what counts as a hydrogen bond

MDAnalysis uses a geometric criterion to detect H-bonds: a bond D–H···A is counted when the donor-acceptor distance and the D–H···A angle both fall within defined thresholds. These thresholds match the standard used in structural biology literature.

Default geometric criteria
D-A distance
≤ 3.0 Å between donor heavy atom and acceptor heavy atom (d_a_cutoff=3.0)
D–H···A angle
≥ 150° — the bond must be near-linear (d_h_a_angle_cutoff=150)
Hydrogens
Required in the topology. Add with u.add_TopologyAttr("hydrogens") if missing, or use GROMACS topologies which always include H positions.
Hydrogens must be in your topology
HydrogenBondAnalysis requires explicit hydrogen atom positions — it traces the D–H···A geometry directly. GROMACS topologies (.gro, .tpr) always include hydrogens. PDB files from crystal structures often don’t. If your topology lacks H atoms, either use a GROMACS-prepared system or add hydrogens with GROMACS (gmx pdb2gmx) before loading into MDAnalysis.

Setting up HydrogenBondAnalysis

Python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA

u = mda.Universe("topology.gro", "trajectory.xtc")

# Basic setup — all intra-protein H-bonds
hbonds = HBA(universe=u)

# The class auto-detects donors and acceptors from the topology
# But you can also specify them explicitly:

# Custom donor selection (atoms with H attached)
hbonds = HBA(
    universe=u,
    donors_sel="protein and (name N or name O or name S)",
    acceptors_sel="protein and (name O or name N or name S)",
    d_a_cutoff=3.0,          # donor-acceptor distance cutoff (Å)
    d_h_a_angle_cutoff=150,  # D-H-A angle cutoff (degrees)
)

# Protein-ligand H-bonds only
hbonds_pl = HBA(
    universe=u,
    between=[["protein"], ["resname LIG"]],
    d_a_cutoff=3.5,          # slightly looser for ligand contacts
    d_h_a_angle_cutoff=120,
)
Use the between parameter for cross-group analysis
The between parameter restricts the analysis to H-bonds that cross between two groups — only bonds where one end is in group 1 and the other end is in group 2. Without it, all possible H-bonds within the entire selection are detected, which includes intra-protein backbone H-bonds, water-mediated contacts, and everything else. For protein-ligand or chain A–chain B interface analysis, always use between.

Running the analysis

Python
# Run on all frames
hbonds.run()

# Run on a subset of frames (skip equilibration)
hbonds.run(start=50, stop=None, step=1)

# Run on every other frame (faster for exploratory analysis)
hbonds.run(step=2)

# Run with verbose progress output
hbonds.run(verbose=True)

# After running — check how many H-bond events were found
print(f"Total H-bond observations: {len(hbonds.results.hbonds)}")

Understanding the results table

After running, all detected H-bonds across all frames are stored in hbonds.results.hbonds — a NumPy array where each row is one H-bond observation in one frame.

hbonds.results.hbonds — one row per H-bond per frame
framedonor_idxhydrogen_idxacceptor_idxd_a_distd_h_a_angle
01421438912.84162.3
089189212042.71171.8
11421438912.91158.1
23153166783.02154.7
Python — working with the results array
import numpy as np

results = hbonds.results.hbonds

# Column indices
# 0: frame, 1: donor atom index, 2: hydrogen atom index
# 3: acceptor atom index, 4: D-A distance, 5: D-H-A angle

# Convert atom indices to residue info
donor_resids    = [u.atoms[int(i)].resid    for i in results[:, 1]]
donor_resnames  = [u.atoms[int(i)].resname  for i in results[:, 1]]
acceptor_resids = [u.atoms[int(i)].resid    for i in results[:, 3]]

# Mean D-A distance across all H-bond events
print(f"Mean D-A distance: {results[:, 4].mean():.2f} Å")
print(f"Mean D-H-A angle:  {results[:, 5].mean():.1f}°")

# H-bonds per frame
n_frames = len(u.trajectory)
hbonds_per_frame = np.array([
    (results[:, 0] == frame).sum()
    for frame in range(n_frames)
])
print(f"Mean H-bonds per frame: {hbonds_per_frame.mean():.1f}")

Counting occupancy

Occupancy is the fraction of frames in which a specific H-bond exists — the most useful summary statistic from H-bond analysis. MDAnalysis provides a built-in method:

Python
# Count how often each unique D-A pair appears across all frames
hbonds.count_by_ids()

# Or compute occupancy manually with full control
import pandas as pd

results = hbonds.results.hbonds
n_frames = len(u.trajectory)

# Build a DataFrame with residue-level identity
rows = []
for frame, d_idx, h_idx, a_idx, dist, angle in results:
    d_atom = u.atoms[int(d_idx)]
    a_atom = u.atoms[int(a_idx)]
    rows.append({
        "frame":        int(frame),
        "donor_resid":  d_atom.resid,
        "donor_resname": d_atom.resname,
        "donor_name":   d_atom.name,
        "acc_resid":    a_atom.resid,
        "acc_resname":  a_atom.resname,
        "acc_name":    a_atom.name,
        "distance":    round(dist, 2),
        "angle":       round(angle, 1),
    })

df = pd.DataFrame(rows)

# Calculate occupancy for each unique D-A pair
occupancy = (
    df.groupby(["donor_resid", "donor_resname", "donor_name",
                "acc_resid", "acc_resname", "acc_name"])
    .agg(
        count=("frame", "count"),
        mean_dist=("distance", "mean"),
    )
    .reset_index()
)
occupancy["occupancy"] = (occupancy["count"] / n_frames * 100).round(1)
occupancy = occupancy.sort_values("occupancy", ascending=False)
print(occupancy.head(10).to_string(index=False))

Identifying persistent H-bonds

< 20%
Transient
Forms and breaks freely. Normal for loops and surface residues. Not typically meaningful for stability.
20–60%
Moderate
Present more often than absent. May be important; worth inspecting the geometry and structural context.
> 60%
Persistent
Stable, significant interaction. Report these in your paper. Present in most frames — likely structure-stabilizing.
Python
# Filter to persistent H-bonds (occupancy > 60%)
persistent = occupancy[occupancy["occupancy"] >= 60].copy()
print(f"Persistent H-bonds (≥60% occupancy): {len(persistent)}")
print(persistent[["donor_resname", "donor_resid", "acc_resname",
                   "acc_resid", "occupancy", "mean_dist"]].to_string(index=False))

# Export all occupancy data
occupancy.to_csv("hbond_occupancy.csv", index=False)
persistent.to_csv("hbond_persistent.csv", index=False)

Protein-ligand H-bonds

Python
# Set up protein-ligand specific analysis
hbonds_pl = HBA(
    universe=u,
    between=[["protein"], ["resname LIG"]],
    d_a_cutoff=3.5,
    d_h_a_angle_cutoff=120,
)
hbonds_pl.run(start=50)   # skip equilibration

results_pl = hbonds_pl.results.hbonds
if len(results_pl) == 0:
    print("No protein-ligand H-bonds detected")
else:
    rows_pl = []
    for frame, d_idx, h_idx, a_idx, dist, angle in results_pl:
        d_atom = u.atoms[int(d_idx)]
        a_atom = u.atoms[int(a_idx)]
        rows_pl.append({
            "donor":    f"{d_atom.resname}{d_atom.resid}-{d_atom.name}",
            "acceptor": f"{a_atom.resname}{a_atom.resid}-{a_atom.name}",
            "frame":    int(frame),
            "distance": round(dist, 2),
        })

    df_pl = pd.DataFrame(rows_pl)
    n_eq_frames = len(u.trajectory) - 50

    occ_pl = (
        df_pl.groupby(["donor", "acceptor"])
        .agg(count=("frame", "count"), mean_dist=("distance", "mean"))
        .reset_index()
    )
    occ_pl["occupancy_pct"] = (occ_pl["count"] / n_eq_frames * 100).round(1)
    occ_pl = occ_pl.sort_values("occupancy_pct", ascending=False)
    print("Protein-ligand H-bond occupancy:")
    print(occ_pl.to_string(index=False))

Plotting results

Python
import matplotlib.pyplot as plt
import numpy as np

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Panel A: H-bonds per frame over time
ax1.plot(hbonds_per_frame, color="#0d6e54", linewidth=0.8, alpha=0.7)
ax1.axhline(hbonds_per_frame.mean(), color="#0d6e54", linestyle="--",
            label=f"Mean: {hbonds_per_frame.mean():.1f}")
ax1.set_xlabel("Frame"); ax1.set_ylabel("Number of H-bonds")
ax1.set_title("(A) Total H-bonds per frame", fontweight="bold", loc="left")
ax1.legend(fontsize=9)

# Panel B: Occupancy bar chart for top 15 H-bonds
top15 = occupancy.head(15)
colors = ["#0d6e54" if occ >= 60 else ("#ba7517" if occ >= 20 else "#928e85")
          for occ in top15["occupancy"]]
labels = [f"{row.donor_resname}{row.donor_resid}→{row.acc_resname}{row.acc_resid}"
          for _, row in top15.iterrows()]

ax2.barh(range(len(top15)), top15["occupancy"], color=colors)
ax2.set_yticks(range(len(top15))); ax2.set_yticklabels(labels, fontsize=9)
ax2.axvline(60, color="#0d6e54", linestyle="--", linewidth=0.8, label="60% threshold")
ax2.axvline(20, color="#ba7517", linestyle="--", linewidth=0.8, label="20% threshold")
ax2.set_xlabel("Occupancy (%)")
ax2.set_title("(B) Top H-bond occupancy", fontweight="bold", loc="left")
ax2.legend(fontsize=9); ax2.invert_yaxis()

plt.tight_layout()
plt.savefig("hbond_analysis.png", dpi=300, bbox_inches="tight")
plt.close()
print("Saved hbond_analysis.png")
Reporting H-bonds in the methods section
Standard language: “Hydrogen bonds were identified using MDAnalysis HydrogenBondAnalysis with a donor-acceptor distance cutoff of 3.0 Å and a D–H···A angle cutoff of 150°. Occupancy was calculated as the fraction of frames (excluding the first 50 ns equilibration period) in which each H-bond was present. H-bonds with occupancy ≥ 60% were classified as persistent.” Include the total simulation time analyzed and the number of frames in the table caption.

H-bond analysis in one paragraph

Import HydrogenBondAnalysis from MDAnalysis.analysis.hydrogenbonds.hbond_analysis, create it with your Universe and optional between groups for cross-group analysis, and call .run(start=N) to skip equilibration. Results are in hbonds.results.hbonds — a row per H-bond per frame with columns for frame index, donor/hydrogen/acceptor atom indices, D-A distance, and D-H-A angle. Convert atom indices to residue names using u.atoms[int(idx)].resname, then group by donor-acceptor pair and divide by frame count to get occupancy. H-bonds with occupancy above 60% are persistent and worth reporting; above 20% are moderate. Use between to restrict to protein-ligand contacts — these are the interactions that appear in binding mode tables in drug discovery papers.

Last updated on

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *