MDAnalysis Tutorial for Beginners: Loading and Working with MD Trajectories in Python

MDAnalysis Tutorial for Beginners: Loading and Working with MD Trajectories in Python

MDAnalysis turns a molecular dynamics trajectory — which is just a large binary file — into a Python object you can slice, select, iterate, and compute with. This tutorial covers the essentials: loading your simulation, making atom selections, moving through frames, and extracting the data you need for analysis.

Installing MDAnalysis

MDAnalysis is best installed via conda-forge, which handles its compiled dependencies automatically. If you’re using the structbio environment from the setup tutorial, run:

Terminal — with structbio environment active
conda install -c conda-forge mdanalysis -y

# Verify
python -c "import MDAnalysis as mda; print(mda.__version__)"
2.7.0
Also install MDAnalysisTests for practice data
MDAnalysis ships with a separate test data package containing small example trajectories you can use to run code before you have your own simulation files. Install it with: pip install MDAnalysisTests. The examples in this tutorial use these test files where noted, so you can run every code block without needing your own GROMACS output.

The Universe — MDAnalysis’s central object

Everything in MDAnalysis flows through a single object: the Universe. It holds two things simultaneously — the topology (all static information about atoms: names, residue IDs, chain identifiers, masses, charges) and the trajectory (all time-varying coordinate frames). Creating a Universe is the first step in any MDAnalysis script.

The MDAnalysis Universe
u.atoms
→ AtomGroup (all atoms)
Access all atoms, their names, residues, masses, charges, and current positions.
u.residues
→ ResidueGroup
Access all residues, residue names, numbers, and segment identifiers.
u.trajectory
→ Trajectory reader
Iterate over frames, seek to specific frames, get time and frame index.
u.select_atoms()
→ AtomGroup (subset)
Select a subset of atoms using the selection language — the most-used method in MDAnalysis.

Loading topology and trajectory files

MDAnalysis needs a topology file (which defines atom names, residue connectivity, and chain structure) and optionally a trajectory file (which contains coordinates at each time step). Loading them together creates the Universe.

Python
import MDAnalysis as mda

# GROMACS: topology (.gro or .tpr) + trajectory (.xtc or .trr)
u = mda.Universe("topology.gro", "trajectory.xtc")

# AMBER: topology (.prmtop) + trajectory (.nc or .dcd)
u = mda.Universe("topology.prmtop", "trajectory.nc")

# NAMD/CHARMM: PSF topology + DCD trajectory
u = mda.Universe("topology.psf", "trajectory.dcd")

# Single PDB file (topology + one frame, no trajectory)
u = mda.Universe("protein.pdb")

# Multiple trajectory files as a list (concatenated automatically)
u = mda.Universe("topology.gro", ["run1.xtc", "run2.xtc", "run3.xtc"])

# Using the MDAnalysisTests example data
from MDAnalysis.tests.datafiles import PSF, DCD
u = mda.Universe(PSF, DCD)    # adenylate kinase example
FormatSoftwareExtension
.gro / .tprGROMACS topologyUse as first argument
.xtc / .trrGROMACS trajectoryUse as second argument
.prmtopAMBER topologyUse as first argument
.nc / .ncdfAMBER trajectoryUse as second argument
.psfCHARMM/NAMD topologyUse as first argument
.dcdCHARMM/NAMD trajectoryUse as second argument
.pdbAny (single frame)Topology and coordinates in one file

Inspecting your simulation

Python
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD
u = mda.Universe(PSF, DCD)

# System composition
print(f"Total atoms:    {u.atoms.n_atoms}")
print(f"Total residues: {u.residues.n_residues}")
print(f"Total segments: {u.segments.n_segments}")

# Trajectory information
print(f"Frames:           {len(u.trajectory)}")
print(f"Timestep (ps):    {u.trajectory.dt:.2f}")
print(f"Total time (ns):  {u.trajectory.totaltime / 1000:.2f}")

# Current frame info
ts = u.trajectory.ts
print(f"Current frame: {ts.frame} at {ts.time:.2f} ps")

# Total atoms:    3341
# Total residues: 214
# Frames:         98
# Total time (ns): 0.97

Atom selections

MDAnalysis’s selection language is the most important thing to learn after loading a Universe. Every analysis starts with a selection that narrows the atoms you care about. The syntax resembles VMD and is more flexible than BioPython’s hierarchy approach — you can combine criteria with boolean logic, spatial searches, and named keyword shortcuts.

Python
# ── Built-in keyword selections ────────────────────────────
protein   = u.select_atoms("protein")
backbone  = u.select_atoms("backbone")
water     = u.select_atoms("resname SOL")
ligand    = u.select_atoms("resname LIG")
not_water = u.select_atoms("not resname SOL")

# ── By residue ─────────────────────────────────────────────
glu_all   = u.select_atoms("resname GLU")
res_100   = u.select_atoms("resid 100")
loop      = u.select_atoms("resid 100:150")

# ── By atom name ───────────────────────────────────────────
ca_atoms  = u.select_atoms("backbone and name CA")
heavy     = u.select_atoms("protein and not type H")

# ── By segment / chain ─────────────────────────────────────
chain_a   = u.select_atoms("segid A")

# ── Distance-based ─────────────────────────────────────────
pocket    = u.select_atoms("protein and around 5 resname LIG")
near_res  = u.select_atoms("protein and around 8 resid 100")

# ── Combined boolean ───────────────────────────────────────
charged   = u.select_atoms("protein and (resname ASP GLU LYS ARG)")
site      = u.select_atoms("resname LIG or (protein and resid 85 100 173)")

# Inspect what you selected
print(f"Protein atoms: {protein.n_atoms}")
print(f"Backbone CA atoms: {ca_atoms.n_atoms}")
Selection stringWhat it selects
Molecular type
proteinAll protein residues (amino acids)
backboneN, Cα, C, O atoms of protein backbone
nucleicNucleic acid residues
waterWater molecules (TIP3P, SPC etc.)
Residue and atom identity
resname GLUAll glutamate residues
resid 100:200Residues 100 through 200
name CAAlpha carbon atoms only
type CAll carbon atoms (by element type)
Boolean and spatial
not waterEverything except water
protein and backboneIntersection: protein backbone atoms
around 5 resname LIGAll atoms within 5 Å of the ligand
sphlayer 4 8 resid 100Atoms 4–8 Å from residue 100 (shell)
Selections update every frame — mostly a feature, sometimes a trap
Distance-based selections like "protein and around 5 resname LIG" are re-evaluated at the current frame when you call select_atoms(). If you call this inside a trajectory loop, you get different atoms in each frame as the protein and ligand move — which is usually what you want for contact analysis. If you want a fixed selection that doesn’t change (e.g., the active site residues you identified in the starting frame), define it once before the loop and reuse the AtomGroup object.

Accessing positions and coordinates

Python
protein = u.select_atoms("protein")
ca      = u.select_atoms("backbone and name CA")

# Positions at the current frame — shape (n_atoms, 3)
pos = protein.positions
print(f"Position array shape: {pos.shape}")   # (n_protein_atoms, 3)

# Center of mass of the protein
com = protein.center_of_mass()
print(f"Center of mass: {com}")    # [x, y, z] in Å

# Center of geometry (unweighted, faster than COM)
cog = protein.center_of_geometry()

# Radius of gyration
rg = protein.radius_of_gyration()
print(f"Radius of gyration: {rg:.2f} Å")

# Individual atom properties
print(ca.names[:5])       # atom names: ['CA' 'CA' 'CA' ...]
print(ca.resnames[:5])    # residue names: ['MET' 'ALA' 'GLY' ...]
print(ca.resids[:5])      # residue numbers: [1 2 3 4 5]
print(ca.masses[:5])      # atomic masses in Da

Iterating over trajectory frames

Iterating over a trajectory is the core pattern for all time-series analysis. Each step through u.trajectory advances to the next frame and updates all atom positions in-place — the same AtomGroup object returns different coordinates on each iteration.

Python
protein = u.select_atoms("protein")

# Iterate over every frame
for ts in u.trajectory:
    print(f"Frame {ts.frame:4d}  time {ts.time:8.1f} ps  "
          f"protein COM: {protein.center_of_mass()}")

# Jump to a specific frame
u.trajectory[50]           # seek to frame 50
print(f"At frame: {u.trajectory.ts.frame}")

# Iterate over every 5th frame (stride)
for ts in u.trajectory[::5]:
    pass   # process every 5th frame

# Iterate over a specific range of frames
for ts in u.trajectory[10:50]:
    pass   # frames 10 through 49
ts is a Timestep object — not just a frame index
The loop variable ts is a Timestep object. Key attributes: ts.frame (integer frame index, starting from 0), ts.time (simulation time in ps), ts.dt (timestep in ps), and ts.dimensions (the simulation box dimensions as [a, b, c, α, β, γ]). You don’t need to use ts directly to access positions — just call atomgroup.positions inside the loop and MDAnalysis returns the positions at the current frame.

Basic per-frame analysis

The most common analysis pattern: collect a value per frame into a list, then convert to a NumPy array for plotting or further calculation.

Python — radius of gyration over time
import numpy as np
import matplotlib.pyplot as plt

protein = u.select_atoms("protein")

times, rg_values = [], []
for ts in u.trajectory:
    times.append(ts.time)
    rg_values.append(protein.radius_of_gyration())

times    = np.array(times)    / 1000   # ps → ns
rg_values = np.array(rg_values)

print(f"Mean Rg: {rg_values.mean():.2f} ± {rg_values.std():.2f} Å")

plt.figure(figsize=(9, 4))
plt.plot(times, rg_values, color="#0d6e54", linewidth=1.2)
plt.xlabel("Time (ns)"); plt.ylabel("Radius of gyration (Å)")
plt.title("Protein compactness over simulation")
plt.tight_layout()
plt.savefig("rg_over_time.png", dpi=300)
plt.close()
Python — ligand distance from binding site over time
ligand   = u.select_atoms("resname LIG")
key_res  = u.select_atoms("resid 100 and name CA")

distances = []
for ts in u.trajectory:
    lig_com    = ligand.center_of_mass()
    res_pos    = key_res.positions[0]
    dist       = np.linalg.norm(lig_com - res_pos)
    distances.append(dist)

distances = np.array(distances)
print(f"Mean ligand distance from res 100: {distances.mean():.2f} Å")
print(f"Max distance: {distances.max():.2f} Å")

Writing trajectory subsets

MDAnalysis can write a subset of atoms or frames to a new trajectory file — useful for stripping solvent before visualization, extracting key frames, or converting between formats:

Python
# Write protein-only trajectory (strip water and ions)
protein = u.select_atoms("protein")

with mda.Writer("protein_only.xtc", protein.n_atoms) as w:
    for ts in u.trajectory:
        w.write(protein)

# Write only every 10th frame (reduce file size)
with mda.Writer("thinned.xtc", protein.n_atoms) as w:
    for ts in u.trajectory[::10]:
        w.write(protein)

# Write a single frame as PDB (for PyMOL visualization)
u.trajectory[50]       # seek to frame 50
protein.write("frame_50.pdb")

print("Trajectory subsets written")

MDAnalysis in one paragraph

Create a Universe with mda.Universe(topology, trajectory) — the topology file defines the atom names and connectivity, the trajectory file holds the coordinates at each frame. Select atoms with u.select_atoms("selection string") using keyword shortcuts like "protein", "backbone", and "resname LIG", or combine them with boolean operators and spatial searches. Iterate over u.trajectory in a for loop — positions update automatically each iteration, so atomgroup.positions always returns coordinates for the current frame. Collect per-frame values into a list, convert to NumPy, and plot with matplotlib. Write subsets with mda.Writer to strip solvent, reduce frames, or convert formats. Everything else in the MDAnalysis ecosystem — RMSD, RMSF, H-bond analysis — follows these same building blocks.

Last updated on

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *