MDAnalysis Tutorial for Beginners: Loading and Working with MD Trajectories in Python
MDAnalysis turns a molecular dynamics trajectory — which is just a large binary file — into a Python object you can slice, select, iterate, and compute with. This tutorial covers the essentials: loading your simulation, making atom selections, moving through frames, and extracting the data you need for analysis.
Installing MDAnalysis
MDAnalysis is best installed via conda-forge, which handles its compiled dependencies automatically. If you’re using the structbio environment from the setup tutorial, run:
conda install -c conda-forge mdanalysis -y
# Verify
python -c "import MDAnalysis as mda; print(mda.__version__)"
2.7.0
pip install MDAnalysisTests. The examples in this tutorial use these test files where noted, so you can run every code block without needing your own GROMACS output.
The Universe — MDAnalysis’s central object
Everything in MDAnalysis flows through a single object: the Universe. It holds two things simultaneously — the topology (all static information about atoms: names, residue IDs, chain identifiers, masses, charges) and the trajectory (all time-varying coordinate frames). Creating a Universe is the first step in any MDAnalysis script.
Loading topology and trajectory files
MDAnalysis needs a topology file (which defines atom names, residue connectivity, and chain structure) and optionally a trajectory file (which contains coordinates at each time step). Loading them together creates the Universe.
import MDAnalysis as mda
# GROMACS: topology (.gro or .tpr) + trajectory (.xtc or .trr)
u = mda.Universe("topology.gro", "trajectory.xtc")
# AMBER: topology (.prmtop) + trajectory (.nc or .dcd)
u = mda.Universe("topology.prmtop", "trajectory.nc")
# NAMD/CHARMM: PSF topology + DCD trajectory
u = mda.Universe("topology.psf", "trajectory.dcd")
# Single PDB file (topology + one frame, no trajectory)
u = mda.Universe("protein.pdb")
# Multiple trajectory files as a list (concatenated automatically)
u = mda.Universe("topology.gro", ["run1.xtc", "run2.xtc", "run3.xtc"])
# Using the MDAnalysisTests example data
from MDAnalysis.tests.datafiles import PSF, DCD
u = mda.Universe(PSF, DCD) # adenylate kinase example
| Format | Software | Extension |
|---|---|---|
| .gro / .tpr | GROMACS topology | Use as first argument |
| .xtc / .trr | GROMACS trajectory | Use as second argument |
| .prmtop | AMBER topology | Use as first argument |
| .nc / .ncdf | AMBER trajectory | Use as second argument |
| .psf | CHARMM/NAMD topology | Use as first argument |
| .dcd | CHARMM/NAMD trajectory | Use as second argument |
| .pdb | Any (single frame) | Topology and coordinates in one file |
Inspecting your simulation
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD
u = mda.Universe(PSF, DCD)
# System composition
print(f"Total atoms: {u.atoms.n_atoms}")
print(f"Total residues: {u.residues.n_residues}")
print(f"Total segments: {u.segments.n_segments}")
# Trajectory information
print(f"Frames: {len(u.trajectory)}")
print(f"Timestep (ps): {u.trajectory.dt:.2f}")
print(f"Total time (ns): {u.trajectory.totaltime / 1000:.2f}")
# Current frame info
ts = u.trajectory.ts
print(f"Current frame: {ts.frame} at {ts.time:.2f} ps")
# Total atoms: 3341
# Total residues: 214
# Frames: 98
# Total time (ns): 0.97
Atom selections
MDAnalysis’s selection language is the most important thing to learn after loading a Universe. Every analysis starts with a selection that narrows the atoms you care about. The syntax resembles VMD and is more flexible than BioPython’s hierarchy approach — you can combine criteria with boolean logic, spatial searches, and named keyword shortcuts.
# ── Built-in keyword selections ────────────────────────────
protein = u.select_atoms("protein")
backbone = u.select_atoms("backbone")
water = u.select_atoms("resname SOL")
ligand = u.select_atoms("resname LIG")
not_water = u.select_atoms("not resname SOL")
# ── By residue ─────────────────────────────────────────────
glu_all = u.select_atoms("resname GLU")
res_100 = u.select_atoms("resid 100")
loop = u.select_atoms("resid 100:150")
# ── By atom name ───────────────────────────────────────────
ca_atoms = u.select_atoms("backbone and name CA")
heavy = u.select_atoms("protein and not type H")
# ── By segment / chain ─────────────────────────────────────
chain_a = u.select_atoms("segid A")
# ── Distance-based ─────────────────────────────────────────
pocket = u.select_atoms("protein and around 5 resname LIG")
near_res = u.select_atoms("protein and around 8 resid 100")
# ── Combined boolean ───────────────────────────────────────
charged = u.select_atoms("protein and (resname ASP GLU LYS ARG)")
site = u.select_atoms("resname LIG or (protein and resid 85 100 173)")
# Inspect what you selected
print(f"Protein atoms: {protein.n_atoms}")
print(f"Backbone CA atoms: {ca_atoms.n_atoms}")
| Selection string | What it selects |
|---|---|
| Molecular type | |
| protein | All protein residues (amino acids) |
| backbone | N, Cα, C, O atoms of protein backbone |
| nucleic | Nucleic acid residues |
| water | Water molecules (TIP3P, SPC etc.) |
| Residue and atom identity | |
| resname GLU | All glutamate residues |
| resid 100:200 | Residues 100 through 200 |
| name CA | Alpha carbon atoms only |
| type C | All carbon atoms (by element type) |
| Boolean and spatial | |
| not water | Everything except water |
| protein and backbone | Intersection: protein backbone atoms |
| around 5 resname LIG | All atoms within 5 Å of the ligand |
| sphlayer 4 8 resid 100 | Atoms 4–8 Å from residue 100 (shell) |
"protein and around 5 resname LIG" are re-evaluated at the current frame when you call select_atoms(). If you call this inside a trajectory loop, you get different atoms in each frame as the protein and ligand move — which is usually what you want for contact analysis. If you want a fixed selection that doesn’t change (e.g., the active site residues you identified in the starting frame), define it once before the loop and reuse the AtomGroup object.
Accessing positions and coordinates
protein = u.select_atoms("protein")
ca = u.select_atoms("backbone and name CA")
# Positions at the current frame — shape (n_atoms, 3)
pos = protein.positions
print(f"Position array shape: {pos.shape}") # (n_protein_atoms, 3)
# Center of mass of the protein
com = protein.center_of_mass()
print(f"Center of mass: {com}") # [x, y, z] in Å
# Center of geometry (unweighted, faster than COM)
cog = protein.center_of_geometry()
# Radius of gyration
rg = protein.radius_of_gyration()
print(f"Radius of gyration: {rg:.2f} Å")
# Individual atom properties
print(ca.names[:5]) # atom names: ['CA' 'CA' 'CA' ...]
print(ca.resnames[:5]) # residue names: ['MET' 'ALA' 'GLY' ...]
print(ca.resids[:5]) # residue numbers: [1 2 3 4 5]
print(ca.masses[:5]) # atomic masses in Da
Iterating over trajectory frames
Iterating over a trajectory is the core pattern for all time-series analysis. Each step through u.trajectory advances to the next frame and updates all atom positions in-place — the same AtomGroup object returns different coordinates on each iteration.
protein = u.select_atoms("protein")
# Iterate over every frame
for ts in u.trajectory:
print(f"Frame {ts.frame:4d} time {ts.time:8.1f} ps "
f"protein COM: {protein.center_of_mass()}")
# Jump to a specific frame
u.trajectory[50] # seek to frame 50
print(f"At frame: {u.trajectory.ts.frame}")
# Iterate over every 5th frame (stride)
for ts in u.trajectory[::5]:
pass # process every 5th frame
# Iterate over a specific range of frames
for ts in u.trajectory[10:50]:
pass # frames 10 through 49
ts is a Timestep object. Key attributes: ts.frame (integer frame index, starting from 0), ts.time (simulation time in ps), ts.dt (timestep in ps), and ts.dimensions (the simulation box dimensions as [a, b, c, α, β, γ]). You don’t need to use ts directly to access positions — just call atomgroup.positions inside the loop and MDAnalysis returns the positions at the current frame.
Basic per-frame analysis
The most common analysis pattern: collect a value per frame into a list, then convert to a NumPy array for plotting or further calculation.
import numpy as np
import matplotlib.pyplot as plt
protein = u.select_atoms("protein")
times, rg_values = [], []
for ts in u.trajectory:
times.append(ts.time)
rg_values.append(protein.radius_of_gyration())
times = np.array(times) / 1000 # ps → ns
rg_values = np.array(rg_values)
print(f"Mean Rg: {rg_values.mean():.2f} ± {rg_values.std():.2f} Å")
plt.figure(figsize=(9, 4))
plt.plot(times, rg_values, color="#0d6e54", linewidth=1.2)
plt.xlabel("Time (ns)"); plt.ylabel("Radius of gyration (Å)")
plt.title("Protein compactness over simulation")
plt.tight_layout()
plt.savefig("rg_over_time.png", dpi=300)
plt.close()
ligand = u.select_atoms("resname LIG")
key_res = u.select_atoms("resid 100 and name CA")
distances = []
for ts in u.trajectory:
lig_com = ligand.center_of_mass()
res_pos = key_res.positions[0]
dist = np.linalg.norm(lig_com - res_pos)
distances.append(dist)
distances = np.array(distances)
print(f"Mean ligand distance from res 100: {distances.mean():.2f} Å")
print(f"Max distance: {distances.max():.2f} Å")
Writing trajectory subsets
MDAnalysis can write a subset of atoms or frames to a new trajectory file — useful for stripping solvent before visualization, extracting key frames, or converting between formats:
# Write protein-only trajectory (strip water and ions)
protein = u.select_atoms("protein")
with mda.Writer("protein_only.xtc", protein.n_atoms) as w:
for ts in u.trajectory:
w.write(protein)
# Write only every 10th frame (reduce file size)
with mda.Writer("thinned.xtc", protein.n_atoms) as w:
for ts in u.trajectory[::10]:
w.write(protein)
# Write a single frame as PDB (for PyMOL visualization)
u.trajectory[50] # seek to frame 50
protein.write("frame_50.pdb")
print("Trajectory subsets written")
MDAnalysis in one paragraph
Create a Universe with mda.Universe(topology, trajectory) — the topology file defines the atom names and connectivity, the trajectory file holds the coordinates at each frame. Select atoms with u.select_atoms("selection string") using keyword shortcuts like "protein", "backbone", and "resname LIG", or combine them with boolean operators and spatial searches. Iterate over u.trajectory in a for loop — positions update automatically each iteration, so atomgroup.positions always returns coordinates for the current frame. Collect per-frame values into a list, convert to NumPy, and plot with matplotlib. Write subsets with mda.Writer to strip solvent, reduce frames, or convert formats. Everything else in the MDAnalysis ecosystem — RMSD, RMSF, H-bond analysis — follows these same building blocks.