MDAnalysis Tutorial for Beginners: Loading and Working with MD Trajectories in Python

12 min read Beginner — basic Python assumed

MDAnalysis turns a molecular dynamics trajectory — which is just a large binary file — into a Python object you can slice, select, iterate, and compute with. This tutorial covers the essentials: loading your simulation, making atom selections, moving through frames, and extracting the data you need for analysis.

Installing MDAnalysis

MDAnalysis is best installed via conda-forge, which handles its compiled dependencies automatically. If you’re using the structbio environment from the setup tutorial, run:

Terminal — with structbio environment active

conda install -c conda-forge mdanalysis -y

# Verify
python -c "import MDAnalysis as mda; print(mda.__version__)"
2.7.0

Also install MDAnalysisTests for practice data

MDAnalysis ships with a separate test data package containing small example trajectories you can use to run code before you have your own simulation files. Install it with: pip install MDAnalysisTests. The examples in this tutorial use these test files where noted, so you can run every code block without needing your own GROMACS output.

The Universe — MDAnalysis’s central object

Everything in MDAnalysis flows through a single object: the Universe. It holds two things simultaneously — the topology (all static information about atoms: names, residue IDs, chain identifiers, masses, charges) and the trajectory (all time-varying coordinate frames). Creating a Universe is the first step in any MDAnalysis script.

The MDAnalysis Universe

u.atoms

→ AtomGroup (all atoms)

Access all atoms, their names, residues, masses, charges, and current positions.

u.residues

→ ResidueGroup

Access all residues, residue names, numbers, and segment identifiers.

u.trajectory

→ Trajectory reader

Iterate over frames, seek to specific frames, get time and frame index.

u.select_atoms()

→ AtomGroup (subset)

Select a subset of atoms using the selection language — the most-used method in MDAnalysis.

Loading topology and trajectory files

MDAnalysis needs a topology file (which defines atom names, residue connectivity, and chain structure) and optionally a trajectory file (which contains coordinates at each time step). Loading them together creates the Universe.

Python

import MDAnalysis as mda

# GROMACS: topology (.gro or .tpr) + trajectory (.xtc or .trr)
u = mda.Universe("topology.gro", "trajectory.xtc")

# AMBER: topology (.prmtop) + trajectory (.nc or .dcd)
u = mda.Universe("topology.prmtop", "trajectory.nc")

# NAMD/CHARMM: PSF topology + DCD trajectory
u = mda.Universe("topology.psf", "trajectory.dcd")

# Single PDB file (topology + one frame, no trajectory)
u = mda.Universe("protein.pdb")

# Multiple trajectory files as a list (concatenated automatically)
u = mda.Universe("topology.gro", ["run1.xtc", "run2.xtc", "run3.xtc"])

# Using the MDAnalysisTests example data
from MDAnalysis.tests.datafiles import PSF, DCD
u = mda.Universe(PSF, DCD)    # adenylate kinase example

Format	Software	Extension
.gro / .tpr	GROMACS topology	Use as first argument
.xtc / .trr	GROMACS trajectory	Use as second argument
.prmtop	AMBER topology	Use as first argument
.nc / .ncdf	AMBER trajectory	Use as second argument
.psf	CHARMM/NAMD topology	Use as first argument
.dcd	CHARMM/NAMD trajectory	Use as second argument
.pdb	Any (single frame)	Topology and coordinates in one file

Inspecting your simulation

Python

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD
u = mda.Universe(PSF, DCD)

# System composition
print(f"Total atoms:    {u.atoms.n_atoms}")
print(f"Total residues: {u.residues.n_residues}")
print(f"Total segments: {u.segments.n_segments}")

# Trajectory information
print(f"Frames:           {len(u.trajectory)}")
print(f"Timestep (ps):    {u.trajectory.dt:.2f}")
print(f"Total time (ns):  {u.trajectory.totaltime / 1000:.2f}")

# Current frame info
ts = u.trajectory.ts
print(f"Current frame: {ts.frame} at {ts.time:.2f} ps")

# Total atoms:    3341
# Total residues: 214
# Frames:         98
# Total time (ns): 0.97

Atom selections

MDAnalysis’s selection language is the most important thing to learn after loading a Universe. Every analysis starts with a selection that narrows the atoms you care about. The syntax resembles VMD and is more flexible than BioPython’s hierarchy approach — you can combine criteria with boolean logic, spatial searches, and named keyword shortcuts.

Python

# ── Built-in keyword selections ────────────────────────────
protein   = u.select_atoms("protein")
backbone  = u.select_atoms("backbone")
water     = u.select_atoms("resname SOL")
ligand    = u.select_atoms("resname LIG")
not_water = u.select_atoms("not resname SOL")

# ── By residue ─────────────────────────────────────────────
glu_all   = u.select_atoms("resname GLU")
res_100   = u.select_atoms("resid 100")
loop      = u.select_atoms("resid 100:150")

# ── By atom name ───────────────────────────────────────────
ca_atoms  = u.select_atoms("backbone and name CA")
heavy     = u.select_atoms("protein and not type H")

# ── By segment / chain ─────────────────────────────────────
chain_a   = u.select_atoms("segid A")

# ── Distance-based ─────────────────────────────────────────
pocket    = u.select_atoms("protein and around 5 resname LIG")
near_res  = u.select_atoms("protein and around 8 resid 100")

# ── Combined boolean ───────────────────────────────────────
charged   = u.select_atoms("protein and (resname ASP GLU LYS ARG)")
site      = u.select_atoms("resname LIG or (protein and resid 85 100 173)")

# Inspect what you selected
print(f"Protein atoms: {protein.n_atoms}")
print(f"Backbone CA atoms: {ca_atoms.n_atoms}")

Selection string	What it selects
Molecular type
protein	All protein residues (amino acids)
backbone	N, Cα, C, O atoms of protein backbone
nucleic	Nucleic acid residues
water	Water molecules (TIP3P, SPC etc.)
Residue and atom identity
resname GLU	All glutamate residues
resid 100:200	Residues 100 through 200
name CA	Alpha carbon atoms only
type C	All carbon atoms (by element type)
Boolean and spatial
not water	Everything except water
protein and backbone	Intersection: protein backbone atoms
around 5 resname LIG	All atoms within 5 Å of the ligand
sphlayer 4 8 resid 100	Atoms 4–8 Å from residue 100 (shell)

Selections update every frame — mostly a feature, sometimes a trap

Distance-based selections like "protein and around 5 resname LIG" are re-evaluated at the current frame when you call select_atoms(). If you call this inside a trajectory loop, you get different atoms in each frame as the protein and ligand move — which is usually what you want for contact analysis. If you want a fixed selection that doesn’t change (e.g., the active site residues you identified in the starting frame), define it once before the loop and reuse the AtomGroup object.

Accessing positions and coordinates

Python

protein = u.select_atoms("protein")
ca      = u.select_atoms("backbone and name CA")

# Positions at the current frame — shape (n_atoms, 3)
pos = protein.positions
print(f"Position array shape: {pos.shape}")   # (n_protein_atoms, 3)

# Center of mass of the protein
com = protein.center_of_mass()
print(f"Center of mass: {com}")    # [x, y, z] in Å

# Center of geometry (unweighted, faster than COM)
cog = protein.center_of_geometry()

# Radius of gyration
rg = protein.radius_of_gyration()
print(f"Radius of gyration: {rg:.2f} Å")

# Individual atom properties
print(ca.names[:5])       # atom names: ['CA' 'CA' 'CA' ...]
print(ca.resnames[:5])    # residue names: ['MET' 'ALA' 'GLY' ...]
print(ca.resids[:5])      # residue numbers: [1 2 3 4 5]
print(ca.masses[:5])      # atomic masses in Da

Iterating over trajectory frames

Iterating over a trajectory is the core pattern for all time-series analysis. Each step through u.trajectory advances to the next frame and updates all atom positions in-place — the same AtomGroup object returns different coordinates on each iteration.

Python

protein = u.select_atoms("protein")

# Iterate over every frame
for ts in u.trajectory:
    print(f"Frame {ts.frame:4d}  time {ts.time:8.1f} ps  "
          f"protein COM: {protein.center_of_mass()}")

# Jump to a specific frame
u.trajectory[50]           # seek to frame 50
print(f"At frame: {u.trajectory.ts.frame}")

# Iterate over every 5th frame (stride)
for ts in u.trajectory[::5]:
    pass   # process every 5th frame

# Iterate over a specific range of frames
for ts in u.trajectory[10:50]:
    pass   # frames 10 through 49

ts is a Timestep object — not just a frame index

The loop variable ts is a Timestep object. Key attributes: ts.frame (integer frame index, starting from 0), ts.time (simulation time in ps), ts.dt (timestep in ps), and ts.dimensions (the simulation box dimensions as [a, b, c, α, β, γ]). You don’t need to use ts directly to access positions — just call atomgroup.positions inside the loop and MDAnalysis returns the positions at the current frame.

Basic per-frame analysis

The most common analysis pattern: collect a value per frame into a list, then convert to a NumPy array for plotting or further calculation.

Python — radius of gyration over time

import numpy as np
import matplotlib.pyplot as plt

protein = u.select_atoms("protein")

times, rg_values = [], []
for ts in u.trajectory:
    times.append(ts.time)
    rg_values.append(protein.radius_of_gyration())

times    = np.array(times)    / 1000   # ps → ns
rg_values = np.array(rg_values)

print(f"Mean Rg: {rg_values.mean():.2f} ± {rg_values.std():.2f} Å")

plt.figure(figsize=(9, 4))
plt.plot(times, rg_values, color="#0d6e54", linewidth=1.2)
plt.xlabel("Time (ns)"); plt.ylabel("Radius of gyration (Å)")
plt.title("Protein compactness over simulation")
plt.tight_layout()
plt.savefig("rg_over_time.png", dpi=300)
plt.close()

Python — ligand distance from binding site over time

ligand   = u.select_atoms("resname LIG")
key_res  = u.select_atoms("resid 100 and name CA")

distances = []
for ts in u.trajectory:
    lig_com    = ligand.center_of_mass()
    res_pos    = key_res.positions[0]
    dist       = np.linalg.norm(lig_com - res_pos)
    distances.append(dist)

distances = np.array(distances)
print(f"Mean ligand distance from res 100: {distances.mean():.2f} Å")
print(f"Max distance: {distances.max():.2f} Å")

Writing trajectory subsets

MDAnalysis can write a subset of atoms or frames to a new trajectory file — useful for stripping solvent before visualization, extracting key frames, or converting between formats:

Python

# Write protein-only trajectory (strip water and ions)
protein = u.select_atoms("protein")

with mda.Writer("protein_only.xtc", protein.n_atoms) as w:
    for ts in u.trajectory:
        w.write(protein)

# Write only every 10th frame (reduce file size)
with mda.Writer("thinned.xtc", protein.n_atoms) as w:
    for ts in u.trajectory[::10]:
        w.write(protein)

# Write a single frame as PDB (for PyMOL visualization)
u.trajectory[50]       # seek to frame 50
protein.write("frame_50.pdb")

print("Trajectory subsets written")

MDAnalysis in one paragraph

Create a Universe with mda.Universe(topology, trajectory) — the topology file defines the atom names and connectivity, the trajectory file holds the coordinates at each frame. Select atoms with u.select_atoms("selection string") using keyword shortcuts like "protein", "backbone", and "resname LIG", or combine them with boolean operators and spatial searches. Iterate over u.trajectory in a for loop — positions update automatically each iteration, so atomgroup.positions always returns coordinates for the current frame. Collect per-frame values into a list, convert to NumPy, and plot with matplotlib. Write subsets with mda.Writer to strip solvent, reduce frames, or convert formats. Everything else in the MDAnalysis ecosystem — RMSD, RMSF, H-bond analysis — follows these same building blocks.

MDAnalysis Tutorial for Beginners: Loading and Working with MD Trajectories in Python

Installing MDAnalysis

The Universe — MDAnalysis’s central object

Loading topology and trajectory files

Inspecting your simulation

Atom selections

Accessing positions and coordinates

Iterating over trajectory frames

Basic per-frame analysis

Writing trajectory subsets

MDAnalysis in one paragraph

Python for Structural Biology: The Complete Guide to BioPython and MDAnalysis

How to Calculate Protein Properties with BioPython: Molecular Weight, GRAVY and Secondary Structure

How to Automate GROMACS Analysis with Python: RMSD, RMSF and Energy Plots

BioPython Tutorial for Beginners: Parsing PDB Files and Working with Protein Structures

How to Download and Parse AlphaFold Structures with Python and BioPython

How to Calculate RMSD and Align Protein Structures with Python and BioPython

Leave a Reply Cancel reply

Installing MDAnalysis

The Universe — MDAnalysis’s central object

Loading topology and trajectory files

Inspecting your simulation

Atom selections

Accessing positions and coordinates

Iterating over trajectory frames

Basic per-frame analysis

Writing trajectory subsets

MDAnalysis in one paragraph

Similar Posts

Leave a Reply Cancel reply