Python for Structural Biology Cheat Sheet: BioPython and MDAnalysis Essential Commands

Quick reference — all levels

Every essential command for BioPython and MDAnalysis — organized by task, not alphabetically. Bookmark this and keep it open while you code.

Environment setup

Command	What it does
conda create -n structbio python=3.10 -y	Create dedicated conda environment
conda activate structbio	Activate before every session
conda install -c conda-forge biopython mdanalysis numpy pandas matplotlib -y	Install all essential packages at once
conda install -c conda-forge pymol-open-source -y	Add PyMOL to the same environment
conda env export > structbio.yml	Save environment spec for reproducibility
conda env create -f structbio.yml	Recreate environment from saved spec

BioPython — loading structures

Command	What it does
Parsers
PDBParser(QUIET=True)	Parser for .pdb and .ent files
MMCIFParser(QUIET=True)	Parser for .cif and .mmcif files — use for AlphaFold and newer PDB downloads
parser.get_structure(“name”, “file.pdb”)	Load a structure — returns a Structure object
Downloading
PDBList().retrieve_pdb_file(“4XYZ”, pdir=”.”)	Download PDB file from RCSB
requests.get(f”https://alphafold.ebi.ac.uk/files/AF-{uid}-F1-model_v4.pdb”)	Download AlphaFold structure via API — no auth required

BioPython — navigation and access

Command	What it does
SMCRA hierarchy — Structure → Model → Chain → Residue → Atom
structure[0]	First model (almost always the only one for X-ray / AF)
model[“A”]	Chain A by identifier string
chain[100]	Residue by PDB author number — not zero-indexed
residue[“CA”]	Alpha carbon atom by name
structure[0][“A”][185][“OG”]	Full path to a specific atom in one expression
Iteration
model.get_chains()	Iterate all chains
chain.get_residues()	Iterate all residues including HETATM — filter with res.id[0]==” “
residue.get_atoms()	Iterate all atoms in a residue
structure.get_atoms()	Iterate every atom in the entire structure
Residue identity
residue.id	Tuple (het_flag, seq_num, icode) — ” ” = standard AA, “W” = water
residue.id[0] == ” “	True for standard amino acids — use to filter HETATM
residue.get_resname()	Three-letter residue name e.g. “GLU”
residue.id[1]	Residue sequence number from PDB
Atom properties
atom.get_coord()	NumPy array [x, y, z] in ångströms
atom.get_bfactor()	B-factor — holds pLDDT in AlphaFold structures
atom.get_name()	Atom name string e.g. “CA”, “OG”, “NZ”
atom.element	Element symbol e.g. “C”, “N”, “O”
atom1 – atom2	Distance in ångströms between two atoms

BioPython — calculations

Command	What it does
Neighbor search — use instead of nested loops
NeighborSearch(atom_list)	Build k-d tree from list of Atom objects
ns.search(coords, radius, level=”R”)	Find residues within radius of a point — “A”=atoms, “R”=residues
Structure superposition
Superimposer()	Create superimposer object
sup.set_atoms(ref_atoms, mob_atoms)	Calculate optimal superposition and RMSD — does not move anything
sup.rms	RMSD value in ångströms after set_atoms()
sup.apply(mobile.get_atoms())	Apply transformation to move mobile structure in-place
Protein analysis (sequence-based)
ProteinAnalysis(sequence_str)	Create analysis object from one-letter sequence string
analysis.molecular_weight()	Molecular weight in Daltons
analysis.isoelectric_point()	Theoretical isoelectric point (pH)
analysis.gravy()	GRAVY hydrophobicity score — negative = soluble
analysis.instability_index()	Predicted stability — <40 stable, ≥40 unstable
analysis.secondary_structure_fraction()	(helix, turn, sheet) fractions — Chou-Fasman, sequence-based

BioPython — file I/O

Command	What it does
PDBIO().set_structure(structure); io.save(“out.pdb”)	Write whole structure to PDB file
io.save(“chain_A.pdb”, ChainASelect())	Write subset using a Select subclass — override accept_residue()
PPBuilder().build_peptides(structure)	Extract polypeptide sequences from structure — returns list of Polypeptide objects
str(pp.get_sequence())	Get sequence string from a Polypeptide object

MDAnalysis — loading

Command	What it does
mda.Universe(“topology.gro”, “traj.xtc”)	Load GROMACS topology + trajectory
mda.Universe(“topology.prmtop”, “traj.nc”)	Load AMBER topology + trajectory
mda.Universe(“topology.psf”, “traj.dcd”)	Load NAMD/CHARMM topology + trajectory
mda.Universe(“protein.pdb”)	Single structure with one frame — no trajectory
mda.Universe(“top.gro”, [“r1.xtc”, “r2.xtc”])	Concatenate multiple trajectory files
Trajectory info
u.atoms.n_atoms	Total atom count
len(u.trajectory)	Number of frames
u.trajectory.dt	Timestep in picoseconds
u.trajectory.totaltime	Total simulation time in picoseconds
u.trajectory[N]	Seek to frame N

MDAnalysis — selections

Selection string	What it selects
Molecular type keywords
“protein”	All protein residues
“backbone”	Protein backbone: N, CA, C, O
“water”	Water molecules (TIP3P, SPC etc.)
“not protein and not resname SOL NA CL”	Ligands and non-standard residues only
Residue and atom identity
“resname GLU”	All glutamate residues
“resid 100:200”	Residues 100 through 200
“resid 85 100 173”	Specific residues by ID (space-separated list)
“name CA”	Alpha carbon atoms only
“name CA CB”	Multiple atom names
“not type H”	Heavy atoms only (no hydrogens)
Spatial and boolean
“protein and around 5 resname LIG”	Protein atoms within 5 Å of ligand — updates each frame
“(resname ASP GLU) or (resname LYS ARG)”	Charged residues — parentheses group OR clauses
“protein and not water”	Boolean NOT — everything protein that isn’t water
AtomGroup properties
ag.positions	NumPy array (n_atoms, 3) — coordinates at current frame
ag.n_atoms	Number of atoms — check this is > 0 after any selection
ag.resids	Array of residue numbers for each atom
ag.resnames	Array of residue names for each atom
ag.center_of_mass()	Center of mass [x, y, z]
ag.radius_of_gyration()	Radius of gyration in ångströms

MDAnalysis — trajectory analysis

Command	What it does
Alignment (run before RMSD/RMSF)
align.AlignTraj(u, u, select=”backbone”, in_memory=True).run()	Align trajectory on backbone — required before RMSD/RMSF
align.AlignTraj(u, u, select=”backbone”, filename=”aligned.xtc”).run()	Align and write to disk — use for large systems
RMSD and RMSF
rms.RMSD(u, select=”backbone”).run()	RMSD over trajectory — results in .results.rmsd[:, 2] (Å)
rms.RMSD(u, select=”backbone”).run(start=50)	Skip equilibration frames — start from frame 50
rms.RMSF(ca_atoms).run()	Per-residue RMSF — results in .results.rmsf (Å)
H-bond analysis
HBA(universe=u).run()	All intra-system H-bonds (requires H atoms in topology)
HBA(universe=u, between=[[“protein”], [“resname LIG”]]).run()	Protein-ligand H-bonds only
hba.results.hbonds	Array of (frame, donor_idx, H_idx, acceptor_idx, dist, angle)
Distance calculation
distance_array(ag1.positions, ag2.positions)	Fast pairwise distance matrix — returns (n1, n2) NumPy array
np.linalg.norm(atom1.positions[0] – atom2.positions[0])	Distance between two specific atoms at current frame
Trajectory iteration
for ts in u.trajectory:	Iterate every frame — positions update automatically
for ts in u.trajectory[::5]:	Every 5th frame (stride) — faster for large trajectories
for ts in u.trajectory[50:]:	From frame 50 onward — skip equilibration
ts.frame, ts.time, ts.dimensions	Current frame index, time (ps), box dimensions [a,b,c,α,β,γ]

MDAnalysis — writing output

Command	What it does
protein.write(“protein_only.pdb”)	Write current frame of selection to PDB
with mda.Writer(“out.xtc”, n_atoms) as w: w.write(ag)	Write trajectory frames to XTC file
for ts in u.trajectory[::10]: w.write(protein)	Write every 10th frame — reduces file size
u.trajectory[50]; protein.write(“frame50.pdb”)	Write a specific frame as PDB for PyMOL

Cross-library workflows

AlphaFold pLDDT extraction (BioPython)

from Bio.PDB import PDBParser parser = PDBParser(QUIET=True) struct = parser.get_structure(“af”, “AF-P04637-F1-model_v4.pdb”) plddt = {res.id[1]: res[“CA”].get_bfactor() for res in struct[0][“A”].get_residues() if res.id[0] == ” “ and “CA” in res} # plddt = {resid: score, …} — filter: {k: v for k, v in plddt.items() if v >= 70}

Standard MD analysis setup (MDAnalysis)

import MDAnalysis as mda from MDAnalysis.analysis import align, rms u = mda.Universe(“topology.gro”, “md_vis.xtc”) align.AlignTraj(u, u, select=“backbone”, in_memory=True).run() # align first rmsd_data = rms.RMSD(u, select=“backbone”).run().results.rmsd[:, 2] rmsf_data = rms.RMSF(u.select_atoms(“backbone and name CA”)).run().results.rmsf

Interface residues — BioPython NeighborSearch

from Bio.PDB import NeighborSearch ns_b = NeighborSearch(list(model[“B”].get_atoms())) iface_a = {res_a for res_a in model[“A”].get_residues() if res_a.id[0] == ” “ for atom in res_a.get_atoms() if ns_b.search(atom.get_coord(), 5.0, level=“R”)}

Protein-ligand contact occupancy — MDAnalysis

from MDAnalysis.lib.distances import distance_array from collections import Counter lig = u.select_atoms(“resname LIG”) protein = u.select_atoms(“protein”) counts = Counter() for ts in u.trajectory: mask = (distance_array(lig.positions, protein.positions) < 5.0).any(axis=0) counts.update(protein.atoms[mask].resids) # occupancy_pct = {r: c/len(u.trajectory)*100 for r, c in counts.items()}

RMSD comparison — BioPython Superimposer

from Bio.PDB import Superimposer sup = Superimposer() # Build equal-length lists of paired Cα atoms ca_ref = [chain_ref[r][“CA”] for r in common_resids] ca_mob = [chain_mob[r][“CA”] for r in common_resids] sup.set_atoms(ca_ref, ca_mob) print(f“RMSD: {sup.rms:.3f} Å over {len(ca_ref)} Cα atoms”) sup.apply(mobile.get_atoms()) # move mobile to aligned position

Three checks before any analysis

(1) After every BioPython parse: print(len(list(structure.get_atoms()))) — confirm atoms loaded. (2) After every MDAnalysis selection: assert sel.n_atoms > 0, "Empty selection" — catch silent mismatches. (3) After alignment: check RMSD is physically reasonable (backbone RMSD < 5 Å for a stable protein) before running RMSF or H-bond analysis.

Python for Structural Biology Cheat Sheet: BioPython and MDAnalysis Essential Commands

How to Parse AutoDock Vina Docking Results with Python: From PDBQT to Ranked Hit List

MDAnalysis Tutorial for Beginners: Loading and Working with MD Trajectories in Python

How to Find and Analyze Protein-Protein Interfaces with BioPython

Python for Structural Biology: The Complete Guide to BioPython and MDAnalysis

How to Analyze Protein-Ligand Interactions over an MD Trajectory with MDAnalysis

How to Calculate Protein Properties with BioPython: Molecular Weight, GRAVY and Secondary Structure

Leave a Reply Cancel reply

Similar Posts

Leave a Reply Cancel reply