How to Set Up a Python Environment for Structural Biology (2026)
Getting your Python environment right is the unglamorous prerequisite to everything else. Do it once properly — with conda, a dedicated environment, and all the right packages — and you’ll never hit a dependency conflict or a “module not found” error mid-analysis.
Why conda, not pip
Python packages can be installed with either pip (Python’s built-in package manager) or conda (Anaconda’s environment manager). For structural biology, conda is strongly preferred for two reasons.
First, structural biology packages like MDAnalysis have compiled C and C++ extensions that pip installs from source — requiring a working compiler toolchain on your machine, which is not always present and frequently causes cryptic error messages. Conda installs pre-compiled binary packages that work immediately on any supported platform.
Second, conda manages full environment isolation. A dedicated structbio environment has its own Python interpreter and package set, completely separate from any other Python on your machine. This means BioPython’s dependencies can’t conflict with your web scraping project’s dependencies, and you can always recreate the environment from scratch if something breaks.
Installing Miniconda
.pkg installer and follow prompts.
docs.conda.io/en/latest/miniconda.html
.exe installer. During setup, leave “Add to PATH” unchecked — use Anaconda Prompt instead.
docs.conda.io/en/latest/miniconda.html
.sh installer and run it in terminal. Works on any distribution — Ubuntu, CentOS, Rocky Linux.
docs.conda.io/en/latest/miniconda.html
# Download and run the Miniconda installer (Linux example)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Follow prompts — accept license, confirm install location
# When asked "Do you wish to initialize Miniconda3?" → yes
# Restart terminal, then verify
conda --version
# Expected output:
conda 24.x.x
x86_64) and one for Apple Silicon (arm64 / M1/M2/M3). Download the arm64 version for any Mac with an M-series chip. Installing the Intel version on Apple Silicon causes subtle performance issues and occasional package incompatibilities with conda-forge builds.
Creating the structbio environment
Once Miniconda is installed, create a dedicated environment for all structural biology work. Using a separate environment means you can always return to a clean working state by recreating it, and you’ll never break your base conda installation by installing conflicting packages.
# Create the environment with Python 3.10
conda create -n structbio python=3.10 -y
# Activate it — do this at the start of every work session
conda activate structbio
# Your prompt changes to show the active environment:
(structbio) $
# Deactivate when done
conda deactivate
conda activate structbio before running any Python code. If you see (base) in your prompt instead of (structbio), you’re working in the base environment and none of the structural biology packages are available. Add the activation command to your shell config (~/.zshrc on macOS or ~/.bashrc on Linux) if you want it to activate automatically.
Installing the essential packages
With the environment active, install all structural biology packages in a single conda command. Installing everything at once lets conda resolve the full dependency graph correctly — doing it piecemeal can lead to conflicts.
# Install all essential packages in one command
conda install -c conda-forge \
biopython \
mdanalysis \
numpy \
pandas \
matplotlib \
scipy \
jupyter \
-y
# This takes 3–10 minutes depending on connection speed
# conda-forge has the most up-to-date builds of all these packages
| Package | What it does | Used for |
|---|---|---|
| biopython | Structure and sequence analysis | Parsing PDB files, protein properties, RMSD, AlphaFold structures |
| mdanalysis | MD trajectory analysis | Loading trajectories, RMSD/RMSF, H-bonds, per-frame analysis |
| numpy | Numerical arrays and math | Coordinate arrays, distance matrices, mathematical operations |
| pandas | Tabular data and DataFrames | Docking result tables, RMSF summaries, data filtering and export |
| matplotlib | Plotting and visualization | RMSD plots, RMSF bar charts, score distributions |
| scipy | Scientific algorithms | Clustering, statistics, signal processing on trajectory data |
| jupyter | Interactive notebooks | Exploratory analysis, inline plots, sharing workflows |
Setting up Jupyter notebooks
Jupyter notebooks let you run Python code in interactive cells and see plots inline — ideal for exploratory analysis where you want to try different selections or visualization parameters without rerunning the entire script. For automated pipelines that run on servers or process large datasets, plain .py scripts are more appropriate.
# Launch Jupyter — opens in your default browser
jupyter notebook
# Or use JupyterLab (more modern interface)
conda install -c conda-forge jupyterlab -y
jupyter lab
# To run on a remote server (HPC cluster) without a browser:
jupyter notebook --no-browser --port=8888
# Then SSH tunnel from your local machine:
# ssh -L 8888:localhost:8888 username@cluster.university.edu
# Then open http://localhost:8888 in your local browser
When Jupyter opens, create a new notebook with the Python 3 kernel. The kernel should automatically use your structbio environment if you launched Jupyter from within it. If you see “kernel not found” or imports fail, confirm that you activated the environment before running jupyter notebook.
srun --pty bash on SLURM systems), activate your conda environment there, start Jupyter with --no-browser, then set up the SSH tunnel described above. Your university’s HPC documentation should have cluster-specific instructions for port forwarding.
Verifying the installation
Run these checks in a new Python session or Jupyter notebook cell to confirm everything is installed and working before starting your first analysis:
import Bio
import MDAnalysis as mda
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
print(f"BioPython: {Bio.__version__}")
print(f"MDAnalysis: {mda.__version__}")
print(f"NumPy: {np.__version__}")
print(f"pandas: {pd.__version__}")
# Quick functional test — parse a structure from the PDB
from Bio.PDB import PDBList, PDBParser
pdbl = PDBList()
pdbl.retrieve_pdb_file("1ubq", file_type="pdb", pdir=".")
parser = PDBParser(QUIET=True)
structure = parser.get_structure("ubq", "./ub/pdb1ubq.ent")
chain = structure[0]["A"]
print(f"Ubiquitin: {len(list(chain.get_residues()))} residues loaded")
# Expected output:
Ubiquitin: 76 residues loaded
| Check | Expected result | If it fails |
|---|---|---|
| import Bio | No error | Run conda install -c conda-forge biopython again |
| import MDAnalysis | No error | Run conda install -c conda-forge mdanalysis again |
| Structure loads, 76 residues | 76 residues loaded | Check internet connection; PDBList downloads from RCSB |
| jupyter notebook opens | Browser opens at localhost:8888 | Try conda install -c conda-forge jupyter again |
Optional: adding PyMOL to the same environment
If you also do PyMOL visualization work, installing it in the same structbio environment means your analysis scripts can call PyMOL commands directly — loading a structure in BioPython, filtering it, then passing it to PyMOL for figure generation in one pipeline:
# Add open-source PyMOL to the same environment
conda install -c conda-forge pymol-open-source -y
# Verify
python -c "from pymol import cmd; print('PyMOL available in structbio')"
With all three — BioPython, MDAnalysis, and PyMOL — in a single environment, you can write scripts that span the full structural biology workflow: download an AlphaFold structure with BioPython, analyze the trajectory with MDAnalysis, and generate a publication figure with PyMOL, all in one Python file.
conda env export > structbio_environment.yml. Recreate from the file later with: conda env create -f structbio_environment.yml. Add this YAML file to your project’s git repository — it’s the equivalent of a requirements.txt but with full dependency pinning.
Environment setup in one paragraph
Install Miniconda (not Anaconda), create a dedicated structbio environment with Python 3.10, and install BioPython, MDAnalysis, NumPy, pandas, matplotlib, and Jupyter in one conda command using the conda-forge channel. Activate the environment with conda activate structbio at the start of every session. Verify the install by importing all packages and loading a test structure. Export the environment spec with conda env export and commit it to your project repository. Do this once and every tutorial in this pillar works from the same clean, reproducible foundation.