How to Set Up a Python Environment for Structural Biology (2026)

How to Set Up a Python Environment for Structural Biology (2026)

Getting your Python environment right is the unglamorous prerequisite to everything else. Do it once properly — with conda, a dedicated environment, and all the right packages — and you’ll never hit a dependency conflict or a “module not found” error mid-analysis.

Why conda, not pip

Python packages can be installed with either pip (Python’s built-in package manager) or conda (Anaconda’s environment manager). For structural biology, conda is strongly preferred for two reasons.

First, structural biology packages like MDAnalysis have compiled C and C++ extensions that pip installs from source — requiring a working compiler toolchain on your machine, which is not always present and frequently causes cryptic error messages. Conda installs pre-compiled binary packages that work immediately on any supported platform.

Second, conda manages full environment isolation. A dedicated structbio environment has its own Python interpreter and package set, completely separate from any other Python on your machine. This means BioPython’s dependencies can’t conflict with your web scraping project’s dependencies, and you can always recreate the environment from scratch if something breaks.

Miniconda vs Anaconda
Anaconda is the full distribution — 3 GB, pre-loads hundreds of packages. Miniconda is the minimal installer — around 100 MB, gives you conda and Python, nothing else. Use Miniconda. You’ll install exactly what you need for structural biology without gigabytes of unrelated packages slowing down your environment solves.

Installing Miniconda

🍎
macOS
Download the macOS installer (Intel or Apple Silicon) from the Miniconda page. Run the .pkg installer and follow prompts. docs.conda.io/en/latest/miniconda.html
🪟
Windows
Download the Windows .exe installer. During setup, leave “Add to PATH” unchecked — use Anaconda Prompt instead. docs.conda.io/en/latest/miniconda.html
🐧
Linux
Download the .sh installer and run it in terminal. Works on any distribution — Ubuntu, CentOS, Rocky Linux. docs.conda.io/en/latest/miniconda.html
Linux / macOS — terminal
# Download and run the Miniconda installer (Linux example)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# Follow prompts — accept license, confirm install location
# When asked "Do you wish to initialize Miniconda3?" → yes

# Restart terminal, then verify
conda --version
# Expected output:
conda 24.x.x
Apple Silicon Macs — download the right installer
There are two macOS Miniconda installers: one for Intel (x86_64) and one for Apple Silicon (arm64 / M1/M2/M3). Download the arm64 version for any Mac with an M-series chip. Installing the Intel version on Apple Silicon causes subtle performance issues and occasional package incompatibilities with conda-forge builds.

Creating the structbio environment

Once Miniconda is installed, create a dedicated environment for all structural biology work. Using a separate environment means you can always return to a clean working state by recreating it, and you’ll never break your base conda installation by installing conflicting packages.

All platforms
# Create the environment with Python 3.10
conda create -n structbio python=3.10 -y

# Activate it — do this at the start of every work session
conda activate structbio

# Your prompt changes to show the active environment:
(structbio) $

# Deactivate when done
conda deactivate
Always activate before working
Every time you open a new terminal session, run conda activate structbio before running any Python code. If you see (base) in your prompt instead of (structbio), you’re working in the base environment and none of the structural biology packages are available. Add the activation command to your shell config (~/.zshrc on macOS or ~/.bashrc on Linux) if you want it to activate automatically.

Installing the essential packages

With the environment active, install all structural biology packages in a single conda command. Installing everything at once lets conda resolve the full dependency graph correctly — doing it piecemeal can lead to conflicts.

All platforms — with structbio active
# Install all essential packages in one command
conda install -c conda-forge \
    biopython \
    mdanalysis \
    numpy \
    pandas \
    matplotlib \
    scipy \
    jupyter \
    -y

# This takes 3–10 minutes depending on connection speed
# conda-forge has the most up-to-date builds of all these packages
PackageWhat it doesUsed for
biopythonStructure and sequence analysisParsing PDB files, protein properties, RMSD, AlphaFold structures
mdanalysisMD trajectory analysisLoading trajectories, RMSD/RMSF, H-bonds, per-frame analysis
numpyNumerical arrays and mathCoordinate arrays, distance matrices, mathematical operations
pandasTabular data and DataFramesDocking result tables, RMSF summaries, data filtering and export
matplotlibPlotting and visualizationRMSD plots, RMSF bar charts, score distributions
scipyScientific algorithmsClustering, statistics, signal processing on trajectory data
jupyterInteractive notebooksExploratory analysis, inline plots, sharing workflows

Setting up Jupyter notebooks

Jupyter notebooks let you run Python code in interactive cells and see plots inline — ideal for exploratory analysis where you want to try different selections or visualization parameters without rerunning the entire script. For automated pipelines that run on servers or process large datasets, plain .py scripts are more appropriate.

Terminal — with structbio active
# Launch Jupyter — opens in your default browser
jupyter notebook

# Or use JupyterLab (more modern interface)
conda install -c conda-forge jupyterlab -y
jupyter lab

# To run on a remote server (HPC cluster) without a browser:
jupyter notebook --no-browser --port=8888
# Then SSH tunnel from your local machine:
# ssh -L 8888:localhost:8888 username@cluster.university.edu
# Then open http://localhost:8888 in your local browser

When Jupyter opens, create a new notebook with the Python 3 kernel. The kernel should automatically use your structbio environment if you launched Jupyter from within it. If you see “kernel not found” or imports fail, confirm that you activated the environment before running jupyter notebook.

Working on an HPC cluster?
Most university HPC clusters don’t allow running Jupyter directly in a browser from a login node. The standard workflow is to request an interactive compute node (srun --pty bash on SLURM systems), activate your conda environment there, start Jupyter with --no-browser, then set up the SSH tunnel described above. Your university’s HPC documentation should have cluster-specific instructions for port forwarding.

Verifying the installation

Run these checks in a new Python session or Jupyter notebook cell to confirm everything is installed and working before starting your first analysis:

Python / Jupyter cell
import Bio
import MDAnalysis as mda
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

print(f"BioPython:    {Bio.__version__}")
print(f"MDAnalysis:   {mda.__version__}")
print(f"NumPy:        {np.__version__}")
print(f"pandas:       {pd.__version__}")

# Quick functional test — parse a structure from the PDB
from Bio.PDB import PDBList, PDBParser

pdbl = PDBList()
pdbl.retrieve_pdb_file("1ubq", file_type="pdb", pdir=".")

parser = PDBParser(QUIET=True)
structure = parser.get_structure("ubq", "./ub/pdb1ubq.ent")
chain = structure[0]["A"]
print(f"Ubiquitin: {len(list(chain.get_residues()))} residues loaded")
# Expected output:
Ubiquitin: 76 residues loaded
CheckExpected resultIf it fails
import Bio No error Run conda install -c conda-forge biopython again
import MDAnalysis No error Run conda install -c conda-forge mdanalysis again
Structure loads, 76 residues 76 residues loaded Check internet connection; PDBList downloads from RCSB
jupyter notebook opens Browser opens at localhost:8888 Try conda install -c conda-forge jupyter again

Optional: adding PyMOL to the same environment

If you also do PyMOL visualization work, installing it in the same structbio environment means your analysis scripts can call PyMOL commands directly — loading a structure in BioPython, filtering it, then passing it to PyMOL for figure generation in one pipeline:

Terminal — with structbio active
# Add open-source PyMOL to the same environment
conda install -c conda-forge pymol-open-source -y

# Verify
python -c "from pymol import cmd; print('PyMOL available in structbio')"

With all three — BioPython, MDAnalysis, and PyMOL — in a single environment, you can write scripts that span the full structural biology workflow: download an AlphaFold structure with BioPython, analyze the trajectory with MDAnalysis, and generate a publication figure with PyMOL, all in one Python file.

Save your environment spec for reproducibility
Once your environment is set up and working, export its full specification so you (or a collaborator) can recreate it exactly: conda env export > structbio_environment.yml. Recreate from the file later with: conda env create -f structbio_environment.yml. Add this YAML file to your project’s git repository — it’s the equivalent of a requirements.txt but with full dependency pinning.

Environment setup in one paragraph

Install Miniconda (not Anaconda), create a dedicated structbio environment with Python 3.10, and install BioPython, MDAnalysis, NumPy, pandas, matplotlib, and Jupyter in one conda command using the conda-forge channel. Activate the environment with conda activate structbio at the start of every session. Verify the install by importing all packages and loading a test structure. Export the environment spec with conda env export and commit it to your project repository. Do this once and every tutorial in this pillar works from the same clean, reproducible foundation.

Last updated on

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *