How to Download and Parse AlphaFold Structures with Python and BioPython
Every AlphaFold prediction is one API call away. Python makes it trivial to download hundreds of structures, extract their per-residue confidence scores, and filter out unreliable regions — automating what would otherwise be hours of manual work through the AlphaFold Database web interface.
Understanding pLDDT and where it lives in the PDB file
AlphaFold assigns every residue a per-residue confidence score called pLDDT (predicted Local Distance Difference Test), ranging from 0 to 100. Higher is more confident. The critical fact for parsing: pLDDT values are stored in the B-factor column of the AlphaFold PDB file — the same column that experimental crystal structures use for thermal displacement parameters. BioPython reads this column with atom.get_bfactor().
| pLDDT range | Confidence level | What it means for your analysis |
|---|---|---|
| > 90 | Very high | Backbone and side chains reliable. Safe to use directly for docking or MD input. |
| 70 – 90 | Confident | Backbone reliable. Side chains may be less accurate. Suitable for most analyses. |
| 50 – 70 | Low | May represent flexible or disordered regions. Treat with caution. |
| < 50 | Very low | Likely intrinsically disordered. Do not use for structural analysis or docking. |
spectrum b, blue_cyan_yellow_orange_red, minimum=50, maximum=100. Once you’ve filtered to high-confidence residues in Python and saved a new PDB, you can load that filtered structure directly into PyMOL for visualization.
Downloading a single AlphaFold structure
The AlphaFold Database provides a simple REST API. Given a UniProt accession ID, the PDB file is available at a predictable URL — no authentication or API key required.
import requests
import os
def download_alphafold(uniprot_id, outdir=".", version=4):
"""
Download an AlphaFold structure PDB from the AFDB API.
Returns the local filepath on success, None on failure.
"""
url = (
f"https://alphafold.ebi.ac.uk/files/"
f"AF-{uniprot_id}-F1-model_v{version}.pdb"
)
response = requests.get(url, timeout=30)
if response.status_code != 200:
print(f"Not found: {uniprot_id} (HTTP {response.status_code})")
return None
os.makedirs(outdir, exist_ok=True)
filepath = os.path.join(outdir, f"AF-{uniprot_id}.pdb")
with open(filepath, "w") as f:
f.write(response.text)
print(f"Downloaded: {filepath}")
return filepath
# Download p53 (human TP53, UniProt P04637)
filepath = download_alphafold("P04637", outdir="./af_structures")
# Downloaded: ./af_structures/AF-P04637.pdb
P04637 for human p53 or P00533 for EGFR. The AFDB covers most reviewed human proteins and a large fraction of the proteomes of major model organisms. If your protein isn’t in the AFDB, the API returns a 404 — check with response.status_code before trying to parse the result.
Batch downloading multiple structures
import time
# List of UniProt IDs to download
targets = [
"P04637", # TP53 — human p53
"P00533", # EGFR — epidermal growth factor receptor
"P35222", # CTNNB1 — beta-catenin
"Q9Y6Q9", # TP53BP2 — ASPP2
]
downloaded = {}
for uid in targets:
path = download_alphafold(uid, outdir="./af_structures")
if path:
downloaded[uid] = path
time.sleep(0.5) # be polite to the API — 500 ms between requests
print(f"\nSuccessfully downloaded {len(downloaded)}/{len(targets)} structures")
time.sleep(0.5) between requests is courteous and prevents your IP from being temporarily blocked. For very large downloads (thousands of structures), consider using the AFDB bulk download service at alphafold.ebi.ac.uk/download, which provides pre-packaged tar archives by proteome, rather than calling the per-structure API endpoint repeatedly.
Parsing the structure and extracting pLDDT
Once downloaded, parse the PDB normally with BioPython’s PDBParser and read pLDDT from the B-factor column of each alpha carbon atom:
from Bio.PDB import PDBParser
import numpy as np
parser = PDBParser(QUIET=True)
structure = parser.get_structure("p53", "./af_structures/AF-P04637.pdb")
model = structure[0]
chain = model["A"] # AlphaFold structures always have a single chain A
# Extract per-residue pLDDT from B-factor column
plddt_data = []
for residue in chain.get_residues():
if residue.id[0] != " ": # skip HETATM records
continue
try:
ca = residue["CA"]
plddt_data.append({
"resi": residue.id[1],
"resn": residue.get_resname(),
"plddt": ca.get_bfactor(),
})
except KeyError:
pass # residue missing CA — rare but possible
# Summary statistics
plddts = np.array([d["plddt"] for d in plddt_data])
print(f"Residues: {len(plddts)}")
print(f"Mean pLDDT: {plddts.mean():.1f}")
print(f"Median pLDDT: {np.median(plddts):.1f}")
print(f"Below 70 (low): {(plddts < 70).sum()} residues")
print(f"Above 90 (VH): {(plddts > 90).sum()} residues")
# Residues: 393
# Mean pLDDT: 71.4
# Median pLDDT: 76.2
# Below 70 (low): 128 residues
# Above 90 (VH): 89 residues
Filtering residues by confidence threshold
For docking and MD simulations, the standard practice is to remove residues with pLDDT below 70 before using the structure. Disordered residues (pLDDT < 50) can distort docking box calculations, and low-confidence loops can mislead MD force field parameterization:
import pandas as pd
# Load pLDDT data into a DataFrame for easy filtering
df = pd.DataFrame(plddt_data)
# Inspect the confidence distribution
print(df["plddt"].describe().round(1))
# Classify each residue
def classify_plddt(score):
if score >= 90: return "very_high"
if score >= 70: return "confident"
if score >= 50: return "low"
return "very_low"
df["confidence"] = df["plddt"].apply(classify_plddt)
print(df["confidence"].value_counts())
# Get residue numbers with pLDDT above threshold
threshold = 70
high_conf_resis = df[df["plddt"] >= threshold]["resi"].tolist()
print(f"\nResidues above pLDDT {threshold}: {len(high_conf_resis)}")
# Check binding site residues specifically
binding_site = [175, 248, 249, 273, 282] # p53 DNA-binding domain key residues
site_df = df[df["resi"].isin(binding_site)][["resi", "resn", "plddt", "confidence"]]
print("\nBinding site pLDDT:")
print(site_df.to_string(index=False))
print(f"Mean binding site pLDDT: {site_df['plddt'].mean():.1f}")
Saving the filtered structure to a new PDB file
BioPython’s PDBIO with a custom Select subclass lets you write only the residues that pass the confidence filter — producing a clean, high-confidence PDB ready for docking or MD input:
from Bio.PDB import PDBIO, Select
class HighConfidenceSelect(Select):
"""Keep only residues with pLDDT at or above the threshold."""
def __init__(self, min_plddt=70.0):
self.min_plddt = min_plddt
def accept_residue(self, residue):
if residue.id[0] != " ": # always keep HETATM
return True
try:
plddt = residue["CA"].get_bfactor()
return plddt >= self.min_plddt
except KeyError:
return False
# Save a structure with only pLDDT ≥ 70 residues
io = PDBIO()
io.set_structure(structure)
io.save("AF-P04637_confident.pdb", HighConfidenceSelect(min_plddt=70))
# Save a very high confidence only version (≥ 90) for the core domain
io.save("AF-P04637_very_high.pdb", HighConfidenceSelect(min_plddt=90))
print("Saved filtered structures")
Complete pipeline — download to filtered PDB
Combining all steps into a single reusable function that takes a UniProt ID and returns a filtered PDB file, plus a DataFrame of pLDDT values for reporting:
import requests, os, time
import numpy as np
import pandas as pd
from Bio.PDB import PDBParser, PDBIO, Select
class HighConfidenceSelect(Select):
def __init__(self, min_plddt=70.0):
self.min_plddt = min_plddt
def accept_residue(self, residue):
if residue.id[0] != " ": return True
try: return residue["CA"].get_bfactor() >= self.min_plddt
except KeyError: return False
def process_alphafold(uniprot_id, outdir=".", min_plddt=70.0):
"""
Download, parse, filter, and save an AlphaFold structure.
Returns (filtered_pdb_path, plddt_dataframe).
"""
# 1. Download
url = f"https://alphafold.ebi.ac.uk/files/AF-{uniprot_id}-F1-model_v4.pdb"
r = requests.get(url, timeout=30)
if r.status_code != 200:
raise ValueError(f"Could not download {uniprot_id}: HTTP {r.status_code}")
raw_path = os.path.join(outdir, f"AF-{uniprot_id}_raw.pdb")
os.makedirs(outdir, exist_ok=True)
with open(raw_path, "w") as f: f.write(r.text)
# 2. Parse and extract pLDDT
parser = PDBParser(QUIET=True)
structure = parser.get_structure(uniprot_id, raw_path)
rows = []
for res in structure[0]["A"].get_residues():
if res.id[0] != " ": continue
try: rows.append({"resi": res.id[1], "resn": res.get_resname(),
"plddt": res["CA"].get_bfactor()})
except KeyError: pass
df = pd.DataFrame(rows)
# 3. Save filtered structure
filtered_path = os.path.join(outdir, f"AF-{uniprot_id}_plddt{min_plddt:.0f}.pdb")
io = PDBIO()
io.set_structure(structure)
io.save(filtered_path, HighConfidenceSelect(min_plddt))
n_kept = (df["plddt"] >= min_plddt).sum()
print(f"{uniprot_id}: {len(df)} total residues, {n_kept} kept (pLDDT≥{min_plddt}), "
f"mean={df['plddt'].mean():.1f}")
return filtered_path, df
# Run the full pipeline for several proteins
proteins = ["P04637", "P00533", "P35222"]
all_results = {}
for uid in proteins:
path, df = process_alphafold(uid, outdir="./af_filtered", min_plddt=70)
all_results[uid] = {"path": path, "df": df}
time.sleep(0.5)
# Export a summary CSV across all proteins
summary = pd.DataFrame([{
"uniprot_id": uid,
"total_residues": len(v["df"]),
"mean_plddt": v["df"]["plddt"].mean().round(1),
"pct_above_70": (v["df"]["plddt"] >= 70).mean().round(3) * 100,
"filtered_pdb": v["path"],
} for uid, v in all_results.items()])
summary.to_csv("alphafold_summary.csv", index=False)
print(summary.to_string(index=False))
AlphaFold parsing in one paragraph
Download AlphaFold structures from https://alphafold.ebi.ac.uk/files/AF-{UniProtID}-F1-model_v4.pdb with Python’s requests library — no API key needed. Parse with PDBParser(QUIET=True) exactly like any other PDB file. Extract per-residue pLDDT from the B-factor column using residue["CA"].get_bfactor(). Filter to confident residues (pLDDT ≥ 70) using a custom Select subclass in PDBIO, and save the filtered structure as a new PDB. For batch work, wrap the whole pipeline in a function, iterate over UniProt IDs with a 500 ms sleep between requests, and export a summary DataFrame. The filtered PDB files are then ready for docking with AutoDock Vina or MD preparation with GROMACS.