How to Find and Analyze Protein-Protein Interfaces with BioPython
Protein-protein interfaces determine binding specificity, drug targetability, and the functional logic of multi-subunit complexes. BioPython’s NeighborSearch class finds interface residues in milliseconds — and from there you can extract contact pairs, compute buried surface area, and produce a complete interface report programmatically.
The contact-distance approach to interface finding
The standard way to define a protein-protein interface computationally is by distance: any residue in chain A with at least one atom within a defined cutoff distance of any atom in chain B is an interface residue. The choice of cutoff affects what you capture.
Five ångströms is the most widely used cutoff in published interface analyses and is the default we’ll use here. Use 4 Å when you specifically want direct atomic contacts (useful for identifying H-bond partners before calculating distances), and 8 Å when studying allosteric networks or predicting mutation effects that propagate through the interface.
NeighborSearch — the right tool for the job
NeighborSearch builds a k-d tree from a list of atoms and then finds all atoms within a specified radius of a query point in O(log n) time. For interface analysis with thousands of atoms, this is orders of magnitude faster than a nested loop.
from Bio.PDB import PDBParser, NeighborSearch
parser = PDBParser(QUIET=True)
structure = parser.get_structure("complex", "complex.pdb")
model = structure[0]
# Get all atoms from chain B to search against
chain_b_atoms = list(model["B"].get_atoms())
# Build the k-d tree from chain B atoms
ns = NeighborSearch(chain_b_atoms)
# Query: which chain B atoms are within 5 Å of this specific atom?
query_atom = model["A"][100]["CA"]
nearby = ns.search(query_atom.get_coord(), 5.0, level="A")
print(f"Chain B atoms within 5Å of A-100-CA: {len(nearby)}")
The level argument controls what search() returns. Use "A" for individual atoms, "R" for complete residues, or "C" for chains. For interface analysis, "R" is usually the most useful — it returns complete residue objects rather than individual atoms, so you don’t need to deduplicate.
| level= | Returns | Best for |
|---|---|---|
| “A” | Individual Atom objects | Finding specific contact atoms, H-bond donors/acceptors |
| “R” | Complete Residue objects | Interface residue lists, contact maps |
| “C” | Chain objects | Checking which chains are in contact |
| “M” | Model objects | Rarely useful for interface analysis |
| “S” | Structure objects | Only returns the whole structure if any atom is in range |
Finding interface residues
def get_interface_residues(structure, chain_a_id, chain_b_id, cutoff=5.0):
"""
Find residues at the interface between two chains.
Returns (interface_A, interface_B) — sets of Residue objects.
"""
model = structure[0]
chain_a = model[chain_a_id]
chain_b = model[chain_b_id]
# Build NeighborSearch from chain B atoms
ns_b = NeighborSearch(list(chain_b.get_atoms()))
# Build NeighborSearch from chain A atoms (for finding chain B contacts)
ns_a = NeighborSearch(list(chain_a.get_atoms()))
interface_a = set()
interface_b = set()
# For each residue in chain A, check if any of its atoms are within cutoff of chain B
for res_a in chain_a.get_residues():
if res_a.id[0] != " ": continue # skip HETATM
for atom in res_a.get_atoms():
nearby_b = ns_b.search(atom.get_coord(), cutoff, level="R")
if nearby_b:
interface_a.add(res_a)
interface_b.update(nearby_b)
break # one atom contact is enough to flag the residue
return interface_a, interface_b
# Run the analysis
iface_a, iface_b = get_interface_residues(structure, "A", "B", cutoff=5.0)
print(f"Interface residues — Chain A: {len(iface_a)}")
print(f"Interface residues — Chain B: {len(iface_b)}")
# Print residue identities
print("\nChain A interface:")
for res in sorted(iface_a, key=lambda r: r.id[1]):
print(f" {res.get_resname()} {res.id[1]}")
# SER 18
# TYR 36
# GLU 72 ...
Extracting specific contact pairs
For a detailed interface report — especially when you need to identify which specific residues in chain A contact which specific residues in chain B — extract the contacts as pairs rather than as flat sets:
def get_contact_pairs(structure, chain_a_id, chain_b_id, cutoff=5.0):
"""Return a list of (resA, resB, min_distance) contact pairs."""
model = structure[0]
chain_a = model[chain_a_id]
chain_b = model[chain_b_id]
ns_b = NeighborSearch(list(chain_b.get_atoms()))
pairs = {} # key: (resi_A, resi_B) → min distance seen
for res_a in chain_a.get_residues():
if res_a.id[0] != " ": continue
for atom_a in res_a.get_atoms():
nearby_atoms = ns_b.search(atom_a.get_coord(), cutoff, level="A")
for atom_b in nearby_atoms:
res_b = atom_b.get_parent()
if res_b.id[0] != " ": continue
dist = atom_a - atom_b
key = (res_a.id[1], res_b.id[1])
if key not in pairs or dist < pairs[key]["min_dist"]:
pairs[key] = {
"resn_a": res_a.get_resname(),
"resi_a": res_a.id[1],
"resn_b": res_b.get_resname(),
"resi_b": res_b.id[1],
"min_dist": round(dist, 2),
}
return sorted(pairs.values(), key=lambda x: x["min_dist"])
contact_pairs = get_contact_pairs(structure, "A", "B", cutoff=5.0)
print(f"Total contact pairs: {len(contact_pairs)}")
print("Closest 5 contacts:")
for p in contact_pairs[:5]:
print(f" {p['resn_a']}{p['resi_a']} — {p['resn_b']}{p['resi_b']}: {p['min_dist']} Å")
# GLU72 — LYS156: 2.81 Å
# TYR36 — ASN201: 3.14 Å
Estimating interface area
A rough estimate of buried surface area — the gold-standard metric for interface size — can be approximated by counting interface atoms, since each atom contributes roughly 10–25 ų of buried area. For a rigorous buried surface area calculation, use DSSP or FreeSASA (a Python binding available via pip install freesasa). The atom-count approach below is suitable for comparative analysis when precise values aren’t required:
def estimate_interface_area(interface_residues):
"""
Rough BSA estimate: ~25 Ų per heavy atom at interface.
Suitable for relative comparisons; use FreeSASA for publication values.
"""
total_atoms = sum(
sum(1 for atom in res.get_atoms() if atom.element != "H")
for res in interface_residues
)
return total_atoms * 25 # ų
area_a = estimate_interface_area(iface_a)
area_b = estimate_interface_area(iface_b)
print(f"Estimated interface area (chain A): ~{area_a} Ų")
print(f"Estimated interface area (chain B): ~{area_b} Ų")
print(f"Total interface area estimate: ~{area_a + area_b} Ų")
# Typical values for reference:
# Small interface (transient): ~500 – 1500 Ų
# Medium interface: ~1500 – 3000 Ų
# Large, stable interface: > 3000 Ų
pip install freesasa) and calculate the actual solvent-accessible surface area of each chain alone vs in the complex. BSA = (SASA_A + SASA_B − SASA_complex) / 2. This gives the standard two-chain BSA value that is expected in protein interface papers. The atom-count estimate above is useful for quick comparisons but should not be reported as BSA in a publication.
Classifying contacts by type
# Classify contacts by interaction type based on residue chemistry
CHARGED_POS = {"ARG", "LYS", "HIS"}
CHARGED_NEG = {"ASP", "GLU"}
POLAR = {"SER", "THR", "ASN", "GLN", "TYR", "CYS"}
HYDROPHOBIC = {"ALA", "VAL", "ILE", "LEU", "MET", "PHE", "TRP", "PRO"}
def classify_contact(resn_a, resn_b):
if (resn_a in CHARGED_POS and resn_b in CHARGED_NEG) or \
(resn_a in CHARGED_NEG and resn_b in CHARGED_POS):
return "salt_bridge"
if resn_a in HYDROPHOBIC and resn_b in HYDROPHOBIC:
return "hydrophobic"
if resn_a in POLAR or resn_b in POLAR:
return "polar"
return "other"
for p in contact_pairs:
p["contact_type"] = classify_contact(p["resn_a"], p["resn_b"])
Tabulating and plotting results
import pandas as pd
import matplotlib.pyplot as plt
# Build a DataFrame from contact pairs
df = pd.DataFrame(contact_pairs)
print(df.head(10).to_string(index=False))
df.to_csv("interface_contacts.csv", index=False)
# Contact type breakdown
type_counts = df["contact_type"].value_counts()
print("\nContact type breakdown:")
print(type_counts)
# Plot: distance distribution
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4))
ax1.hist(df["min_dist"], bins=20, color="#4a7fc1", edgecolor="white")
ax1.set_xlabel("Minimum distance (Å)")
ax1.set_ylabel("Number of contact pairs")
ax1.set_title("Interface contact distance distribution")
ax1.axvline(3.5, color="#0d6e54", linestyle="--", label="H-bond threshold")
ax1.legend(fontsize=9)
colors = {"salt_bridge": "#E24B4A", "hydrophobic": "#ba7517",
"polar": "#4a7fc1", "other": "#928e85"}
ax2.bar(type_counts.index, type_counts.values,
color=[colors.get(t, "#928e85") for t in type_counts.index])
ax2.set_xlabel("Contact type")
ax2.set_ylabel("Count")
ax2.set_title("Contact types at interface")
plt.tight_layout()
plt.savefig("interface_analysis.png", dpi=300, bbox_inches="tight")
plt.close()
print("Saved interface_analysis.png")
Complete interface analysis function
from Bio.PDB import PDBParser, NeighborSearch
import pandas as pd
def analyze_interface(pdb_path, chain_a="A", chain_b="B", cutoff=5.0):
"""Full interface analysis: returns summary dict and contact DataFrame."""
structure = PDBParser(QUIET=True).get_structure("s", pdb_path)
model = structure[0]
ns_b = NeighborSearch(list(model[chain_b].get_atoms()))
contacts, iface_a, iface_b = [], set(), set()
for res_a in model[chain_a].get_residues():
if res_a.id[0] != " ": continue
min_dist_this_res = cutoff + 1
contact_res_b = None
for atom_a in res_a.get_atoms():
for atom_b in ns_b.search(atom_a.get_coord(), cutoff, "A"):
d = atom_a - atom_b
res_b = atom_b.get_parent()
if res_b.id[0] != " ": continue
iface_a.add(res_a); iface_b.add(res_b)
if d < min_dist_this_res:
min_dist_this_res = d; contact_res_b = res_b
if contact_res_b:
contacts.append({
"resn_a": res_a.get_resname(), "resi_a": res_a.id[1],
"resn_b": contact_res_b.get_resname(),
"resi_b": contact_res_b.id[1],
"min_dist": round(min_dist_this_res, 2),
})
df = pd.DataFrame(contacts).sort_values("min_dist").reset_index(drop=True)
summary = {
"interface_residues_A": len(iface_a),
"interface_residues_B": len(iface_b),
"total_contacts": len(df),
"mean_contact_dist": round(df["min_dist"].mean(), 2),
"close_contacts_lt4": (df["min_dist"] < 4.0).sum(),
}
return summary, df
summary, contacts_df = analyze_interface("complex.pdb")
for k, v in summary.items():
print(f"{k}: {v}")
contacts_df.to_csv("interface_contacts.csv", index=False)
iface_b.add(res_b) line inside the loop. If you only search in one direction — atoms of A against all of B — you may miss chain B residues at the edge of the interface that have close contacts with chain A but whose atoms aren’t the closest ones. The complete function above handles this correctly because it searches A against B and records both sides.
Interface analysis in one paragraph
Build a NeighborSearch from chain B atoms, then iterate over chain A residues — for each atom in each chain A residue, search for chain B atoms within your cutoff (5 Å is standard). Any chain A residue with at least one atom hit is an interface residue; collect the chain B residue on the other side of each contact to build the full picture. Extract contact pairs as (resA, resB, min_distance) tuples, classify them by residue chemistry, load into a pandas DataFrame, and export to CSV for the paper. For publication-quality buried surface area values use FreeSASA — the atom-count approximation is suitable only for comparative screening.