How to Validate Molecular Docking Results: Redocking, Cross-Docking, and Benchmarking
Running a docking calculation is easy. Knowing whether your results are trustworthy is hard — and it’s where most beginners stop short. This tutorial covers the full validation toolkit: why it matters, how to run self-docking validation step by step, how to interpret RMSD, and what reviewers expect to see before you publish.
Why validation is not optional
Molecular docking runs to completion and returns scores regardless of whether the inputs were correct, the grid box was positioned well, or the receptor was properly prepared. A badly set-up docking calculation fails silently — you get plausible-looking numbers from a meaningless search. Without validation, you have no way to distinguish results from a working protocol from results from a broken one.
This isn’t a theoretical concern. Retracted docking papers — and papers that survive review despite being based on invalid protocols — consistently share one feature: the authors did not perform self-docking validation before reporting results. The validation step is the standard of evidence that your docking protocol is doing what you think it is.
Reviewers at major journals in structural biology and medicinal chemistry now routinely request validation data. Providing it proactively — with the RMSD reported in your methods section — signals that you understand the limitations of the method and have controlled for the most common failure modes.
The three tiers of docking validation
Validation exists on a spectrum. Here is how to think about the three levels, and when each is appropriate:
For most academic docking studies — where you’re docking your own compounds against a specific target — Tier 1 is the minimum and Tier 2 is strongly recommended if known actives and inactives are available in ChEMBL or the literature. Tier 3 is primarily for papers that describe a new docking method or protocol rather than a drug discovery application.
Self-docking validation: step by step
You need two things: the prepared receptor PDBQT (from your standard preparation workflow) and the native ligand — the molecule that was co-crystallized with the protein in the PDB structure — in its crystal pose. The crystal pose is your ground truth.
Extract the native ligand from the original PDB file in PyMOL:
# Load the original structure
load 5KIR.pdb
# Select and save the co-crystallized ligand
# Replace IMN with your ligand's residue name
select native_ligand, resn IMN
save validation/native_ligand_crystal.pdb, native_ligand
This PDB file is your reference — the experimentally determined position of the ligand. Do not modify or energy-minimize it. You want the raw crystal coordinates.
Convert the crystal ligand to PDBQT using Open Babel, applying the same preparation parameters you use for all your ligands:
obabel validation/native_ligand_crystal.pdb \
-O validation/native_ligand.pdbqt \
--partialcharge gasteiger \
-p 7.4
--gen3d flag. You want Open Babel to assign charges to the existing crystal geometry, not regenerate a new 3D conformation. The --gen3d flag would overwrite the experimental coordinates with a computationally generated conformer, defeating the purpose of the validation.
Run AutoDock Vina (or GNINA) using your standard config file with the native ligand as input. Use the same grid box, exhaustiveness, and all other settings you plan to use for your actual campaign — the point is to validate that specific protocol, not a differently configured one.
vina \
--receptor receptor/5KIR_receptor.pdbqt \
--ligand validation/native_ligand.pdbqt \
--center_x 15.234 --center_y -8.441 --center_z 22.178 \
--size_x 20 --size_y 20 --size_z 20 \
--exhaustiveness 16 \
--num_modes 9 \
--out validation/native_ligand_docked.pdbqt \
--log validation/validation.log
Now load both the crystal pose and the top docked pose in PyMOL side by side to compare visually before running the RMSD calculation:
# Load both poses
load validation/native_ligand_crystal.pdb, crystal_pose
load validation/native_ligand_docked.pdbqt, docked_poses
# Show both as sticks, colored differently
show sticks, crystal_pose
show sticks, docked_poses
color yellow, crystal_pose
color cyan, docked_poses and state 1
# Load receptor for context
load receptor/5KIR_receptor.pdbqt, receptor
show cartoon, receptor
zoom crystal_pose
If the cyan (docked) pose overlaps well with the yellow (crystal) pose, your validation is likely passing. If they’re in completely different positions, stop here — do not proceed to dock your compounds until the preparation issue is diagnosed and fixed.
The visual check tells you roughly whether the poses agree. The RMSD calculation gives you the number to report. Run it in PyMOL between the crystal pose and the top-ranked docked pose:
# RMSD between crystal pose and mode 1 of docked output
# rms_cur does not superpose — it measures as-is, which is what you want
rms_cur crystal_pose, docked_poses and state 1
An RMSD of 1.42 Å for 21 matched atoms indicates successful self-docking validation.
Alternatively, calculate RMSD from the command line using Open Babel or an RDKit script if you need it for automated validation pipelines:
python3 - <<EOF
# Quick RMSD between crystal and docked SDF/PDB files
from rdkit import Chem
from rdkit.Chem import AllChem, rdMolAlign
crystal = Chem.MolFromPDBFile('validation/native_ligand_crystal.pdb', removeHs=False)
docked = Chem.MolFromPDBFile(# convert pdbqt to pdb first with obabel
'validation/native_ligand_docked_mode1.pdb', removeHs=False)
rmsd = rdMolAlign.CalcRMS(crystal, docked)
print(f'RMSD: {rmsd:.3f} Å')
EOF
Interpreting RMSD results
The 2.0 Å threshold is the field-standard cutoff for successful self-docking validation. Here is how to interpret the full range of outcomes:
centerofmass in PyMOL. (2) Were alternate conformations removed from the receptor? (3) Are protonation states correct for active site residues? (4) Was the co-crystallized ligand fully removed from the receptor before docking? (5) Did the ligand preparation preserve the correct ionization state? Fixing any one of these often resolves a failing RMSD.
Cross-docking validation
Self-docking tells you that your protocol can reproduce a known pose. Cross-docking asks a harder and more relevant question: can your protocol correctly rank known active compounds above known inactives?
The workflow: collect a set of known actives (compounds with published IC50 or Kd data against your target) and known inactives (compounds tested and confirmed to not bind) from ChEMBL or the literature. Dock all of them against your receptor using your validated protocol. Then calculate enrichment metrics — specifically the area under the ROC curve (AUC) and enrichment factor at 1% (EF1%) — to quantify how well your docking scores separate actives from inactives.
- An AUC of 0.5 means your scores are no better than random — the protocol cannot distinguish actives from inactives
- An AUC of 0.7–0.8 is considered good for docking
- An AUC above 0.8 is excellent and publication-worthy as a standalone result
- EF1% measures how many actives are in your top 1% of compounds — a value of 10 means your protocol finds actives at 10× the rate of random selection
Benchmarking against known datasets
If you are describing a new docking protocol, a new scoring function, or comparing docking programs rather than reporting drug discovery results, you need to validate on established community benchmarks. Two datasets dominate the field:
| Dataset | Size | What it tests | Used for |
|---|---|---|---|
| CASF-2016 | 285 complexes | Pose prediction, scoring, ranking, docking power | The standard benchmark for comparing docking programs. Report success rate at 2 Å RMSD threshold. |
| DUD-E | 102 targets, ~22k actives | Virtual screening enrichment (AUC, EF) | Evaluating virtual screening protocols. Strong community recognition — reviewers know this dataset. |
| PDBbind | ~20,000 complexes | Binding affinity prediction | Testing whether scores correlate with experimental affinities. Used for scoring function development. |
| MUV | 17 targets | Virtual screening — designed to be harder than DUD-E | Stricter virtual screening benchmark with more realistic decoys. |
For most drug discovery docking papers, CASF-2016 self-docking results plus DUD-E enrichment for your specific target class is the combination reviewers find most convincing.
What to include in your methods section
A complete, reproducible docking methods section should contain everything another researcher needs to independently reproduce your protocol. Reviewers increasingly use this checklist:
Validation in three sentences
Self-docking validation is the minimum standard for any published docking study: dock the co-crystallized ligand back into its own receptor, measure RMSD against the crystal pose, and require RMSD < 2.0 Å before trusting any results from that protocol. If it fails, diagnose the preparation — not the algorithm. If it passes, report the RMSD in your methods section and proceed with confidence.
Validation is not a formality. It is the evidence that your computational results mean something.
Helpful informations.