How to Validate Molecular Docking Results: Redocking, Cross-Docking, and Benchmarking

How to Validate Molecular Docking Results: Redocking, Cross-Docking, and Benchmarking

Running a docking calculation is easy. Knowing whether your results are trustworthy is hard — and it’s where most beginners stop short. This tutorial covers the full validation toolkit: why it matters, how to run self-docking validation step by step, how to interpret RMSD, and what reviewers expect to see before you publish.

Why validation is not optional

Molecular docking runs to completion and returns scores regardless of whether the inputs were correct, the grid box was positioned well, or the receptor was properly prepared. A badly set-up docking calculation fails silently — you get plausible-looking numbers from a meaningless search. Without validation, you have no way to distinguish results from a working protocol from results from a broken one.

This isn’t a theoretical concern. Retracted docking papers — and papers that survive review despite being based on invalid protocols — consistently share one feature: the authors did not perform self-docking validation before reporting results. The validation step is the standard of evidence that your docking protocol is doing what you think it is.

Reviewers at major journals in structural biology and medicinal chemistry now routinely request validation data. Providing it proactively — with the RMSD reported in your methods section — signals that you understand the limitations of the method and have controlled for the most common failure modes.

The three tiers of docking validation

Validation exists on a spectrum. Here is how to think about the three levels, and when each is appropriate:

Tier 1 — Required
Self-docking
Redock the co-crystallized ligand into the same structure it was solved in. The minimum acceptable validation for any published docking study.
Tier 2 — Recommended
Cross-docking
Dock known actives and inactives against your receptor and assess enrichment. More rigorous — tests ranking ability, not just pose reproduction.
Tier 3 — Gold standard
Benchmarking
Test your protocol on established datasets (DUD-E, CASF-2016) with known outcomes. Required for methods papers; highly credible for drug discovery papers.

For most academic docking studies — where you’re docking your own compounds against a specific target — Tier 1 is the minimum and Tier 2 is strongly recommended if known actives and inactives are available in ChEMBL or the literature. Tier 3 is primarily for papers that describe a new docking method or protocol rather than a drug discovery application.

Self-docking validation: step by step

1
Step 1
Extract the co-crystallized ligand

You need two things: the prepared receptor PDBQT (from your standard preparation workflow) and the native ligand — the molecule that was co-crystallized with the protein in the PDB structure — in its crystal pose. The crystal pose is your ground truth.

Extract the native ligand from the original PDB file in PyMOL:

PyMOL command line
# Load the original structure
load 5KIR.pdb

# Select and save the co-crystallized ligand
# Replace IMN with your ligand's residue name
select native_ligand, resn IMN
save validation/native_ligand_crystal.pdb, native_ligand

This PDB file is your reference — the experimentally determined position of the ligand. Do not modify or energy-minimize it. You want the raw crystal coordinates.

2
Step 2
Prepare the native ligand for docking

Convert the crystal ligand to PDBQT using Open Babel, applying the same preparation parameters you use for all your ligands:

Terminal
obabel validation/native_ligand_crystal.pdb \
  -O validation/native_ligand.pdbqt \
  --partialcharge gasteiger \
  -p 7.4
Do not use –gen3d for the native ligand
When preparing the native ligand for self-docking, omit the --gen3d flag. You want Open Babel to assign charges to the existing crystal geometry, not regenerate a new 3D conformation. The --gen3d flag would overwrite the experimental coordinates with a computationally generated conformer, defeating the purpose of the validation.
3
Step 3
Dock and collect the top pose

Run AutoDock Vina (or GNINA) using your standard config file with the native ligand as input. Use the same grid box, exhaustiveness, and all other settings you plan to use for your actual campaign — the point is to validate that specific protocol, not a differently configured one.

Terminal
vina \
  --receptor receptor/5KIR_receptor.pdbqt \
  --ligand validation/native_ligand.pdbqt \
  --center_x 15.234 --center_y -8.441 --center_z 22.178 \
  --size_x 20 --size_y 20 --size_z 20 \
  --exhaustiveness 16 \
  --num_modes 9 \
  --out validation/native_ligand_docked.pdbqt \
  --log validation/validation.log

Now load both the crystal pose and the top docked pose in PyMOL side by side to compare visually before running the RMSD calculation:

PyMOL command line
# Load both poses
load validation/native_ligand_crystal.pdb, crystal_pose
load validation/native_ligand_docked.pdbqt, docked_poses

# Show both as sticks, colored differently
show sticks, crystal_pose
show sticks, docked_poses
color yellow, crystal_pose
color cyan,   docked_poses and state 1

# Load receptor for context
load receptor/5KIR_receptor.pdbqt, receptor
show cartoon, receptor
zoom crystal_pose

If the cyan (docked) pose overlaps well with the yellow (crystal) pose, your validation is likely passing. If they’re in completely different positions, stop here — do not proceed to dock your compounds until the preparation issue is diagnosed and fixed.

4
Step 4
Calculate the RMSD

The visual check tells you roughly whether the poses agree. The RMSD calculation gives you the number to report. Run it in PyMOL between the crystal pose and the top-ranked docked pose:

PyMOL command line
# RMSD between crystal pose and mode 1 of docked output
# rms_cur does not superpose — it measures as-is, which is what you want
rms_cur crystal_pose, docked_poses and state 1
PyMOL output
Executive: RMSD = 1.42 (21 to 21 atoms)

An RMSD of 1.42 Å for 21 matched atoms indicates successful self-docking validation.

Alternatively, calculate RMSD from the command line using Open Babel or an RDKit script if you need it for automated validation pipelines:

Terminal — RDKit RMSD calculation
python3 - <<EOF
# Quick RMSD between crystal and docked SDF/PDB files
from rdkit import Chem
from rdkit.Chem import AllChem, rdMolAlign

crystal = Chem.MolFromPDBFile('validation/native_ligand_crystal.pdb', removeHs=False)
docked  = Chem.MolFromPDBFile(# convert pdbqt to pdb first with obabel
    'validation/native_ligand_docked_mode1.pdb', removeHs=False)

rmsd = rdMolAlign.CalcRMS(crystal, docked)
print(f'RMSD: {rmsd:.3f} Å')
EOF

Interpreting RMSD results

The 2.0 Å threshold is the field-standard cutoff for successful self-docking validation. Here is how to interpret the full range of outcomes:

RMSD < 1.0 Å
Excellent reproduction
Your protocol is well-validated. The binding mode is reproduced with near-crystallographic accuracy. Report this confidently.
RMSD 1.0 – 2.0 Å
Acceptable — validation passes
The overall binding mode is captured. Minor deviations in flexible substituents are normal and expected. This is the most common range for successful validation and is fully acceptable for publication.
RMSD 2.0 – 3.0 Å
Borderline — investigate before proceeding
The core binding orientation may be correct but significant deviations exist. Check whether the overall pose topology is right even if individual atoms are displaced. Try increasing exhaustiveness to 32 and re-running before concluding the protocol has failed.
RMSD > 3.0 Å
Validation failed — do not proceed
The docked pose is substantially different from the crystal structure. Diagnose before running any compounds: check grid box center, protonation states, alternate conformations, and whether the correct ligand residue name was removed from the receptor.
When RMSD fails: a diagnostic checklist
Work through these in order: (1) Is the grid box centered correctly on the binding site? Recheck with centerofmass in PyMOL. (2) Were alternate conformations removed from the receptor? (3) Are protonation states correct for active site residues? (4) Was the co-crystallized ligand fully removed from the receptor before docking? (5) Did the ligand preparation preserve the correct ionization state? Fixing any one of these often resolves a failing RMSD.

Cross-docking validation

Self-docking tells you that your protocol can reproduce a known pose. Cross-docking asks a harder and more relevant question: can your protocol correctly rank known active compounds above known inactives?

The workflow: collect a set of known actives (compounds with published IC50 or Kd data against your target) and known inactives (compounds tested and confirmed to not bind) from ChEMBL or the literature. Dock all of them against your receptor using your validated protocol. Then calculate enrichment metrics — specifically the area under the ROC curve (AUC) and enrichment factor at 1% (EF1%) — to quantify how well your docking scores separate actives from inactives.

  • An AUC of 0.5 means your scores are no better than random — the protocol cannot distinguish actives from inactives
  • An AUC of 0.7–0.8 is considered good for docking
  • An AUC above 0.8 is excellent and publication-worthy as a standalone result
  • EF1% measures how many actives are in your top 1% of compounds — a value of 10 means your protocol finds actives at 10× the rate of random selection
Where to find known actives and inactives for your target
ChEMBL (ebi.ac.uk/chembl) is the primary source — search by target name and filter for compounds with IC50 data. DUD-E (dude.docking.org) provides pre-curated active/inactive sets for 102 targets specifically designed for docking benchmarking. If your target is in DUD-E, using it is the most credible validation you can do.

Benchmarking against known datasets

If you are describing a new docking protocol, a new scoring function, or comparing docking programs rather than reporting drug discovery results, you need to validate on established community benchmarks. Two datasets dominate the field:

DatasetSizeWhat it testsUsed for
CASF-2016 285 complexes Pose prediction, scoring, ranking, docking power The standard benchmark for comparing docking programs. Report success rate at 2 Å RMSD threshold.
DUD-E 102 targets, ~22k actives Virtual screening enrichment (AUC, EF) Evaluating virtual screening protocols. Strong community recognition — reviewers know this dataset.
PDBbind ~20,000 complexes Binding affinity prediction Testing whether scores correlate with experimental affinities. Used for scoring function development.
MUV 17 targets Virtual screening — designed to be harder than DUD-E Stricter virtual screening benchmark with more realistic decoys.

For most drug discovery docking papers, CASF-2016 self-docking results plus DUD-E enrichment for your specific target class is the combination reviewers find most convincing.

What to include in your methods section

A complete, reproducible docking methods section should contain everything another researcher needs to independently reproduce your protocol. Reviewers increasingly use this checklist:

Methods section checklist for publication
Software name and version (e.g. AutoDock Vina 1.2.5)
PDB accession code and resolution of the receptor structure used
Receptor preparation steps: what was removed (waters, ligands, alternates), how hydrogens were added, which charge model was used
Protonation state handling — tool used (H++, PropKa, manual) and target pH
Ligand preparation: source of 3D conformers, charge model, protonation method
Grid box center coordinates and dimensions (Å)
Exhaustiveness setting and number of poses generated
Self-docking validation: native ligand used, RMSD of top-ranked pose vs crystal structure
Visualization software used for pose analysis (PyMOL version, UCSF ChimeraX, etc.)
If virtual screening: score cutoff used for hit selection, number of compounds screened and hits identified
What a complete validation statement looks like in a methods section
“Docking was performed using AutoDock Vina 1.2.5. The receptor was prepared by removing water molecules and the co-crystallized inhibitor, adding polar hydrogens with AutoDockTools, and assigning Gasteiger partial charges. Protonation states were assigned at pH 7.4 using H++. The docking grid box (20 × 20 × 20 Å) was centered on the centroid of the co-crystallized ligand (PDB: 5KIR). Self-docking validation was performed by redocking the native ligand (IMN) into the prepared receptor; the top-ranked pose reproduced the crystal structure with an RMSD of 1.42 Å, confirming protocol validity. Exhaustiveness was set to 16 for all docking runs.”

Validation in three sentences

Self-docking validation is the minimum standard for any published docking study: dock the co-crystallized ligand back into its own receptor, measure RMSD against the crystal pose, and require RMSD < 2.0 Å before trusting any results from that protocol. If it fails, diagnose the preparation — not the algorithm. If it passes, report the RMSD in your methods section and proceed with confidence.

Validation is not a formality. It is the evidence that your computational results mean something.

Last updated on

Similar Posts

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *