10 Common Molecular Docking Mistakes (And How to Avoid Every One)
Most bad docking results don’t come from the algorithm — they come from avoidable preparation and interpretation errors that produce wrong answers without any error message. Here are the ten mistakes that appear most often in troubleshooting forums, paper review comments, and lab group postmortems, with exact fixes for each.
The grid box tells Vina where to search for binding poses. Get it wrong and you’re searching the wrong region of the protein — or searching such a large region that the algorithm spreads its sampling across irrelevant surface area and misses the real binding site entirely. This is the most common source of suspiciously bad scores with no error message.
A box that’s too small will clip the ligand, preventing it from sampling its full conformational space. A box that’s too large wastes exhaustiveness budget on empty space. A miscentered box will dock the ligand to the wrong site altogether — and if the score looks reasonable, you may not even notice.
centerofmass command on the native ligand selection. Size the box so it extends 8–10 Å beyond the largest dimension of your ligand. After docking, verify in PyMOL that the top pose sits inside the pocket before reporting any results.
A raw PDB file downloaded from the RCSB cannot go directly into AutoDock Vina. It is missing hydrogens (which X-ray crystallography cannot resolve), contains waters and co-crystallized ligands that will block your binding site, and lacks the partial charges that the scoring function requires. Docking against an unprepared receptor produces scores that are numerically plausible but physically meaningless.
This happens more than you’d expect — often when someone is quickly testing a hypothesis and plans to “do it properly later.” The later rarely comes, and the results get presented.
remove resn HOH in PyMOL), remove co-crystallized ligands and crystallization artifacts, add polar hydrogens, assign Gasteiger charges, and generate a valid PDBQT. Check for ? atom types in the output — any present means preparation is incomplete.
AutoDockTools assigns protonation states using simple default rules that are correct for most surface residues but frequently wrong for active site residues — especially histidines, which can be neutral (HIE or HID) or positively charged (HIP) depending on local electrostatic environment and pH. A misprotonated catalytic histidine can reverse the sign of a key hydrogen bond and systematically misrank every compound you dock.
Aspartate and glutamate residues in buried, hydrophobic environments are also frequently misprotonated by default tools. This is one of the highest-impact improvements you can make to a docking protocol, and one of the most consistently skipped.
PDB files sometimes contain alternate conformations for residues with significant positional disorder — indicated by an “A” or “B” in column 17 of the ATOM records. When both are present, AutoDockTools can produce duplicate atoms or assign incorrect atom types, resulting in a PDBQT file that looks valid but produces distorted scoring. Active site residues with alternates are particularly damaging because they directly affect where and how ligands bind.
remove not (alt ''+A) then alter all, alt='' then sort. Or use the --deleteAltB flag when running prepare_receptor4.py. Always keep only the primary conformation (A) unless you have specific reason to use the alternate.
Two opposite errors are common here. The first is removing everything labeled HETATM, including catalytic metal ions that are biologically essential — a zinc ion in a metalloprotease or a heme iron in a cytochrome P450 is not a crystallization artifact. Removing it collapses the binding site and produces nonsensical poses. The second error is not removing enough — leaving behind glycerol (GOL), sulfate (SO4), polyethylene glycol (PEG), or other cryoprotectants that occupy the binding site and block the ligand from docking correctly.
A ligand PDBQT generated from a 2D structure without proper 3D conformer generation, or with incorrect partial charges, will dock poorly regardless of how well the receptor is prepared. Common problems include flat ring systems that should be nonplanar, incorrect ionization states (docking a carboxylic acid as neutral at pH 7.4 instead of deprotonated), and missing or incorrect rotatable bond definitions that prevent the ligand from exploring its true conformational flexibility.
--gen3d and -p 7.4 flags to set correct protonation at physiological pH. Assign Gasteiger charges explicitly with --partialcharge gasteiger. For publication-quality work, verify the 3D geometry visually in PyMOL — rings should be non-planar where expected, bond lengths and angles should look chemically reasonable.
The default exhaustiveness = 8 is fine for initial exploration but is widely insufficient for publication-quality results, especially for flexible ligands with many rotatable bonds or for targets with complex binding sites. Low exhaustiveness means the search algorithm may not have adequately sampled conformational space — the top-ranked pose may not be the true energy minimum, just the best pose found in a limited search. Results from under-sampled searches are poorly reproducible: run the same job twice and you’ll get meaningfully different scores.
exhaustiveness = 16 at minimum; 32 for flexible ligands (>8 rotatable bonds) or difficult targets. Test reproducibility by running the same docking three times with different random seeds — if top scores vary by more than 0.5 kcal/mol between runs, increase exhaustiveness until they converge. Report your exhaustiveness setting in the methods section.
Vina’s mode 1 is the highest-scoring pose, not necessarily the most biologically meaningful one. The scoring function is an approximation — it can favor poses with favorable electrostatic terms that are geometrically wrong over poses with correct hydrogen bonding geometry that score slightly lower. For flexible ligands, the correct binding mode sometimes appears at mode 2 or 3, with mode 1 representing a false energy minimum that looks good numerically but makes no chemical sense.
The most seductive mistake in docking. A compound scores −11.4 kcal/mol and suddenly it’s being described as a “potent inhibitor” in a draft paper. Vina scores are estimates of binding free energy with a correlation of roughly r = 0.5–0.6 against experimental affinities across diverse compound sets. A compound scoring −11 may experimentally bind at micromolar affinity. A compound scoring −7 may be your most potent hit. The score is useful for ranking within a campaign; it is not a substitute for experimental measurement.
This is the mistake that reviewers catch most reliably — and the one that most consistently indicates a protocol that cannot be trusted. Self-docking validation (redocking the co-crystallized ligand back into the prepared receptor and comparing the result to the crystal structure) is the standard check that your preparation workflow is producing meaningful results. If your protocol cannot reproduce a known experimental pose to within 2.0 Å RMSD, there is no basis for trusting what it tells you about novel compounds.
Journals in computational chemistry and structural biology increasingly require this validation to be reported. Skipping it doesn’t just weaken your paper — it means you may be acting on results from a broken protocol without knowing it.
Quick reference: all 10 mistakes at a glance
| Mistake | Severity | Core fix |
|---|---|---|
| Wrong grid box | Critical | Center on co-crystallized ligand centroid; verify pose in PyMOL |
| Skipping protein prep | Critical | Full pipeline: remove waters/ligands, add H, assign charges, generate PDBQT |
| Wrong protonation states | Critical | Use H++ or PropKa at pH 7.4; check active site His manually |
| Alternate conformations | High | Remove alt B with deleteAltB flag or PyMOL commands |
| Wrong heteroatoms removed | High | Look up each HETATM before removing; keep biological metals/cofactors |
| Bad ligand preparation | High | Open Babel with --gen3d -p 7.4 --partialcharge gasteiger |
| Exhaustiveness too low | High | Use 16 minimum; 32 for flexible ligands; test reproducibility |
| Only checking mode 1 | Medium | Inspect top 3 poses; use SAR data to select biologically best pose |
| Over-trusting scores | Medium | Scores rank compounds; they do not predict experimental activity |
| No self-docking validation | Critical | Redock co-crystallized ligand; require RMSD < 2.0 Å before proceeding |
The one-paragraph version
Docking is only as trustworthy as the preparation that precedes it. A correctly run search algorithm on a badly prepared receptor produces wrong answers that look right — and that’s a more dangerous failure mode than an obvious error. Build a preparation checklist, run self-docking validation on every new receptor, inspect your poses visually before trusting any score, and report your validation RMSD. Do those four things and you’ll avoid most of what trips up beginners and gets papers sent back from reviewers.