How to Interpret Molecular Docking Results: Scores, Poses & What They Mean

13 min read Beginner to intermediate

You’ve run AutoDock Vina and you’re staring at a table of scores. What does −8.3 kcal/mol actually mean? Is that good? How do you know if the pose is biologically meaningful? This guide explains exactly how to read docking results — and the interpretation mistakes that most beginners make.

Understanding binding affinity scores (kcal/mol)

AutoDock Vina outputs binding affinity as a number in kcal/mol — kilocalories per mole. This is an estimate of the binding free energy: how much free energy is released when the ligand binds to the protein. The more negative the number, the more favorable the predicted binding.

Thermodynamically, the relationship is straightforward: a more negative ΔG means the bound state is more stable relative to the unbound state. In practice, Vina’s scores are a computed approximation of this quantity, not a direct measurement — which is why you should always treat them as estimates, not facts.

Binding affinity score reference — AutoDock Vina (kcal/mol)

−11 to −14

Very strong — rare, worth prioritizing

−9 to −11

Strong — solid hit in most contexts

−7 to −9

Moderate — evaluate pose carefully

Above −7

Weak — unlikely useful hit

These ranges are useful guidelines, not hard cutoffs. What counts as a “good” score depends on the target. Enzymes with deep, enclosed binding pockets tend to produce stronger scores than proteins with shallow or open binding sites. A score of −8 kcal/mol is impressive for a GPCR and unremarkable for a protease.

Scores are not comparable across different targets

If compound A scores −10 kcal/mol against protein X and compound B scores −10 kcal/mol against protein Y, this tells you nothing about which compound binds more tightly in absolute terms. Scores are only meaningful for ranking compounds against the same target. Never use them to compare affinities across different proteins.

Scores vs. experimental IC50 — how well do they correlate?

The honest answer is: moderately. Across diverse compound datasets, the correlation between Vina scores and experimental binding affinities (Kd, Ki, or IC50) is roughly r = 0.5–0.6. That’s real signal — docking genuinely enriches hit rates — but it’s far from predictive. A compound scoring −10 kcal/mol may experimentally bind worse than one scoring −8, especially if the −10 compound has a problematic pose or unusually favorable electrostatic terms that the scoring function overweights.

Use scores for ranking and filtering. Do not use them to predict experimental IC50 values.

What the RMSD columns mean

After the affinity column, Vina prints two more columns: rmsd l.b. and rmsd u.b. These are lower bound and upper bound estimates of how different each pose is from the top-ranked pose (mode 1).

RMSD stands for Root Mean Square Deviation — a measure of atomic displacement between two structures. An RMSD of 1.0 Å means the atoms in one pose are, on average, 1 Å away from their positions in the reference pose.

< 2 Å

Similar pose

This pose and mode 1 share essentially the same binding geometry. The algorithm is converging on the same solution.

2–4 Å

Different orientation

Meaningfully different from mode 1 — worth inspecting visually. Could be an alternate binding mode or a non-specific pose.

> 4 Å

Distinct pose

A substantially different pose. Usually indicates the ligand is sampling a different region of the binding site — or has escaped outside it entirely.

A practical rule: if modes 1, 2, and 3 all have low RMSD relative to each other (l.b. < 2 Å), the algorithm has converged — it found a genuine energy minimum and sampled it repeatedly. This is a good sign. If all 9 modes have wildly different RMSDs and scores spread across 3+ kcal/mol, the search hasn’t converged — increase exhaustiveness and re-run.

RMSD in validation vs. RMSD in the output table

These are two different uses of RMSD. The output table RMSDs compare docked poses to each other. Validation RMSD compares your top docked pose to the experimentally determined crystal pose — the number used to judge whether your docking protocol is working. The validation threshold is 2.0 Å. The output table RMSDs have no pass/fail threshold.

Analyzing poses in PyMOL

A score is a number. A pose is a 3D structure. You need both to make a judgment about whether a docking result is meaningful. Here is a systematic workflow for analyzing poses visually in PyMOL after a Vina run.

1

Load receptor and poses

Load both files and set up the display. The docked output PDBQT contains all modes as separate MODEL entries — PyMOL loads them as states of a single object.

PyMOL command line

load receptor/5KIR_receptor.pdbqt, receptor
load output/docked.pdbqt, poses
hide everything
show cartoon, receptor
show sticks, poses
zoom poses

2

Check whether the ligand is inside the binding pocket

This is the first and most important check. Use the state slider at the bottom of the PyMOL window to cycle through poses. For each top pose, confirm that the ligand sits inside the cavity — not floating above the surface, not partially buried in the protein backbone, not at a symmetry-related site on the protein exterior.
3

Identify contacts with key binding site residues

Show the residues within 5 Å of the ligand as sticks, and use the distance tool to measure interactions. Hydrogen bonds should be 2.5–3.5 Å between donor and acceptor heavy atoms. Hydrophobic contacts should be 3.5–5.0 Å. Compare the contacts you see to published mutagenesis or structural data for your target.

PyMOL command line

# Show binding site residues
select binding_site, receptor and (byres poses expand 5)
show sticks, binding_site

# Measure a specific potential hydrogen bond
distance hbond1, poses and name O, receptor and resi 120 and name NH2

# Find all contacts within 4 Å
select contacts, receptor and (byres poses around 4)

4

Check for steric clashes

If the ligand overlaps with protein atoms (visible as interpenetrating sticks), the preparation went wrong somewhere. This should not happen with a properly prepared receptor — if you see it, check for unknown atom types in your PDBQT file and re-prepare.
5

Compare the top 3 poses

Don’t just look at mode 1. Use the state controls to examine modes 1, 2, and 3. If they all show the same binding geometry with minor variations, you have convergent, trustworthy results. If mode 2 shows the ligand flipped or displaced, decide which pose is more chemically reasonable based on the interactions it makes.

What a good docking result looks like

There is no single number that defines a good result. A good docking result is a combination of a reasonable score, a chemically sensible pose, and consistency between poses. Here’s what to look for and what raises red flags.

Good signs

Ligand fully inside the pocket

The entire ligand sits within the binding cavity with no atoms extending into solvent.

Red flags

Ligand on the protein surface

The ligand docked to an exterior surface region — grid box was too large or incorrectly centered.

Good signs

H-bonds to known key residues

The pose makes hydrogen bonds to residues known from mutagenesis or other structural data to be important for binding.

Red flags

No contacts with key residues

Good score but the ligand makes no meaningful contacts with the pharmacophore — likely a false positive driven by scoring function artifacts.

Good signs

Modes 1–3 are convergent

Top poses are similar to each other (low inter-mode RMSD) — the algorithm found a stable energy minimum and sampled it consistently.

Red flags

All poses look completely different

Non-convergent sampling suggests the search space is too large or exhaustiveness is too low. Increase exhaustiveness to 16–32 and re-run.

Good signs

Score consistent with target class

The score is in the expected range for your target type — not suspiciously better or worse than known binders from the literature.

Red flags

Implausibly strong score

Scores below −13 for a drug-like molecule are rare and often indicate a preparation artifact, a too-small grid box concentrating the search, or a known problem compound (e.g. PAINS).

Common interpretation mistakes

Reporting only the top score without inspecting the pose

The most common mistake in docking papers. A score of −10 kcal/mol means nothing if the ligand is half outside the binding pocket or making no chemically meaningful contacts. The score is hypothesis; the pose is evidence.

Always show the binding pose in your figures. Always describe which residues the ligand contacts and compare to known pharmacophore data.
Comparing scores across different targets

A score of −9 kcal/mol against kinase A does not mean the same thing as −9 kcal/mol against kinase B. Binding site size, polarity, and geometry all affect absolute scores. Rankings within one target are meaningful; cross-target comparisons of raw scores are not.

When comparing across targets, use normalized metrics or report enrichment factors from benchmark sets rather than raw kcal/mol values.
Treating mode 1 as definitively correct

Vina ranks poses by predicted energy, not by biological correctness. The top-ranked pose is the most energetically favorable according to a flawed scoring function. The actual binding mode may be mode 2 or 3, especially for flexible ligands or targets with known induced-fit behavior.

Inspect the top 3 poses. If literature SAR data exists, use it to select the most chemically reasonable pose rather than defaulting to mode 1.
Skipping self-docking validation before reporting results

If you have not confirmed that your protocol can reproduce a known crystal structure pose (RMSD < 2.0 Å), you have no basis for trusting your results on novel ligands. This is not optional for publication-quality work.

Always run self-docking validation on the co-crystallized ligand before reporting any docking results. Report the validation RMSD in your methods section.
Confusing docking score with biological activity

Docking scores predict binding affinity, not biological activity. A compound can bind tightly and be inactive (wrong binding mode, wrong mechanism, poor cell permeability). A compound can have a mediocre docking score and be an excellent drug candidate for reasons docking cannot assess.

Frame docking results as hypotheses about binding affinity. Biological activity requires experimental validation. Never claim activity based on docking alone.

How to report docking results in a paper

The methods section should include: the software version and scoring function used, the source and resolution of the receptor structure, how the receptor was prepared (including protonation state handling), the grid box center and dimensions, the exhaustiveness setting, how many poses were generated, and the self-docking validation RMSD. Reviewers increasingly expect all of this.

The right way to report docking results

Good docking analysis tells a story: here is the score, here is the pose, here are the specific contacts the ligand makes, here is why those contacts are biologically meaningful, and here is how this result was validated. A score alone is never enough.

The best docking figures show the ligand inside the binding pocket as sticks, with the key interacting residues labeled, hydrogen bonds drawn as dashed lines, and a caption that includes both the binding affinity score and the validation RMSD. That level of detail signals to reviewers that you understand what docking can and cannot tell you.

The three-question test for any docking result

Before trusting a docking result, ask three questions. First: is the pose inside the binding pocket and making chemically reasonable contacts with known key residues? Second: are the top poses convergent — do modes 1, 2, and 3 show consistent binding geometry? Third: did your protocol pass self-docking validation on a known co-crystallized ligand? If the answer to any of these is no, the result needs more work before it can be reported or acted on.

How to Interpret Molecular Docking Results: Scores, Poses & What They Mean

Understanding binding affinity scores (kcal/mol)

Scores vs. experimental IC50 — how well do they correlate?

What the RMSD columns mean

Analyzing poses in PyMOL

What a good docking result looks like

Common interpretation mistakes

The right way to report docking results

The three-question test for any docking result

Free Molecular Docking Software in 2026: The Complete List

GNINA Tutorial: Deep Learning Molecular Docking as an Alternative to AutoDock Vina

How to Install AutoDock Vina on Windows, Mac and Linux (2026 Guide)

AutoDock vs AutoDock Vina vs Glide: Best Molecular Docking Software in 2026

How to Prepare a Protein for Molecular Docking: Complete Step-by-Step Guide

Molecular Docking vs Molecular Dynamics: What’s the Difference and When to Use Each

Leave a Reply Cancel reply

Understanding binding affinity scores (kcal/mol)

Scores vs. experimental IC50 — how well do they correlate?

What the RMSD columns mean

Analyzing poses in PyMOL

What a good docking result looks like

Common interpretation mistakes

The right way to report docking results

The three-question test for any docking result

Similar Posts

Leave a Reply Cancel reply