Homology Modeling Tutorial: Swiss-Model and MODELLER for Beginners

Homology Modeling Tutorial: Swiss-Model and MODELLER for Beginners

Homology modeling has been the workhorse of structural biology for three decades — and despite AlphaFold’s emergence, it remains the method of choice in several important situations. This tutorial explains what it is, when to use it, and how to run it using two of the most widely used tools: Swiss-Model and MODELLER.

What homology modeling is

Homology modeling — also called comparative modeling — predicts the 3D structure of a protein (the target) by using the known experimental structure of a related protein (the template) as a guide. The underlying assumption is evolutionary: proteins that share significant sequence identity almost certainly share a similar fold. If two proteins are 50% identical in sequence, their backbone structures typically superimpose within 1–2 Å.

The method works in three stages. First, find one or more structurally characterized proteins with detectable similarity to your target. Second, align the target sequence to the template sequence, mapping each residue of the target to a position in the template structure. Third, use that alignment to build a 3D model: copy the template backbone, rebuild divergent regions (loops) using sampling algorithms, and optimize side chains and overall geometry by energy minimization.

The quality of the result depends almost entirely on two things: the quality and resolution of the template, and the sequence identity between target and template. Both are checkable before you run a single calculation.

When to use homology modeling vs AlphaFold

Use AlphaFold when
AlphaFold (via ColabFold or AFDB)
  • No structural template exists in the PDB
  • Sequence identity to known structures is below 30%
  • You need a quick, no-template prediction
  • The protein is in the AlphaFold Database already
  • You want confidence scores (pLDDT, PAE)
  • Large-scale proteome-wide prediction
Use homology modeling when
Swiss-Model or MODELLER
  • You need a specific conformation (active, ligand-bound)
  • A high-identity template (>50%) exists in a relevant state
  • You want explicit control over template selection
  • Modeling a mutant using the wild-type structure
  • Comparing with previous homology modeling literature
  • MHC/antibody modeling with specialized templates

The most practically important case where homology modeling beats AlphaFold is conformation control. AlphaFold predicts its best estimate of the stable apo (unbound) state. If you want a model in the active conformation, or in the conformation observed when a specific ligand is bound, and a close homolog crystallized in that conformation exists in the PDB — homology modeling using that template will give you exactly what you need. AlphaFold cannot.

How sequence identity affects model quality

Sequence identity between target and template is the single most predictive factor of homology model quality. Before choosing a template, check the identity — it determines whether the model will be worth building.

> 50%
Excellent. Model backbone accuracy comparable to a medium-resolution crystal structure (2.0–2.5 Å RMSD). Side chains reliable in the core; some variation at surface positions. Suitable for docking, MD, and structure-based drug design.
30–50%
Good. Backbone generally correct; some loop regions may be poorly modeled. Core side chains reliable; surface side chains less so. Useful for most structural analyses with appropriate caveats.
20–30%
Moderate — twilight zone. Template and target may share a fold but alignment is uncertain. Model errors can be large, especially in loops. Validate carefully with multiple templates and quality checks. AlphaFold likely performs better in this range.
< 20%
Unreliable. Below this threshold, sequence similarity is no longer a reliable indicator of structural similarity. Do not build homology models at this identity level — use AlphaFold or abandon template-based approaches.

Running Swiss-Model: step by step

Swiss-Model (swissmodel.expasy.org) is the world’s most widely used homology modeling server. It is fully automated — you provide a sequence, it finds templates, aligns, builds, and evaluates a model in a single submission. No installation, no command line, no coding required.

1
Step 1
Submit your sequence

Go to swissmodel.expasy.org. Click Start Modelling. Paste your protein sequence in FASTA format into the target sequence field. Optionally paste in a project name — something descriptive so you can find the result later. Click Build Model.

Swiss-Model will automatically search the PDB for structural templates, select the best one based on sequence identity and coverage, build the alignment, and generate a model. For a typical protein (200–400 residues), this takes 2–10 minutes. You’ll receive an email when it’s done if you register, or you can keep the browser tab open.

2
Step 2
Review the template selection

The results page shows the template(s) selected, their PDB IDs, sequence identity, and coverage. This is the first thing to check: is the template appropriate for your question?

  • Sequence identity — ideally above 40%; check the scale above
  • Coverage — what fraction of your sequence is covered by the template? Low coverage means large regions will be modeled without a template (loop modeling), which is much less accurate
  • Template resolution — prefer templates at 2.5 Å or better; lower resolution templates have larger coordinate uncertainties
  • Template conformation — is the template in the state you need (apo, holo, active, inactive)? Click the PDB ID to check in the RCSB

If the auto-selected template is not appropriate, use the Template Selection tab to search for and manually specify a better template. This is one of Swiss-Model’s key advantages over automated tools — you can override the template choice.

3
Step 3
Evaluate the model quality

Swiss-Model automatically evaluates the model with two quality scores: QMEAN and the Ramachandran plot. Both appear on the results page alongside a 3D viewer.

The QMEAN Z-score compares your model to experimentally determined structures of similar size. A score near 0 means the model quality is consistent with real crystal structures of that size. Scores below −4 indicate poor model quality and should prompt investigation — wrong template, poor alignment, or a difficult loop region.

The Ramachandran plot shows backbone dihedral angle distributions. In a good model, over 95% of residues should fall in the favored regions (dark blue area). Residues in outlier regions (red) have strained geometry and may indicate modeling errors at those positions.

Download the model PDB file from the Download section for use in docking, MD simulation, or further analysis.

Running MODELLER: step by step

MODELLER is a command-line program developed at UCSF that gives you full control over every step of the homology modeling process. It’s more complex than Swiss-Model but offers capabilities Swiss-Model doesn’t: multi-template modeling, loop refinement protocols, and model optimization you can tune for your specific application. It’s free for academic use.

Install MODELLER via conda
The simplest installation is via conda-forge: conda install -c conda-forge modeller. You’ll need to register for a free academic license key at salilab.org/modeller and add it to your environment. The conda package handles the rest.

The core MODELLER workflow requires three things: your target sequence in FASTA format, the template PDB file, and an alignment file in PIR format mapping target residues to template residues. Swiss-Model (or HHpred for more sensitive alignments) can generate this alignment for you.

4
MODELLER — core script
Build a model with MODELLER Python API

Save this script as model.py in your working directory alongside the template PDB and alignment file:

model.py
# Basic MODELLER homology modeling script
# Replace TARGET, TEMPLATE, and ALIGNMENT values with yours
from modeller import *
from modeller.automodel import *

env = Environ()
env.io.atom_files_directory = ['.', '../templates']

# AutoModel: standard homology modeling
a = AutoModel(
    env,
    alnfile  = 'alignment.ali',   # PIR alignment file
    knowns   = '4XYZ',            # template PDB code (4-letter)
    sequence = 'target_protein',  # target sequence name in alignment
    assess_methods = (assess.DOPE,
                      assess.GA341)
)

# Generate 5 models — more models = better sampling
a.starting_model = 1
a.ending_model   = 5

a.make()

# Select best model by DOPE score (most negative = best)
ok_models = [m for m in a.outputs if m['failure'] is None]
key = 'DOPE score'
ok_models.sort(key=lambda a: a[key])
m = ok_models[0]
print(f"Best model: {m['name']} — DOPE: {m[key]:.3f}")
Terminal
python model.py 2>&1 | tee modeller.log

MODELLER generates the requested number of models (5 in the script above), calculates DOPE scores for each, and outputs them to the terminal. The best model is the one with the lowest (most negative) DOPE score.

The alignment is the most critical input
MODELLER builds exactly what the alignment tells it to. A misaligned residue produces a modeling error at that position — and MODELLER won’t warn you. Always inspect your alignment visually before running MODELLER, particularly around insertions and deletions. Use HHpred (toolkit.tuebingen.mpg.de/tools/hhpred) for sensitive alignments when sequence identity is below 40%; it’s significantly more accurate than standard BLAST for structurally similar but sequentially divergent proteins.

Evaluating model quality: DOPE, QMEAN and MolProbity

No homology model should be used for research without a quality assessment. Three complementary metrics cover different aspects of model quality:

DOPE Score
Discrete Optimized Protein Energy
A statistical energy function that scores how “protein-like” the modeled structure is. Calculated per residue, so you can identify which specific regions are poorly modeled. Used by MODELLER to rank multiple models.
More negative = better. No absolute threshold — compare models against each other. A DOPE profile plot identifies problem loops as local peaks.
QMEAN Z-score
Qualitative Model Energy ANalysis
Compares the model against a reference set of experimentally determined structures of similar size. Provides a global quality estimate normalized to what is expected for a real structure of that size.
Near 0 = good. Below −2: investigate. Below −4: significant quality problems. Available from Swiss-Model and ProSA servers.

After DOPE and QMEAN, run the model through MolProbity (molprobity.biochem.duke.edu) for stereochemical validation. MolProbity checks Ramachandran statistics, rotamer quality, bond lengths and angles, and steric clashes. A publication-quality homology model should have:

  • Ramachandran favored > 95%, outliers < 0.5%
  • Rotamer outliers < 1%
  • MolProbity clashscore below 20 (ideally below 10)
  • MolProbity score below 2.0
Also try ProSA for Z-score visualization
ProSA (prosa.services.came.sbg.ac.at) is a free web server that calculates the Z-score and plots energy per residue, making it easy to visually identify problematic loop regions that warrant refinement or closer scrutiny before use.

What to report in a methods section

A complete homology modeling methods section should include: the target sequence source (UniProt accession), template selection method (automated Swiss-Model vs manual), the selected template PDB ID and chain, sequence identity and coverage to the target, how many models were generated, which model was selected and by what criterion (DOPE score, QMEAN), and the stereochemical validation results from MolProbity.

Example methods statement
“A homology model of [protein] was generated using MODELLER 10.4 with the crystal structure of [homolog] (PDB: 4XYZ, chain A, 2.1 Å resolution) as template, selected based on 58% sequence identity and 94% coverage of the target sequence. Five models were generated and ranked by DOPE score; the best-scoring model was selected for further analysis. Model quality was assessed using QMEAN (Z-score = −0.8) and MolProbity (clashscore 7.2, Ramachandran favored 97.3%, outliers 0.2%).”

Homology modeling in one paragraph

Homology modeling builds a 3D protein model by copying the backbone from a structurally characterized relative and optimizing the sequence-specific differences. It works best when sequence identity to the template exceeds 30%, and it remains the preferred method when you need a model in a specific conformation — active, ligand-bound, or mutant — that AlphaFold cannot access from sequence alone. Swiss-Model is the fastest entry point: paste your sequence, get a model and quality scores in minutes. MODELLER gives you full control for publication-quality work. Always validate with DOPE, QMEAN, and MolProbity before using any model for docking or MD simulation.

Last updated on

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *