How to Use AlphaFold2 with ColabFold: Predict a Protein Structure for Free (2026 Guide)
ColabFold brings AlphaFold2-quality structure prediction to any researcher with a Google account — no installation, no HPC cluster, no cost. This tutorial walks through every step from opening the notebook to downloading and interpreting your results.
What ColabFold is and how it relates to AlphaFold2
AlphaFold2 is the deep learning model developed by DeepMind that transformed protein structure prediction. Its model weights are publicly available — but running it yourself requires significant technical setup: installing dependencies, downloading multi-terabyte sequence databases, and having access to a GPU. For most researchers, that’s a significant barrier.
ColabFold, developed by the Steinegger lab at Seoul National University, solves this by packaging AlphaFold2 into a Google Colab notebook that runs entirely in your browser. It replaces AlphaFold2’s MSA generation step — which normally requires downloading massive databases — with a fast remote search against MMseqs2 servers. The result is a prediction pipeline that’s nearly as accurate as the original, takes 15–30 minutes per protein, and requires nothing more than a Google account.
Before you start
You need three things before beginning:
- A Google account — Colab runs in your Google account’s compute allocation. Free tier works for most proteins; longer sequences or multiple models may benefit from Colab Pro.
- Your protein sequence in FASTA format — the amino acid sequence starting with a
>headerline, followed by the sequence. Get this from UniProt, NCBI, or your own sequencing data. - A stable internet connection — the notebook connects to external MSA servers and the Colab runtime. Dropped connections mid-run require restarting from the beginning.
Step 1 — Open the ColabFold notebook
Go to colab.research.google.com. In the search bar, type “ColabFold AlphaFold2” — you’ll find the official notebook published by the Steinegger lab. Alternatively, go directly to the ColabFold GitHub page at github.com/sokrypton/ColabFold and click the “Open in Colab” badge next to the AlphaFold2 notebook.
Once the notebook opens, sign into your Google account if prompted. You’ll see a notebook with several grey code cells — these are the steps that run in sequence. You don’t need to understand the code. You only interact with the form fields at the top of the first cell.
Before running anything, connect to a runtime with a GPU: go to Runtime → Change runtime type → T4 GPU, then click Save. Without a GPU, predictions run on CPU and take many hours. With a GPU, they complete in 15–30 minutes.
Step 2 — Enter your sequence
In the first code cell, you’ll see a form field labelled query_sequence. Paste your sequence here. You can paste it as plain amino acid sequence (just the letters) or in FASTA format with a header line — ColabFold handles both.
Below the sequence field is a jobname field. Give your job a descriptive name — something like P53_human_TP53 or EGFR_kinase_domain. This name is used to label your output files, so it’s worth being specific. Avoid spaces and special characters; use underscores.
Step 3 — Choose your settings
Below the sequence input you’ll find several dropdown menus controlling how the prediction runs. Most can be left at their defaults — but MSA mode is worth understanding:
Other settings worth knowing
- num_relax — how many of the 5 predicted models to run through AMBER energy relaxation. Set to 1 for the top model. Setting to 5 gives you relaxed versions of all models but takes longer.
- num_models — ColabFold generates 5 models by default and ranks them by predicted confidence. Keep this at 5 for publication work; reduce to 1 or 2 for quick exploratory predictions.
- use_ptm — leave checked. This enables the pTM (predicted TM-score) confidence metric used for ranking models.
- use_dropout — leave unchecked for standard predictions. Only check this if you want to generate diverse ensemble structures by introducing stochastic variation.
Step 4 — Run the prediction
Go to Runtime → Run all (or press Ctrl+F9 / Cmd+F9). The notebook runs its cells sequentially, and you’ll see output appear below each cell as it completes.
The run proceeds through these phases — you’ll see progress messages in the cell output:
- Installation (1–3 min) — ColabFold installs its dependencies. Only needed on first run or after runtime reset.
- MSA generation (2–5 min) — your sequence is sent to the MMseqs2 servers and the MSA is built. You’ll see a count of sequences found.
- Structure prediction (10–25 min) — AlphaFold2 runs through its 5 models. Progress bars appear for each.
- Relaxation (2–5 min) — the top model(s) are refined with AMBER energy minimization.
- Output generation — results are compiled and displayed inline in the notebook.
When the run completes, the notebook displays a 3D visualization of the top-ranked model directly in the browser, colored by pLDDT confidence. Blue regions are high confidence; yellow and orange are moderate; red is low. This inline view is useful for a quick sanity check but is not publication quality — download the files for proper visualization in PyMOL or VMD.
Step 5 — Download your results
At the end of the notebook, a download cell compresses all outputs into a ZIP file and provides a download link. Click it to save everything to your computer. The ZIP contains:
-
*_relaxed_rank_1.pdbThe top-ranked, AMBER-relaxed structure — this is the file to use for docking, MD simulation, or structural analysis. The number in “rank_1” refers to AlphaFold’s confidence ranking, not quality order.
-
*_unrelaxed_rank_*.pdbAll 5 predicted models before AMBER relaxation. Useful for comparing model agreement — if all 5 look similar, the prediction is robust.
-
*_scores_rank_1.jsonPer-residue pLDDT scores and the PAE matrix in JSON format. Essential for assessing confidence — download this alongside the PDB file.
-
*_coverage.pngMSA coverage plot showing how many sequences were found for each position. Poor coverage at a region predicts lower accuracy there.
-
*_pae.pngPredicted Aligned Error heatmap — essential for multi-domain proteins. Dark blue squares indicate confident relative positioning between regions.
-
*_plddt.pngPer-residue pLDDT plot — a quick visual overview of which parts of your protein are confidently predicted.
Step 6 — Understand and visualize your outputs
With your files downloaded, the most important next step is assessing confidence before doing anything else with the structure.
Reading the pLDDT score
Open the _plddt.png file first. This plot shows confidence per residue — x-axis is residue number, y-axis is pLDDT score. Here’s how to read it:
Visualizing in PyMOL
Load the top-ranked relaxed PDB into PyMOL and color by the pLDDT values stored in the B-factor column:
# Load the AlphaFold prediction
load jobname_relaxed_rank_001.pdb
# Color by pLDDT (stored in B-factor column)
# Blue = high confidence, red = low confidence
spectrum b, blue_cyan_yellow_orange_red, minimum=50, maximum=100
# Show as cartoon
show cartoon
hide lines
Reading the PAE plot
Open the _pae.png file. This is a square matrix — each cell (i,j) shows AlphaFold’s confidence in the relative position of residues i and j. Dark blue = confident. Light blue/white = uncertain.
For a single-domain protein, you want to see a uniformly dark blue square — high confidence everywhere. For a multi-domain protein, look for two or more dark blue squares along the diagonal (confident within each domain) with a lighter off-diagonal region (uncertain relative domain orientation). That pattern tells you the individual domains are well-predicted but their arrangement in space is uncertain.
Common problems and fixes
- Runtime disconnected mid-run — reconnect and run all cells again from the beginning. The MSA step will re-run from scratch. Consider switching to Colab Pro for longer proteins.
- Sequence too long — out of memory error — switch to a Colab Pro runtime with an A100 GPU, or trim your sequence to the domain of interest rather than the full protein.
- MSA search returns very few sequences — this is expected for orphan proteins. The prediction will still run but confidence will be lower. Check the coverage PNG — if most positions have fewer than 10 sequences, consider ESMFold as an alternative.
- All 5 models look completely different — low model agreement indicates the protein is either disordered, the MSA has poor coverage, or the protein has multiple stable conformations. Don’t pick one arbitrarily — investigate the pLDDT and PAE before proceeding.
- Download link doesn’t appear — scroll to the very last cell and run it manually by clicking the play button on that cell. The download cell sometimes doesn’t auto-execute.
- Checked the AlphaFold Database first — model wasn’t already available
- GPU runtime selected in Colab (T4 or better)
- Sequence pasted without spaces, numbers, or non-amino-acid characters
- MSA mode set to mmseqs2_uniref_env (default)
- Run completed — all 5 models generated
- ZIP downloaded — contains PDB files and JSON confidence scores
- pLDDT plot checked — binding site and key regions have pLDDT > 70
- PAE plot checked — especially important for multi-domain proteins
ColabFold in one paragraph
ColabFold makes AlphaFold2-quality structure prediction accessible to any researcher with a Google account. Open the notebook, paste your sequence, run all cells, and download the results — the entire active process takes about five minutes. The remaining 15–25 minutes is compute time you spend doing something else. The outputs that matter most are the top-ranked relaxed PDB file and the JSON confidence scores — always check pLDDT and PAE before using any predicted structure for docking, MD simulation, or publication.