𧬠OrthoGather: a local platform for orthology-based proteome comparison and Gene Ontology enrichment
OrthoGather β compare proteomes with OrthoFinder and discover function with GOATOOLS β all in a local web application.
Download UniProt proteomes, run OrthoFinder, perform Gene Ontology enrichment, and export publication-ready figures and tables.
Requires Python 3.11 and OrthoFinder 2.5.5 (runs natively on Apple Silicon, Intel macOS, Linux, and WSL β no Rosetta needed).
OrthoGather is a local web application that integrates orthology inference with functional interpretation for comparative proteome and proteomics analyses.
It enables users to:
- Download reference proteomes from UniProt and infer orthogroups using OrthoFinder.
- Explore shared and species-specific orthogroups through interactive UpSet plots.
- Perform Gene Ontology (GO) enrichment with GOATOOLS, leveraging orthogroup relationships to propagate functional information across species.
- Generate publication-ready figures and Excel tables for downstream analysis.
All analyses run locally, favouring privacy, reproducibility, and rapid iteration, and are particularly useful when working with poorly annotated or non-model organisms.
Pick species from the 1,013,422-proteome UniProt catalogue with live search, then run OrthoFinder locally.
A substantial fraction of proteins across organisms remain under-annotated or inconsistently annotated, which complicates functional interpretation and cross-species comparisons. This is particularly limiting in proteomics experiments involving non-model species or clinical isolates.
OrthoGather addresses this gap by exploiting orthogroup relationships to transfer functional information from well-annotated proteins to those with limited annotation. By combining orthology-based comparison with Gene Ontology enrichment in a single workflow, the platform moves beyond identifying shared proteins toward inferring shared and species-specific biological functions.
Starting from any UniProt-associated proteome set, orthology provides evolutionary context, while Gene Ontology enrichment provides a functional readout β both integrated into a single, local interface.
If you use OrthoGather in your research, please cite:
Manuscript in preparation.
This section will be updated with the bioRxiv preprint and the final journal reference once available.
For non-technical users β download the installer for your computer, double-click, and it sets up everything (package manager, the app, Python 3.11, OrthoFinder) and adds an OrthoGather launcher to your Desktop:
- macOS (Apple Silicon & Intel):
Install OrthoGather.command - Windows 10/11:
Install OrthoGather.bat(enables WSL automatically β OrthoFinder has no native Windows build; needs admin + one restart, then continues by itself)
Get them from the Releases page.
See installers/ for details. The manual conda route below still works
for advanced users.
Before installing OrthoGather, please ensure that you have:
- Git (a plain clone is enough β no Git LFS required)
- Conda or Micromamba
- A Unix-based environment (macOS, Linux, or WSL)
The proteome catalogue ships compressed inside the repo (
static/Proteomes_json/proteomes_list.json.gz, ~19 MB) and is unpacked automatically on the first launch. The app also checks GitHub for a newer catalogue and offers a one-click update.
To install OrthoGather, first clone the repository and move into the project folder:
git clone https://github.com/CarlosVivasR/OrthoGather.git
cd OrthoGatherThe canonical setup is a single Conda environment defined in environment.yml.
It installs Python 3.11, OrthoFinder 2.5.5, and every Python dependency β and runs
natively on Apple Silicon, Intel macOS, Linux, and WSL (no Rosetta).
conda env create -f environment.yml
conda activate orthogather
python app.pyThat's it. Every time you want to use OrthoGather, just conda activate orthogather and python app.py.
Using Micromamba instead of Conda? Replace
condawithmicromambain the commands above.
If you prefer a guided installer that also checks prerequisites, run the script for your platform:
./installers/install_orthogather_mac.sh # macOS (Apple Silicon or Intel)
./installers/install_orthogather_wsl.sh # Linux / WSLBoth scripts create the same orthogather environment from environment.yml and verify that OrthoFinder is detected.
You can start an analysis in three ways:
Select organisms from a UniProt catalog, download proteomes, and run OrthoFinder locally with live logs.
Creates a clean, self-contained workspace for your study.
A ready-to-use example that lets you explore the full workflow immediately (ideal for demos or teaching).
Upload a .zip with previously generated OrthoFinder results from another system to reuse completed analyses without recomputation.
Regardless of the entry point, OrthoGather focuses downstream steps on the standard Orthogroups output, keeping only what is needed for analysis and export.
OrthoGather invokes OrthoFinder with -og (skip gene/species trees β we only need orthogroups) and auto-detects your CPU count for the -t (sequence-search) and -a (analysis) thread counts. On a 10-core Mac you'll see -t 10 -a 2; on a 4-core laptop, -t 4 -a 1. Override if you want to leave headroom for other work:
ORTHOGATHER_OF_THREADS=6 ORTHOGATHER_OF_ANALYSIS_THREADS=1 python app.pyThe convention -a β -t / 4 follows OrthoFinder's own recommendation: the analysis phase is memory-bound and oversubscription hurts more than it helps.
Once orthogroups are available (generated or uploaded), you can take either route β or both β in any order.
This module helps you examine the presence and distribution of orthogroups across a user-defined subset of species and, optionally, narrow the scope to proteins of interest via UniProt IDs.
Features:
- Subset by species β pick two or more species to create a focused comparison set (useful for clades, modelβnon-model contrasts, or custom panels).
- Two UpSet plots (via UpSetPlot):
- Species combinations β number of orthogroups unique/shared across species combinations (presence/absence patterns).
- Protein contribution β how many proteins each combination contributes, clarifying the magnitude behind intersections.
- Optional protein-level filter β restrict orthogroups to those containing specific UniProt IDs (e.g., differentially expressed proteins, pathway members, or candidate families).
Exports: publication-ready PNG figures and Excel/CSV tables summarizing orthogroup membership and intersections.
This module turns orthogroup-level findings into functional hypotheses.
Workflow:
- GOA download (per species) and an annotation coverage panel (4-in-1) to gauge how well proteins are annotated before enrichment.
- Define sets:
- Foreground β paste UniProt IDs for the set to be tested.
- Background β paste UniProt IDs or use βall species with GOAβ from your selection.
- Include complete orthogroups (optional) β expand IDs to all members of their orthogroups to capture functionally related proteins.
- Run enrichment with GOATOOLS, then review significant terms and download detailed results.
Outputs: the enrichment figure and structured tables for downstream exploration.
Every user-visible error in OrthoGather has a stable code, a clear message,
and an actionable hint. The catalogue lives in
orthogather/utils/error_catalog.py (~60 entries today). Backend routes call
respond_error("ERR_CODE", where=..., detail=...) and the frontend renders
the response as a uniform toast via static/js/og-errors.js.
Adding a new error: open error_catalog.py, add a new ErrorSpec with
a code starting with ERR_, then reference it from your route. The pytest
suite at tests/test_error_catalog.py enforces that every code referenced
from app.py exists in the catalogue.
Categories: input, state, data, network, external, not-found,
system. Severities: error, warning, info. The frontend toast styles
itself accordingly (red / amber / blue border, matching icon).
Global error handlers (@app.errorhandler(404), @app.errorhandler(500),
@app.errorhandler(Exception)) catch anything that escapes and render
either JSON (for Accept: application/json / /api/* paths) or the
branded templates/error.html page (for HTML requests).
OrthoGather ships with a pytest suite that locks in the species-matching contract (see tests/test_species_matching.py). To run it:
conda activate orthogather
pip install -r requirements-dev.txt # installs pytest, selenium, webdriver-manager
pytest tests/ -vThe same dev requirements file also installs the tools used by
tools/capture_tutorial_screenshots.py to regenerate the tutorial figures
(headless Chrome via Selenium).
OrthoGather is designed to grow. Near-term additions include:
- GO DAG visualisation
- Richer summary plots
- Faster foreground/background iteration
- Lightweight batch workflows
All while keeping the same local, reproducible, and privacy-preserving design.
In short: formulate testable functional hypotheses from orthogroup presence/absence, exploit well-annotated orthologs to illuminate under-annotated proteins, and obtain immediate, visual answers to βwho shares what?β β with publication-ready outputs and no cloud dependency.
- OrthoFinder β phylogenetic orthology inference platform. See papers linked in their README. OrthoFinder GitHub
- GOATOOLS β Python library for Gene Ontology analyses. GOATOOLS GitHub
- UpSetPlot β visualization of set intersections. UpSetPlot Docs
- UniProt β comprehensive resource for protein sequence and annotation. UniProt
