Skip to content

CarlosVivasR/OrthoGather

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

53 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 OrthoGather: a local platform for orthology-based proteome comparison and Gene Ontology enrichment

DOI Python OrthoFinder Platforms License

OrthoGather β€” compare proteomes with OrthoFinder and discover function with GOATOOLS β€” all in a local web application.
Download UniProt proteomes, run OrthoFinder, perform Gene Ontology enrichment, and export publication-ready figures and tables.
Requires Python 3.11 and OrthoFinder 2.5.5 (runs natively on Apple Silicon, Intel macOS, Linux, and WSL β€” no Rosetta needed).

OrthoGather β€” local, open, reproducible


🧩 Overview

OrthoGather is a local web application that integrates orthology inference with functional interpretation for comparative proteome and proteomics analyses.

It enables users to:

  • Download reference proteomes from UniProt and infer orthogroups using OrthoFinder.
  • Explore shared and species-specific orthogroups through interactive UpSet plots.
  • Perform Gene Ontology (GO) enrichment with GOATOOLS, leveraging orthogroup relationships to propagate functional information across species.
  • Generate publication-ready figures and Excel tables for downstream analysis.

All analyses run locally, favouring privacy, reproducibility, and rapid iteration, and are particularly useful when working with poorly annotated or non-model organisms.

Pick species from the UniProt catalogue with live search
Pick species from the 1,013,422-proteome UniProt catalogue with live search, then run OrthoFinder locally.


πŸ’‘ Why it helps

A substantial fraction of proteins across organisms remain under-annotated or inconsistently annotated, which complicates functional interpretation and cross-species comparisons. This is particularly limiting in proteomics experiments involving non-model species or clinical isolates.

OrthoGather addresses this gap by exploiting orthogroup relationships to transfer functional information from well-annotated proteins to those with limited annotation. By combining orthology-based comparison with Gene Ontology enrichment in a single workflow, the platform moves beyond identifying shared proteins toward inferring shared and species-specific biological functions.

Starting from any UniProt-associated proteome set, orthology provides evolutionary context, while Gene Ontology enrichment provides a functional readout β€” both integrated into a single, local interface.


πŸ“– Citation

If you use OrthoGather in your research, please cite:

Manuscript in preparation.

This section will be updated with the bioRxiv preprint and the final journal reference once available.


πŸ”½ Download and Installation

⚑ Easiest: one-click installer (no terminal)

For non-technical users β€” download the installer for your computer, double-click, and it sets up everything (package manager, the app, Python 3.11, OrthoFinder) and adds an OrthoGather launcher to your Desktop:

  • macOS (Apple Silicon & Intel): Install OrthoGather.command
  • Windows 10/11: Install OrthoGather.bat (enables WSL automatically β€” OrthoFinder has no native Windows build; needs admin + one restart, then continues by itself)

Get them from the Releases page. See installers/ for details. The manual conda route below still works for advanced users.

Prerequisites

Before installing OrthoGather, please ensure that you have:

  • Git (a plain clone is enough β€” no Git LFS required)
  • Conda or Micromamba
  • A Unix-based environment (macOS, Linux, or WSL)

The proteome catalogue ships compressed inside the repo (static/Proteomes_json/proteomes_list.json.gz, ~19 MB) and is unpacked automatically on the first launch. The app also checks GitHub for a newer catalogue and offers a one-click update.

Clone the repository

To install OrthoGather, first clone the repository and move into the project folder:

git clone https://github.com/CarlosVivasR/OrthoGather.git
cd OrthoGather

βœ… Recommended: one-line install with Conda (all platforms)

The canonical setup is a single Conda environment defined in environment.yml. It installs Python 3.11, OrthoFinder 2.5.5, and every Python dependency β€” and runs natively on Apple Silicon, Intel macOS, Linux, and WSL (no Rosetta).

conda env create -f environment.yml
conda activate orthogather
python app.py

That's it. Every time you want to use OrthoGather, just conda activate orthogather and python app.py.

Using Micromamba instead of Conda? Replace conda with micromamba in the commands above.

🧩 Optional: guided install scripts

If you prefer a guided installer that also checks prerequisites, run the script for your platform:

./installers/install_orthogather_mac.sh   # macOS (Apple Silicon or Intel)
./installers/install_orthogather_wsl.sh   # Linux / WSL

Both scripts create the same orthogather environment from environment.yml and verify that OrthoFinder is detected.

⚠️ Prerequisite: Conda or Micromamba must already be installed (e.g. via Miniforge). For a step-by-step walkthrough and troubleshooting, see installation_guide.pdf.


🧬 Input flows

You can start an analysis in three ways:

New Analysis

Select organisms from a UniProt catalog, download proteomes, and run OrthoFinder locally with live logs.
Creates a clean, self-contained workspace for your study.

Preselected Dataset

A ready-to-use example that lets you explore the full workflow immediately (ideal for demos or teaching).

External Data Upload

Upload a .zip with previously generated OrthoFinder results from another system to reuse completed analyses without recomputation.

Regardless of the entry point, OrthoGather focuses downstream steps on the standard Orthogroups output, keeping only what is needed for analysis and export.

OrthoFinder thread tuning

OrthoGather invokes OrthoFinder with -og (skip gene/species trees β€” we only need orthogroups) and auto-detects your CPU count for the -t (sequence-search) and -a (analysis) thread counts. On a 10-core Mac you'll see -t 10 -a 2; on a 4-core laptop, -t 4 -a 1. Override if you want to leave headroom for other work:

ORTHOGATHER_OF_THREADS=6 ORTHOGATHER_OF_ANALYSIS_THREADS=1 python app.py

The convention -a β‰ˆ -t / 4 follows OrthoFinder's own recommendation: the analysis phase is memory-bound and oversubscription hurts more than it helps.


πŸ”¬ Analysis routes

Once orthogroups are available (generated or uploaded), you can take either route β€” or both β€” in any order.

1️⃣ Comparative Orthogroup Analysis

This module helps you examine the presence and distribution of orthogroups across a user-defined subset of species and, optionally, narrow the scope to proteins of interest via UniProt IDs.

Features:

  • Subset by species β€” pick two or more species to create a focused comparison set (useful for clades, model–non-model contrasts, or custom panels).
  • Two UpSet plots (via UpSetPlot):
    • Species combinations β€” number of orthogroups unique/shared across species combinations (presence/absence patterns).
    • Protein contribution β€” how many proteins each combination contributes, clarifying the magnitude behind intersections.
  • Optional protein-level filter β€” restrict orthogroups to those containing specific UniProt IDs (e.g., differentially expressed proteins, pathway members, or candidate families).

Exports: publication-ready PNG figures and Excel/CSV tables summarizing orthogroup membership and intersections.

2️⃣ Gene Ontology Enrichment Analysis

This module turns orthogroup-level findings into functional hypotheses.

Workflow:

  • GOA download (per species) and an annotation coverage panel (4-in-1) to gauge how well proteins are annotated before enrichment.
  • Define sets:
    • Foreground β€” paste UniProt IDs for the set to be tested.
    • Background β€” paste UniProt IDs or use β€œall species with GOA” from your selection.
    • Include complete orthogroups (optional) β€” expand IDs to all members of their orthogroups to capture functionally related proteins.
  • Run enrichment with GOATOOLS, then review significant terms and download detailed results.

Outputs: the enrichment figure and structured tables for downstream exploration.


⚠️ Error system

Every user-visible error in OrthoGather has a stable code, a clear message, and an actionable hint. The catalogue lives in orthogather/utils/error_catalog.py (~60 entries today). Backend routes call respond_error("ERR_CODE", where=..., detail=...) and the frontend renders the response as a uniform toast via static/js/og-errors.js.

Adding a new error: open error_catalog.py, add a new ErrorSpec with a code starting with ERR_, then reference it from your route. The pytest suite at tests/test_error_catalog.py enforces that every code referenced from app.py exists in the catalogue.

Categories: input, state, data, network, external, not-found, system. Severities: error, warning, info. The frontend toast styles itself accordingly (red / amber / blue border, matching icon).

Global error handlers (@app.errorhandler(404), @app.errorhandler(500), @app.errorhandler(Exception)) catch anything that escapes and render either JSON (for Accept: application/json / /api/* paths) or the branded templates/error.html page (for HTML requests).


πŸ§ͺ Running the test suite

OrthoGather ships with a pytest suite that locks in the species-matching contract (see tests/test_species_matching.py). To run it:

conda activate orthogather
pip install -r requirements-dev.txt   # installs pytest, selenium, webdriver-manager
pytest tests/ -v

The same dev requirements file also installs the tools used by tools/capture_tutorial_screenshots.py to regenerate the tutorial figures (headless Chrome via Selenium).


πŸš€ Looking ahead

OrthoGather is designed to grow. Near-term additions include:

  • GO DAG visualisation
  • Richer summary plots
  • Faster foreground/background iteration
  • Lightweight batch workflows

All while keeping the same local, reproducible, and privacy-preserving design.

In short: formulate testable functional hypotheses from orthogroup presence/absence, exploit well-annotated orthologs to illuminate under-annotated proteins, and obtain immediate, visual answers to β€œwho shares what?” β€” with publication-ready outputs and no cloud dependency.


πŸ“š References & attributions

  • OrthoFinder β€” phylogenetic orthology inference platform. See papers linked in their README. OrthoFinder GitHub
  • GOATOOLS β€” Python library for Gene Ontology analyses. GOATOOLS GitHub
  • UpSetPlot β€” visualization of set intersections. UpSetPlot Docs
  • UniProt β€” comprehensive resource for protein sequence and annotation. UniProt

About

Local web platform for orthology-based proteome/proteomics comparison and Gene Ontology enrichment. Run OrthoFinder, explore orthogroups, and perform GO enrichment locally with publication-ready outputs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors