Skip to content

sib-swiss/swisslipidsreact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SwissLipidsReact

Expands Rhea reaction patterns into complete lipid reactions, resolving structures and assigning Web-RInChIs.

License OS Linux OS Windows OS macOS

Description

This code combines the Rhea database of biochemical reactions and the SwissLipids database of lipid structures to enumerate the hypothetically possible space of biochemical reactions with fully defined lipid structures.

The subset of Rhea reactions that define the lipid reaction mechanisms are represented using the ChEBI identifiers of the reacting lipid classes in the Rhea database.

SwissLipids provides connections between a lipid class - a hypothetical entity aiming to represent many lipids present in nature that share a particular substructure - and all of the hypothetically possible lipid structures with isomeric subspecies level of compound structure definition, i.e. 2.5D structure definition, allowing to recognise precisely atom composition and bond order, as well as stereochemical tags of the atoms of every molecule.

This code transforms each Rhea reaction that is defined in terms of lipid classes into a set of reactions where each reactant and product has a defined 2.5D structure, and checks the correspondance between reactants and products to ensure that the resulting reactions are atomically balanced and biochemically feasible.

Data

It is necessary to download lipids.tsv (~700MB) from SwissLipids and copy it to src/swisslipidsreact/package_data before starting the execution.

Installation

pip install .

pyrheadb dependency

This package is dependent on pyrheadb.

To avoid downloading and preprocessing the full Rhea reaction data for every potential new execution, follow these instructions on how to set up the RHEADB_LOC environment variable.

Run

# Enumerate reactions.
swisslipidsreact run

# Build RDF from enumeration results for integration into the RDF knowledge graph.
swisslipidsreact build-rdf

Options

Explanation of fatty acid (FA) options:

Options Meaning Runtime Usage
-filter-fa c16 --test Use only SwissLipids compounds whose FAs are all palmitate minutes Testing with reduced dataset
-filter-fa c16 Use only SwissLipids compounds with maximum one FA that is not palmitate hours Integration in RDF knowledge graph
-filter-fa none Use all SwissLipids compounds Not recommended (too slow), but can be used in combination with the --rhea-id option

Reaction enumeration

"--output-dir",
help = "Output directory (default: current working directory)"

"--filter-fa",
help = "Filter the fatty acids: c16 (default), curated, none (use only in combination with --rhea-id option)"

"--filter-rhea",
help = "Filter Rhea by having a direct SLM parent class of an isomeric subspecies on at least one or both sides of the reaction: two-sides (default), one-side"

"--rhea-id",
help = "Enumerate reactions only for the given Rhea ID"

"--rhea-version",
help = "Use the given Rhea release version (default: latest release)"

"--test",
help = "Use only SwissLipids compounds whose FAs are all palmitate (default: False)"

RDF build

"--input",
 help = "Input TSV file (default: <output-dir>/enumerated_reactions.tsv)"
 
"--output-dir",
help = "Output directory (default: current working directory)"

"--output-format",
help = "RDF serialization format (default: nt)"

Usage examples

To learn more about the options, check swisslipidsreact --help.

  • Enumerate with SwissLipids compounds whose FAs are all palmitate (test set):

    swisslipidsreact run --filter-fa c16 --output-dir results-test-c16 --test
  • Enumerate with SwissLipids compounds with maximum one FA that is not palmitate (production set):

    swisslipidsreact run --filter-fa c16 --output-dir results-prod-c16
  • Enumerate with all SwissLipids compounds for one rhea ID:

    swisslipidsreact run --filter-fa none --rhea-id 78071 --output-dir results-rhea-78071
  • Build RDF for test set:

    swisslipidsreact build-rdf --output-dir results-test-c16
  • Build RDF for production set:

    swisslipidsreact build-rdf --output-dir results-prod-c16

Debugging

Use the environment variable SLR_DEBUG to get more detailed debug information, e.g.:

SLR_DEBUG=1 swisslipidsreact run --filter-fa curated --output-dir results-test-curated --test
  • SLR_DEBUG=1 prints debug messages.
  • SLR_DEBUG=2 serializes various dataframes into DEBUG_...tsv files (this will take disk space, use only in test mode).

Profiling

pip install pyinstrument
pyinstrument --from-path swisslipidsreact build-rdf -input ... --output-dir ...

About

Enumerating reactions based on SwissLipids and Rhea DB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages