Region Embeddings by bramstuyven · Pull Request #46 · aertslab/TF-MINDI

bramstuyven · 2026-04-28T09:46:51Z

Aggregating downstream Mindi seqlets per region to create region embeddings for visualisation of downstream enhancer modelling.

Options

mean (un)weighted pca
mean (un)weighted vae
(un)weighted motif family count vectors

Changes

Made function tfmindi.tl.reduce_seqlet_space public
reduction: 'pca' or 'vae' (default 'pca')
Added tfmindi.tl.embed_regions with options
- aggregate: 'count' or 'mean'
  - default 'mean'
- reduction: 'pca' or 'vae'
  - used with aggregate = 'mean'
  - default 'pca'
- annotation_column:
  - used with aggregate = 'count'
  - default 'cluster_dbd'
- latent:
  - used with aggregate = 'mean'
  - defaults to 10 when reduction = 'vae', 50 when reduction = 'vae'
- weighted: weigh each seqlet embedding before aggregating
- tsne: reduce further to 2D using TSNE
Made function tfmindi.pl.tsne_region_embedding

Explanation

Aggregations

Each called seqlet is compared to a database of reference motifs using tomtom similarity scoring. This results in a vector for each seqlet in the form of the similarity matrix found in the anndata object. To be able to compare the original regions based on motif compositions, different ways of aggregating these seqlets to get a region representation of the motif content are implemented here. Mean aggregation of pca or vae reduced similarity vectors takes the (un)weighted mean of the pca or vae reductions respectivly. Count aggregation creates a count vector (unweighted) for a given annotation column in adata.obs or summed weights of those annotations.

Weights

In both mean and count aggregating, weights specific to a seqlet are calculated by softmaxing the attribution scores of the seqlets per region.

Usage

Default (PCA, 50 latents)

adata = tm.load_h5ad('mindi_adata.h5ad')
tm.tl.embed_regions(adata)
tm.pl.tsne_region_embedding(adata, color_by='topic')

VAE with 16 latents

tm.tl.reduce_seqlet_space(seqlet_adata, reduction='vae', vae_kwargs={'latent'=12})
tm.tl.reduce_seqlet_space(seqlet_adata, reduction='vae', vae_kwargs={'latent'=16})
tm.tl.embed_regions(adata, reduction='vae', latents=12, weighted=True)
tm.tl.embed_regions(adata, reduction='vae', latents=16, weighted=True)
tm.pl.tsne_region_embedding(adata, embedding='vae', embedding_specific=16, weighted=True, color_by='topic')

dabaffy and others added 8 commits April 17, 2026 11:17

Add VAE-based dimensionality reduction to cluster_seqlets

5954a15

Implemented mean region embedding

9d9ef3d

ignore notebook checkpoints

1bd7b79

backwards compatible + embedding tsne

2958023

refining tsne code

e2bfbb9

TSNE cleaned

3b99af8

Implemented count embeddings + more tsne cleanup

071e844

save weighted embeddings seperatly

68ac520

bramstuyven closed this Apr 28, 2026

bramstuyven deleted the region_embeddings branch April 28, 2026 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Region Embeddings#46

Region Embeddings#46
bramstuyven wants to merge 8 commits into
mainfrom
region_embeddings

bramstuyven commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

bramstuyven commented Apr 28, 2026

Aggregating downstream Mindi seqlets per region to create region embeddings for visualisation of downstream enhancer modelling.

Options

Changes

Explanation

Aggregations

Weights

Usage

Default (PCA, 50 latents)

VAE with 16 latents

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant