Skip to content

Region Embeddings#46

Closed
bramstuyven wants to merge 8 commits into
mainfrom
region_embeddings
Closed

Region Embeddings#46
bramstuyven wants to merge 8 commits into
mainfrom
region_embeddings

Conversation

@bramstuyven

Copy link
Copy Markdown
Collaborator

Aggregating downstream Mindi seqlets per region to create region embeddings for visualisation of downstream enhancer modelling.

Options

  • mean (un)weighted pca
  • mean (un)weighted vae
  • (un)weighted motif family count vectors

Changes

  • Made function tfmindi.tl.reduce_seqlet_space public
  • reduction: 'pca' or 'vae' (default 'pca')
  • Added tfmindi.tl.embed_regions with options
    • aggregate: 'count' or 'mean'
      • default 'mean'
    • reduction: 'pca' or 'vae'
      • used with aggregate = 'mean'
      • default 'pca'
    • annotation_column:
      • used with aggregate = 'count'
      • default 'cluster_dbd'
    • latent:
      • used with aggregate = 'mean'
      • defaults to 10 when reduction = 'vae', 50 when reduction = 'vae'
    • weighted: weigh each seqlet embedding before aggregating
    • tsne: reduce further to 2D using TSNE
  • Made function tfmindi.pl.tsne_region_embedding

Explanation

Aggregations

Each called seqlet is compared to a database of reference motifs using tomtom similarity scoring. This results in a vector for each seqlet in the form of the similarity matrix found in the anndata object. To be able to compare the original regions based on motif compositions, different ways of aggregating these seqlets to get a region representation of the motif content are implemented here. Mean aggregation of pca or vae reduced similarity vectors takes the (un)weighted mean of the pca or vae reductions respectivly. Count aggregation creates a count vector (unweighted) for a given annotation column in adata.obs or summed weights of those annotations.

Weights

In both mean and count aggregating, weights specific to a seqlet are calculated by softmaxing the attribution scores of the seqlets per region.

Usage

Default (PCA, 50 latents)

adata = tm.load_h5ad('mindi_adata.h5ad')
tm.tl.embed_regions(adata)
tm.pl.tsne_region_embedding(adata, color_by='topic')

VAE with 16 latents

tm.tl.reduce_seqlet_space(seqlet_adata, reduction='vae', vae_kwargs={'latent'=12})
tm.tl.reduce_seqlet_space(seqlet_adata, reduction='vae', vae_kwargs={'latent'=16})
tm.tl.embed_regions(adata, reduction='vae', latents=12, weighted=True)
tm.tl.embed_regions(adata, reduction='vae', latents=16, weighted=True)
tm.pl.tsne_region_embedding(adata, embedding='vae', embedding_specific=16, weighted=True, color_by='topic')

@bramstuyven bramstuyven deleted the region_embeddings branch April 28, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant