Skip to content

suggestion: handle long gene set names in plots #358

Description

@40YTGan

Concern

I noticed that the gene set names can span out very far in gseapy.dotplot(), even overriding the the user defined figsize in output:

ss = gp.prerank(
    rnk=genedata, # a pre-rank pd.dataframe
    gene_sets=reactome,
    outdir=None, sample_norm_method='rank')
ax = gp.dotplot(
    ss.res2d, column="FDR q-val", x="ES", title=f'reactome_cNMF{i}', top_term=20,
    cmap=plt.cm.viridis, size=5, figsize=(8,16), cutoff=0.05, ofname=outdir / f"11_reactome_cNMF{i}.png")

Output a figure way wider than expected to contain the names:

Image

I tried to implement a tick label wrapping externally, however:

ss = gp.prerank(rnk=genedata,
                gene_sets=reactome,
                outdir=None, sample_norm_method='rank')
ax = gp.dotplot(
    ss.res2d, column="FDR q-val", x="ES", title=f'reactome_cNMF{i}', top_term=20,
    cmap=plt.cm.viridis, size=5, figsize=(10,16), cutoff=0.05)

original_labels = [tick.get_text() for tick in ax.get_yticklabels()]
wrapped_labels = [textwrap.fill(label, width=20) for label in original_labels] # trying to wrap the labels
ax.set_yticks(range(len(wrapped_labels)))
ax.set_yticklabels(wrapped_labels)
plt.savefig(outdir / f"11_reactome_cNMF{i}_alt.png") 
plt.close()

Outputs a figure with truncated gene names:

Image

Suggestions:

Given the status quo where the most reliable figure saving is done internally, it can be desirable to have an option to wrap the gene names internally (optional so that users can choose not to if labels clutters). This will provide a behavior similiar to default of enrichplot dotplot() in Bioconductor.

p.s. I found GSEApy a very nice addon to streamline my cNMF pipeline. Nice job @zqfang

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions