Skip to content

Expose the most-abundant observed mass on IsotopicEnvelope (resolved)#1078

Merged
trishorts merged 14 commits into
smith-chem-wisc:masterfrom
trishorts:mostAbundantMass
Jun 30, 2026
Merged

Expose the most-abundant observed mass on IsotopicEnvelope (resolved)#1078
trishorts merged 14 commits into
smith-chem-wisc:masterfrom
trishorts:mostAbundantMass

Conversation

@trishorts

Copy link
Copy Markdown
Contributor

Summary

Adds the mzLib-side support for selecting top-down proteoform candidates by the most abundant isotopic peak instead of the often-undetectable monoisotopic peak (which causes off-by-N precursor errors at high mass). Companion MetaMorpheus PR consumes this.

Changes

  • IsotopicEnvelope: new public MostAbundantObservedMass (proton-corrected neutral mass of the most intense peak) and AverageObservedMass (intensity-weighted centroid neutral mass), plus an EnvelopeResolution { Resolved, Unresolved } flag and SetResolution(...). Computed in every constructor. The existing (deliberately un-proton-corrected) MostAbundantObservedIsotopicMass is retained for back-compat.
  • AverageResidue: new public GetMostAbundantOffset(mono) and GetAverageOffset(mono) — the averagine-derived offset from a monoisotopic mass to the most-abundant / average mass, used downstream to place each candidate at its theoretical apex.

Tests

New TestMostAbundantMass fixture (10 tests): offset monotonicity/magnitude, proton-corrected mass, mono+offset round-trip, resolution default/equality. All existing deconvolution/scorer tests pass (493 related).

🤖 Generated with Claude Code

…lopes

Add proton-corrected MostAbundantObservedMass and AverageObservedMass to
IsotopicEnvelope, plus an EnvelopeResolution flag, and GetMostAbundantOffset/
GetAverageOffset helpers on AverageResidue. These let downstream search select
proteoform candidates by the experimentally most-detectable peak (the most
abundant isotopologue, or the centroid for unresolved high-mass species)
instead of the often-undetectable monoisotopic peak, avoiding off-by-N
precursor mass errors. The existing un-proton-corrected
MostAbundantObservedIsotopicMass is retained for compatibility.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/// experimentally most detectable peak in each charge envelope and is used as the precursor
/// mass for candidate selection when running in most-abundant precursor-mass mode.
/// </summary>
public double MostAbundantObservedMass { get; private set; }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this handled by MostAbundantObservedIsotopicPeak?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite the same — MostAbundantObservedIsotopicMass is mz × |charge| and is deliberately not proton-corrected (per its doc), i.e. a charge-scaled m/z rather than a neutral mass. MostAbundantObservedMass is the proton-corrected neutral mass (mz.ToMass(charge)), which is what's directly comparable to theoretical proteoform masses for candidate selection; the two differ by charge × proton mass. I expanded both docstrings to make the distinction explicit in c144c38. Happy to rename either if you'd prefer clearer names.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I do not see the point of the preexisting MostAbyndantObservedIsotopicMass if it is calculated incorrectly. Could we simply fix the calculation and reuse the property?

it does not appear to be used outside of this class and testing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep these as two separate properties. MostAbundantObservedIsotopicMass is deliberately m/z × |charge| (not proton-corrected) rather than incorrect — the existing deconvolution tests rely on that exact semantic, dividing it by charge to recover the selected-ion m/z (e.g. StandardDeconvolutionTest.cs:217/236/271, TestDeconvolution.cs:111/212/360) and pinning hardcoded values like 12367.44. "Fixing" it to a neutral mass would break those tests (out of scope for this PR) and conflate two distinct quantities. MostAbundantObservedMass is the proton-corrected neutral mass used for candidate selection; the docstrings (c144c38) spell out the distinction. Happy to rename either for clarity if you'd prefer.

…tAbundantOffset, clarify mass docs

Remove the GetMostAbundantOffset(mass) convenience from AverageResidue: it was just
GetDiffToMonoisotopic(GetMostIntenseMassIndex(mass)), so consumers now compose the
existing public methods directly (reviewer: "isn't this the same as diff to
monoisotopic?"). Clarify the docs distinguishing MostAbundantObservedMass (proton-
corrected neutral mass) from the existing MostAbundantObservedIsotopicMass (m/z x
|charge|, not proton-corrected) (reviewer: "isn't this handled by the existing field?").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@nbollis nbollis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe our main disconnect here is in terminology. In my head the masses are defined as follows:

Average Mass = tallest peak m/z.ToMass(charge)
Centroided Mass = Intensity weighted sum of each isotopic peak m/z.ToMass(charge)

Other findings besides my inline comments.

AverageResidue.cs:24 Centroid intensity-weighted average math appears in 3 places — The same weighted-average formula exists in GetAverageOffset (new), ComputeObservedMasses (new), and DeconvolutionScorer.ComputeAvgPpmError (existing). Consider extracting a static ComputeCentroidMass(double[] masses, double[] intensities) helper to reduce duplication.


double totalIntensity = 0;
double weightedMassSum = 0;
for (int i = 0; i < masses.Length; i++)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intensity weighted mass intentional? That seems like a silly way to arrive at the diff to most intense. Something like

    public virtual double GetDiffToMostAbundant(double testMass)
    {
        int index = GetMostIntenseMassIndex(testMass);
        return GetDiffToMonoisotopic(index) + testMass - GetAllTheoreticalMasses(index)[0];
    }

may get you there more precisely. Push back if you want the centroided here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method this pointed at (the most-abundant offset, which used the centroid math) was removed in c144c38 — consumers now compose the existing index-based public methods directly. The only intensity-weighted method left is GetAverageOffset, where the centroid is the intent: it computes the average (centroid) mass, not the most-abundant one.

/// experimentally most detectable peak in each charge envelope and is used as the precursor
/// mass for candidate selection when running in most-abundant precursor-mass mode.
/// </summary>
public double MostAbundantObservedMass { get; private set; }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I do not see the point of the preexisting MostAbyndantObservedIsotopicMass if it is calculated incorrectly. Could we simply fix the calculation and reuse the property?

it does not appear to be used outside of this class and testing

/// from <see cref="Peaks"/> and <see cref="Charge"/>. Called from every constructor after the
/// peak list and charge are set.
/// </summary>
private void ComputeObservedMasses()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MostAbundantObservedMass and AverageObservedMass should be computed properties — These are fully derivable from Peaks + Charge (both public readonly). Storing them as { get; private set; } fields creates a consistency burden (must call ComputeObservedMasses() in every constructor) and risks staleness if Peaks is mutated externally. Computed get-only properties would eliminate constructor boilerplate and match the data. MostAbundantObservedIsotopicMass (existing internal property) has the same concern.

  • Consensus across 2 specialists (review-backend, review-oop)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — converted both MostAbundantObservedMass and AverageObservedMass to computed get-only properties and deleted ComputeObservedMasses() and its three constructor calls. Since Peaks and Charge are readonly there's no staleness risk. I left the legacy MostAbundantObservedIsotopicMass as-is (discussed in the other thread).

Comment thread mzLib/MassSpectrometry/MzSpectra/IsotopicEnvelope.cs
/// </summary>
/// <remarks>CAREFUL: This is called a lot during <see cref="ClassicDeconvolutionAlgorithm"/> and care should be taken to precalculate as many values as possible when implementing a new type</remarks>
public abstract class AverageResidue : IEquatable<AverageResidue>
public abstract class AverageResidue

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change: IEquatable removed — The interface was added in PR #1070 and removed here. Zero in-codebase consumers (verified by grep), but removing an interface from a public abstract class is technically a SemVer-breaking change. Document in release notes.

  • review-oop

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that #1070 added IEquatable<AverageResidue> — that's exactly what GitHub was flagging as a conflict. This branch was based on a pre-#1070 master, so it never had the interface to begin with (nothing here removed it). I've merged the latest master in and resolved the AverageResidue.cs conflict by keeping both the IEquatable region from #1070 and the new GetAverageOffset. The interface is preserved and the full deconvolution suite passes (661/0).

trishorts and others added 3 commits June 18, 2026 09:56
MostAbundantObservedMass and AverageObservedMass are fully derivable
from the readonly Peaks and Charge, so make them computed get-only
properties instead of fields set in every constructor. Removes
ComputeObservedMasses() and its three call sites, eliminating the
constructor boilerplate and the redundant no-op on the single-peak
file-read path. Legacy MostAbundantObservedIsotopicMass is unchanged.

Addresses review comments from PR smith-chem-wisc#1078 (nbollis).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Bound the average offset from above so a sort-order/sign regression
  that inflates GetAverageOffset is caught (was lower-bound only).
- Tighten the centroid-vs-most-abundant assertion from 0.5 Da slack to
  1e-6 now that the perfect-envelope centroid is confirmed >= apex.
- Add an AverageObservedMass == mono + GetAverageOffset round-trip,
  mirroring the existing most-abundant round-trip.
- Cover the 5-arg deconvolution constructor via a BuildPerfectPeaks
  helper shared with the 6-arg path.

Addresses test-quality findings from the PR smith-chem-wisc#1078 review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	mzLib/MassSpectrometry/Deconvolution/AverageResidue/AverageResidue.cs
@trishorts

Copy link
Copy Markdown
Contributor Author

Two follow-ups from the review summary:

Terminology (average vs centroid). I think we're using "average mass" in the standard mass-spec sense — the abundance-weighted mass over the isotope distribution (the centroid), as the counterpart to monoisotopic mass — while "most abundant mass" is the tallest peak. That's why AverageObservedMass is the intensity-weighted centroid and MostAbundantObservedMass is the tallest peak. The docstrings state the formulas explicitly. I'd lean on the standard convention for the names to avoid churning the public API and the companion MetaMorpheus PR, but I'm happy to surface the word "centroid" more prominently in the docs if that's clearer.

Helper extraction (ComputeCentroidMass). Looking at the three sites: DeconvolutionScorer.ComputeAvgPpmError isn't actually a centroid (it anchors at the apex m/z and averages |ppm error| unweighted, /peaks.Count), and ComputeObservedMasses was removed in the computed-properties change above. That leaves two genuine centroid computations on different data shapes — theoretical double[]/double[] in mass space vs. the Peaks tuple-list centroided in m/z — so a shared ComputeCentroidMass(double[], double[]) would fit only one and would force array allocation in a hot getter. I left them inline, but can revisit if you feel strongly.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.44%. Comparing base (e5ab254) to head (dc94f52).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1078      +/-   ##
==========================================
- Coverage   82.44%   82.44%   -0.01%     
==========================================
  Files         389      389              
  Lines       50831    50833       +2     
  Branches     6153     6154       +1     
==========================================
+ Hits        41910    41911       +1     
  Misses       7729     7729              
- Partials     1192     1193       +1     
Files with missing lines Coverage Δ
...try/Deconvolution/AverageResidue/AverageResidue.cs 81.25% <ø> (ø)
...Lib/MassSpectrometry/MzSpectra/IsotopicEnvelope.cs 95.45% <100.00%> (+0.10%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pcruzparri
pcruzparri previously approved these changes Jun 29, 2026
…o-intensity fallback

Address minor test-quality findings from the PR smith-chem-wisc#1078 review.

- AverageObservedMass_IsBetweenMonoAndAboveMostAbundant now asserts an upper
  bound (centroid <= heaviest peak's neutral mass), so the "between" contract
  the test name advertises is actually enforced rather than only lower-bounded.
- Add AverageObservedMass_ZeroTotalIntensity_FallsBackToMostIntensePeakMass
  covering the totalIntensity == 0 centroid fallback (asserts a real mass, not
  NaN from divide-by-zero).

The empty-peaks guard and GetAverageOffset's zero-total-intensity fallback are
unreachable through the public API (the constructor's MaxBy throws on an empty
value-type sequence; averagine intensities are never all-zero), so those
defensive branches are left documented rather than exercised via reflection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
trishorts and others added 3 commits June 29, 2026 12:33
…ield + -1 sentinel

Keep this PR focused on the resolved most-abundant-peak feature and move the
isotopically-unresolved (centroid/average) machinery to a separate PR.

- Expose MostAbundantObservedIsotopicMass as public (was internal) so consumers
  can read the observed most-abundant peak, and document the terminology
  (most-abundant vs monoisotopic vs average/centroid) explicitly on the members.
- Constructors with no observed isotopic envelope (the neutral-mass-from-file
  path) now set MostAbundantObservedIsotopicMass = -1 (sentinel for "no
  most-abundant peak available") instead of a misleading 0.
- Derive MostAbundantObservedMass from that single field (proton-corrected;
  differs by exactly charge*ProtonMass) so the two can never disagree, and it
  propagates the -1 sentinel. Also removes the per-access peak rescan.
- Move out to the unresolved PR: AverageObservedMass, the EnvelopeResolution
  enum, Resolution/SetResolution, AverageResidue.GetAverageOffset, and their
  tests. The resolved search uses the pre-existing averagine most-abundant
  offset (GetDiffToMonoisotopic(GetMostIntenseMassIndex)).

Deconvolution suite green (667 passed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011QVxqemPqC9BjvYtVg6aKY
@trishorts trishorts changed the title Expose most-abundant and average observed masses on IsotopicEnvelope Expose the most-abundant observed mass on IsotopicEnvelope (resolved) Jun 29, 2026
trishorts and others added 3 commits June 29, 2026 13:49
…servedNeutralMass

Per review feedback: the property is the proton-corrected neutral mass of the
most-abundant peak, so name it explicitly to distinguish it from the
m/z×|charge| MostAbundantObservedIsotopicMass. Adjust the doc comments and
tests to the new name; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011QVxqemPqC9BjvYtVg6aKY
@trishorts trishorts merged commit 4d81586 into smith-chem-wisc:master Jun 30, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants