Expose the most-abundant observed mass on IsotopicEnvelope (resolved)#1078
Conversation
…lopes Add proton-corrected MostAbundantObservedMass and AverageObservedMass to IsotopicEnvelope, plus an EnvelopeResolution flag, and GetMostAbundantOffset/ GetAverageOffset helpers on AverageResidue. These let downstream search select proteoform candidates by the experimentally most-detectable peak (the most abundant isotopologue, or the centroid for unresolved high-mass species) instead of the often-undetectable monoisotopic peak, avoiding off-by-N precursor mass errors. The existing un-proton-corrected MostAbundantObservedIsotopicMass is retained for compatibility. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| /// experimentally most detectable peak in each charge envelope and is used as the precursor | ||
| /// mass for candidate selection when running in most-abundant precursor-mass mode. | ||
| /// </summary> | ||
| public double MostAbundantObservedMass { get; private set; } |
There was a problem hiding this comment.
Isn't this handled by MostAbundantObservedIsotopicPeak?
There was a problem hiding this comment.
Not quite the same — MostAbundantObservedIsotopicMass is mz × |charge| and is deliberately not proton-corrected (per its doc), i.e. a charge-scaled m/z rather than a neutral mass. MostAbundantObservedMass is the proton-corrected neutral mass (mz.ToMass(charge)), which is what's directly comparable to theoretical proteoform masses for candidate selection; the two differ by charge × proton mass. I expanded both docstrings to make the distinction explicit in c144c38. Happy to rename either if you'd prefer clearer names.
There was a problem hiding this comment.
Then I do not see the point of the preexisting MostAbyndantObservedIsotopicMass if it is calculated incorrectly. Could we simply fix the calculation and reuse the property?
it does not appear to be used outside of this class and testing
There was a problem hiding this comment.
I'd like to keep these as two separate properties. MostAbundantObservedIsotopicMass is deliberately m/z × |charge| (not proton-corrected) rather than incorrect — the existing deconvolution tests rely on that exact semantic, dividing it by charge to recover the selected-ion m/z (e.g. StandardDeconvolutionTest.cs:217/236/271, TestDeconvolution.cs:111/212/360) and pinning hardcoded values like 12367.44. "Fixing" it to a neutral mass would break those tests (out of scope for this PR) and conflate two distinct quantities. MostAbundantObservedMass is the proton-corrected neutral mass used for candidate selection; the docstrings (c144c38) spell out the distinction. Happy to rename either for clarity if you'd prefer.
…tAbundantOffset, clarify mass docs Remove the GetMostAbundantOffset(mass) convenience from AverageResidue: it was just GetDiffToMonoisotopic(GetMostIntenseMassIndex(mass)), so consumers now compose the existing public methods directly (reviewer: "isn't this the same as diff to monoisotopic?"). Clarify the docs distinguishing MostAbundantObservedMass (proton- corrected neutral mass) from the existing MostAbundantObservedIsotopicMass (m/z x |charge|, not proton-corrected) (reviewer: "isn't this handled by the existing field?"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
nbollis
left a comment
There was a problem hiding this comment.
I believe our main disconnect here is in terminology. In my head the masses are defined as follows:
Average Mass = tallest peak m/z.ToMass(charge)
Centroided Mass = Intensity weighted sum of each isotopic peak m/z.ToMass(charge)
Other findings besides my inline comments.
AverageResidue.cs:24 Centroid intensity-weighted average math appears in 3 places — The same weighted-average formula exists in GetAverageOffset (new), ComputeObservedMasses (new), and DeconvolutionScorer.ComputeAvgPpmError (existing). Consider extracting a static ComputeCentroidMass(double[] masses, double[] intensities) helper to reduce duplication.
|
|
||
| double totalIntensity = 0; | ||
| double weightedMassSum = 0; | ||
| for (int i = 0; i < masses.Length; i++) |
There was a problem hiding this comment.
Is the intensity weighted mass intentional? That seems like a silly way to arrive at the diff to most intense. Something like
public virtual double GetDiffToMostAbundant(double testMass)
{
int index = GetMostIntenseMassIndex(testMass);
return GetDiffToMonoisotopic(index) + testMass - GetAllTheoreticalMasses(index)[0];
}
may get you there more precisely. Push back if you want the centroided here.
There was a problem hiding this comment.
The method this pointed at (the most-abundant offset, which used the centroid math) was removed in c144c38 — consumers now compose the existing index-based public methods directly. The only intensity-weighted method left is GetAverageOffset, where the centroid is the intent: it computes the average (centroid) mass, not the most-abundant one.
| /// experimentally most detectable peak in each charge envelope and is used as the precursor | ||
| /// mass for candidate selection when running in most-abundant precursor-mass mode. | ||
| /// </summary> | ||
| public double MostAbundantObservedMass { get; private set; } |
There was a problem hiding this comment.
Then I do not see the point of the preexisting MostAbyndantObservedIsotopicMass if it is calculated incorrectly. Could we simply fix the calculation and reuse the property?
it does not appear to be used outside of this class and testing
| /// from <see cref="Peaks"/> and <see cref="Charge"/>. Called from every constructor after the | ||
| /// peak list and charge are set. | ||
| /// </summary> | ||
| private void ComputeObservedMasses() |
There was a problem hiding this comment.
MostAbundantObservedMass and AverageObservedMass should be computed properties — These are fully derivable from Peaks + Charge (both public readonly). Storing them as { get; private set; } fields creates a consistency burden (must call ComputeObservedMasses() in every constructor) and risks staleness if Peaks is mutated externally. Computed get-only properties would eliminate constructor boilerplate and match the data. MostAbundantObservedIsotopicMass (existing internal property) has the same concern.
- Consensus across 2 specialists (review-backend, review-oop)
There was a problem hiding this comment.
Done — converted both MostAbundantObservedMass and AverageObservedMass to computed get-only properties and deleted ComputeObservedMasses() and its three constructor calls. Since Peaks and Charge are readonly there's no staleness risk. I left the legacy MostAbundantObservedIsotopicMass as-is (discussed in the other thread).
| /// </summary> | ||
| /// <remarks>CAREFUL: This is called a lot during <see cref="ClassicDeconvolutionAlgorithm"/> and care should be taken to precalculate as many values as possible when implementing a new type</remarks> | ||
| public abstract class AverageResidue : IEquatable<AverageResidue> | ||
| public abstract class AverageResidue |
There was a problem hiding this comment.
Breaking change: IEquatable removed — The interface was added in PR #1070 and removed here. Zero in-codebase consumers (verified by grep), but removing an interface from a public abstract class is technically a SemVer-breaking change. Document in release notes.
- review-oop
There was a problem hiding this comment.
You're right that #1070 added IEquatable<AverageResidue> — that's exactly what GitHub was flagging as a conflict. This branch was based on a pre-#1070 master, so it never had the interface to begin with (nothing here removed it). I've merged the latest master in and resolved the AverageResidue.cs conflict by keeping both the IEquatable region from #1070 and the new GetAverageOffset. The interface is preserved and the full deconvolution suite passes (661/0).
MostAbundantObservedMass and AverageObservedMass are fully derivable from the readonly Peaks and Charge, so make them computed get-only properties instead of fields set in every constructor. Removes ComputeObservedMasses() and its three call sites, eliminating the constructor boilerplate and the redundant no-op on the single-peak file-read path. Legacy MostAbundantObservedIsotopicMass is unchanged. Addresses review comments from PR smith-chem-wisc#1078 (nbollis). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Bound the average offset from above so a sort-order/sign regression that inflates GetAverageOffset is caught (was lower-bound only). - Tighten the centroid-vs-most-abundant assertion from 0.5 Da slack to 1e-6 now that the perfect-envelope centroid is confirmed >= apex. - Add an AverageObservedMass == mono + GetAverageOffset round-trip, mirroring the existing most-abundant round-trip. - Cover the 5-arg deconvolution constructor via a BuildPerfectPeaks helper shared with the 6-arg path. Addresses test-quality findings from the PR smith-chem-wisc#1078 review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # mzLib/MassSpectrometry/Deconvolution/AverageResidue/AverageResidue.cs
|
Two follow-ups from the review summary: Terminology (average vs centroid). I think we're using "average mass" in the standard mass-spec sense — the abundance-weighted mass over the isotope distribution (the centroid), as the counterpart to monoisotopic mass — while "most abundant mass" is the tallest peak. That's why Helper extraction ( |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1078 +/- ##
==========================================
- Coverage 82.44% 82.44% -0.01%
==========================================
Files 389 389
Lines 50831 50833 +2
Branches 6153 6154 +1
==========================================
+ Hits 41910 41911 +1
Misses 7729 7729
- Partials 1192 1193 +1
🚀 New features to boost your workflow:
|
…o-intensity fallback Address minor test-quality findings from the PR smith-chem-wisc#1078 review. - AverageObservedMass_IsBetweenMonoAndAboveMostAbundant now asserts an upper bound (centroid <= heaviest peak's neutral mass), so the "between" contract the test name advertises is actually enforced rather than only lower-bounded. - Add AverageObservedMass_ZeroTotalIntensity_FallsBackToMostIntensePeakMass covering the totalIntensity == 0 centroid fallback (asserts a real mass, not NaN from divide-by-zero). The empty-peaks guard and GetAverageOffset's zero-total-intensity fallback are unreachable through the public API (the constructor's MaxBy throws on an empty value-type sequence; averagine intensities are never all-zero), so those defensive branches are left documented rather than exercised via reflection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ield + -1 sentinel Keep this PR focused on the resolved most-abundant-peak feature and move the isotopically-unresolved (centroid/average) machinery to a separate PR. - Expose MostAbundantObservedIsotopicMass as public (was internal) so consumers can read the observed most-abundant peak, and document the terminology (most-abundant vs monoisotopic vs average/centroid) explicitly on the members. - Constructors with no observed isotopic envelope (the neutral-mass-from-file path) now set MostAbundantObservedIsotopicMass = -1 (sentinel for "no most-abundant peak available") instead of a misleading 0. - Derive MostAbundantObservedMass from that single field (proton-corrected; differs by exactly charge*ProtonMass) so the two can never disagree, and it propagates the -1 sentinel. Also removes the per-access peak rescan. - Move out to the unresolved PR: AverageObservedMass, the EnvelopeResolution enum, Resolution/SetResolution, AverageResidue.GetAverageOffset, and their tests. The resolved search uses the pre-existing averagine most-abundant offset (GetDiffToMonoisotopic(GetMostIntenseMassIndex)). Deconvolution suite green (667 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011QVxqemPqC9BjvYtVg6aKY
…servedNeutralMass Per review feedback: the property is the proton-corrected neutral mass of the most-abundant peak, so name it explicitly to distinguish it from the m/z×|charge| MostAbundantObservedIsotopicMass. Adjust the doc comments and tests to the new name; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011QVxqemPqC9BjvYtVg6aKY
Summary
Adds the mzLib-side support for selecting top-down proteoform candidates by the most abundant isotopic peak instead of the often-undetectable monoisotopic peak (which causes off-by-N precursor errors at high mass). Companion MetaMorpheus PR consumes this.
Changes
IsotopicEnvelope: new publicMostAbundantObservedMass(proton-corrected neutral mass of the most intense peak) andAverageObservedMass(intensity-weighted centroid neutral mass), plus anEnvelopeResolution { Resolved, Unresolved }flag andSetResolution(...). Computed in every constructor. The existing (deliberately un-proton-corrected)MostAbundantObservedIsotopicMassis retained for back-compat.AverageResidue: new publicGetMostAbundantOffset(mono)andGetAverageOffset(mono)— the averagine-derived offset from a monoisotopic mass to the most-abundant / average mass, used downstream to place each candidate at its theoretical apex.Tests
New
TestMostAbundantMassfixture (10 tests): offset monotonicity/magnitude, proton-corrected mass, mono+offset round-trip, resolution default/equality. All existing deconvolution/scorer tests pass (493 related).🤖 Generated with Claude Code