Intenenal ions squared#1022
Open
trishorts wants to merge 45 commits into
Open
Conversation
…a with internal ions
Codecov Report❌ Patch coverage is Please upload reports for the commit f03ec80 to get more accurate results. Additional details and impacted files@@ Coverage Diff @@
## master #1022 +/- ##
==========================================
- Coverage 81.63% 80.39% -1.25%
==========================================
Files 369 302 -67
Lines 47605 42640 -4965
Branches 5649 4824 -825
==========================================
- Hits 38863 34281 -4582
+ Misses 7643 7512 -131
+ Partials 1099 847 -252
🚀 New features to boost your workflow:
|
mix primary and secondary ion spectra
# Conflicts: # mzLib/PredictionClients/Koina/AbstractClasses/FragmentIntensityModel.cs # mzLib/Test/FileReadingTests/SpectralLibraryTests/SpectralLibraryData/librarySpectrumInternalIons.msp # mzLib/Test/FileReadingTests/SpectralLibraryTests/SpectralLibraryReaderTest.cs # mzLib/Test/KoinaTests/RetentionTimePrediction/Prosit2019iRTTests.cs # mzLib/Test/MassSpectrometryTests/TestSpectralSimilarity.cs # mzLib/Test/Test.csproj # mzLib/Test/TestLibrarySpectrum.cs # mzLib/mzLib.nuspec
Upstream master replaced the FragmentIntensityModel/KoinaModelBase abstraction (constructor-fed inputs + RunInferenceAsync + virtual GenerateLibrarySpectraFrom Predictions(out)) with a Predict(inputs)/GenerateLibrarySpectraFromPredictions( alignedRetentionTimes, ...) flow. Re-port the internal-ions feature onto it so the merged branch compiles and the feature behaves as before: - InternalFragmentIntensityModel: implement the new abstract members (batching/ throttling props, ToBatchedRequests(validInputs), mapping/handling modes), pass an ISequenceConverter to base, keep its own input/output state as plain (non-override) properties, reuse the base Predictions list, and rename its internal-ion spectra builder to GenerateInternalLibrarySpectraFromPredictions (base method is no longer virtual). Provide a local IDisposable + inline MSP save (SavePredictedSpectralLibrary no longer exists on the base). Update PeptideFragmentIntensityPrediction construction for the added ValidatedFullSequence field. - PrimaryIntensityComponent: drive the wrapped Koina model via Predict + Generate LibrarySpectraFromPredictions(alignedRetentionTimes) instead of the removed RunInferenceAsync/PredictedSpectra. - CombinedLibraryModel.WithPrimaryAndInternalFragments: derive the primary model's inputs and aligned RTs from the internal model (same peptide set); add optional collisionEnergy. - Tests: construct Prosit2020IntensityHCD/Prosit2019iRT via the new API; pass collisionEnergy through the factory. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…minalAcidic
Neither flag was an input to the trained ONNX model (ComputeFeatures emits a
fixed 18-feature vector containing neither) and neither had recorded predictive
value. HasMetalOnTerminalAcidic was additionally dead: its EndsWith("on D"/"on E")
test ran against strings already suffixed with " at position N", so it was
hard-wired false. Remove both properties, their TSV header/value columns, the
feature-extractor population, and the Step11 correlation references.
Addresses 1 finding from the PR smith-chem-wisc#1022 review.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ctra - InternalFragmentIntensityModel: candidate enumeration anchored spans at the N-/C-terminus (i==0 / j==n-1), emitting terminal b/y ions mislabeled as internal bIb[...]. Constrain to i>=1 and j<=n-2 so both ends are real internal cleavages. - MixedModelResult.FromSpectra: ToDictionary(s => s.Name) threw on the first duplicate Sequence/Charge; since it runs inside each component's try/catch it silently became FromError and dropped all of that component's spectra. Collapse duplicates last-wins instead. - PrimaryIntensityComponent: correct the class doc that claimed the component disposes the model; the rewritten Predict path does not and the base model is not IDisposable. Addresses 3 findings from the PR smith-chem-wisc#1022 review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… fixture - SpectralLibraryReaderTest: the two internal-ion tests pointed at the pre-merge SpectralLibrary\SpectralLibraryData\ path; the master merge relocated the data to FileReadingTests\SpectralLibraryTests\SpectralLibraryData\, so they threw FileNotFoundException. Repoint to the new path. - MixedModelEvaluationTests: OneTimeSetUp opened a StreamWriter under a developer-local F:\ tree before the Assert.Ignore guard, erroring the whole fixture off-machine. Check Directory.Exists(BaseDir) first and ignore cleanly; flag the fixture for removal (exploratory troubleshooting only). Addresses 2 findings from the PR smith-chem-wisc#1022 review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- HasModifiedResidue now keys off ModificationsInInternalFragment (non-empty) instead of only the four counted categories, so a Common Fixed / UniProt mod no longer reports HasModifiedResidue=false while the mod string is populated. - ReadFromTsv rethrows a row's parse failure with line + file context instead of swallowing it in an empty catch (and guards genuinely blank lines). - ExtractSingleInternalFragment uses product?.NeutralMass ?? NaN, matching the null-tolerant contract of ParseInternalFragmentFromProduct rather than dereferencing a possibly-null product. Addresses 3 minor findings from the PR smith-chem-wisc#1022 review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…op dead using - GenerateInternalLibrarySpectraFromPredictions resets PredictedSpectra to a fresh list so repeated calls no longer accumulate duplicate spectra. - AllowedPrecursorCharges returns a single static readonly HashSet instead of allocating a new set on every access in the constructor validation loop. - Remove the unused System.ComponentModel using from IMixedModelComponent. Addresses 3 minor findings from the PR smith-chem-wisc#1022 review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hange SpectralLibrary.cs carries the substantive internal-ion ReadFragmentIon work; revert the unrelated stylistic noise so the diff stays focused: restore the deleted using block (System, System.Collections.Generic, System.IO, System.Linq, ThermoFisher.CommonCore.Data.Business) and the trailing end-of-file newline. Addresses 2 cosmetic findings from the PR smith-chem-wisc#1022 review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
InternalFragmentStep11AnalysisTests is an exploratory analysis harness, not a validating test: its [Test] methods emit a formatted report with no assertions and read/write a developer-local F:\ tree whose input filename has drifted, so it self-ignores even on the author's machine. Mark it TODO(remove). Addresses 2 minor findings from the PR smith-chem-wisc#1022 review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add InternalFragmentPipelineTests covering the four previously-uncovered Readers/InternalIons files, lifting each from 0%: - InternalFragmentIon -> 100% (header/value arity, computed properties, mass-accuracy filter true/false paths) - InternalFragmentTsvWriter -> 94.5% (write/read round-trip, missing-file guard) - InternalFragmentFeatureExtractor -> 91.3% (ExtractFromPsms over the committed internalIons.psmtsv via the null-tolerant MsDataFile path; empty-input) - InternalFragmentAnalysisRunner (Run end-to-end with a renamed small mzML/mgf, missing-PSM and missing-raw-folder guards, warn-and-continue on an unreadable file). Remaining gap is the Thermo .raw loader arm and an unreachable switch default. No raw spectra committed: reuses internalIons.psmtsv and existing DataFiles fixtures (sliced_ethcd.mzML, tester.mgf). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds internal fragment ion prediction and analysis to mzLib. Internal fragments arise from two simultaneous backbone cleavages, producing an ion that contains neither terminus of the peptide (annotated here as
bIb[start-end]+charge, e.g.bIb[3-7]+1spans residues 3–7). Standard prediction models (Prosit, ms2pip, …) only model the terminal b/y series; this PR contributes the internal series so a predicted spectral library can include both.The work has three independent pieces:
InternalFragmentIntensityModel— a local ONNX model that predicts internal-ion intensities.LibrarySpectrumper peptide.1.
InternalFragmentIntensityModel(local ONNX)PredictionClients/LocalModels/InternalFragmentIntensityModel.cs+ shipped modelInternalFragmentScorer_v3_AllProteases.onnx.Predicts TIC-normalized intensity (TicNI) for internal-fragment candidates of length 3–9. It is protease-agnostic (trained on LysC, Trypsin, ArgC, GluC, AspN, Chymotrypsin) and NCE-agnostic (collision energy is not an explicit input). The model scores 18 engineered features per candidate (fragment length, basic-residue counts, terminal-ion support, proline/aspartate flags, hydrophobicity, etc.), keeps the top N per peptide (default 20), and normalizes intensities to max = 1.0. Each surviving ion becomes a
ProductwithSecondaryProductType/SecondaryFragmentNumberset, which is how mzLib marks an ion as internal (MatchedFragmentIon.IsInternalFragment == true).Pass
spectralLibrarySavePath:to also write the library to an MSP file.2. MixedModels — combine primary + internal ions
PredictionClients/MixedModels/(CombinedLibraryModel,PrimaryIntensityComponent/InternalIntensityComponent,LibrarySpectrumMerger,MixedModelResult,IMixedModelComponent).CombinedLibraryModelruns each component in parallel and unions their fragment ions per peptide (keyed by sequence/charge), so the merged spectrum carries both b/y and internal ions. A failing component (e.g. a Koina outage) is captured rather than aborting the run, so you still get an internal-only library.The factory derives the primary model's inputs and aligned retention times from the internal model (both operate on the same peptide set). For full control, construct the component list directly via the general
CombinedLibraryModel(IReadOnlyList<IMixedModelComponent>, …)constructor.3. InternalIons analysis pipeline (observed ions)
Readers/InternalIons/(InternalFragmentIon,InternalFragmentFeatureExtractor,InternalFragmentAnalysisRunner,InternalFragmentTsvWriter). Independent of the Koina/prediction code.Given a search result and the raw files, it extracts the internal fragments that were actually matched, computes the same 18 features the model uses, and writes a TSV — the data behind training/validating the ONNX model.
Notes
masterand the prediction/MixedModels code was re-ported onto the rewritten KoinaFragmentIntensityModelAPI (Predict(inputs)+GenerateLibrarySpectraFromPredictions(alignedRetentionTimes, …)); behavior is preserved.LibrarySpectrumandSpectralLibrary.ReadFragmentIon(parses/writesbIb[start-end]annotations).[Explicit]and excluded from normal runs.