-
Notifications
You must be signed in to change notification settings - Fork 41
Spectral Averaging
The Spectral Averaging infrastructure provides robust algorithms for combining multiple mass spectra to improve signal-to-noise ratios while intelligently rejecting outliers. Originally developed for top-down proteomics, these methods adapt outlier rejection techniques from astronomical image processing to address mass spectrometry-specific artifacts.
- Multiple Averaging Modes: Average all scans, DDA scans, or every N scans with optional overlap
- Robust Outlier Rejection: Six different algorithms adapted from PixInsight astronomical software
- Flexible Normalization: Multiple strategies for normalizing scan intensities
- Weighted Averaging: Support for different weighting schemes
- High Performance: Multi-threaded processing for large files
- Standard Output: mzML 1.1.0 format with parameter files
// Set up parameters
var parameters = new SpectralAveragingParameters
{
SpectraFileAveragingType = SpectraFileAveragingType.AverageDdaScans,
NumberOfScansToAverage = 5,
OutlierRejectionType = OutlierRejectionType.SigmaClipping,
NormalizationType = NormalizationType.RelativeToTics,
BinSize = 0.01
};
// Load scans from file
var msDataFile = MsDataFileReader.GetDataFile(filePath);
var scans = msDataFile.GetAllScansList();
// Average the spectra
var averagedScans = SpectraFileAveraging.AverageSpectraFile(scans, parameters);
// Write output
AveragedSpectraWriter.WriteAveragedScans(averagedScans, parameters, filePath);Top-down proteomics identification requires:
- High-quality fragmentation spectra for peptide/protein matching
- Accurate neutral mass of the intact proteoform
Intact proteoform spectra are highly complex:
- Multiple overlapping proteoforms
- Many isotopic peaks across charge states
- Lower signal-to-noise ratios compared to peptides
- Artifacts specific to mass spectrometry data
- Signal Averaging: Combining multiple scans improves signal-to-noise
- Outlier Rejection: Removes artifacts before averaging
-
Benefits:
- Improved isotopic envelope refinement
- Increased feature detection
- Higher-quality features
- More proteoform identifications
classDiagram
%% Main Classes
class SpectraFileAveraging {
<<static>>
+AverageSpectraFile(scans, parameters)$ MsDataScan[]
-AverageAll(scans, parameters)$ MsDataScan[]
-AverageEverynScans(scans, parameters)$ MsDataScan[]
-AverageDdaScans(scans, parameters)$ MsDataScan[]
}
class SpectraAveraging {
<<static>>
+AverageSpectra(scans, parameters)$ MzSpectrum
+AverageSpectraMzBinning(scans, parameters)$ MzSpectrum
}
class OutlierRejection {
<<static>>
+RejectOutliers(binnedPeaks, parameters)
-SigmaClipping(binnedPeaks, parameters)
-WinsorizedSigmaClipping(binnedPeaks, parameters)
-AveragedSigmaClipping(binnedPeaks, parameters)
-MinMaxClipping(binnedPeaks)
-PercentileClipping(binnedPeaks, percentile)
-BelowThresholdRejection(binnedPeaks, threshold)
}
class SpectraNormalization {
<<static>>
+NormalizeSpectra(scans, normalizationType)$ List~MzSpectrum~
+NormalizeToTics(scans)$ List~MzSpectrum~
-GetTicNormalizationFactor(spectrum, avgTic)$
}
class SpectralWeighting {
<<static>>
+GetWeights(intensities, weightingType)$ double[]
+GetEvenWeights(count)$ double[]
+GetBasePeakWeights(intensities)$ double[]
+GetTicWeights(intensities)$ double[]
}
class AveragedSpectraWriter {
<<static>>
+WriteAveragedScans(scans, parameters, originalPath)
}
class SpectralAveragingParameters {
+SpectraFileAveragingType SpectraFileAveragingType
+OutlierRejectionType OutlierRejectionType
+NormalizationType NormalizationType
+SpectraWeightingType SpectralWeightingType
+SpectralAveragingType SpectralAveragingType
+NumberOfScansToAverage int
+ScanOverlap int
+BinSize double
+MinSigmaValue double
+MaxSigmaValue double
+Percentile double
+MaxThreadsToUsePerFile int
+SetDefaultValues()
+SetValues(...)
}
class BinnedPeak {
+MzBin double
+Intensities double[]
+Accepted bool[]
+TrimIntensities(acceptedOnly)
}
%% Enums
class SpectraFileAveragingType {
<<enumeration>>
AverageAll
AverageEverynScans
AverageEverynScansWithOverlap
AverageDdaScans
}
class OutlierRejectionType {
<<enumeration>>
NoRejection
MinMaxClipping
PercentileClipping
SigmaClipping
WinsorizedSigmaClipping
AveragedSigmaClipping
BelowThresholdRejection
}
class NormalizationType {
<<enumeration>>
NoNormalization
RelativeToTics
RelativeToMedianTic
}
class SpectraWeightingType {
<<enumeration>>
WeightEvenly
WeightByBasePeakIntensity
WeightByTotalIonCurrent
}
class SpectralAveragingType {
<<enumeration>>
MzBinning
}
%% Relationships
SpectraFileAveraging ..> SpectraAveraging : uses
SpectraFileAveraging ..> SpectralAveragingParameters : uses
SpectraAveraging ..> OutlierRejection : uses
SpectraAveraging ..> SpectraNormalization : uses
SpectraAveraging ..> SpectralWeighting : uses
SpectraAveraging ..> BinnedPeak : creates
SpectralAveragingParameters ..> SpectraFileAveragingType : uses
SpectralAveragingParameters ..> OutlierRejectionType : uses
SpectralAveragingParameters ..> NormalizationType : uses
SpectralAveragingParameters ..> SpectraWeightingType : uses
SpectralAveragingParameters ..> SpectralAveragingType : uses
Manages file-level averaging, grouping scans based on the selected mode.
public static class SpectraFileAveraging
{
public static MsDataScan[] AverageSpectraFile(
List<MsDataScan> scans,
SpectralAveragingParameters parameters);
}Handles the core averaging logic with m/z binning and outlier rejection.
public static class SpectraAveraging
{
public static MzSpectrum AverageSpectra(
this IEnumerable<MsDataScan> scans,
SpectralAveragingParameters parameters);
}Configures all aspects of the averaging process.
public class SpectralAveragingParameters
{
// Averaging mode
public SpectraFileAveragingType SpectraFileAveragingType { get; set; }
// Outlier rejection
public OutlierRejectionType OutlierRejectionType { get; set; }
// Normalization
public NormalizationType NormalizationType { get; set; }
// Weighting
public SpectraWeightingType SpectralWeightingType { get; set; }
// Parameters
public int NumberOfScansToAverage { get; set; } = 5;
public int ScanOverlap { get; set; } = 4;
public double BinSize { get; set; } = 0.01;
public double MinSigmaValue { get; set; } = 1.5;
public double MaxSigmaValue { get; set; } = 1.5;
public double Percentile { get; set; } = 0.1;
}flowchart TD
A[Load Scans] --> B[Group Scans by Mode]
B --> C[For Each Group]
C --> D[Normalize Spectra]
D --> E[Create M/Z Bins]
E --> F{Outlier Rejection}
F -->|No Rejection| G[Calculate Weighted Average]
F -->|Rejection| H[Reject Outliers in Each Bin]
H --> G
G --> I[Create Averaged Spectrum]
I --> J{More Groups?}
J -->|Yes| C
J -->|No| K[Write Output File]
style A fill:#e1f5ff
style K fill:#e1ffe1
style F fill:#fff3e1
-
Group Scans
- Based on
SpectraFileAveragingType - All, every N, or DDA-based grouping
- Based on
-
Normalize Each Spectrum
- To average TIC
- To median TIC
- Or no normalization
-
Create M/Z Bins
- User-defined bin size (default: 0.01 Th)
- Assign peaks to bins
-
Reject Outliers
- Apply selected rejection algorithm
- Per-bin outlier detection
-
Calculate Weighted Average
- Apply weighting scheme
- Compute final m/z and intensity
-
Create Output Scan
- Metadata from representative scan
- Averaged retention time and TIC
Combines all scans in the file into a single averaged spectrum.
var parameters = new SpectralAveragingParameters
{
SpectraFileAveragingType = SpectraFileAveragingType.AverageAll
};
var scans = msDataFile.GetAllScansList();
var averagedScans = SpectraFileAveraging.AverageSpectraFile(scans, parameters);
// Returns: 1 averaged scanUse Cases:
- Creating reference spectra
- Low signal-to-noise throughout acquisition
- Small datasets
Creates non-overlapping groups of N consecutive scans.
var parameters = new SpectralAveragingParameters
{
SpectraFileAveragingType = SpectraFileAveragingType.AverageEverynScans,
NumberOfScansToAverage = 5
};
var averagedScans = SpectraFileAveraging.AverageSpectraFile(scans, parameters);
// Scans: 1-5, 6-10, 11-15, etc.Use Cases:
- Uniform scan-to-scan variation
- Time-course experiments
- When temporal resolution is less critical
Creates overlapping groups for smoother transitions.
var parameters = new SpectralAveragingParameters
{
SpectraFileAveragingType = SpectraFileAveragingType.AverageEverynScansWithOverlap,
NumberOfScansToAverage = 5,
ScanOverlap = 4 // Each group shares 4 scans with next
};
var averagedScans = SpectraFileAveraging.AverageSpectraFile(scans, parameters);
// Scans: 1-5, 2-6, 3-7, 4-8, 5-9, etc.Use Cases:
- Maximum smoothing
- Chromatographic peak tracking
- When retaining more temporal information
Intelligently groups MS1 scans while preserving MS2 structure.
var parameters = new SpectralAveragingParameters
{
SpectraFileAveragingType = SpectraFileAveragingType.AverageDdaScans,
NumberOfScansToAverage = 5,
ScanOverlap = 4
};
var averagedScans = SpectraFileAveraging.AverageSpectraFile(scans, parameters);
// Preserves MS2 scans, averages MS1 with overlapHow It Works:
- MS1 scans: Averaged with overlap (moving average)
- MS2 scans: Passed through unchanged
- MS1-MS2 relationships: Maintained
Use Cases:
- Data-Dependent Acquisition (DDA) datasets
- Top-down proteomics
- When MS2 scans must remain intact
All intensity values are retained. Fastest option but most susceptible to artifacts.
parameters.OutlierRejectionType = OutlierRejectionType.NoRejection;Use When:
- Data is very clean
- Speed is critical
- For comparison purposes
These algorithms iteratively remove outliers based on statistical thresholds.
Algorithm: Remove values outside median ± n*σ iteratively until no more outliers.
parameters.OutlierRejectionType = OutlierRejectionType.SigmaClipping;
parameters.MinSigmaValue = 1.5; // Lower bound multiplier
parameters.MaxSigmaValue = 1.5; // Upper bound multiplierHow It Works:
- Calculate median and standard deviation
- Reject values outside
median - (MinSigma * σ)tomedian + (MaxSigma * σ) - Recalculate median and σ with remaining values
- Repeat until no more rejections or ≤ 2 values remain
Best For:
- Gaussian-distributed noise
- General-purpose outlier rejection
- When outliers are symmetric
Example:
// Asymmetric rejection - more aggressive on high side
parameters.OutlierRejectionType = OutlierRejectionType.SigmaClipping;
parameters.MinSigmaValue = 2.0; // Keep more low values
parameters.MaxSigmaValue = 1.0; // Reject high values aggressivelyAlgorithm: Replace outliers with boundary values before calculating statistics.
parameters.OutlierRejectionType = OutlierRejectionType.WinsorizedSigmaClipping;
parameters.MinSigmaValue = 1.5;
parameters.MaxSigmaValue = 1.5;How It Works:
- Identify outliers outside
median ± 1.5σ - Replace outliers with nearest acceptable value (winsorization)
- Calculate median and σ using winsorized values
- Apply sigma clipping with winsorized statistics
- Iterate until convergence
Advantages:
- More robust to extreme outliers
- Better median and σ estimates
- Less influenced by outlier magnitude
Best For:
- Data with extreme outliers
- Skewed distributions
- When occasional very high/low values occur
Algorithm: Uses noise model based on Poisson statistics.
parameters.OutlierRejectionType = OutlierRejectionType.AveragedSigmaClipping;
parameters.MinSigmaValue = 1.5;
parameters.MaxSigmaValue = 1.5;How It Works:
- Assumes Poisson-based shot noise
- Calculates σ based on intensity-dependent noise model
- Applies sigma clipping with intensity-scaled thresholds
Advantages:
- Accounts for intensity-dependent noise
- Better for mass spectrometry data
- More accurate at different intensity regimes
Best For:
- Mass spectrometry data (shot noise dominant)
- Wide dynamic range
- When low and high intensity peaks need different treatment
These algorithms apply rejection criteria in a single pass.
Algorithm: Remove the highest and lowest intensity values.
parameters.OutlierRejectionType = OutlierRejectionType.MinMaxClipping;How It Works:
- Removes exactly 2 values per bin (min and max)
- Simple and fast
- No parameters needed
Best For:
- Quick outlier removal
- When you know outliers are at extremes
- Small scan groups (5-10 scans)
Example:
// For 5 scans: Keep middle 3 values per m/z bin
parameters.OutlierRejectionType = OutlierRejectionType.MinMaxClipping;Algorithm: Remove values below/above specified percentiles.
parameters.OutlierRejectionType = OutlierRejectionType.PercentileClipping;
parameters.Percentile = 0.1; // Remove bottom and top 10%How It Works:
- Calculate percentile thresholds
- Remove values < lower percentile or > upper percentile
- Symmetric rejection by default
Best For:
- Known contamination rate
- Uniform outlier distribution
- Large scan groups (>10 scans)
Examples:
// Conservative: Remove only 5% on each end
parameters.Percentile = 0.05;
// Aggressive: Remove 20% on each end
parameters.Percentile = 0.20;Algorithm: Remove entire bin if insufficient values present.
parameters.OutlierRejectionType = OutlierRejectionType.BelowThresholdRejection;
// Threshold automatically set to 70% of NumberOfScansToAverageHow It Works:
- Count valid intensities in bin
- If count < threshold (default: 70% of scans), reject entire bin
- Removes sporadic peaks
Best For:
- Removing transient artifacts
- Ensuring peak consistency across scans
- Quality control
Use With:
parameters.NumberOfScansToAverage = 10;
// Bins need ≥7 values (70%) to be kept
parameters.OutlierRejectionType = OutlierRejectionType.BelowThresholdRejection;Normalizes each spectrum to the average TIC of all scans being averaged.
parameters.NormalizationType = NormalizationType.RelativeToTics;How It Works:
- Calculate average TIC across all scans
- Scale each scan's intensities:
intensity * (avgTIC / scanTIC) - Ensures equal contribution from each scan
Best For:
- General-purpose averaging
- When TIC varies between scans
- Default choice
Similar to TIC normalization but uses median instead of mean.
parameters.NormalizationType = NormalizationType.RelativeToMedianTic;Advantages:
- More robust to outlier scans
- Better when TIC distribution is skewed
Scans are averaged without intensity normalization.
parameters.NormalizationType = NormalizationType.NoNormalization;Use When:
- TIC is already normalized
- Comparing normalization effects
- Scans have similar TIC values
All scans contribute equally to the average.
parameters.SpectralWeightingType = SpectraWeightingType.WeightEvenly;Best For:
- Most cases
- After normalization
- When scans are similar quality
Scans with higher base peak intensity contribute more.
parameters.SpectralWeightingType = SpectraWeightingType.WeightByBasePeakIntensity;Scans with higher TIC contribute more.
parameters.SpectralWeightingType = SpectraWeightingType.WeightByTotalIonCurrent;var parameters = new SpectralAveragingParameters
{
// Preserve MS2, average MS1 scans
SpectraFileAveragingType = SpectraFileAveragingType.AverageDdaScans,
NumberOfScansToAverage = 5,
ScanOverlap = 4,
// Robust outlier rejection
OutlierRejectionType = OutlierRejectionType.WinsorizedSigmaClipping,
MinSigmaValue = 1.5,
MaxSigmaValue = 1.5,
// Normalize and weight evenly
NormalizationType = NormalizationType.RelativeToTics,
SpectralWeightingType = SpectraWeightingType.WeightEvenly,
// Standard binning
BinSize = 0.01,
// Use multiple threads
MaxThreadsToUsePerFile = 4
};var parameters = new SpectralAveragingParameters
{
// Moving average with high overlap
SpectraFileAveragingType = SpectraFileAveragingType.AverageEverynScansWithOverlap,
NumberOfScansToAverage = 10,
ScanOverlap = 9, // 90% overlap for smooth chromatogram
// Conservative outlier rejection
OutlierRejectionType = OutlierRejectionType.SigmaClipping,
MinSigmaValue = 2.0,
MaxSigmaValue = 2.0,
// Normalization
NormalizationType = NormalizationType.RelativeToMedianTic,
SpectralWeightingType = SpectraWeightingType.WeightEvenly,
// Fine binning for high resolution
BinSize = 0.005
};var parameters = new SpectralAveragingParameters
{
// Average all for single output
SpectraFileAveragingType = SpectraFileAveragingType.AverageAll,
// Minimal outlier rejection
OutlierRejectionType = OutlierRejectionType.MinMaxClipping,
// No normalization for speed
NormalizationType = NormalizationType.NoNormalization,
SpectralWeightingType = SpectraWeightingType.WeightEvenly,
// Standard binning
BinSize = 0.01,
// Single thread for small files
MaxThreadsToUsePerFile = 1
};Averaged spectra are written in mzML 1.1.0 format.
Filename Convention:
original_filename-averaged.mzML
Example:
Input: sample_topdown.raw
Output: sample_topdown-averaged.mzML
Each averaged scan retains metadata from the first scan in the group:
- Scan number: Sequential (1, 2, 3...)
- Retention time: Average of grouped scans
- Total ion current: Average of grouped scans
- Polarity: From representative scan
- MS level: From representative scan
- Centroid/Profile: From representative scan
using Readers;
using SpectralAveraging;
using MassSpectrometry;
// 1. Load data file
string inputPath = "data/sample_topdown.raw";
var msDataFile = MsDataFileReader.GetDataFile(inputPath);
var allScans = msDataFile.GetAllScansList();
Console.WriteLine($"Loaded {allScans.Count} scans");
// 2. Configure averaging parameters
var parameters = new SpectralAveragingParameters
{
// DDA-specific averaging
SpectraFileAveragingType = SpectraFileAveragingType.AverageDdaScans,
NumberOfScansToAverage = 5,
ScanOverlap = 4,
// Robust outlier rejection
OutlierRejectionType = OutlierRejectionType.WinsorizedSigmaClipping,
MinSigmaValue = 1.5,
MaxSigmaValue = 1.5,
// Standard normalization
NormalizationType = NormalizationType.RelativeToTics,
SpectralWeightingType = SpectraWeightingType.WeightEvenly,
// Standard binning
BinSize = 0.01,
// Multi-threading
MaxThreadsToUsePerFile = 4,
// Output format
OutputType = OutputType.MzML
};
// 3. Perform averaging
Console.WriteLine("Averaging spectra...");
var averagedScans = SpectraFileAveraging.AverageSpectraFile(allScans, parameters);
Console.WriteLine($"Created {averagedScans.Length} averaged scans");
// 4. Write output
string outputDir = "data/averaged/";
Directory.CreateDirectory(outputDir);
AveragedSpectraWriter.WriteAveragedScans(averagedScans, parameters, inputPath);
Console.WriteLine($"Output written to: {Path.GetFileNameWithoutExtension(inputPath)}-averaged.mzML");Problem: Out of memory errors with large files
Solutions:
// Reduce averaging window
parameters.NumberOfScansToAverage = 3;
// Reduce threading
parameters.MaxThreadsToUsePerFile = 1;
// Increase bin size
parameters.BinSize = 0.05;
// Process in batches
var batchSize = 1000;
for (int i = 0; i < allScans.Count; i += batchSize)
{
var batch = allScans.Skip(i).Take(batchSize).ToList();
var averagedBatch = SpectraFileAveraging.AverageSpectraFile(batch, parameters);
// Process or write batch
}SpectralAveraging (this project)
↓
MassSpectrometry (scans and spectra)
↓
Readers (file I/O)
↓
MzLibUtil (utilities)
- MathNet.Numerics: Statistical calculations
- CsvHelper: Parameter file I/O
- MetaMorpheus: Built-in averaging task for top-down searches
- Custom Workflows: Preprocessing before quantification or identification
- Data QC: Assessing data quality through averaged spectra
- PixInsight: Astronomical Image Processing Software - Source of outlier rejection algorithms
- MetaMorpheus: GitHub - Top-down proteomics implementation
- mzML 1.1.0: PSI Standard - Output file format
- MS Data File Reading
- Deconvolution Wiki - Processing averaged spectra