Skip to content

Logical draft of how the metric processors shall work#1560

Draft
ArneTR wants to merge 1 commit into
mainfrom
metric-processors
Draft

Logical draft of how the metric processors shall work#1560
ArneTR wants to merge 1 commit into
mainfrom
metric-processors

Conversation

@ArneTR

@ArneTR ArneTR commented Feb 12, 2026

Copy link
Copy Markdown
Member

This PR outlines the concept of introducing two new stages to the GMT:

  • Post Metric Processors
  • Post Phases Stat Processors

They run in the pipeline as follows:
Metric Providers -> Post Metric Processors -> Phase Stats -> Post Phase Stats Processors

The already commited config.yml.example outlines how they are defined in the config.

Simplifications and Domain Logic

This PR simplifies the mental load on where processing happens in the GMT and makes it more explicit and more modular.

  • The phase_stats.py will then really only create avg and aggregate values of metric data
  • SCI will not be all over the place but in a separate stage. Making it also more easy to get SCI values per phase
  • Overloaded Providers like XGBoost and SDIA are then in their separate stage. Not having the need to be processed last in the metric provider stage and loading in "hopefully" generated values by other providers ... which is currently only by accident :)

When we implement this approach the following refactorings need to be done:

  • Rip out the calculate_co2_intensity from the phase_stats.py and bring it to a Post Metric Processor (
    def calculate_co2_intensity(run_id):
    )
  • Migrate the XGBoost and SDIA providers to be Post Metric Processors
  • Migrate the SCI generation code from the phase_stats and bring it to the Post Phases Stats Processors
  • Look at https://if.greensoftware.foundation/pipelines/ to see if we can take some good ideas from how IF thinks about pipelines. We want it less complex. But still might find some good work there

@ArneTR

ArneTR commented Feb 12, 2026

Copy link
Copy Markdown
Member Author

@ribalba Please add what you think we also need

@ArneTR ArneTR mentioned this pull request Feb 12, 2026

@ribalba ribalba left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Is what we discussed. The more I think about this the more I like how clean it separates concerns.

Comment thread config.yml.example
- cpu.energy.rapl.msr.component.provider.CpuEnergyRaplMsrComponentProvider:
- psu.energy.ac.mcp.machine.provider.PsuEnergyAcMcpMachineProvider
# - ...
carbon_input: grid.intensity.elephant.api.global.GridIntensityElephantApiGlobalProvider

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you creating a new key carbon_input? Wouldn't this just be another input?

Comment thread config.yml.example
# Another alternative is to take a cpu_utilization_* value and converting into a psu_energy_* value like Cloud Energy
post_metric_provider_processors:
SDIAPostMetricProviderProcessor:
# Processor outputs psu_energy_ac_xgboost_machine

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the SDIA output a linear model and not XGBoost?

Comment thread config.yml.example

post_phase_stats_processors:
# https://github.com/Green-Software-Foundation/sci/blob/main/Software_Carbon_Intensity/Software_Carbon_Intensity_Specification.md
SCI:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still add a processor script here that then has the SCI values as parameters. Like with metric providers.

Comment thread config.yml.example
#--- END

# Processors
# These processors run after the metric_providers stage

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make clear that they are called in order they are defined here. This is important if a "carbon" processor relies on an "energy" processor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants