Skip to content

feat: implement DumpState for program-aware-fairness#1839

Open
thc1006 wants to merge 1 commit into
llm-d:mainfrom
thc1006:ops/dumpstate-program-aware-fairness
Open

feat: implement DumpState for program-aware-fairness#1839
thc1006 wants to merge 1 commit into
llm-d:mainfrom
thc1006:ops/dumpstate-program-aware-fairness

Conversation

@thc1006

@thc1006 thc1006 commented Jun 27, 2026

Copy link
Copy Markdown

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds plugin.StateDumper to the program-aware-fairness plugin so its health can
be inspected through GET /debug/plugins/state, the same way the
inflight-load-producer already exposes its state.

Unlike the other plugins, this one keys its state by program ID, which comes from
a user-controlled request header (x-llm-d-inference-fairness-id) and is
high-cardinality. Dumping those IDs would expose user-controlled, tenant-identifying
values, which docs/plugin_debug.md says to omit. So DumpState reports only
bounded aggregates and no IDs: the number of tracked programs, total in-flight
requests, and Jain's fairness index (which the plugin already computes). That is
the right operational signal for a fairness policy without leaking identities.

Example response for this plugin:

{"totalPrograms":12,"totalInFlight":34,"fairnessIndex":0.92}

A shared jainFairnessIndex helper keeps the index formula in one place so the
dump and the existing metric come from a single snapshot of the program map.

Part of #1755.

Which issue(s) this PR fixes:

Fixes #1798

Release note (write NONE if no user-facing change):

NONE

Testing:

New unit tests, all passing:

  • Aggregates (program count, in-flight, fairness index) are reported
  • User-controlled program IDs never appear in the dump
  • An empty policy returns valid JSON (trivially fair)

Expose aggregate fairness health through /debug/plugins/state: the number of
tracked programs, total in-flight requests, and Jain's fairness index. Program
IDs come from a user-controlled request header, so they are omitted rather than
dumped; only these bounded aggregates are reported.

Part of llm-d#1755.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
@thc1006 thc1006 requested review from a team, LukeAVanDrie and shmuelk as code owners June 27, 2026 16:21
@thc1006 thc1006 requested review from ahg-g and liu-cong June 27, 2026 16:21
@github-actions github-actions Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 27, 2026
@thc1006

thc1006 commented Jun 28, 2026

Copy link
Copy Markdown
Author

The failing e2e-router (pd) check is the "should report metrics" spec, which is flaky on main at the moment. It fails on the base commit (1fa3803) with the same spec and nothing changed, and this PR only touches the program-aware fairness plugin. Could you re-run that job when you get a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Operations] Implement DumpState for program-aware-fairness plugin

2 participants