Skip to content

Add filter tracing decorator to scheduling pipeline#1707

Open
chethanuk wants to merge 3 commits into
llm-d:mainfrom
chethanuk:issue-1693-filter-span
Open

Add filter tracing decorator to scheduling pipeline#1707
chethanuk wants to merge 3 commits into
llm-d:mainfrom
chethanuk:issue-1693-filter-span

Conversation

@chethanuk

Copy link
Copy Markdown

What

Add a single OTel filter_endpoints span over the scheduling filter stage,
carrying the input/output endpoint counts and request-correlation keys.

Implements #1693 (sub-task of #1483).

Design

Inline, single-step span, following the merged tracing convention from #1565
(the repo standard: score_prefix_cache / pick_pd_profile are traced inline,
not via a decorator) and the maintainer guidance on #1693:

  • runFilterPlugins starts one filter_endpoints span (SpanKindInternal) via
    tracing.Tracer(schedplugins.TracerScope) around the whole filter chain, with
    defer span.End() so it ends on the drain break.
  • Attributes: llm_d.epp.filter.candidate_endpoints (input count),
    llm_d.epp.filter.filtered_endpoints (final output count), plus the conditional
    shared gen_ai.request.model / gen_ai.request.id keys — matching
    score_prefix_cache. No type/name attrs (the span name identifies the
    operation).
  • The span context is threaded into each filter.Filter(...), so inner spans
    nest. Filtering behavior and the per-plugin latency metric are unchanged.

Scope: schedplugins.TracerScope = llm-d-router/pkg/epp/framework/plugins/scheduling.

Review feedback addressed

  • Per-request wrapper allocation (gemini, codeant): the TracedFilter decorator
    is removed entirely; there is no per-request wrapper.
  • otel.Tracer() vs the project helper (codeant): now uses tracing.Tracer(...),
    so spans carry the BuildRef / commit-sha instrumentation metadata.
  • Span name, granularity, and attribute shape: aligned to the maintainer's Add span for filter #1693
    decision (single-step span, llm_d.epp.filter.* keys, package scope).

Tests

scheduler_profile_tracing_test.go (spans read as a slice via a tracetest
recorder): single span with name/kind/parent and counts; one span for a
multi-filter chain (candidate = first input, filtered = final output);
drain-break still ends the span with filtered_endpoints == 0; gen_ai.*
omitted when request fields are empty; inner delegate span nests under
filter_endpoints.

Gates: go build ./..., go test ./pkg/epp/scheduling/... -race, go vet,
and make lint (new-only) all green.

Refs: #1693, #1483

@chethanuk chethanuk requested a review from a team as a code owner June 22, 2026 16:18
@chethanuk chethanuk requested review from ahg-g and vMaroon June 22, 2026 16:18
@github-actions github-actions Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 22, 2026
Comment thread pkg/epp/scheduling/scheduler_profile.go Outdated
filteredEndpoints := endpoints
logger.V(logutil.DEBUG).Info("Before running filter plugins", "endpoints", filteredEndpoints)

ctx, span := tracing.Tracer(schedplugins.TracerScope).Start(ctx, "filter_endpoints",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the TracerScope as llm-d-router/pkg/epp/scheduling

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span now uses TracerScope = "llm-d-router/pkg/epp/scheduling" from a local constants.go file (matching the pattern used in other packages) instead of the plugins scope. The leftover schedplugins import has also been removed.

Wrap the filter chain in runFilterPlugins in a single inline "filter_endpoints"
span via tracing.Tracer(schedplugins.TracerScope) (SpanKindInternal), recording
the input and output endpoint counts (llm_d.epp.filter.candidate_endpoints,
llm_d.epp.filter.filtered_endpoints) plus the conditional gen_ai.request.{model,id}
keys. Follows the single-step span convention from llm-d#1565 and llm-d#1693
(score_prefix_cache); filtering behavior and the per-plugin latency metric are
unchanged.

Refs: llm-d#1693, llm-d#1483
Signed-off-by: ChethanUK <chethanuk@outlook.com>
@chethanuk chethanuk force-pushed the issue-1693-filter-span branch from efd78ac to a8095b8 Compare June 22, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants