fix(drift): derive fitColumns from stored metadata when omitted by SudipSinha · Pull Request #180 · trustyai-explainability/trustyai-service

SudipSinha · 2026-06-16T07:42:49Z

Summary

All drift metric endpoints reject requests with HTTP 400 when fitColumns is not provided. The Java service treats fitColumns as optional — when omitted, it fits on all available input columns. This mismatch causes integration tests to fail when scheduling drift metrics without explicit column lists.

Root cause

Each drift endpoint's compute and schedule functions had a hard validation:

if not request.fit_columns:
    raise HTTPException(
        status_code=HTTPStatus.BAD_REQUEST,
        detail="fitColumns is required - specify which features to test for drift",
    )

In the Java service (DriftMetricRequest.java), fitColumns is a Set<String> with no validation — fully optional. When omitted, the Java implementation fits on all available input columns from stored data.

The fix

When fitColumns is omitted, derive it from the model's stored input schema metadata:

if not request.fit_columns:
    data_source = get_data_source()
    metadata = await data_source.get_metadata(request.model_id)
    request.fit_columns = list(metadata.input_schema.items.keys())

When fitColumns is explicitly provided, the existing whitespace validation is preserved (compare_means) or the values are used as-is (kstest, jensenshannon).

Applied consistently across all 6 endpoint functions:

Endpoint	Compute	Schedule
CompareMeans	fixed	fixed
KSTest	fixed	fixed (was missing validation entirely)
JensenShannon	fixed	fixed

Also fixed a latent bug in compare_means.py where the feature iteration loop used valid_features (only defined in the else branch) instead of request.fit_columns.

Files changed

File	Change
`src/endpoints/metrics/drift/compare_means.py`	Derive fitColumns in compute + schedule, fix `valid_features` → `request.fit_columns`
`src/endpoints/metrics/drift/kolmogorov_smirnov.py`	Derive fitColumns in compute + schedule
`src/endpoints/metrics/drift/jensen_shannon.py`	Derive fitColumns in compute + schedule
`tests/endpoints/metrics/drift/factory.py`	Add `get_metadata` AsyncMock to both test factories
`tests/endpoints/metrics/drift/test_compare_means.py`	Convert error test to success test for missing fitColumns
`tests/endpoints/metrics/drift/test_kolmogorov_smirnov.py`	Convert error test to success test for missing fitColumns
`tests/endpoints/metrics/drift/test_jensen_shannon.py`	Convert error test to success test for missing fitColumns

Test plan

All 100 drift metric tests pass (3 converted from error→success tests)
ruff check clean
ruff format clean
pyrefly check clean
Rebuild container image and verify POST /metrics/drift/meanshift/request with {"modelId": "...", "referenceTag": "TRAINING"} returns 200

🤖 Generated with Claude Code

The Python service rejected drift metric requests with HTTP 400 when fitColumns was not provided. The Java service treats fitColumns as optional, fitting on all input columns when omitted. Derive fitColumns from the model's stored input schema metadata across all 6 drift endpoint functions (3 compute + 3 schedule). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sudip Sinha <Sudip.Sinha@RedHat.com>

coderabbitai · 2026-06-16T07:42:59Z

Warning

Review limit reached

@SudipSinha, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 32 minutes and 41 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 162b2075-603c-4fa8-b819-ffa680c65693

📥 Commits

Reviewing files that changed from the base of the PR and between 1e3e6d2 and 35a8b43.

📒 Files selected for processing (7)

src/endpoints/metrics/drift/compare_means.py
src/endpoints/metrics/drift/jensen_shannon.py
src/endpoints/metrics/drift/kolmogorov_smirnov.py
tests/endpoints/metrics/drift/factory.py
tests/endpoints/metrics/drift/test_compare_means.py
tests/endpoints/metrics/drift/test_jensen_shannon.py
tests/endpoints/metrics/drift/test_kolmogorov_smirnov.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/fitcolumns-optional-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Apply the same fix as PR #180 to the streaming KS test endpoints: when fitColumns is not provided, derive it from the model's stored input schema metadata instead of rejecting with HTTP 400. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sudip Sinha <Sudip.Sinha@RedHat.com>

github-actions · 2026-06-16T07:48:55Z

PR image build and manifest generation completed successfully!

📦 PR image: quay.io/trustyai/trustyai-service-python-ci:35a8b4315bfad78df4f1707be1bbf04da1574f13

🗂️ CI manifests

devFlags:
  manifests:
    - contextDir: config
      sourcePath: ''
      uri: https://api.github.com/repos/trustyai-explainability/trustyai-service-operator-ci/tarball/service-python-35a8b4315bfad78df4f1707be1bbf04da1574f13

Apply the same fix as PR #180 to the MMD endpoint: when fitColumns is not provided, derive it from the model's stored input schema metadata instead of rejecting with HTTP 400. The shared _validate_drift_request function is now async to support the metadata lookup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sudip Sinha <Sudip.Sinha@RedHat.com>

Apply the same fix as PR #180 to the streaming KS test endpoints: when fitColumns is not provided, derive it from the model's stored input schema metadata instead of rejecting with HTTP 400. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sudip Sinha <Sudip.Sinha@RedHat.com>

Apply the same fix as PR #180 to the MMD endpoint: when fitColumns is not provided, derive it from the model's stored input schema metadata instead of rejecting with HTTP 400. The shared _validate_drift_request function is now async to support the metadata lookup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sudip Sinha <Sudip.Sinha@RedHat.com>

Apply the same fix as PR #180 to the streaming KS test endpoints: when fitColumns is not provided, derive it from the model's stored input schema metadata instead of rejecting with HTTP 400. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sudip Sinha <Sudip.Sinha@RedHat.com>

SudipSinha self-assigned this Jun 16, 2026

SudipSinha added bug Something isn't working python Pull requests that update python code ok-to-test Proceed with CI testing tests high-priority Critical issues that should be addressed soon labels Jun 16, 2026

SudipSinha mentioned this pull request Jun 18, 2026

feat: Streaming Kolmogorov-Smirnov test for drift detection #99

Open

SudipSinha marked this pull request as ready for review June 18, 2026 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(drift): derive fitColumns from stored metadata when omitted#180

fix(drift): derive fitColumns from stored metadata when omitted#180
SudipSinha wants to merge 1 commit into
mainfrom
fix/fitcolumns-optional-validation

SudipSinha commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Review limit reached

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SudipSinha commented Jun 16, 2026

Summary

Root cause

The fix

Files changed

Test plan

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading