feat(datasource): implement get_dataframe_by_tag for drift metric reference data by SudipSinha · Pull Request #183 · trustyai-explainability/trustyai-service

SudipSinha · 2026-06-16T08:43:35Z

Summary

All drift metric endpoints call data_source.get_dataframe_by_tag(model_id, reference_tag) to retrieve reference data, but the method doesn't exist on DataSource. Every drift computation fails with AttributeError and HTTP 500.

Root cause

The drift endpoints were written to call get_dataframe_by_tag (matching the Java service's filterRowsByTagEquals pattern), but the method was never implemented on the Python DataSource class.

How it works

The method reads all stored data for a model, then filters rows by tag:

Read input data and metadata via ModelData.data()
Find the tags column in metadata (column index 3, contains list[str] per row)
Build a boolean mask: tag in row_tags for each row
Apply the mask to the input data array
Return a pandas DataFrame with input columns only

Returns input columns only because drift metrics iterate over request.fit_columns (input feature names) and access reference_df[feature_name].

Reads all rows (no batch size) because reference data tagged e.g. "TRAINING" could be anywhere in the dataset, not just the last N rows.

Handles numpy arrays from column_names() — ModelData.column_names() returns numpy arrays, not Python lists. The implementation converts to list() before calling .index("tags") to avoid AttributeError. Test mocks use np.array() to match the real return type and catch this class of bug.

Files changed

File	Change
`src/service/data/datasources/data_source.py`	Add `get_dataframe_by_tag` method
`tests/service/data/datasources/test_datasource.py`	3 tests: matching rows, non-existent tag, missing data (mocks use numpy arrays)

Test plan

All 24 datasource tests pass (3 new)
Test mocks use np.array() for column_names() return values, matching real ModelData behavior
ruff check clean
ruff format clean
pyrefly check clean
Rebuild container image and verify drift metric computation succeeds with tagged reference data

🤖 Generated with Claude Code

coderabbitai · 2026-06-16T08:43:45Z

Warning

Review limit reached

@SudipSinha, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 14 minutes and 57 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df0f7ded-78ac-4b65-8ec4-fa6441353484

📥 Commits

Reviewing files that changed from the base of the PR and between eeca7e6 and bd48cc2.

📒 Files selected for processing (2)

src/service/data/datasources/data_source.py
tests/service/data/datasources/test_datasource.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/missing-get-dataframe-by-tag

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-16T08:48:21Z

PR image build and manifest generation completed successfully!

📦 PR image: quay.io/trustyai/trustyai-service-python-ci:bd48cc2ceed2620c6cc32cec7de2576227246d04

🗂️ CI manifests

devFlags:
  manifests:
    - contextDir: config
      sourcePath: ''
      uri: https://api.github.com/repos/trustyai-explainability/trustyai-service-operator-ci/tarball/service-python-bd48cc2ceed2620c6cc32cec7de2576227246d04

Amended from PR #183 — column_names() returns numpy arrays, not lists, so .index() needs list() wrapping. Test mocks updated to use np.array() to match real return types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RobGeada · 2026-06-18T09:50:41Z

+        """
+        try:
+            model_data = ModelData(model_id)
+            input_data, _, metadata = await model_data.data()


Is there no need to get output_data here?

…erence data All drift metric endpoints call data_source.get_dataframe_by_tag() to retrieve reference data filtered by tag (e.g. "TRAINING"), but the method did not exist on DataSource, causing AttributeError and HTTP 500 for every drift computation. Also converts metadata_names to list before calling .index() since column_names() returns numpy arrays which lack .index(). Test mocks use numpy arrays to match the real return type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sudip Sinha <Sudip.Sinha@RedHat.com>

…e_by_tag MariaDB round-trips tags through json/gzip/LONGBLOB, producing numpy arrays instead of Python lists. The isinstance(cell, list) check returned False, causing all rows to be filtered out. Extract tags via _extract_tags helper that handles both list and np.ndarray. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sudip Sinha <Sudip.Sinha@RedHat.com>

SudipSinha self-assigned this Jun 16, 2026

SudipSinha added bug Something isn't working python Pull requests that update python code ok-to-test Proceed with CI testing tests high-priority Critical issues that should be addressed soon labels Jun 16, 2026

SudipSinha force-pushed the fix/missing-get-dataframe-by-tag branch from a062c5d to 69ce598 Compare June 16, 2026 10:04

RobGeada requested changes Jun 18, 2026

View reviewed changes

SudipSinha marked this pull request as ready for review June 18, 2026 14:08

SudipSinha and others added 2 commits June 18, 2026 16:36

SudipSinha force-pushed the fix/missing-get-dataframe-by-tag branch from 28feb59 to bd48cc2 Compare June 18, 2026 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datasource): implement get_dataframe_by_tag for drift metric reference data#183

feat(datasource): implement get_dataframe_by_tag for drift metric reference data#183
SudipSinha wants to merge 2 commits into
mainfrom
fix/missing-get-dataframe-by-tag

SudipSinha commented Jun 16, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Review limit reached

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

RobGeada Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SudipSinha commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

How it works

Files changed

Test plan

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RobGeada Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SudipSinha commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading