Skip to content

feat(giskard-checks): add SuiteRunTrendAnalyzer for pass_rate regression detection 🤖🤖🤖🤖#2518

Open
nuthalapativarun wants to merge 2 commits into
Giskard-AI:mainfrom
nuthalapativarun:feat/2355-suite-run-trend-analyzer
Open

feat(giskard-checks): add SuiteRunTrendAnalyzer for pass_rate regression detection 🤖🤖🤖🤖#2518
nuthalapativarun wants to merge 2 commits into
Giskard-AI:mainfrom
nuthalapativarun:feat/2355-suite-run-trend-analyzer

Conversation

@nuthalapativarun

Copy link
Copy Markdown
Contributor

Description

Adds SuiteRunTrendAnalyzer to detect pass_rate regressions across sequential suite runs.

Implements the four new types proposed in the issue:

  • SuiteRunPoint — immutable snapshot of one run's pass_rate and counts
  • SuiteTrend — OLS slope + direction (improving / stable / degrading) + is_regression flag
  • SuiteRunTrendReport — aggregate report over the analysis window
  • SuiteRunTrendAnalyzer — stateful recorder with configurable window, regression_threshold, and improvement_threshold

Uses statistics.linear_regression (stdlib, Python ≥ 3.10) — zero new dependencies.
All classes are exported from giskard.checks and giskard.checks.core.

Related Issue

Closes #2355

Type of Change

  • 🚀 New feature

Checklist

  • Read CODE_OF_CONDUCT.md
  • Read CONTRIBUTING.md
  • Written tests for all new methods/classes
  • Written NumPy-format docstrings for new methods/classes
  • Updated uv.lock (if pyproject.toml changed)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a suite run trend analysis feature to detect pass rate regressions across sequential runs using ordinary least-squares (OLS) linear regression. It adds the "SuiteRunTrendAnalyzer" class and associated data classes ("SuiteRunPoint", "SuiteTrend", "SuiteRunTrendReport"), along with comprehensive unit tests. Feedback on the changes highlights a Python 3.10 compatibility issue due to importing "UTC" directly from "datetime" (which was introduced in Python 3.11), and suggests validating that the "regression_threshold" is strictly less than the "improvement_threshold" during initialization.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


import statistics
from dataclasses import dataclass, field
from datetime import UTC, datetime

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The import from datetime import UTC was introduced in Python 3.11. Since this module is intended to support Python >= 3.10 (as stated in the docstring and PR description), importing UTC directly from datetime will raise an ImportError on Python 3.10.

Please import timezone from datetime instead and use timezone.utc to maintain compatibility with Python 3.10.

Suggested change
from datetime import UTC, datetime
from datetime import datetime, timezone

Optional wall-clock time for this run. When omitted,
``datetime.now(UTC)`` is used.
"""
self._runs.append((result, timestamp or datetime.now(UTC)))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Use timezone.utc instead of UTC to maintain compatibility with Python 3.10.

Suggested change
self._runs.append((result, timestamp or datetime.now(UTC)))
self._runs.append((result, timestamp or datetime.now(timezone.utc)))

Comment on lines +145 to +149
if window < 2:
raise ValueError("window must be >= 2 (OLS requires at least two points)")
self.window = window
self.regression_threshold = regression_threshold
self.improvement_threshold = improvement_threshold

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is a good practice to validate that regression_threshold is strictly less than improvement_threshold. If regression_threshold >= improvement_threshold, it could lead to logical contradictions or unexpected behavior during trend classification in analyze().

        if window < 2:
            raise ValueError("window must be >= 2 (OLS requires at least two points)")
        if regression_threshold >= improvement_threshold:
            raise ValueError(
                f"regression_threshold ({regression_threshold}) must be less than "
                f"improvement_threshold ({improvement_threshold})"
            )
        self.window = window
        self.regression_threshold = regression_threshold
        self.improvement_threshold = improvement_threshold

@nuthalapativarun

Copy link
Copy Markdown
Contributor Author

Addressed both issues raised in the bot review:

  1. Python 3.10 compat: Replaced from datetime import UTC (Python 3.11+) with from datetime import datetime, timezone and updated the call site to datetime.now(timezone.utc).
  2. Threshold ordering validation: Added a ValueError in __init__ when regression_threshold >= improvement_threshold, ensuring the thresholds are always ordered correctly.

@nuthalapativarun nuthalapativarun force-pushed the feat/2355-suite-run-trend-analyzer branch from 179bb16 to 41c7bb0 Compare June 5, 2026 18:51
nuthalapativarun and others added 2 commits June 15, 2026 08:56
…ion detection

Implements SuiteRunTrendAnalyzer, SuiteRunTrendReport, SuiteTrend, and
SuiteRunPoint in giskard/checks/core/trend.py. Uses stdlib statistics.linear_regression
(OLS) over a rolling window of SuiteResult runs to classify pass_rate slope
as improving, stable, or degrading. Zero new dependencies.

Closes Giskard-AI#2355
…tion

Replace datetime.UTC (Python 3.11+) with datetime.timezone.utc for
Python 3.10 compatibility. Add validation that regression_threshold is
strictly less than improvement_threshold in SuiteRunTrendAnalyzer.__init__.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nuthalapativarun nuthalapativarun force-pushed the feat/2355-suite-run-trend-analyzer branch from 41c7bb0 to 3381a61 Compare June 15, 2026 15:57
@nuthalapativarun

Copy link
Copy Markdown
Contributor Author

This branch has been rebased onto the latest main to resolve merge conflicts (the new Target/Trace types and SuiteRunTrendAnalyzer/SuiteRunPoint/SuiteRunTrendReport/SuiteTrend exports both landed in giskard.checks.__init__ and giskard.checks.core.__init__, so both sets of exports/imports were merged together).

All previously addressed bot feedback remains in place. Tests for the trend analyzer (tests/core/test_trend.py, 22 tests) pass, and the full giskard-checks suite shows no new failures beyond pre-existing, unrelated flaky tests.

Ready for a human review whenever convenient. Thanks!

@nuthalapativarun

Copy link
Copy Markdown
Contributor Author

Just checking in on this one — it's mergeable and the only thing holding back a green check is the authorize workflow gate, which doesn't auto-run for fork PRs. Would appreciate a first look when you have a moment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

feat(core): SuiteRunTrendAnalyzer — detect pass_rate regression across sequential suite runs

1 participant