feat(giskard-checks): add SuiteRunTrendAnalyzer for pass_rate regression detection 🤖🤖🤖🤖#2518
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a suite run trend analysis feature to detect pass rate regressions across sequential runs using ordinary least-squares (OLS) linear regression. It adds the "SuiteRunTrendAnalyzer" class and associated data classes ("SuiteRunPoint", "SuiteTrend", "SuiteRunTrendReport"), along with comprehensive unit tests. Feedback on the changes highlights a Python 3.10 compatibility issue due to importing "UTC" directly from "datetime" (which was introduced in Python 3.11), and suggests validating that the "regression_threshold" is strictly less than the "improvement_threshold" during initialization.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| import statistics | ||
| from dataclasses import dataclass, field | ||
| from datetime import UTC, datetime |
There was a problem hiding this comment.
The import from datetime import UTC was introduced in Python 3.11. Since this module is intended to support Python >= 3.10 (as stated in the docstring and PR description), importing UTC directly from datetime will raise an ImportError on Python 3.10.
Please import timezone from datetime instead and use timezone.utc to maintain compatibility with Python 3.10.
| from datetime import UTC, datetime | |
| from datetime import datetime, timezone |
| Optional wall-clock time for this run. When omitted, | ||
| ``datetime.now(UTC)`` is used. | ||
| """ | ||
| self._runs.append((result, timestamp or datetime.now(UTC))) |
| if window < 2: | ||
| raise ValueError("window must be >= 2 (OLS requires at least two points)") | ||
| self.window = window | ||
| self.regression_threshold = regression_threshold | ||
| self.improvement_threshold = improvement_threshold |
There was a problem hiding this comment.
It is a good practice to validate that regression_threshold is strictly less than improvement_threshold. If regression_threshold >= improvement_threshold, it could lead to logical contradictions or unexpected behavior during trend classification in analyze().
if window < 2:
raise ValueError("window must be >= 2 (OLS requires at least two points)")
if regression_threshold >= improvement_threshold:
raise ValueError(
f"regression_threshold ({regression_threshold}) must be less than "
f"improvement_threshold ({improvement_threshold})"
)
self.window = window
self.regression_threshold = regression_threshold
self.improvement_threshold = improvement_threshold|
Addressed both issues raised in the bot review:
|
179bb16 to
41c7bb0
Compare
…ion detection Implements SuiteRunTrendAnalyzer, SuiteRunTrendReport, SuiteTrend, and SuiteRunPoint in giskard/checks/core/trend.py. Uses stdlib statistics.linear_regression (OLS) over a rolling window of SuiteResult runs to classify pass_rate slope as improving, stable, or degrading. Zero new dependencies. Closes Giskard-AI#2355
…tion Replace datetime.UTC (Python 3.11+) with datetime.timezone.utc for Python 3.10 compatibility. Add validation that regression_threshold is strictly less than improvement_threshold in SuiteRunTrendAnalyzer.__init__. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
41c7bb0 to
3381a61
Compare
|
This branch has been rebased onto the latest All previously addressed bot feedback remains in place. Tests for the trend analyzer ( Ready for a human review whenever convenient. Thanks! |
|
Just checking in on this one — it's mergeable and the only thing holding back a green check is the |
Description
Adds
SuiteRunTrendAnalyzerto detect pass_rate regressions across sequential suite runs.Implements the four new types proposed in the issue:
SuiteRunPoint— immutable snapshot of one run's pass_rate and countsSuiteTrend— OLS slope + direction (improving/stable/degrading) +is_regressionflagSuiteRunTrendReport— aggregate report over the analysis windowSuiteRunTrendAnalyzer— stateful recorder with configurablewindow,regression_threshold, andimprovement_thresholdUses
statistics.linear_regression(stdlib, Python ≥ 3.10) — zero new dependencies.All classes are exported from
giskard.checksandgiskard.checks.core.Related Issue
Closes #2355
Type of Change
Checklist