Extend deflake-e2e-recent-commits to scan PRs by wwwillchen/wwwillchen-bot#2647
Conversation
…n-bot Add functionality to gather flaky tests from open PRs authored by wwwillchen or wwwillchen-bot in addition to main branch CI runs. The command now parses Playwright Test Results comments on these PRs to extract flaky tests, providing more comprehensive coverage for deflaking efforts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary of ChangesHello @wwwillchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Greptile OverviewGreptile SummaryExtended the
The parsing pattern correctly matches the output format from Confidence Score: 5/5
Important Files Changed
|
There was a problem hiding this comment.
Code Review
This pull request extends the deflake-e2e-recent-commits command to also find flaky tests from comments on open pull requests, in addition to CI runs on the main branch. The changes are in the command's documentation file. My review focuses on ensuring the documentation is complete and accurate. I've pointed out that the documentation for parsing flakes from the main branch was unintentionally removed and should be restored. I also suggested a small improvement to a gh command to make it more robust.
| 2. **Gather flaky tests from recent PRs by wwwillchen and wwwillchen-bot:** | ||
|
|
||
| From each `results.json`, extract flaky test names. A test is flaky if: | ||
| - It has multiple results (retries occurred) | ||
| - The final result status is `"passed"` | ||
| - At least one prior result has status `"failed"`, `"timedOut"`, or `"interrupted"` | ||
| In addition to main branch CI runs, scan recent open PRs authored by `wwwillchen` or `wwwillchen-bot` for flaky tests reported in Playwright report comments. | ||
|
|
||
| The test title format is: `<spec_file.spec.ts> > <Suite Name> > <Test Name>` | ||
| a. List recent open PRs by these authors: | ||
|
|
||
| Parse each title to extract the spec file (everything before the first `>`). | ||
| ``` | ||
| gh pr list --author wwwillchen --state open --limit 10 --json number,title | ||
| gh pr list --author wwwillchen-bot --state open --limit 10 --json number,title | ||
| ``` | ||
|
|
||
| b. For each PR, find the most recent Playwright Test Results comment (posted by a bot, containing "🎭 Playwright Test Results"): | ||
|
|
||
| ``` | ||
| gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("Playwright Test Results")))] | last' | ||
| ``` | ||
|
|
||
| c. Parse the comment body to extract flaky tests. The comment format includes a "⚠️ Flaky Tests" section with test names in backticks: | ||
| - Look for lines matching the pattern: ``- `<test_title>` (passed after N retries)`` | ||
| - Extract the test title from within the backticks | ||
| - The test title format is: `<spec_file.spec.ts> > <Suite Name> > <Test Name>` | ||
|
|
||
| d. Add these flaky tests to the overall collection, noting they came from PR #N for the summary | ||
|
|
There was a problem hiding this comment.
This section introduces gathering flaky tests from PRs, but it replaces the previous documentation on how to parse flaky tests from results.json on the main branch. Since the goal is to combine flakes from both sources, the documentation should describe both gathering methods. Please consider re-introducing the removed documentation for parsing main branch results and restructuring the steps accordingly, so that both sources of flaky tests are documented.
| b. For each PR, find the most recent Playwright Test Results comment (posted by a bot, containing "🎭 Playwright Test Results"): | ||
|
|
||
| ``` | ||
| gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("Playwright Test Results")))] | last' |
There was a problem hiding this comment.
The jq filter contains("Playwright Test Results") is good, but could be more specific. The script scripts/generate-playwright-summary.js generates the comment title as ## 🎭 Playwright Test Results. To make the filter more robust and avoid potential false positives, it would be better to include the emoji in the search string.
| gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("Playwright Test Results")))] | last' | |
| gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("🎭 Playwright Test Results")))] | last' |
|
@BugBot run |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is ON, but it could not run because Privacy Mode (Legacy) is turned on. To enable Bugbot Autofix, switch your privacy mode in the Cursor dashboard.
| b. For each PR, find the most recent Playwright Test Results comment (posted by a bot, containing "🎭 Playwright Test Results"): | ||
|
|
||
| ``` | ||
| gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("Playwright Test Results")))] | last' |
There was a problem hiding this comment.
Missing pagination causes skipping recent PR comments
Medium Severity
The gh api call for fetching PR comments lacks --paginate or a per_page parameter. The GitHub Issues Comments API defaults to 30 results per page sorted ascending by creation date (oldest first). For PRs with more than 30 comments, the most recent Playwright Test Results comment won't be in the first page, causing | last to silently return either an outdated bot comment or null. The existing API calls in step 1 consistently set per_page; this new call is inconsistent and will miss flaky tests from active PRs.
|
|
||
| 4. **Skip if no flaky tests found:** | ||
|
|
||
| If no flaky tests are found, report "No flaky tests found in recent commits" and stop. |
There was a problem hiding this comment.
Step 3 omits PR-sourced flakes from frequency counting
Medium Severity
Step 2d instructs the agent to add PR-sourced flakes to "the overall collection," but step 3 still says to count flakes "across all CI runs" without mentioning PRs. An agent following these instructions literally may exclude the newly-gathered PR-sourced flakes from the deduplication and frequency ranking, silently dropping the data that this entire PR is designed to collect.
Additional Locations (1)
…n-bot (dyad-sh#2647) ## Summary - Extends the `deflake-e2e-recent-commits` command to also gather flaky tests from open PRs authored by `wwwillchen` and `wwwillchen-bot` - Parses Playwright Test Results comments on these PRs to extract flaky test names - Provides more comprehensive coverage for deflaking efforts by combining main branch CI runs with PR-reported flakes ## Test plan - Run `/dyad:deflake-e2e-recent-commits` and verify it now scans both main branch CI runs AND open PRs by the specified authors - Verify flaky tests from PR comments are correctly parsed and added to the deflake list 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2647" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Extends deflake-e2e-recent-commits to also scan open PRs by wwwillchen and wwwillchen-bot for Playwright-reported flaky tests. This broadens coverage beyond main-branch CI and improves deflaking accuracy. - **New Features** - Lists recent open PRs by wwwillchen and wwwillchen-bot. - Parses the latest “Playwright Test Results” bot comment to extract flaky test titles. - Merges PR-derived flakes with main-branch results, de-duplicates, and notes PR sources in the summary. - Updates no-results message to include PRs (“recent commits or PRs”). <sup>Written for commit 32766d6. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. --> <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Documentation-only change that broadens the data sources described for collecting flaky tests; no runtime or production code is modified. > > **Overview** > Extends the `.claude` command `deflake-e2e-recent-commits` to **collect flaky Playwright tests from two sources**: recent `main` CI `html-report` artifacts *and* the latest “Playwright Test Results” bot comment on recent open PRs authored by `wwwillchen`/`wwwillchen-bot`. > > Updates the instructions to include the PR scanning/parsing workflow, to attribute flakes by source in the final report, and to change the no-flakes message to cover “recent commits or PRs.” > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 32766d6. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>


Summary
deflake-e2e-recent-commitscommand to also gather flaky tests from open PRs authored bywwwillchenandwwwillchen-botTest plan
/dyad:deflake-e2e-recent-commitsand verify it now scans both main branch CI runs AND open PRs by the specified authors🤖 Generated with Claude Code
Summary by cubic
Extends deflake-e2e-recent-commits to also scan open PRs by wwwillchen and wwwillchen-bot for Playwright-reported flaky tests. This broadens coverage beyond main-branch CI and improves deflaking accuracy.
Written for commit 32766d6. Summary will update on new commits.
Note
Low Risk
Documentation-only change that broadens the data sources described for collecting flaky tests; no runtime or production code is modified.
Overview
Extends the
.claudecommanddeflake-e2e-recent-commitsto collect flaky Playwright tests from two sources: recentmainCIhtml-reportartifacts and the latest “Playwright Test Results” bot comment on recent open PRs authored bywwwillchen/wwwillchen-bot.Updates the instructions to include the PR scanning/parsing workflow, to attribute flakes by source in the final report, and to change the no-flakes message to cover “recent commits or PRs.”
Written by Cursor Bugbot for commit 32766d6. This will update automatically on new commits. Configure here.