Skip to content

Extend deflake-e2e-recent-commits to scan PRs by wwwillchen/wwwillchen-bot#2647

Merged
wwwillchen merged 1 commit into
mainfrom
agent--1770853598274-1770853819
Feb 12, 2026
Merged

Extend deflake-e2e-recent-commits to scan PRs by wwwillchen/wwwillchen-bot#2647
wwwillchen merged 1 commit into
mainfrom
agent--1770853598274-1770853819

Conversation

@wwwillchen

@wwwillchen wwwillchen commented Feb 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Extends the deflake-e2e-recent-commits command to also gather flaky tests from open PRs authored by wwwillchen and wwwillchen-bot
  • Parses Playwright Test Results comments on these PRs to extract flaky test names
  • Provides more comprehensive coverage for deflaking efforts by combining main branch CI runs with PR-reported flakes

Test plan

  • Run /dyad:deflake-e2e-recent-commits and verify it now scans both main branch CI runs AND open PRs by the specified authors
  • Verify flaky tests from PR comments are correctly parsed and added to the deflake list

🤖 Generated with Claude Code


Open with Devin

Summary by cubic

Extends deflake-e2e-recent-commits to also scan open PRs by wwwillchen and wwwillchen-bot for Playwright-reported flaky tests. This broadens coverage beyond main-branch CI and improves deflaking accuracy.

  • New Features
    • Lists recent open PRs by wwwillchen and wwwillchen-bot.
    • Parses the latest “Playwright Test Results” bot comment to extract flaky test titles.
    • Merges PR-derived flakes with main-branch results, de-duplicates, and notes PR sources in the summary.
    • Updates no-results message to include PRs (“recent commits or PRs”).

Written for commit 32766d6. Summary will update on new commits.


Note

Low Risk
Documentation-only change that broadens the data sources described for collecting flaky tests; no runtime or production code is modified.

Overview
Extends the .claude command deflake-e2e-recent-commits to collect flaky Playwright tests from two sources: recent main CI html-report artifacts and the latest “Playwright Test Results” bot comment on recent open PRs authored by wwwillchen/wwwillchen-bot.

Updates the instructions to include the PR scanning/parsing workflow, to attribute flakes by source in the final report, and to change the no-flakes message to cover “recent commits or PRs.”

Written by Cursor Bugbot for commit 32766d6. This will update automatically on new commits. Configure here.

…n-bot

Add functionality to gather flaky tests from open PRs authored by
wwwillchen or wwwillchen-bot in addition to main branch CI runs.
The command now parses Playwright Test Results comments on these PRs
to extract flaky tests, providing more comprehensive coverage for
deflaking efforts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @wwwillchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the deflake-e2e-recent-commits command by broadening its scope beyond just main branch CI runs. It now actively identifies flaky end-to-end tests reported within Playwright Test Results comments on open pull requests from specific authors, thereby improving the overall effectiveness and coverage of the automated deflaking process.

Highlights

  • Expanded Flaky Test Detection Scope: The deflake-e2e-recent-commits command now includes scanning open pull requests authored by wwwillchen and wwwillchen-bot in addition to main branch CI runs.
  • PR Comment Parsing: Implemented logic to parse Playwright Test Results comments on identified PRs to extract reported flaky test names.
  • Enhanced Deflaking Coverage: The combined approach of analyzing main branch CI and PR comments provides a more comprehensive list of flaky tests for deflaking efforts.
Changelog
  • .claude/commands/dyad/deflake-e2e-recent-commits.md
    • Updated the command's introductory description to reflect the inclusion of PR scanning.
    • Added a new detailed step outlining the process for gathering flaky tests from recent PRs, including gh commands for listing PRs and fetching comments, and parsing logic.
    • Modified the message displayed when no flaky tests are found to include PRs as a potential source.
    • Adjusted the final summary report to explicitly mention flaky tests found across both main branch commits and PRs, and to report the sources.
Activity
  • No review activity or comments have been recorded yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@greptile-apps

greptile-apps Bot commented Feb 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Overview

Greptile Summary

Extended the deflake-e2e-recent-commits command to gather flaky tests from both main branch CI runs and recent open PRs authored by wwwillchen and wwwillchen-bot. The new step 2 adds logic to:

  • List open PRs by these authors (limit 10 each)
  • Extract flaky test names from Playwright Test Results bot comments containing "⚠️ Flaky Tests" sections
  • Parse test titles from backtick-wrapped lines matching the pattern - \<test_title>` (passed after N retries)`
  • Track which tests came from which PRs for the summary report

The parsing pattern correctly matches the output format from scripts/generate-playwright-summary.js (lines 317, 391), ensuring accurate extraction. This provides more comprehensive coverage for deflaking by combining artifact-based detection with PR comment parsing.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes only modify a command documentation file, adding clear instructions for parsing PR comments. The parsing pattern matches the actual output format from generate-playwright-summary.js, and the logic is well-documented with proper error handling notes (e.g., skipping PRs without comments).
  • No files require special attention

Important Files Changed

Filename Overview
.claude/commands/dyad/deflake-e2e-recent-commits.md Extended deflake command to also scan open PRs by wwwillchen/wwwillchen-bot for flaky tests from Playwright comment reports

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends the deflake-e2e-recent-commits command to also find flaky tests from comments on open pull requests, in addition to CI runs on the main branch. The changes are in the command's documentation file. My review focuses on ensuring the documentation is complete and accurate. I've pointed out that the documentation for parsing flakes from the main branch was unintentionally removed and should be restored. I also suggested a small improvement to a gh command to make it more robust.

Comment on lines +47 to 70
2. **Gather flaky tests from recent PRs by wwwillchen and wwwillchen-bot:**

From each `results.json`, extract flaky test names. A test is flaky if:
- It has multiple results (retries occurred)
- The final result status is `"passed"`
- At least one prior result has status `"failed"`, `"timedOut"`, or `"interrupted"`
In addition to main branch CI runs, scan recent open PRs authored by `wwwillchen` or `wwwillchen-bot` for flaky tests reported in Playwright report comments.

The test title format is: `<spec_file.spec.ts> > <Suite Name> > <Test Name>`
a. List recent open PRs by these authors:

Parse each title to extract the spec file (everything before the first `>`).
```
gh pr list --author wwwillchen --state open --limit 10 --json number,title
gh pr list --author wwwillchen-bot --state open --limit 10 --json number,title
```

b. For each PR, find the most recent Playwright Test Results comment (posted by a bot, containing "🎭 Playwright Test Results"):

```
gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("Playwright Test Results")))] | last'
```

c. Parse the comment body to extract flaky tests. The comment format includes a "⚠️ Flaky Tests" section with test names in backticks:
- Look for lines matching the pattern: ``- `<test_title>` (passed after N retries)``
- Extract the test title from within the backticks
- The test title format is: `<spec_file.spec.ts> > <Suite Name> > <Test Name>`

d. Add these flaky tests to the overall collection, noting they came from PR #N for the summary

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This section introduces gathering flaky tests from PRs, but it replaces the previous documentation on how to parse flaky tests from results.json on the main branch. Since the goal is to combine flakes from both sources, the documentation should describe both gathering methods. Please consider re-introducing the removed documentation for parsing main branch results and restructuring the steps accordingly, so that both sources of flaky tests are documented.

b. For each PR, find the most recent Playwright Test Results comment (posted by a bot, containing "🎭 Playwright Test Results"):

```
gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("Playwright Test Results")))] | last'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The jq filter contains("Playwright Test Results") is good, but could be more specific. The script scripts/generate-playwright-summary.js generates the comment title as ## 🎭 Playwright Test Results. To make the filter more robust and avoid potential false positives, it would be better to include the emoji in the search string.

Suggested change
gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("Playwright Test Results")))] | last'
gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("🎭 Playwright Test Results")))] | last'

@wwwillchen

Copy link
Copy Markdown
Collaborator Author

@BugBot run

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is ON, but it could not run because Privacy Mode (Legacy) is turned on. To enable Bugbot Autofix, switch your privacy mode in the Cursor dashboard.

b. For each PR, find the most recent Playwright Test Results comment (posted by a bot, containing "🎭 Playwright Test Results"):

```
gh api "repos/{owner}/{repo}/issues/<pr_number>/comments" --jq '[.[] | select(.user.type == "Bot" and (.body | contains("Playwright Test Results")))] | last'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing pagination causes skipping recent PR comments

Medium Severity

The gh api call for fetching PR comments lacks --paginate or a per_page parameter. The GitHub Issues Comments API defaults to 30 results per page sorted ascending by creation date (oldest first). For PRs with more than 30 comments, the most recent Playwright Test Results comment won't be in the first page, causing | last to silently return either an outdated bot comment or null. The existing API calls in step 1 consistently set per_page; this new call is inconsistent and will miss flaky tests from active PRs.

Fix in Cursor Fix in Web


4. **Skip if no flaky tests found:**

If no flaky tests are found, report "No flaky tests found in recent commits" and stop.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 3 omits PR-sourced flakes from frequency counting

Medium Severity

Step 2d instructs the agent to add PR-sourced flakes to "the overall collection," but step 3 still says to count flakes "across all CI runs" without mentioning PRs. An agent following these instructions literally may exclude the newly-gathered PR-sourced flakes from the deduplication and frequency ranking, silently dropping the data that this entire PR is designed to collect.

Additional Locations (1)

Fix in Cursor Fix in Web

@wwwillchen wwwillchen merged commit c3c6d3e into main Feb 12, 2026
13 of 14 checks passed
azizmejri1 pushed a commit to azizmejri1/dyad that referenced this pull request Feb 12, 2026
…n-bot (dyad-sh#2647)

## Summary
- Extends the `deflake-e2e-recent-commits` command to also gather flaky
tests from open PRs authored by `wwwillchen` and `wwwillchen-bot`
- Parses Playwright Test Results comments on these PRs to extract flaky
test names
- Provides more comprehensive coverage for deflaking efforts by
combining main branch CI runs with PR-reported flakes

## Test plan
- Run `/dyad:deflake-e2e-recent-commits` and verify it now scans both
main branch CI runs AND open PRs by the specified authors
- Verify flaky tests from PR comments are correctly parsed and added to
the deflake list

🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2647"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Extends deflake-e2e-recent-commits to also scan open PRs by wwwillchen
and wwwillchen-bot for Playwright-reported flaky tests. This broadens
coverage beyond main-branch CI and improves deflaking accuracy.

- **New Features**
  - Lists recent open PRs by wwwillchen and wwwillchen-bot.
- Parses the latest “Playwright Test Results” bot comment to extract
flaky test titles.
- Merges PR-derived flakes with main-branch results, de-duplicates, and
notes PR sources in the summary.
  - Updates no-results message to include PRs (“recent commits or PRs”).

<sup>Written for commit 32766d6.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Documentation-only change that broadens the data sources described for
collecting flaky tests; no runtime or production code is modified.
> 
> **Overview**
> Extends the `.claude` command `deflake-e2e-recent-commits` to
**collect flaky Playwright tests from two sources**: recent `main` CI
`html-report` artifacts *and* the latest “Playwright Test Results” bot
comment on recent open PRs authored by `wwwillchen`/`wwwillchen-bot`.
> 
> Updates the instructions to include the PR scanning/parsing workflow,
to attribute flakes by source in the final report, and to change the
no-flakes message to cover “recent commits or PRs.”
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
32766d6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant