Deflake e2e run 28392564611 by wwwillchen · Pull Request #3711 · dyad-sh/dyad

wwwillchen · 2026-06-29T22:48:22Z

Summary

Fixes Supabase migration E2Es by manually approving destructive SQL proposals before asserting migration files.
Pins the legacy local-agent code_search E2E to enableCodeExplorer=false so the snapshot does not depend on code-explorer readiness.
Extends CI Socket Firewall warmup retries for transient sfw binary download failures.
Makes the codebase extraction unit assertion order-insensitive after npm test exposed filesystem-order variance.

Failing run: https://github.com/dyad-sh/dyad/actions/runs/28392564611

Note

Low Risk
Test-only and documentation changes; no production behavior modified.

Overview
E2E reliability fixes for Supabase migrations, local-agent code_search, Socket Firewall warmup, and a flaky unit assertion.

Supabase migration specs now call po.approveProposal() after destructive SQL prompts (DROP TABLE) because auto-approve does not apply those changes—migration file assertions were racing ahead of approval.

The local-agent code_search spec forces enableCodeExplorer: false and polls persisted settings before the fixture prompt, so the chat snapshot consistently shows the Code Search tool UI instead of varying with code-explorer readiness. The ARIA baseline was updated to match.

Socket Firewall warmup in package_manager.spec.ts uses 8 retries on CI (vs 5 locally) for transient sfw npx download failures.

extractCodebase test compares file paths with .sort() so filesystem iteration order does not fail the assertion.

rules/e2e-testing.md documents the code explorer and destructive SQL approval patterns.

^{Reviewed by Cursor Bugbot for commit aa87587. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor · 2026-06-29T22:48:30Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_17c534bd-be72-4352-84ce-7fa4a49a73b3)

wwwillchen · 2026-06-29T22:48:30Z

@BugBot run

cursor · 2026-06-29T22:48:36Z

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_d35fcdda-f937-47fc-aadc-2a46f7b663b0)

gemini-code-assist

Code Review

This pull request updates E2E tests and documentation to handle legacy local-agent code search and destructive Supabase migrations. Specifically, it disables the code explorer in the local-agent code search spec, adds proposal approval steps in the Supabase migration tests, updates the corresponding E2E testing rules documentation, and sorts file paths in a codebase test to ensure consistency. The reviewer recommends replacing a ternary operator with an explicit if/else block in package_manager.spec.ts to improve readability when setting the maximum attempts for CI environments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-29T22:49:29Z


 function warmSocketFirewallCache() {
-  const maxAttempts = 5;
+  const maxAttempts = process.env.CI ? 8 : 5;


According to the project's general rules, we should prioritize explicit if/else blocks over ternary operators for clarity when handling different execution paths (such as CI vs. local builds), even if it results in minor code duplication. Readability is favored over conciseness in these scenarios.

let maxAttempts = 5; if (process.env.CI) { maxAttempts = 8; }

References

Prioritize explicit if/else blocks over ternary operators for clarity when handling different execution paths (e.g., test vs. production builds), even if it results in minor code duplication. Readability is favored over conciseness in these scenarios.

Copilot

Pull request overview

This PR addresses flakiness across E2E and unit tests by making assertions deterministic and ensuring tests explicitly perform required UI/actions before checking filesystem outputs.

Changes:

Make extractCodebase unit test order-insensitive by sorting extracted paths before asserting.
Stabilize E2E behavior by explicitly approving destructive Supabase SQL proposals before asserting migration file creation.
Reduce CI-only flakiness by increasing Socket Firewall warmup retry attempts on CI and pinning local-agent code_search E2E behavior via enableCodeExplorer: false plus persisted-settings polling; update the ARIA snapshot and document both patterns.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/tests/codebase.test.ts	Sort extracted file paths before equality assertion to avoid filesystem iteration order flakes.
rules/e2e-testing.md	Document patterns for stabilizing `code_search` E2E and handling destructive Supabase SQL proposals.
e2e-tests/supabase_migrations.spec.ts	Call `po.approveProposal()` after destructive SQL prompts before asserting migration files.
e2e-tests/snapshots/local_agent_code_search.spec.ts_local-agent---code-search-1.aria.yml	Update ARIA baseline to reflect stable `code_search` UI rendering when code explorer is disabled.
e2e-tests/package_manager.spec.ts	Increase Socket Firewall warmup retry attempts on CI to mitigate transient download failures.
e2e-tests/local_agent_code_search.spec.ts	Disable code explorer via persisted settings and poll settings before snapshot to prevent UI variance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cubic-dev-ai

No issues found across 6 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Re-trigger cubic}

dyad-assistant · 2026-06-29T22:54:02Z

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge
Recommendation: ready

This is a well-scoped test-deflaking PR that addresses four independent flakiness sources. All fixes follow established patterns in the test suite.

Change-by-change assessment:

local_agent_code_search.spec.ts — Disables code explorer via IPC before sending the fixture prompt, preventing snapshot divergence depending on code-explorer readiness. Uses expect.poll() to wait for the setting to persist, which is the correct retry pattern for async IPC writes. The expect import from @playwright/test is required for expect.poll() and follows the convention used in other spec files.
package_manager.spec.ts — Bumps Socket Firewall warmup retries from 5 to 8, scoped to CI only (process.env.CI). Conservative and targeted.
Snapshot update — The updated aria snapshot correctly reflects the new rendering when code explorer is disabled: the code_search tool call now shows as a separate button between the two assistant message paragraphs, instead of concatenating them into one paragraph.
supabase_migrations.spec.ts — Adds po.approveProposal() after waitForChatCompletion() for the tc=execute-sql-no-description fixture (which generates DROP TABLE users;). The approveProposal() method (PageObject.ts:464-471) properly waits for the approve button to be enabled and then waits for it to disappear, so there is no race condition. This follows the same send-wait-approve pattern used in 15+ other tests across the suite.
rules/e2e-testing.md — Documents both the code-explorer readiness issue and the destructive SQL approval requirement. Good practice for preventing future regressions.
codebase.test.ts — Adds .sort() to make the file path assertion order-insensitive after filesystem-order variance was observed. The expected values ([".gitignore", "src.ts"]) are already alphabetically sorted, so this is correct and minimal.

✅ No issues found by persona-based review.

Generated by Dyadbot persona-based code review

github-actions · 2026-06-29T23:18:01Z

🎭 Playwright Test Results

❌ Some tests failed

OS	Passed	Failed	Flaky	Skipped
🍎 macOS	345	1	0	7

Summary: 345 passed, 1 failed, 7 skipped

Failed Tests

🍎 macOS

local_agent_auto.spec.ts > local-agent - auto model
- Error: expect(string).toMatchSnapshot(expected) failed

📋 Re-run Failing Tests (macOS)

Copy and paste to re-run all failing spec files locally:

npm run e2e \
  e2e-tests/local_agent_auto.spec.ts

📊 View full report

wwwillchen added 2 commits June 29, 2026 14:33

test: deflake failing e2e run

7383d95

docs: record e2e deflake learnings

aa87587

wwwillchen requested review from a team and Copilot June 29, 2026 22:48

wwwillchen temporarily deployed to ai-bots June 29, 2026 22:48 — with GitHub Actions Inactive

wwwillchen had a problem deploying to ai-bots June 29, 2026 22:48 — with GitHub Actions Failure

wwwillchen temporarily deployed to ai-bots June 29, 2026 22:48 — with GitHub Actions Inactive

Copilot started reviewing on behalf of wwwillchen June 29, 2026 22:48 View session

gemini-code-assist Bot reviewed Jun 29, 2026

View reviewed changes

Copilot AI reviewed Jun 29, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jun 29, 2026

View reviewed changes

wwwillchen temporarily deployed to ai-bots June 29, 2026 22:53 — with GitHub Actions Inactive

github-actions Bot added the needs-human:review-issue ai agent flagged an issue that requires human review label Jun 29, 2026

wwwillchen merged commit 8784216 into dyad-sh:main Jun 29, 2026
17 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deflake e2e run 28392564611#3711

Deflake e2e run 28392564611#3711
wwwillchen merged 2 commits into
dyad-sh:mainfrom
wwwillchen:deflake-e2e-run-28392564611

wwwillchen commented Jun 29, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot commented Jun 29, 2026

Uh oh!

wwwillchen commented Jun 29, 2026

Uh oh!

cursor Bot commented Jun 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

dyad-assistant Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

wwwillchen commented Jun 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

cursor Bot commented Jun 29, 2026

Bugbot couldn't run - usage limit reached

Uh oh!

wwwillchen commented Jun 29, 2026

Uh oh!

cursor Bot commented Jun 29, 2026

Bugbot couldn't run - usage limit reached

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

dyad-assistant Bot commented Jun 29, 2026

🔍 Dyadbot Code Review Summary

Uh oh!

github-actions Bot commented Jun 29, 2026

🎭 Playwright Test Results

❌ Some tests failed

Failed Tests

🍎 macOS

📋 Re-run Failing Tests (macOS)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wwwillchen commented Jun 29, 2026 •

edited by cursor Bot

Loading