Skip to content

Deflake e2e run 28392564611#3711

Merged
wwwillchen merged 2 commits into
dyad-sh:mainfrom
wwwillchen:deflake-e2e-run-28392564611
Jun 29, 2026
Merged

Deflake e2e run 28392564611#3711
wwwillchen merged 2 commits into
dyad-sh:mainfrom
wwwillchen:deflake-e2e-run-28392564611

Conversation

@wwwillchen

@wwwillchen wwwillchen commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Fixes Supabase migration E2Es by manually approving destructive SQL proposals before asserting migration files.
  • Pins the legacy local-agent code_search E2E to enableCodeExplorer=false so the snapshot does not depend on code-explorer readiness.
  • Extends CI Socket Firewall warmup retries for transient sfw binary download failures.
  • Makes the codebase extraction unit assertion order-insensitive after npm test exposed filesystem-order variance.

Failing run: https://github.com/dyad-sh/dyad/actions/runs/28392564611


Note

Low Risk
Test-only and documentation changes; no production behavior modified.

Overview
E2E reliability fixes for Supabase migrations, local-agent code_search, Socket Firewall warmup, and a flaky unit assertion.

Supabase migration specs now call po.approveProposal() after destructive SQL prompts (DROP TABLE) because auto-approve does not apply those changesβ€”migration file assertions were racing ahead of approval.

The local-agent code_search spec forces enableCodeExplorer: false and polls persisted settings before the fixture prompt, so the chat snapshot consistently shows the Code Search tool UI instead of varying with code-explorer readiness. The ARIA baseline was updated to match.

Socket Firewall warmup in package_manager.spec.ts uses 8 retries on CI (vs 5 locally) for transient sfw npx download failures.

extractCodebase test compares file paths with .sort() so filesystem iteration order does not fail the assertion.

rules/e2e-testing.md documents the code explorer and destructive SQL approval patterns.

Reviewed by Cursor Bugbot for commit aa87587. Bugbot is set up for automated code reviews on this repo. Configure here.

Review in cubic

@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_17c534bd-be72-4352-84ce-7fa4a49a73b3)

@wwwillchen

Copy link
Copy Markdown
Collaborator Author

@BugBot run

@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_d35fcdda-f937-47fc-aadc-2a46f7b663b0)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates E2E tests and documentation to handle legacy local-agent code search and destructive Supabase migrations. Specifically, it disables the code explorer in the local-agent code search spec, adds proposal approval steps in the Supabase migration tests, updates the corresponding E2E testing rules documentation, and sorts file paths in a codebase test to ensure consistency. The reviewer recommends replacing a ternary operator with an explicit if/else block in package_manager.spec.ts to improve readability when setting the maximum attempts for CI environments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


function warmSocketFirewallCache() {
const maxAttempts = 5;
const maxAttempts = process.env.CI ? 8 : 5;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the project's general rules, we should prioritize explicit if/else blocks over ternary operators for clarity when handling different execution paths (such as CI vs. local builds), even if it results in minor code duplication. Readability is favored over conciseness in these scenarios.

  let maxAttempts = 5;
  if (process.env.CI) {
    maxAttempts = 8;
  }
References
  1. Prioritize explicit if/else blocks over ternary operators for clarity when handling different execution paths (e.g., test vs. production builds), even if it results in minor code duplication. Readability is favored over conciseness in these scenarios.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses flakiness across E2E and unit tests by making assertions deterministic and ensuring tests explicitly perform required UI/actions before checking filesystem outputs.

Changes:

  • Make extractCodebase unit test order-insensitive by sorting extracted paths before asserting.
  • Stabilize E2E behavior by explicitly approving destructive Supabase SQL proposals before asserting migration file creation.
  • Reduce CI-only flakiness by increasing Socket Firewall warmup retry attempts on CI and pinning local-agent code_search E2E behavior via enableCodeExplorer: false plus persisted-settings polling; update the ARIA snapshot and document both patterns.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/tests/codebase.test.ts Sort extracted file paths before equality assertion to avoid filesystem iteration order flakes.
rules/e2e-testing.md Document patterns for stabilizing code_search E2E and handling destructive Supabase SQL proposals.
e2e-tests/supabase_migrations.spec.ts Call po.approveProposal() after destructive SQL prompts before asserting migration files.
e2e-tests/snapshots/local_agent_code_search.spec.ts_local-agent---code-search-1.aria.yml Update ARIA baseline to reflect stable code_search UI rendering when code explorer is disabled.
e2e-tests/package_manager.spec.ts Increase Socket Firewall warmup retry attempts on CI to mitigate transient download failures.
e2e-tests/local_agent_code_search.spec.ts Disable code explorer via persisted settings and poll settings before snapshot to prevent UI variance.

πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 6 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

@dyad-assistant

Copy link
Copy Markdown
Contributor

πŸ” Dyadbot Code Review Summary

Verdict: βœ… YES - Ready to merge
Recommendation: ready

This is a well-scoped test-deflaking PR that addresses four independent flakiness sources. All fixes follow established patterns in the test suite.

Change-by-change assessment:

  1. local_agent_code_search.spec.ts β€” Disables code explorer via IPC before sending the fixture prompt, preventing snapshot divergence depending on code-explorer readiness. Uses expect.poll() to wait for the setting to persist, which is the correct retry pattern for async IPC writes. The expect import from @playwright/test is required for expect.poll() and follows the convention used in other spec files.

  2. package_manager.spec.ts β€” Bumps Socket Firewall warmup retries from 5 to 8, scoped to CI only (process.env.CI). Conservative and targeted.

  3. Snapshot update β€” The updated aria snapshot correctly reflects the new rendering when code explorer is disabled: the code_search tool call now shows as a separate button between the two assistant message paragraphs, instead of concatenating them into one paragraph.

  4. supabase_migrations.spec.ts β€” Adds po.approveProposal() after waitForChatCompletion() for the tc=execute-sql-no-description fixture (which generates DROP TABLE users;). The approveProposal() method (PageObject.ts:464-471) properly waits for the approve button to be enabled and then waits for it to disappear, so there is no race condition. This follows the same send-wait-approve pattern used in 15+ other tests across the suite.

  5. rules/e2e-testing.md β€” Documents both the code-explorer readiness issue and the destructive SQL approval requirement. Good practice for preventing future regressions.

  6. codebase.test.ts β€” Adds .sort() to make the file path assertion order-insensitive after filesystem-order variance was observed. The expected values ([".gitignore", "src.ts"]) are already alphabetically sorted, so this is correct and minimal.

βœ… No issues found by persona-based review.


Generated by Dyadbot persona-based code review

@github-actions github-actions Bot added the needs-human:review-issue ai agent flagged an issue that requires human review label Jun 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🎭 Playwright Test Results

❌ Some tests failed

OS Passed Failed Flaky Skipped
🍎 macOS 345 1 0 7

Summary: 345 passed, 1 failed, 7 skipped

Failed Tests

🍎 macOS

  • local_agent_auto.spec.ts > local-agent - auto model
    • Error: expect(string).toMatchSnapshot(expected) failed

πŸ“‹ Re-run Failing Tests (macOS)

Copy and paste to re-run all failing spec files locally:

npm run e2e \
  e2e-tests/local_agent_auto.spec.ts

πŸ“Š View full report

@wwwillchen wwwillchen merged commit 8784216 into dyad-sh:main Jun 29, 2026
17 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human:review-issue ai agent flagged an issue that requires human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants