Skip to content

Stabilize local agent auto model dump E2E#3712

Merged
wwwillchen merged 2 commits into
dyad-sh:mainfrom
wwwillchen:deflake-e2e-run-28407790114
Jun 29, 2026
Merged

Stabilize local agent auto model dump E2E#3712
wwwillchen merged 2 commits into
dyad-sh:mainfrom
wwwillchen:deflake-e2e-run-28407790114

Conversation

@wwwillchen

@wwwillchen wwwillchen commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Pins code explorer off in the local-agent auto-model request dump test before sending the prompt.
  • Prevents the serialized prompt/tool snapshot from flipping between code_search and explore_code based on code explorer indexing readiness.
  • Records the E2E gotcha for future local-agent request-dump specs.

Verification

  • npm run build
  • npm run fmt
  • npm run lint
  • npm run lint:fix
  • npm run ts
  • PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/local_agent_auto.spec.ts
  • PLAYWRIGHT_RETRIES=0 PLAYWRIGHT_HTML_OPEN=never npm run e2e -- e2e-tests/local_agent_auto.spec.ts --repeat-each=5
  • npm test

Failing run investigated: https://github.com/dyad-sh/dyad/actions/runs/28407790114

#skip-bugbot


Note

Low Risk
Test-only and documentation changes with no production code paths affected.

Overview
Stabilizes the local-agent auto model request-dump E2E by turning enableCodeExplorer off via set-user-settings and polling persisted settings before the [dump] prompt, so the serialized request snapshot does not race between code_search and explore_code when code-explorer indexing finishes at different times on CI vs locally.

Documents the same pattern in rules/e2e-testing.md for other local-agent request-dump specs that are not explicitly testing explore_code.

Reviewed by Cursor Bugbot for commit 600a84f. Bugbot is set up for automated code reviews on this repo. Configure here.

Review in cubic

@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

Bugbot couldn't run - usage limit reached

Bugbot is counted against Cursor usage for this user or team, and this run hit a usage or spend limit.

A user or team admin can review and increase usage limits in the Cursor dashboard.

(requestId: serverGenReqId_4ed329e5-799a-48a8-91c8-338cf5d375a5)

@wwwillchen wwwillchen merged commit ef0cafb into dyad-sh:main Jun 29, 2026
10 of 12 checks passed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR deflakes the β€œlocal-agent - auto model” E2E by removing a race where request-dump snapshots could alternate between code_search and explore_code depending on code-explorer indexing readiness.

Changes:

  • Pins enableCodeExplorer: false via set-user-settings in the auto-model dump E2E and waits for the persisted setting before sending [dump].
  • Documents the same stabilization pattern in rules/e2e-testing.md for other request-dump specs not explicitly testing explore_code.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
rules/e2e-testing.md Adds a new E2E gotcha entry describing how to keep request-dump snapshots stable by pinning enableCodeExplorer off and polling persisted settings.
e2e-tests/local_agent_auto.spec.ts Disables code explorer and polls po.settings.recordSettings().enableCodeExplorer before issuing the dump prompt to prevent snapshot/tool-list flakiness.

πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dyad-assistant

Copy link
Copy Markdown
Contributor

πŸ” Dyadbot Code Review Summary

Verdict: βœ… YES - Ready to merge
Recommendation: ready

Small, well-scoped test stabilization PR. Two files changed (10 additions, 0 deletions) β€” all test/docs only, zero production code paths affected.

What it does: Pins enableCodeExplorer: false via IPC in the local-agent auto-model request-dump E2E test and polls until the setting persists before sending the [dump] prompt. This prevents the serialized tool snapshot from flipping between code_search and explore_code depending on whether code-explorer indexing finishes before or after the prompt is sent. Documents the pattern in rules/e2e-testing.md.

Correctness: The IPC call uses the validated set-user-settings channel (enforces UserSettingsSchema.partial()), and the expect.poll() pattern correctly waits for persistence before proceeding. The setting is pinned before po.importApp(), which is the right sequencing. The pattern matches existing specs (local_agent_code_search.spec.ts, local_agent_explore_code.spec.ts) exactly.

Security: No new IPC channels or renderer-side capabilities introduced. The test calls an existing, validated endpoint with a well-typed payload.

Code Health: Follows the established convention for similar local-agent E2E specs. The documentation bullet is clear, actionable, and placed adjacent to the related code_search coverage bullet.

UX: No user-facing changes.

βœ… No issues found by persona-based review.


Generated by Dyadbot persona-based code review

@github-actions

Copy link
Copy Markdown
Contributor

🎭 Playwright Test Results

βœ… All tests passed!

OS Passed Flaky Skipped
🍎 macOS 535 3 170
πŸͺŸ Windows 535 3 170

Total: 1070 tests passed (6 flaky) (340 skipped)

⚠️ Flaky Tests

🍎 macOS

  • chat_image_generation.spec.ts > generate image from chat - full flow (passed after 1 retry)
  • context_limit_banner.spec.ts > context limit banner shows 'running out' when near context limit (passed after 2 retries)
  • setup_flow.spec.ts > Setup Flow > node.js install flow (passed after 1 retry)

πŸͺŸ Windows

  • app_screenshot.spec.ts > captures an app screenshot after the first generated commit (passed after 1 retry)
  • chat_input.spec.ts > send button disabled during pending proposal - reject (passed after 1 retry)
  • concurrent_chat.spec.ts > concurrent chat (passed after 1 retry)

πŸ“Š View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants