Add AI-generated E2E testing feature#3681
Conversation
- Tests panel UI with LLM-generated E2E test generation - Record-a-test: capture preview actions, AI writes the test - Isolated test DB with Neon test branches - Playwright bootstrap, reporting, and parallel/tree test running Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@BugBot run |
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental end-to-end testing feature, allowing users to record user flows in the preview iframe, generate Playwright tests via chat, and run them within a new 'Tests' panel. For database-connected Neon apps, test runs are isolated using ephemeral copy-on-write branches. Feedback focuses on improving the robustness of this system: propagating and checking the abort signal during database setup and server polling, only clearing the test branch ID in the database upon successful deletion to avoid orphaned branches, re-activating the test recorder upon iframe navigation/reload, and safely handling empty paths in test path normalization to prevent directory-write crashes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
🔍 Dyadbot Code Review SummaryVerdict: ✅ YES - Ready to merge This is a well-structured, large feature PR (~5900 lines across 47 files) that adds AI-generated E2E testing with a Tests panel, Playwright integration, record-a-test flow, and Neon branch isolation for database-connected apps. What's done well:
Issues Summary
🟢 Low Priority Notes (6 items)
Generated by Dyadbot persona-based code review |
- Thread cancellation signal through isolated test DB setup and waitForServerReady - Only clear neonTestBranchId after a confirmed Neon branch deletion (avoid orphans) - Re-send recorder state on iframe load so recording survives navigation/reload - Fall back to a concrete filename for empty/all-dots normalized test paths - Resolve appPath through realpath in getTestScreenshot (symlink containment) - Show 'Screenshot unavailable' instead of perpetual loading state - Only treat .spec.ts files under tests/ as E2E test cards Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@BugBot run |
🤖 Claude Code Review SummaryPR Confidence: 4/5All review comments were addressed with code changes or a reasoned engineering decision, lint/typecheck pass, and changes are pushed — held at 4 only because the cancellation and isolation paths warrant a manual smoke-test before merge. Unresolved Threads
No unresolved threads Resolved Threads
Product Principle Suggestions
🤖 Generated by Claude Code |
🔍 Dyadbot Code Review SummaryVerdict: ⛔ NO - Do NOT merge Large, well-structured feature PR that adds AI-generated E2E testing with a thoughtful UX (result taxonomy, database isolation, lazy Playwright bootstrap, record-a-test). The security posture for screenshot reading (symlink resolution + containment check), command injection prevention (array args, no shell on Unix), and path normalization is solid. Two mechanical bugs need fixing before merge. Issues Summary
HIGH #1 — HIGH #2 — Extension mismatch loses the test card for non- 🟢 Low Priority Notes (2 items)
What's done well:
Generated by Dyadbot persona-based code review |
|
@BugBot run |
🔍 Dyadbot Code Review SummaryVerdict: ✅ YES - Ready to merge This is a well-architected, security-conscious feature PR that adds AI-generated E2E testing with appropriate defense-in-depth across all layers. The core security decisions are sound:
The DB isolation design (Neon branches + Supabase test users with crash-safe reconciliation) is thoughtful, and the test coverage across 8 test files is solid. Issues Summary
🟢 Low Priority Notes (4 items)
Generated by Dyadbot persona-based code review |
Defer the preview "record a test" sub-feature to a follow-up PR. Removes the in-iframe recorder client and its proxy injection, the recording atoms/types, the recorder UI in PreviewIframe and TestsPanel, and the "Tests from a recorded flow" system prompt section. The AI-generated E2E testing feature and test-runner infrastructure are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merge the neon_test_branch_id and supabase_test_user_id column additions into a single migration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@BugBot run |
🔍 Dyadbot Code Review SummaryVerdict: ✅ YES - Ready to merge This is a well-designed, large feature PR that adds AI-generated E2E testing with good security practices (path traversal protection, CLI flag injection prevention, symlink resolution, SQL injection guards), thorough test coverage (unit tests for all critical utilities and the isolation service), and careful UX design (infra-vs-assertion taxonomy, isolated test data, crash reconciliation). The architecture follows established Dyad patterns for IPC contracts, jotai atoms, and panel components. Issues Summary
🟢 Low Priority Notes (6 items)
Generated by Dyadbot persona-based code review |
|
@wwwillchen Quick update :
I'll try to wrap up this PR in the next couple of days. |
|
@BugBot run |
🔍 Dyadbot Code Review SummaryVerdict: 🤔 NOT SURE - Potential issues This is a substantial, well-structured feature addition (~6.5k lines across 44 files) that adds AI-generated E2E testing with Playwright, a new Tests panel, isolated test databases (Neon branches + Supabase test users), and orphan reconciliation on startup. Security posture is strong: the main/renderer boundary is preserved throughout — service-role keys stay in the main process, shell injection is blocked at multiple layers ( The code quality is high — proper One HIGH-severity race condition and several MEDIUM issues should be addressed before shipping. Issues Summary
🟢 Low Priority Notes (5 items)
Generated by Dyadbot persona-based code review |
- normalize_test_path: fall back to a default spec filename for bare
`tests`, non-spec, or empty paths to avoid EISDIR / non-test writes
- DyadMarkdownParser: route all .spec.{ts,tsx,js,jsx} files to the test card
- supabase_test_user: validate userId is a UUID in deleteUserBestEffort;
use Postgres format(%I, %L) for owner-row cleanup
- isolated_test_db: skip env restore/restart in teardown when the env was
never swapped; honor the abort signal in the Supabase isolation path
- spawn_streaming: document the Windows shell:true injection constraint
- tests_handlers: await the prior run's full teardown before starting a new
run to prevent an env-swap race against real data
- DyadGenerateTest: hide "View in Tests" deep-link for aborted (unwritten) tests
- TestsPanel: return cancellation cleanup from the streaming-watcher reload
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@BugBot run |
🤖 Claude Code Review SummaryPR Confidence: 4/5All 16 outstanding review threads were addressed with code changes or resolved with documented reasoning; checks (fmt/lint/typecheck) pass locally, but the concurrency-race and DB-isolation fixes touch runtime paths best confirmed by a manual test run before merge. Unresolved ThreadsNo unresolved threads Resolved Threads
Product Principle SuggestionsThe following suggestion could improve
🤖 Generated by Claude Code |
| const column = String(row?.column_name ?? ""); | ||
| if (!SAFE_IDENT_RE.test(table) || !SAFE_IDENT_RE.test(column)) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
🟡 MEDIUM
Misleading comment on SQL injection defense layers
The cleanUpRowsOwnedBy query uses DO $$ BEGIN EXECUTE format('DELETE FROM public.%I WHERE %I = %L', '${table}', '${column}', '${userId}'); END $$;. The code is safe because the JS-interpolated values land as positional string-literal arguments to PostgreSQL format(), which properly escapes them. The SAFE_IDENT_RE regex provides a second layer. However, the comment misleadingly implies format() is the primary guard over the regex, when in reality the JS template literal interpolation happens first. A future maintainer reading this could misunderstand which layer is load-bearing.
💡 Suggestion: Rewrite the comment to accurately describe the defense: the regex validates the shape, and the values then pass through format(%I/%L) as a second layer. Both are required because the JS interpolation happens before Postgres sees the query.
| // Kill the whole process tree — with a shell (Windows) or fast package | ||
| // managers, the spawned child forks descendants (npx/playwright/chromium) | ||
| // that a plain child.kill() would leave orphaned. | ||
| if (child.pid) { |
There was a problem hiding this comment.
🟡 MEDIUM
Unbounded stdout/stderr accumulation in memory
Both stdout and stderr are accumulated as in-memory strings with no size cap. A test producing excessive output (e.g., a console.log loop) could grow these unboundedly. The TestsPanel also accumulates runState.output without limit. The test handler only uses the last 1500 chars for error reporting, suggesting the full buffer is not needed.
💡 Suggestion: Cap accumulated output to a reasonable size (e.g., ring-buffer the last 100-200KB) to prevent unbounded memory growth from runaway test output.
| }) | ||
| .catch((err) => { | ||
| if (!cancelled) showError(err); | ||
| }) |
There was a problem hiding this comment.
🟡 MEDIUM
Spec list fetch does not use TanStack Query
The listAppTests IPC call is managed with manual useState/useEffect and cancellation flags instead of TanStack Query, which the repo conventionally uses for IPC-backed data fetching. This loses automatic deduplication, cache invalidation, and error retry that TanStack Query provides. The streaming run itself reasonably lives outside TanStack Query, but the spec discovery query would benefit from the established pattern.
💡 Suggestion: Refactor the listAppTests call to use useQuery (or the repo's IPC query wrapper) for consistency with other panels' data fetching patterns.
| * cascade FK are NOT removed — a documented limitation of the free-tier model. | ||
| */ | ||
| async function cleanUpRowsOwnedBy({ | ||
| projectId, |
There was a problem hiding this comment.
🟡 MEDIUM
Silent data leak when test user row cleanup fails
When cleanUpRowsOwnedBy fails for some tables (e.g., FK constraints on tables not matching the OWNER_COLUMNS heuristic), test data silently remains in the user's real database. Since the test user is subsequently deleted, orphaned rows referencing the deleted user's id could cause integrity issues. The user receives no indication that cleanup was incomplete.
💡 Suggestion: Surface a warning in the isolation.reason field when cleanup errors occur, so the user knows some test data may remain in their database.
🔍 Dyadbot Code Review SummaryVerdict: ✅ YES - Ready to merge This is a well-designed, security-conscious PR that adds a significant new feature. The IPC boundary is carefully maintained: test file paths are validated against a strict regex before reaching Playwright CLI, screenshot reads resolve symlinks before containment checks, Zod schemas validate all inputs, and the isolation system dead-ends safely rather than running tests against real data. The DB schema changes include the required Drizzle migration and snapshot, and crash recovery is handled via startup reconciliation sweeps. Test coverage is thorough with unit tests for all major utilities. No HIGH severity issues were found. The MEDIUM items below are worth considering for follow-up but do not block merge. Issues Summary
🟢 Low Priority Notes (6 items)
Detail on MEDIUM findingsMisleading comment on SQL injection defense layers ( Unbounded stdout/stderr accumulation ( Spec list fetch does not use TanStack Query ( Silent data leak when test user row cleanup fails ( Generated by Dyadbot persona-based code review |
got it - thanks for teh update. |
Uh oh!
There was an error while loading. Please reload this page.