Skip to content

Add AI-generated E2E testing feature#3681

Draft
azizmejri1 wants to merge 7 commits into
dyad-sh:mainfrom
azizmejri1:feature/ai-e2e-testing
Draft

Add AI-generated E2E testing feature#3681
azizmejri1 wants to merge 7 commits into
dyad-sh:mainfrom
azizmejri1:feature/ai-e2e-testing

Conversation

@azizmejri1

@azizmejri1 azizmejri1 commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator
  • Tests panel UI with LLM-generated E2E test generation
  • Record-a-test: capture preview actions, AI writes the test
  • Isolated test DB with Neon test branches
  • Playwright bootstrap, reporting, and parallel/tree test running

Review in cubic

- Tests panel UI with LLM-generated E2E test generation
- Record-a-test: capture preview actions, AI writes the test
- Isolated test DB with Neon test branches
- Playwright bootstrap, reporting, and parallel/tree test running

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wwwillchen

Copy link
Copy Markdown
Collaborator

@BugBot run

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an experimental end-to-end testing feature, allowing users to record user flows in the preview iframe, generate Playwright tests via chat, and run them within a new 'Tests' panel. For database-connected Neon apps, test runs are isolated using ephemeral copy-on-write branches. Feedback focuses on improving the robustness of this system: propagating and checking the abort signal during database setup and server polling, only clearing the test branch ID in the database upon successful deletion to avoid orphaned branches, re-activating the test recorder upon iframe navigation/reload, and safely handling empty paths in test path normalization to prevent directory-write crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/ipc/handlers/tests_handlers.ts
Comment thread src/ipc/services/isolated_test_db.ts
Comment thread src/ipc/services/isolated_test_db.ts Outdated
Comment thread src/ipc/utils/neon_test_branch.ts
Comment thread src/ipc/utils/neon_test_branch.ts
Comment thread src/components/preview_panel/PreviewIframe.tsx Outdated
Comment thread src/ipc/utils/normalize_test_path.ts

@dyad-assistant dyad-assistant Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review: 4 inline finding(s).

Comment thread src/ipc/handlers/tests_handlers.ts Outdated
Comment thread src/components/preview_panel/TestsPanel.tsx
Comment thread src/components/chat/DyadMarkdownParser.tsx
Comment thread src/components/preview_panel/TestsPanel.tsx
@dyad-assistant

Copy link
Copy Markdown
Contributor

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge
Recommendation: ready

This is a well-structured, large feature PR (~5900 lines across 47 files) that adds AI-generated E2E testing with a Tests panel, Playwright integration, record-a-test flow, and Neon branch isolation for database-connected apps.

What's done well:

  • Security defense-in-depth: regex-validated test file paths, normalizeTestPath forcing files under tests/, realpathSync + containment checks for screenshots, shell: false on Unix in spawnStreaming, DyadError with proper error kinds for expected failures.
  • The Neon test branch lifecycle is thoughtfully designed: crash recovery via reconcileOrphanTestBranches on startup, env snapshot/restore teardown, and the "never run against real data" dead-end when isolation fails.
  • Good test coverage for new utilities: playwright report parsing, test case extraction, tag parsing, neon branch operations, and isolated test DB orchestration.
  • The infra-vs-assertion error taxonomy (amber vs red) is a smart UX choice that prevents "your app is broken" false alarms from flaky test infrastructure.
  • Accessibility: aria-label, aria-pressed, aria-live="polite" for the counter, keyboard-focusable run buttons with focus-visible:opacity-100.

Issues Summary

Severity File Issue
🟡 MEDIUM src/ipc/handlers/tests_handlers.ts:300 getTestScreenshot containment check fails when app path involves symlinks
🟡 MEDIUM src/components/preview_panel/TestsPanel.tsx:226 FailureDetails shows 'Loading screenshot...' forever when screenshot is unavailable
🟡 MEDIUM src/components/chat/DyadMarkdownParser.tsx:472 dyad-write to DyadGenerateTest redirect catches all .spec.ts files, not just tests
🟡 MEDIUM src/components/preview_panel/TestsPanel.tsx:530 IPC data fetching uses manual state management instead of TanStack Query
🟢 Low Priority Notes (6 items)
  • aggregateFileResult duplicates aggregateTestResults - The file-level aggregation function in TestsPanel.tsx is essentially the same logic as aggregateTestResults in playwright_report.ts. Consider reusing the existing function. (src/components/preview_panel/TestsPanel.tsx)

  • getDyadGenerateTestTags is a near-identical clone of getDyadWriteTags - The new tag parser duplicates the regex-based XML parsing, attribute extraction, and code fence stripping. A generic parseDyadTags(response, tagName) helper would reduce duplication. (src/ipc/utils/dyad_tag_parser.ts)

  • Experimental badge text is not i18n-ized - The "Experimental" badge in the toolbar and Tests panel header is a hardcoded English string rather than using t(...). The locale file gets a preview.tests entry but not the badge. (src/components/preview_panel/PreviewToolbar.tsx)

  • spawn_streaming.ts accumulates unbounded stdout/stderr - Both output strings grow without limit during long Playwright runs. Unlikely to be a practical issue for typical test suites, but a cap or ring buffer would be safer for edge cases. (src/ipc/utils/spawn_streaming.ts)

  • Record button relies on color alone for recording state - The button toggles between Circle (idle) and Square (recording) icons, which is good, but the red/white color is the primary differentiator for sighted users. A pulsing animation or "REC" label would improve accessibility. (src/components/preview_panel/PreviewIframe.tsx)

  • normalizeTestPath passes through input of exactly "tests" - If rawPath is "tests" (no file), it returns "tests" verbatim. This isn't a security issue (downstream validation catches it) but could be tightened. (src/ipc/utils/normalize_test_path.ts)


Generated by Dyadbot persona-based code review

- Thread cancellation signal through isolated test DB setup and waitForServerReady
- Only clear neonTestBranchId after a confirmed Neon branch deletion (avoid orphans)
- Re-send recorder state on iframe load so recording survives navigation/reload
- Fall back to a concrete filename for empty/all-dots normalized test paths
- Resolve appPath through realpath in getTestScreenshot (symlink containment)
- Show 'Screenshot unavailable' instead of perpetual loading state
- Only treat .spec.ts files under tests/ as E2E test cards

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wwwillchen

Copy link
Copy Markdown
Collaborator

@BugBot run

@azizmejri1

Copy link
Copy Markdown
Collaborator Author

🤖 Claude Code Review Summary

PR Confidence: 4/5

All review comments were addressed with code changes or a reasoned engineering decision, lint/typecheck pass, and changes are pushed — held at 4 only because the cancellation and isolation paths warrant a manual smoke-test before merge.

Unresolved Threads

Thread Rationale Link

No unresolved threads

Resolved Threads

Issue Rationale Link
Thread cancellation didn't reach the isolated-DB setup phase (signal not passed; waitForServerReady could hang ~120s after Stop) Threaded signal from the handler into prepareIsolatedTestDatabase and into waitForServerReady, which now throws on abort View · View · View
Failed Neon branch deletion still cleared neonTestBranchId, orphaning branches permanently deleteBranchBestEffort now returns a boolean; the column is cleared only on confirmed deletion, so the startup sweep can retry View · View
Recorder silently stopped capturing after in-iframe navigation/reload (active=false re-injected) Re-send recorder state on the iframe load event, with cleanup View
Empty/all-dots normalized test path collapsed to tests/, causing EISDIR on write Fall back to tests/generated.spec.ts when the sanitized path is empty View
getTestScreenshot containment check failed under ancestor symlinks (e.g. /var/private/var), perpetual "Loading…" Resolve appPath through realpathSync before path.relative View
FailureDetails showed "Loading screenshot…" forever when a screenshot was unavailable Added a loaded flag; renders "Screenshot unavailable" once the fetch settles with no image (Principle #4: Transparent Over Magical, #6: Delightful) View
dyad-write → test-card redirect caught any .spec.ts, including unit specs like src/components/Button.spec.ts Tightened condition to require a tests/ prefix View
Suggestion to migrate loadSpecs from manual state to TanStack Query Resolved with explanation: test state lives in per-app jotai atoms mirroring previewRunStateByAppIdAtom; a partial migration would split state across two systems. Engineering-consistency call (product principles cover product/UX, not renderer data-flow) View
Product Principle Suggestions
  • General: "Add a principle (or sub-note) covering internal code-architecture consistency — e.g. 'prefer the established state-management pattern of the surrounding module over introducing a parallel system for marginal gains' — so code-health/refactor suggestions can be adjudicated against an explicit rule rather than ad-hoc engineering judgment."

🤖 Generated by Claude Code

@dyad-assistant dyad-assistant Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review: 5 inline finding(s).

Comment thread src/ipc/utils/normalize_test_path.ts Outdated
Comment thread src/components/chat/DyadMarkdownParser.tsx
Comment thread src/components/preview_panel/TestsPanel.tsx
Comment thread src/ipc/utils/normalize_test_path.ts
Comment thread src/components/preview_panel/PreviewIframe.tsx
@dyad-assistant

Copy link
Copy Markdown
Contributor

🔍 Dyadbot Code Review Summary

Verdict: ⛔ NO - Do NOT merge
Recommendation: auto-fix

Large, well-structured feature PR that adds AI-generated E2E testing with a thoughtful UX (result taxonomy, database isolation, lazy Playwright bootstrap, record-a-test). The security posture for screenshot reading (symlink resolution + containment check), command injection prevention (array args, no shell on Unix), and path normalization is solid. Two mechanical bugs need fixing before merge.

Issues Summary

Severity File Issue
🔴 HIGH src/ipc/utils/normalize_test_path.ts:22 normalizeTestPath("tests") returns bare directory, causing EISDIR crash on write
🔴 HIGH src/components/chat/DyadMarkdownParser.tsx:472 Extension mismatch: only .spec.ts gets the test card, not .spec.tsx/.js/.jsx
🟡 MEDIUM src/components/preview_panel/TestsPanel.tsx:569 TestsPanel uses manual IPC + useState instead of TanStack Query
🟡 MEDIUM src/ipc/utils/normalize_test_path.ts:10 normalizeTestPath does not validate spec file extension on write path
🟡 MEDIUM src/components/preview_panel/PreviewIframe.tsx:940 Recorder action handler does not check event.origin

HIGH #1normalizeTestPath("tests") crashes. If the LLM emits <dyad-generate-test path="tests">, the function returns the bare string "tests" (via sanitized === "tests" early return). response_processor.ts then calls fs.writeFileSync on a directory, throwing an unhandled EISDIR error. Fix: fold the bare "tests" case into the empty-path fallback: if (!sanitized || sanitized === "tests") return "tests/generated.spec.ts";.

HIGH #2 — Extension mismatch loses the test card for non-.ts specs. The <dyad-write> → test card routing in DyadMarkdownParser.tsx only checks writePath.endsWith(".spec.ts"), but listAppTests globs *.spec.{ts,tsx,js,jsx} and TEST_FILE_PATTERN matches all four. Agent-mode specs with .spec.tsx/.spec.js/.spec.jsx extensions render as a plain file card with no "View in Tests" link. Fix: const isTestSpec = /\.spec\.(ts|tsx|js|jsx)$/.test(writePath) && ....

🟢 Low Priority Notes (2 items)
  • Unused get param in write atom - setTestRunStateForAppAtom declares get but never uses it; convention is _get. (src/atoms/testRuntimeAtoms.ts)
  • Password masking sentinel is ambiguous - The recorder replaces password values with "********". The prompt explains the convention, but a more explicit sentinel like "[MASKED_PASSWORD]" would be less likely to confuse the LLM. (worker/dyad-recorder-client.js)

What's done well:

  • The infra-vs-assertion error taxonomy (amber/red) is a genuinely useful UX distinction that avoids the "Dyad is broken" false alarm.
  • Screenshot path validation resolves symlinks before the containment check — correct defense against symlink escapes.
  • spawnStreaming uses array args (no shell on Unix) and treeKill for clean abort — solid process management.
  • The Neon branch isolation flow (create → swap env → restart → run → teardown → restore) with crash-recovery reconciliation at startup is thorough.
  • Comprehensive test coverage for the new utilities (parse_test_cases, playwright_report, neon_test_branch, isolated_test_db, normalize paths, tag parser).
  • The basicAgentMode / PRO_AGENT_ONLY_TOOLS removal is clean dead-code deletion (the set was empty, the filter was a no-op).

Generated by Dyadbot persona-based code review

@wwwillchen

Copy link
Copy Markdown
Collaborator

@BugBot run

@dyad-assistant dyad-assistant Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review: 5 inline finding(s).

Comment thread src/components/chat/DyadMarkdownParser.tsx
Comment thread src/ipc/utils/supabase_test_user.ts
Comment thread src/components/preview_panel/TestsPanel.tsx
Comment thread src/ipc/services/isolated_test_db.ts
Comment thread src/ipc/utils/spawn_streaming.ts
@dyad-assistant

Copy link
Copy Markdown
Contributor

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge
Recommendation: ready

This is a well-architected, security-conscious feature PR that adds AI-generated E2E testing with appropriate defense-in-depth across all layers. The core security decisions are sound:

  • Path traversal: Multi-layer protection via TEST_FILE_PATTERN regex, normalizeTestPath, safeJoin, and symlink resolution in the screenshot handler.
  • SQL injection: UUID_RE validation on userId, SAFE_IDENT_RE on identifiers, hardcoded OWNER_COLUMNS list.
  • Command injection: Arguments passed as arrays to spawn with shell: false on Unix. Windows shell: true is necessary for .cmd shims and current callers only pass validated/static args.
  • IPC boundary: All handlers use createTypedHandler with Zod-validated contracts. Screenshot reading validates containment after resolving symlinks. Test file paths are validated against a strict regex before reaching the Playwright CLI.
  • Error handling: Expected errors use DyadError with appropriate DyadErrorKind. The isolated DB service never runs tests against real data on setup failure.

The DB isolation design (Neon branches + Supabase test users with crash-safe reconciliation) is thoughtful, and the test coverage across 8 test files is solid.

Issues Summary

Severity File Issue
🟡 MEDIUM src/components/chat/DyadMarkdownParser.tsx:470 isTestSpec only matches .spec.ts, missing .tsx/.js/.jsx extensions
🟡 MEDIUM src/ipc/utils/supabase_test_user.ts:376 userId not validated as UUID before URL interpolation in deleteUserBestEffort
🟡 MEDIUM src/components/preview_panel/TestsPanel.tsx:568 TestsPanel uses direct IPC calls instead of TanStack Query for data fetching
🟡 MEDIUM src/ipc/services/isolated_test_db.ts:131 Neon teardown restarts the app even if setup failed before any env modification
🟡 MEDIUM src/ipc/utils/spawn_streaming.ts:62 Exported utility uses shell: true on Windows without documenting injection constraints
🟢 Low Priority Notes (4 items)
  • Optional chain on os.cpus() - parallelWorkerCount() in tests_handlers.ts uses os.cpus()?.length ?? 2 — the optional chain is unnecessary since os.cpus() always returns an array, though it's harmless defensive coding. (src/ipc/handlers/tests_handlers.ts)
  • postMessage origin - The recorder's postMessage(message, "*") in PreviewIframe.tsx is consistent with every other postMessage call in that file — no change needed, but worth noting the existing pattern uses "*" throughout. (src/components/preview_panel/PreviewIframe.tsx)
  • Dead code removal - Removing the unused basicAgentMode option and empty PRO_AGENT_ONLY_TOOLS set in tool_definitions.ts is good cleanup, though it's unrelated to the E2E testing feature. (src/pro/main/ipc/handlers/local_agent/tool_definitions.ts)
  • Prompt duplication in snapshots - The E2E test writing guidance block appears 8 times in the snapshot file (once per agent mode variant) — this is the expected behavior of the snapshot structure, not a bug. (src/prompts/__snapshots__/local_agent_prompt.test.ts.snap)

Generated by Dyadbot persona-based code review

azizmejri1 and others added 2 commits June 29, 2026 21:30
Defer the preview "record a test" sub-feature to a follow-up PR. Removes
the in-iframe recorder client and its proxy injection, the recording
atoms/types, the recorder UI in PreviewIframe and TestsPanel, and the
"Tests from a recorded flow" system prompt section. The AI-generated E2E
testing feature and test-runner infrastructure are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merge the neon_test_branch_id and supabase_test_user_id column
additions into a single migration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wwwillchen

Copy link
Copy Markdown
Collaborator

@BugBot run

@dyad-assistant dyad-assistant Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review: 2 inline finding(s).

Comment thread src/components/chat/DyadMarkdownParser.tsx
Comment thread src/ipc/services/isolated_test_db.ts
@dyad-assistant

Copy link
Copy Markdown
Contributor

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge
Recommendation: ready

This is a well-designed, large feature PR that adds AI-generated E2E testing with good security practices (path traversal protection, CLI flag injection prevention, symlink resolution, SQL injection guards), thorough test coverage (unit tests for all critical utilities and the isolation service), and careful UX design (infra-vs-assertion taxonomy, isolated test data, crash reconciliation). The architecture follows established Dyad patterns for IPC contracts, jotai atoms, and panel components.

Issues Summary

Severity File Issue
🟡 MEDIUM src/components/chat/DyadMarkdownParser.tsx:472 isTestSpec check only matches .spec.ts, not .spec.tsx/.js/.jsx
🟡 MEDIUM src/ipc/services/isolated_test_db.ts:87 AbortSignal not forwarded to Supabase isolation path
🟢 Low Priority Notes (6 items)
  • parallelWorkerCount floor of 2 on 1-2 core machines - The Math.max(2, ...) floor means parallel mode always spawns at least 2 workers even on 1-core machines, defeating the "leave one core free" comment. Consider Math.max(1, ...). (src/ipc/handlers/tests_handlers.ts)

  • waitForServerReady effective timeout exceeds 120s - Each loop iteration can block 3.5s (500ms sleep + 3s fetch timeout), so the Date.now() < deadline check doesn't guarantee a 120s cap. Unlikely to matter in practice since most probes fail fast. (src/ipc/services/isolated_test_db.ts)

  • EMPTY_TEST_RUN_STATE double-cast to escape Object.freeze types - The as unknown as TestRunState cast hides a genuine type mismatch from frozen arrays. A factory function returning fresh plain objects would be safer. (src/atoms/testRuntimeAtoms.ts)

  • askAiToFix sends current run output, not snapshot from the failing run - If the user triggers "Ask AI to fix" from run N's failure while run N+1 is streaming, the output context sent to chat mixes both runs. Minor incoherence since the error text itself is stable. (src/components/preview_panel/TestsPanel.tsx)

  • Headed mode on Linux without DISPLAY - CI=true is set in the env, but --headed overrides it. On headless Linux (cloud desktops, WSL), the browser launch will fail with an opaque amber error. Consider detecting and warning. (src/ipc/handlers/tests_handlers.ts)

  • basicAgentMode removal is unrelated cleanup - The removed PRO_AGENT_ONLY_TOOLS set was already empty (new Set<string>()), so the removal has zero behavioral effect. Worth noting in the PR description since it's unrelated to the testing feature. (src/pro/main/ipc/handlers/local_agent/tool_definitions.ts)


Generated by Dyadbot persona-based code review

@azizmejri1

Copy link
Copy Markdown
Collaborator Author

@wwwillchen Quick update :

  • I removed the test record feature from this PR , i'm gonna open another PR for recording tests with a more precise and reliable version.
  • for DB isolation , i went with a test branch for neon and test account for supabase since branches are not possible with the free plan .

I'll try to wrap up this PR in the next couple of days.

@wwwillchen

Copy link
Copy Markdown
Collaborator

@BugBot run

@dyad-assistant dyad-assistant Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review: 4 inline finding(s).

Comment thread src/ipc/handlers/tests_handlers.ts
Comment thread src/ipc/utils/supabase_test_user.ts
Comment thread src/components/preview_panel/TestsPanel.tsx Outdated
Comment thread src/components/chat/DyadGenerateTest.tsx
@dyad-assistant

Copy link
Copy Markdown
Contributor

🔍 Dyadbot Code Review Summary

Verdict: 🤔 NOT SURE - Potential issues
Recommendation: auto-fix

This is a substantial, well-structured feature addition (~6.5k lines across 44 files) that adds AI-generated E2E testing with Playwright, a new Tests panel, isolated test databases (Neon branches + Supabase test users), and orphan reconciliation on startup.

Security posture is strong: the main/renderer boundary is preserved throughout — service-role keys stay in the main process, shell injection is blocked at multiple layers (spawn with array args, no shell on Unix, TEST_FILE_PATTERN regex before CLI args), path traversal is defended in depth (normalizeTestPath strips .., safeJoin for disk writes, symlink-aware containment check for screenshots), and the data isolation design dead-ends safely (never falls back to running tests against real data).

The code quality is high — proper DyadError usage with appropriate DyadErrorKinds, comprehensive test coverage (7 test files), well-documented schema changes with a matching Drizzle migration, and the UI covers all required states (empty, loading, setup, running, results, dev-server-not-running, error).

One HIGH-severity race condition and several MEDIUM issues should be addressed before shipping.

Issues Summary

Severity File Issue
🔴 HIGH src/ipc/handlers/tests_handlers.ts:378 Concurrent test runs race on env-swap can cause tests to run against real data
🟡 MEDIUM src/ipc/utils/supabase_test_user.ts:310 SQL identifier interpolation from external API result guarded only by regex
🟡 MEDIUM src/components/preview_panel/TestsPanel.tsx:599 Streaming watcher effect leaks cancellation cleanup on fast app-switch
🟡 MEDIUM src/components/chat/DyadGenerateTest.tsx:86 View in Tests button shown for aborted (unwritten) test files
🟢 Low Priority Notes (5 items)
  • Unbounded output accumulation - The output string in TestRunState grows without bound as bootstrap + test runner output is appended. For verbose installs or large test suites this can accumulate megabytes in the renderer. Consider capping at ~512 KB with .slice(-512_000). (src/atoms/testRuntimeAtoms.ts)

  • Server-ready poller is aggressive - waitForServerReady polls at 500 ms with a 3-second fetch timeout, generating up to 240 HTTP requests over 2 minutes. Consider exponential backoff or a longer interval (1-2 s), since cold dev server starts rarely complete in under 5 s. (src/ipc/services/isolated_test_db.ts)

  • Multi-line test titles not parsed - parseTestCases operates line-by-line and misses test("title", split across lines. Generated tests use single-line titles, but user-edited tests may not. The file can still be run as a whole; only per-test granularity is lost. (src/ipc/utils/parse_test_cases.ts)

  • Stale testLine produces unhelpful error - A stale line number (after the user edits a test file) causes Playwright to run zero tests, surfacing as a generic "test runner didn't produce a report" infra error rather than a specific "no test found at line N" message. (src/ipc/handlers/tests_handlers.ts)

  • TanStack Query not used for IPC data fetching - TestsPanel fetches spec lists via raw ipc.tests.listAppTests() in a useCallback instead of using TanStack Query patterns present elsewhere in the repo. This is acceptable for ephemeral data, but diverges from the established pattern. (src/components/preview_panel/TestsPanel.tsx)


Generated by Dyadbot persona-based code review

- normalize_test_path: fall back to a default spec filename for bare
  `tests`, non-spec, or empty paths to avoid EISDIR / non-test writes
- DyadMarkdownParser: route all .spec.{ts,tsx,js,jsx} files to the test card
- supabase_test_user: validate userId is a UUID in deleteUserBestEffort;
  use Postgres format(%I, %L) for owner-row cleanup
- isolated_test_db: skip env restore/restart in teardown when the env was
  never swapped; honor the abort signal in the Supabase isolation path
- spawn_streaming: document the Windows shell:true injection constraint
- tests_handlers: await the prior run's full teardown before starting a new
  run to prevent an env-swap race against real data
- DyadGenerateTest: hide "View in Tests" deep-link for aborted (unwritten) tests
- TestsPanel: return cancellation cleanup from the streaming-watcher reload

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wwwillchen

Copy link
Copy Markdown
Collaborator

@BugBot run

@azizmejri1

Copy link
Copy Markdown
Collaborator Author

🤖 Claude Code Review Summary

PR Confidence: 4/5

All 16 outstanding review threads were addressed with code changes or resolved with documented reasoning; checks (fmt/lint/typecheck) pass locally, but the concurrency-race and DB-isolation fixes touch runtime paths best confirmed by a manual test run before merge.

Unresolved Threads

No unresolved threads

Resolved Threads

Issue Rationale Link
normalizeTestPath returns a bare/dir/non-spec path → EISDIR or non-test write Falls back to tests/generated.spec.ts unless the path is a real .spec.{ts,tsx,js,jsx} file View
Test-card routing only matched .spec.ts (3 duplicate threads) Broadened isTestSpec to /\.spec\.(ts|tsx|js|jsx)$/ under tests/ View
deleteUserBestEffort interpolated userId into URL without UUID check Added UUID_RE guard, matching cleanUpRowsOwnedBy View
Owner-row cleanup relied on regex as the only SQL guard Now uses Postgres format(%I, %L) so the DB escapes identifiers/value View
Neon teardown restarted the dev server even when no env swap occurred Gated env restore + restart on an envModified flag View
Abort signal not forwarded to the Supabase isolation path Threaded signal in and check aborted before user creation View
spawnStreaming shell:true on Windows undocumented for future callers Added a SECURITY JSDoc note on validating args View
Stop-then-Run race could run tests against real data New run awaits the prior run's full prepare→run→teardown via a per-app done promise View
"View in Tests" deep-link shown for aborted (unwritten) tests Hidden when aborted View
Streaming-watcher reload leaked its cancellation cleanup Effect now returns the loadSpecs cleanup View
Panel uses manual IPC + useState instead of TanStack Query (2 threads) Resolved on engineering-consistency grounds: panel state mirrors the per-app jotai previewRunStateByAppIdAtom pattern; run state is push-driven and doesn't model cleanly as a query View
Recorder action handler missing event.origin check Obsolete — the test-recording feature was removed in b176e5bd View
Product Principle Suggestions

The following suggestion could improve rules/product-principles.md:

  • New principle / appendix on internal architecture: "Add guidance on whether established internal renderer data-flow patterns (e.g. jotai atoms vs. TanStack Query for IPC-backed fetches) are in scope for product-principle review, or are an engineering-consistency concern out of scope for these principles." The current principles cover product/UX only, which left the TanStack-vs-atoms threads without explicit guidance.

🤖 Generated by Claude Code

@dyad-assistant dyad-assistant Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review: 4 inline finding(s).

const column = String(row?.column_name ?? "");
if (!SAFE_IDENT_RE.test(table) || !SAFE_IDENT_RE.test(column)) {
continue;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM

Misleading comment on SQL injection defense layers

The cleanUpRowsOwnedBy query uses DO $$ BEGIN EXECUTE format('DELETE FROM public.%I WHERE %I = %L', '${table}', '${column}', '${userId}'); END $$;. The code is safe because the JS-interpolated values land as positional string-literal arguments to PostgreSQL format(), which properly escapes them. The SAFE_IDENT_RE regex provides a second layer. However, the comment misleadingly implies format() is the primary guard over the regex, when in reality the JS template literal interpolation happens first. A future maintainer reading this could misunderstand which layer is load-bearing.

💡 Suggestion: Rewrite the comment to accurately describe the defense: the regex validates the shape, and the values then pass through format(%I/%L) as a second layer. Both are required because the JS interpolation happens before Postgres sees the query.

// Kill the whole process tree — with a shell (Windows) or fast package
// managers, the spawned child forks descendants (npx/playwright/chromium)
// that a plain child.kill() would leave orphaned.
if (child.pid) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM

Unbounded stdout/stderr accumulation in memory

Both stdout and stderr are accumulated as in-memory strings with no size cap. A test producing excessive output (e.g., a console.log loop) could grow these unboundedly. The TestsPanel also accumulates runState.output without limit. The test handler only uses the last 1500 chars for error reporting, suggesting the full buffer is not needed.

💡 Suggestion: Cap accumulated output to a reasonable size (e.g., ring-buffer the last 100-200KB) to prevent unbounded memory growth from runaway test output.

})
.catch((err) => {
if (!cancelled) showError(err);
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM

Spec list fetch does not use TanStack Query

The listAppTests IPC call is managed with manual useState/useEffect and cancellation flags instead of TanStack Query, which the repo conventionally uses for IPC-backed data fetching. This loses automatic deduplication, cache invalidation, and error retry that TanStack Query provides. The streaming run itself reasonably lives outside TanStack Query, but the spec discovery query would benefit from the established pattern.

💡 Suggestion: Refactor the listAppTests call to use useQuery (or the repo's IPC query wrapper) for consistency with other panels' data fetching patterns.

* cascade FK are NOT removed — a documented limitation of the free-tier model.
*/
async function cleanUpRowsOwnedBy({
projectId,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM

Silent data leak when test user row cleanup fails

When cleanUpRowsOwnedBy fails for some tables (e.g., FK constraints on tables not matching the OWNER_COLUMNS heuristic), test data silently remains in the user's real database. Since the test user is subsequently deleted, orphaned rows referencing the deleted user's id could cause integrity issues. The user receives no indication that cleanup was incomplete.

💡 Suggestion: Surface a warning in the isolation.reason field when cleanup errors occur, so the user knows some test data may remain in their database.

@dyad-assistant

Copy link
Copy Markdown
Contributor

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge
Recommendation: ready

This is a well-designed, security-conscious PR that adds a significant new feature. The IPC boundary is carefully maintained: test file paths are validated against a strict regex before reaching Playwright CLI, screenshot reads resolve symlinks before containment checks, Zod schemas validate all inputs, and the isolation system dead-ends safely rather than running tests against real data. The DB schema changes include the required Drizzle migration and snapshot, and crash recovery is handled via startup reconciliation sweeps. Test coverage is thorough with unit tests for all major utilities.

No HIGH severity issues were found. The MEDIUM items below are worth considering for follow-up but do not block merge.

Issues Summary

Severity File Issue
🟡 MEDIUM src/ipc/utils/supabase_test_user.ts:340 Misleading comment on SQL injection defense layers
🟡 MEDIUM src/ipc/utils/spawn_streaming.ts:83 Unbounded stdout/stderr accumulation in memory
🟡 MEDIUM src/components/preview_panel/TestsPanel.tsx:577 Spec list fetch does not use TanStack Query
🟡 MEDIUM src/ipc/utils/supabase_test_user.ts:306 Silent data leak when test user row cleanup fails
🟢 Low Priority Notes (6 items)
  • Missing newline at end of migration file - drizzle/0034_gray_cerebro.sql ends without a trailing newline. (drizzle/0034_gray_cerebro.sql)
  • parseTestCases skips * lines overzealously - A line like */ test("after comment", ... would be skipped, causing a phantom missing test in the tree UI. Best-effort is documented. (src/ipc/utils/parse_test_cases.ts)
  • ensureTestScript rewrites package.json with 2-space indentation - Projects using different formatting get a noisy git diff. (src/ipc/utils/playwright_bootstrap.ts)
  • EMPTY_TEST_RUN_STATE frozen arrays typed as mutable - The as unknown as cast hides that runningFiles is readonly string[], risking a silent runtime failure if .push() is ever called on it. (src/atoms/testRuntimeAtoms.ts)
  • Unrelated cleanup bundled in PR - Removal of basicAgentMode and PRO_AGENT_ONLY_TOOLS in tool_definitions.ts is a separate concern. (src/pro/main/ipc/handlers/local_agent/tool_definitions.ts)
  • Scaffold .gitignore gets Playwright entries for all new apps - Harmless but adds test-related patterns even when tests are never used. (scaffold/.gitignore)

Detail on MEDIUM findings

Misleading comment on SQL injection defense layers (src/ipc/utils/supabase_test_user.ts:340): The cleanUpRowsOwnedBy query uses DO $$ BEGIN EXECUTE format('DELETE FROM public.%I WHERE %I = %L', '${table}', '${column}', '${userId}'); END $$;. The code is actually safe -- the JS-interpolated values land as positional string-literal arguments to PostgreSQL's format(), which properly escapes them via %I/%L. The SAFE_IDENT_RE regex provides a second layer. However, the comment says "Let Postgres escape the identifiers/value via format(%I, %L) instead of relying on the regex check above as the only guard" -- this is misleading because the JS template literal interpolation happens before format() processes its arguments, making it look like the values bypass format() when they actually don't. A future maintainer who loosens the regex trusting format() to catch edge cases would still be safe here, but the comment should accurately describe the layered defense.

Unbounded stdout/stderr accumulation (src/ipc/utils/spawn_streaming.ts:83): Both stdout and stderr are accumulated in-memory strings with no size cap. A test that produces excessive output (e.g., a console.log loop) could grow these unboundedly. The TestsPanel also accumulates runState.output without limit. The test handler already only uses run.stderr.trim().slice(-1500) for the infra error message, suggesting the full buffer isn't needed. Consider capping to a ring-buffer (e.g., last 100-200KB).

Spec list fetch does not use TanStack Query (src/components/preview_panel/TestsPanel.tsx:577): The listAppTests IPC call is managed with manual useState/useEffect + cancellation flags instead of TanStack Query, which the repo uses elsewhere for IPC-backed data fetching. This loses automatic deduplication, cache invalidation, and error retry. The streaming run itself reasonably lives outside TanStack Query, but the discovery query would benefit from the pattern.

Silent data leak when test user row cleanup fails (src/ipc/utils/supabase_test_user.ts:306): When cleanUpRowsOwnedBy fails for some tables (e.g., FK constraints on tables not matching the OWNER_COLUMNS heuristic), test data silently remains in the user's real database. Since the test user is subsequently deleted, orphaned rows referencing the deleted user's id could cause integrity issues. The user sees no indication that cleanup was incomplete. Consider surfacing a warning in the isolation.reason field when cleanup errors occur.


Generated by Dyadbot persona-based code review

@wwwillchen

Copy link
Copy Markdown
Collaborator

@wwwillchen Quick update :

  • I removed the test record feature from this PR , i'm gonna open another PR for recording tests with a more precise and reliable version.
  • for DB isolation , i went with a test branch for neon and test account for supabase since branches are not possible with the free plan .

I'll try to wrap up this PR in the next couple of days.

got it - thanks for teh update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants