Add AI-generated E2E testing feature by azizmejri1 · Pull Request #3681 · dyad-sh/dyad

azizmejri1 · 2026-06-25T20:33:57Z

Tests panel UI with LLM-generated E2E test generation
Record-a-test: capture preview actions, AI writes the test
Isolated test DB with Neon test branches
Playwright bootstrap, reporting, and parallel/tree test running

- Tests panel UI with LLM-generated E2E test generation - Record-a-test: capture preview actions, AI writes the test - Isolated test DB with Neon test branches - Playwright bootstrap, reporting, and parallel/tree test running Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

wwwillchen · 2026-06-25T20:34:05Z

@BugBot run

gemini-code-assist

Code Review

This pull request introduces an experimental end-to-end testing feature, allowing users to record user flows in the preview iframe, generate Playwright tests via chat, and run them within a new 'Tests' panel. For database-connected Neon apps, test runs are isolated using ephemeral copy-on-write branches. Feedback focuses on improving the robustness of this system: propagating and checking the abort signal during database setup and server polling, only clearing the test branch ID in the database upon successful deletion to avoid orphaned branches, re-activating the test recorder upon iframe navigation/reload, and safely handling empty paths in test path normalization to prevent directory-write crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

dyad-assistant

Claude review: 4 inline finding(s).

dyad-assistant · 2026-06-25T20:43:15Z

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge
Recommendation: ready

This is a well-structured, large feature PR (~5900 lines across 47 files) that adds AI-generated E2E testing with a Tests panel, Playwright integration, record-a-test flow, and Neon branch isolation for database-connected apps.

What's done well:

Security defense-in-depth: regex-validated test file paths, normalizeTestPath forcing files under tests/, realpathSync + containment checks for screenshots, shell: false on Unix in spawnStreaming, DyadError with proper error kinds for expected failures.
The Neon test branch lifecycle is thoughtfully designed: crash recovery via reconcileOrphanTestBranches on startup, env snapshot/restore teardown, and the "never run against real data" dead-end when isolation fails.
Good test coverage for new utilities: playwright report parsing, test case extraction, tag parsing, neon branch operations, and isolated test DB orchestration.
The infra-vs-assertion error taxonomy (amber vs red) is a smart UX choice that prevents "your app is broken" false alarms from flaky test infrastructure.
Accessibility: aria-label, aria-pressed, aria-live="polite" for the counter, keyboard-focusable run buttons with focus-visible:opacity-100.

Issues Summary

Severity	File	Issue
🟡 MEDIUM	`src/ipc/handlers/tests_handlers.ts:300`	getTestScreenshot containment check fails when app path involves symlinks
🟡 MEDIUM	`src/components/preview_panel/TestsPanel.tsx:226`	FailureDetails shows 'Loading screenshot...' forever when screenshot is unavailable
🟡 MEDIUM	`src/components/chat/DyadMarkdownParser.tsx:472`	dyad-write to DyadGenerateTest redirect catches all .spec.ts files, not just tests
🟡 MEDIUM	`src/components/preview_panel/TestsPanel.tsx:530`	IPC data fetching uses manual state management instead of TanStack Query

🟢 Low Priority Notes (6 items)

aggregateFileResult duplicates aggregateTestResults - The file-level aggregation function in TestsPanel.tsx is essentially the same logic as aggregateTestResults in playwright_report.ts. Consider reusing the existing function. (src/components/preview_panel/TestsPanel.tsx)
getDyadGenerateTestTags is a near-identical clone of getDyadWriteTags - The new tag parser duplicates the regex-based XML parsing, attribute extraction, and code fence stripping. A generic parseDyadTags(response, tagName) helper would reduce duplication. (src/ipc/utils/dyad_tag_parser.ts)
Experimental badge text is not i18n-ized - The "Experimental" badge in the toolbar and Tests panel header is a hardcoded English string rather than using t(...). The locale file gets a preview.tests entry but not the badge. (src/components/preview_panel/PreviewToolbar.tsx)
spawn_streaming.ts accumulates unbounded stdout/stderr - Both output strings grow without limit during long Playwright runs. Unlikely to be a practical issue for typical test suites, but a cap or ring buffer would be safer for edge cases. (src/ipc/utils/spawn_streaming.ts)
Record button relies on color alone for recording state - The button toggles between Circle (idle) and Square (recording) icons, which is good, but the red/white color is the primary differentiator for sighted users. A pulsing animation or "REC" label would improve accessibility. (src/components/preview_panel/PreviewIframe.tsx)
normalizeTestPath passes through input of exactly "tests" - If rawPath is "tests" (no file), it returns "tests" verbatim. This isn't a security issue (downstream validation catches it) but could be tightened. (src/ipc/utils/normalize_test_path.ts)

Generated by Dyadbot persona-based code review

- Thread cancellation signal through isolated test DB setup and waitForServerReady - Only clear neonTestBranchId after a confirmed Neon branch deletion (avoid orphans) - Re-send recorder state on iframe load so recording survives navigation/reload - Fall back to a concrete filename for empty/all-dots normalized test paths - Resolve appPath through realpath in getTestScreenshot (symlink containment) - Show 'Screenshot unavailable' instead of perpetual loading state - Only treat .spec.ts files under tests/ as E2E test cards Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

wwwillchen · 2026-06-27T14:48:45Z

@BugBot run

azizmejri1 · 2026-06-27T14:50:21Z

🤖 Claude Code Review Summary

PR Confidence: 4/5

All review comments were addressed with code changes or a reasoned engineering decision, lint/typecheck pass, and changes are pushed — held at 4 only because the cancellation and isolation paths warrant a manual smoke-test before merge.

Unresolved Threads

Thread	Rationale	Link

No unresolved threads

Resolved Threads

Issue	Rationale	Link
Thread cancellation didn't reach the isolated-DB setup phase (signal not passed; `waitForServerReady` could hang ~120s after Stop)	Threaded `signal` from the handler into `prepareIsolatedTestDatabase` and into `waitForServerReady`, which now throws on abort	View · View · View
Failed Neon branch deletion still cleared `neonTestBranchId`, orphaning branches permanently	`deleteBranchBestEffort` now returns a boolean; the column is cleared only on confirmed deletion, so the startup sweep can retry	View · View
Recorder silently stopped capturing after in-iframe navigation/reload (`active=false` re-injected)	Re-send recorder state on the iframe `load` event, with cleanup	View
Empty/all-dots normalized test path collapsed to `tests/`, causing EISDIR on write	Fall back to `tests/generated.spec.ts` when the sanitized path is empty	View
`getTestScreenshot` containment check failed under ancestor symlinks (e.g. `/var`→`/private/var`), perpetual "Loading…"	Resolve `appPath` through `realpathSync` before `path.relative`	View
`FailureDetails` showed "Loading screenshot…" forever when a screenshot was unavailable	Added a `loaded` flag; renders "Screenshot unavailable" once the fetch settles with no image (Principle #4: Transparent Over Magical, #6: Delightful)	View
`dyad-write` → test-card redirect caught any `.spec.ts`, including unit specs like `src/components/Button.spec.ts`	Tightened condition to require a `tests/` prefix	View
Suggestion to migrate `loadSpecs` from manual state to TanStack Query	Resolved with explanation: test state lives in per-app jotai atoms mirroring `previewRunStateByAppIdAtom`; a partial migration would split state across two systems. Engineering-consistency call (product principles cover product/UX, not renderer data-flow)	View

Product Principle Suggestions

General: "Add a principle (or sub-note) covering internal code-architecture consistency — e.g. 'prefer the established state-management pattern of the surrounding module over introducing a parallel system for marginal gains' — so code-health/refactor suggestions can be adjudicated against an explicit rule rather than ad-hoc engineering judgment."

🤖 Generated by Claude Code

dyad-assistant

Claude review: 5 inline finding(s).

dyad-assistant · 2026-06-27T14:57:29Z

🔍 Dyadbot Code Review Summary

Verdict: ⛔ NO - Do NOT merge
Recommendation: auto-fix

Large, well-structured feature PR that adds AI-generated E2E testing with a thoughtful UX (result taxonomy, database isolation, lazy Playwright bootstrap, record-a-test). The security posture for screenshot reading (symlink resolution + containment check), command injection prevention (array args, no shell on Unix), and path normalization is solid. Two mechanical bugs need fixing before merge.

Issues Summary

Severity	File	Issue
🔴 HIGH	`src/ipc/utils/normalize_test_path.ts:22`	normalizeTestPath("tests") returns bare directory, causing EISDIR crash on write
🔴 HIGH	`src/components/chat/DyadMarkdownParser.tsx:472`	Extension mismatch: only .spec.ts gets the test card, not .spec.tsx/.js/.jsx
🟡 MEDIUM	`src/components/preview_panel/TestsPanel.tsx:569`	TestsPanel uses manual IPC + useState instead of TanStack Query
🟡 MEDIUM	`src/ipc/utils/normalize_test_path.ts:10`	normalizeTestPath does not validate spec file extension on write path
🟡 MEDIUM	`src/components/preview_panel/PreviewIframe.tsx:940`	Recorder action handler does not check event.origin

HIGH #1 — normalizeTestPath("tests") crashes. If the LLM emits <dyad-generate-test path="tests">, the function returns the bare string "tests" (via sanitized === "tests" early return). response_processor.ts then calls fs.writeFileSync on a directory, throwing an unhandled EISDIR error. Fix: fold the bare "tests" case into the empty-path fallback: if (!sanitized || sanitized === "tests") return "tests/generated.spec.ts";.

HIGH #2 — Extension mismatch loses the test card for non-.ts specs. The <dyad-write> → test card routing in DyadMarkdownParser.tsx only checks writePath.endsWith(".spec.ts"), but listAppTests globs *.spec.{ts,tsx,js,jsx} and TEST_FILE_PATTERN matches all four. Agent-mode specs with .spec.tsx/.spec.js/.spec.jsx extensions render as a plain file card with no "View in Tests" link. Fix: const isTestSpec = /\.spec\.(ts|tsx|js|jsx)$/.test(writePath) && ....

🟢 Low Priority Notes (2 items)

Unused get param in write atom - setTestRunStateForAppAtom declares get but never uses it; convention is _get. (src/atoms/testRuntimeAtoms.ts)
Password masking sentinel is ambiguous - The recorder replaces password values with "********". The prompt explains the convention, but a more explicit sentinel like "[MASKED_PASSWORD]" would be less likely to confuse the LLM. (worker/dyad-recorder-client.js)

What's done well:

The infra-vs-assertion error taxonomy (amber/red) is a genuinely useful UX distinction that avoids the "Dyad is broken" false alarm.
Screenshot path validation resolves symlinks before the containment check — correct defense against symlink escapes.
spawnStreaming uses array args (no shell on Unix) and treeKill for clean abort — solid process management.
The Neon branch isolation flow (create → swap env → restart → run → teardown → restore) with crash-recovery reconciliation at startup is thorough.
Comprehensive test coverage for the new utilities (parse_test_cases, playwright_report, neon_test_branch, isolated_test_db, normalize paths, tag parser).
The basicAgentMode / PRO_AGENT_ONLY_TOOLS removal is clean dead-code deletion (the set was empty, the filter was a no-op).

Generated by Dyadbot persona-based code review

wwwillchen · 2026-06-28T20:28:26Z

@BugBot run

dyad-assistant

Claude review: 5 inline finding(s).

dyad-assistant · 2026-06-28T20:39:50Z

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge
Recommendation: ready

This is a well-architected, security-conscious feature PR that adds AI-generated E2E testing with appropriate defense-in-depth across all layers. The core security decisions are sound:

Path traversal: Multi-layer protection via TEST_FILE_PATTERN regex, normalizeTestPath, safeJoin, and symlink resolution in the screenshot handler.
SQL injection: UUID_RE validation on userId, SAFE_IDENT_RE on identifiers, hardcoded OWNER_COLUMNS list.
Command injection: Arguments passed as arrays to spawn with shell: false on Unix. Windows shell: true is necessary for .cmd shims and current callers only pass validated/static args.
IPC boundary: All handlers use createTypedHandler with Zod-validated contracts. Screenshot reading validates containment after resolving symlinks. Test file paths are validated against a strict regex before reaching the Playwright CLI.
Error handling: Expected errors use DyadError with appropriate DyadErrorKind. The isolated DB service never runs tests against real data on setup failure.

The DB isolation design (Neon branches + Supabase test users with crash-safe reconciliation) is thoughtful, and the test coverage across 8 test files is solid.

Issues Summary

Severity	File	Issue
🟡 MEDIUM	`src/components/chat/DyadMarkdownParser.tsx:470`	isTestSpec only matches .spec.ts, missing .tsx/.js/.jsx extensions
🟡 MEDIUM	`src/ipc/utils/supabase_test_user.ts:376`	userId not validated as UUID before URL interpolation in deleteUserBestEffort
🟡 MEDIUM	`src/components/preview_panel/TestsPanel.tsx:568`	TestsPanel uses direct IPC calls instead of TanStack Query for data fetching
🟡 MEDIUM	`src/ipc/services/isolated_test_db.ts:131`	Neon teardown restarts the app even if setup failed before any env modification
🟡 MEDIUM	`src/ipc/utils/spawn_streaming.ts:62`	Exported utility uses shell: true on Windows without documenting injection constraints

🟢 Low Priority Notes (4 items)

Optional chain on os.cpus() - parallelWorkerCount() in tests_handlers.ts uses os.cpus()?.length ?? 2 — the optional chain is unnecessary since os.cpus() always returns an array, though it's harmless defensive coding. (src/ipc/handlers/tests_handlers.ts)
postMessage origin - The recorder's postMessage(message, "*") in PreviewIframe.tsx is consistent with every other postMessage call in that file — no change needed, but worth noting the existing pattern uses "*" throughout. (src/components/preview_panel/PreviewIframe.tsx)
Dead code removal - Removing the unused basicAgentMode option and empty PRO_AGENT_ONLY_TOOLS set in tool_definitions.ts is good cleanup, though it's unrelated to the E2E testing feature. (src/pro/main/ipc/handlers/local_agent/tool_definitions.ts)
Prompt duplication in snapshots - The E2E test writing guidance block appears 8 times in the snapshot file (once per agent mode variant) — this is the expected behavior of the snapshot structure, not a bug. (src/prompts/__snapshots__/local_agent_prompt.test.ts.snap)

Generated by Dyadbot persona-based code review

Defer the preview "record a test" sub-feature to a follow-up PR. Removes the in-iframe recorder client and its proxy injection, the recording atoms/types, the recorder UI in PreviewIframe and TestsPanel, and the "Tests from a recorded flow" system prompt section. The AI-generated E2E testing feature and test-runner infrastructure are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge the neon_test_branch_id and supabase_test_user_id column additions into a single migration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

wwwillchen · 2026-06-29T20:42:26Z

@BugBot run

dyad-assistant

Claude review: 2 inline finding(s).

dyad-assistant · 2026-06-29T20:57:21Z

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge
Recommendation: ready

This is a well-designed, large feature PR that adds AI-generated E2E testing with good security practices (path traversal protection, CLI flag injection prevention, symlink resolution, SQL injection guards), thorough test coverage (unit tests for all critical utilities and the isolation service), and careful UX design (infra-vs-assertion taxonomy, isolated test data, crash reconciliation). The architecture follows established Dyad patterns for IPC contracts, jotai atoms, and panel components.

Issues Summary

Severity	File	Issue
🟡 MEDIUM	`src/components/chat/DyadMarkdownParser.tsx:472`	isTestSpec check only matches .spec.ts, not .spec.tsx/.js/.jsx
🟡 MEDIUM	`src/ipc/services/isolated_test_db.ts:87`	AbortSignal not forwarded to Supabase isolation path

🟢 Low Priority Notes (6 items)

parallelWorkerCount floor of 2 on 1-2 core machines - The Math.max(2, ...) floor means parallel mode always spawns at least 2 workers even on 1-core machines, defeating the "leave one core free" comment. Consider Math.max(1, ...). (src/ipc/handlers/tests_handlers.ts)
waitForServerReady effective timeout exceeds 120s - Each loop iteration can block 3.5s (500ms sleep + 3s fetch timeout), so the Date.now() < deadline check doesn't guarantee a 120s cap. Unlikely to matter in practice since most probes fail fast. (src/ipc/services/isolated_test_db.ts)
EMPTY_TEST_RUN_STATE double-cast to escape Object.freeze types - The as unknown as TestRunState cast hides a genuine type mismatch from frozen arrays. A factory function returning fresh plain objects would be safer. (src/atoms/testRuntimeAtoms.ts)
askAiToFix sends current run output, not snapshot from the failing run - If the user triggers "Ask AI to fix" from run N's failure while run N+1 is streaming, the output context sent to chat mixes both runs. Minor incoherence since the error text itself is stable. (src/components/preview_panel/TestsPanel.tsx)
Headed mode on Linux without DISPLAY - CI=true is set in the env, but --headed overrides it. On headless Linux (cloud desktops, WSL), the browser launch will fail with an opaque amber error. Consider detecting and warning. (src/ipc/handlers/tests_handlers.ts)
basicAgentMode removal is unrelated cleanup - The removed PRO_AGENT_ONLY_TOOLS set was already empty (new Set<string>()), so the removal has zero behavioral effect. Worth noting in the PR description since it's unrelated to the testing feature. (src/pro/main/ipc/handlers/local_agent/tool_definitions.ts)

Generated by Dyadbot persona-based code review

azizmejri1 · 2026-06-29T21:25:43Z

@wwwillchen Quick update :

I removed the test record feature from this PR , i'm gonna open another PR for recording tests with a more precise and reliable version.
for DB isolation , i went with a test branch for neon and test account for supabase since branches are not possible with the free plan .

I'll try to wrap up this PR in the next couple of days.

wwwillchen · 2026-06-30T19:26:05Z

@BugBot run

dyad-assistant

Claude review: 4 inline finding(s).

dyad-assistant · 2026-06-30T19:36:20Z

🔍 Dyadbot Code Review Summary

Verdict: 🤔 NOT SURE - Potential issues
Recommendation: auto-fix

This is a substantial, well-structured feature addition (~6.5k lines across 44 files) that adds AI-generated E2E testing with Playwright, a new Tests panel, isolated test databases (Neon branches + Supabase test users), and orphan reconciliation on startup.

Security posture is strong: the main/renderer boundary is preserved throughout — service-role keys stay in the main process, shell injection is blocked at multiple layers (spawn with array args, no shell on Unix, TEST_FILE_PATTERN regex before CLI args), path traversal is defended in depth (normalizeTestPath strips .., safeJoin for disk writes, symlink-aware containment check for screenshots), and the data isolation design dead-ends safely (never falls back to running tests against real data).

The code quality is high — proper DyadError usage with appropriate DyadErrorKinds, comprehensive test coverage (7 test files), well-documented schema changes with a matching Drizzle migration, and the UI covers all required states (empty, loading, setup, running, results, dev-server-not-running, error).

One HIGH-severity race condition and several MEDIUM issues should be addressed before shipping.

Issues Summary

Severity	File	Issue
🔴 HIGH	`src/ipc/handlers/tests_handlers.ts:378`	Concurrent test runs race on env-swap can cause tests to run against real data
🟡 MEDIUM	`src/ipc/utils/supabase_test_user.ts:310`	SQL identifier interpolation from external API result guarded only by regex
🟡 MEDIUM	`src/components/preview_panel/TestsPanel.tsx:599`	Streaming watcher effect leaks cancellation cleanup on fast app-switch
🟡 MEDIUM	`src/components/chat/DyadGenerateTest.tsx:86`	View in Tests button shown for aborted (unwritten) test files

🟢 Low Priority Notes (5 items)

Unbounded output accumulation - The output string in TestRunState grows without bound as bootstrap + test runner output is appended. For verbose installs or large test suites this can accumulate megabytes in the renderer. Consider capping at ~512 KB with .slice(-512_000). (src/atoms/testRuntimeAtoms.ts)
Server-ready poller is aggressive - waitForServerReady polls at 500 ms with a 3-second fetch timeout, generating up to 240 HTTP requests over 2 minutes. Consider exponential backoff or a longer interval (1-2 s), since cold dev server starts rarely complete in under 5 s. (src/ipc/services/isolated_test_db.ts)
Multi-line test titles not parsed - parseTestCases operates line-by-line and misses test("title", split across lines. Generated tests use single-line titles, but user-edited tests may not. The file can still be run as a whole; only per-test granularity is lost. (src/ipc/utils/parse_test_cases.ts)
Stale testLine produces unhelpful error - A stale line number (after the user edits a test file) causes Playwright to run zero tests, surfacing as a generic "test runner didn't produce a report" infra error rather than a specific "no test found at line N" message. (src/ipc/handlers/tests_handlers.ts)
TanStack Query not used for IPC data fetching - TestsPanel fetches spec lists via raw ipc.tests.listAppTests() in a useCallback instead of using TanStack Query patterns present elsewhere in the repo. This is acceptable for ephemeral data, but diverges from the established pattern. (src/components/preview_panel/TestsPanel.tsx)

Generated by Dyadbot persona-based code review

- normalize_test_path: fall back to a default spec filename for bare `tests`, non-spec, or empty paths to avoid EISDIR / non-test writes - DyadMarkdownParser: route all .spec.{ts,tsx,js,jsx} files to the test card - supabase_test_user: validate userId is a UUID in deleteUserBestEffort; use Postgres format(%I, %L) for owner-row cleanup - isolated_test_db: skip env restore/restart in teardown when the env was never swapped; honor the abort signal in the Supabase isolation path - spawn_streaming: document the Windows shell:true injection constraint - tests_handlers: await the prior run's full teardown before starting a new run to prevent an env-swap race against real data - DyadGenerateTest: hide "View in Tests" deep-link for aborted (unwritten) tests - TestsPanel: return cancellation cleanup from the streaming-watcher reload Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

wwwillchen · 2026-06-30T21:35:01Z

@BugBot run

azizmejri1 · 2026-06-30T21:38:20Z

🤖 Claude Code Review Summary

PR Confidence: 4/5

All 16 outstanding review threads were addressed with code changes or resolved with documented reasoning; checks (fmt/lint/typecheck) pass locally, but the concurrency-race and DB-isolation fixes touch runtime paths best confirmed by a manual test run before merge.

Unresolved Threads

No unresolved threads

Resolved Threads

Issue	Rationale	Link
`normalizeTestPath` returns a bare/dir/non-spec path → EISDIR or non-test write	Falls back to `tests/generated.spec.ts` unless the path is a real `.spec.{ts,tsx,js,jsx}` file	View
Test-card routing only matched `.spec.ts` (3 duplicate threads)	Broadened `isTestSpec` to `/\.spec\.(ts\|tsx\|js\|jsx)$/` under `tests/`	View
`deleteUserBestEffort` interpolated `userId` into URL without UUID check	Added `UUID_RE` guard, matching `cleanUpRowsOwnedBy`	View
Owner-row cleanup relied on regex as the only SQL guard	Now uses Postgres `format(%I, %L)` so the DB escapes identifiers/value	View
Neon teardown restarted the dev server even when no env swap occurred	Gated env restore + restart on an `envModified` flag	View
Abort signal not forwarded to the Supabase isolation path	Threaded `signal` in and check `aborted` before user creation	View
`spawnStreaming` `shell:true` on Windows undocumented for future callers	Added a SECURITY JSDoc note on validating args	View
Stop-then-Run race could run tests against real data	New run awaits the prior run's full prepare→run→teardown via a per-app `done` promise	View
"View in Tests" deep-link shown for aborted (unwritten) tests	Hidden when `aborted`	View
Streaming-watcher reload leaked its cancellation cleanup	Effect now returns the `loadSpecs` cleanup	View
Panel uses manual IPC + `useState` instead of TanStack Query (2 threads)	Resolved on engineering-consistency grounds: panel state mirrors the per-app jotai `previewRunStateByAppIdAtom` pattern; run state is push-driven and doesn't model cleanly as a query	View
Recorder action handler missing `event.origin` check	Obsolete — the test-recording feature was removed in `b176e5bd`	View

Product Principle Suggestions

The following suggestion could improve rules/product-principles.md:

New principle / appendix on internal architecture: "Add guidance on whether established internal renderer data-flow patterns (e.g. jotai atoms vs. TanStack Query for IPC-backed fetches) are in scope for product-principle review, or are an engineering-consistency concern out of scope for these principles." The current principles cover product/UX only, which left the TanStack-vs-atoms threads without explicit guidance.

🤖 Generated by Claude Code

dyad-assistant

Claude review: 4 inline finding(s).

dyad-assistant · 2026-06-30T21:44:27Z

+      const column = String(row?.column_name ?? "");
+      if (!SAFE_IDENT_RE.test(table) || !SAFE_IDENT_RE.test(column)) {
+        continue;
+      }


🟡 MEDIUM

Misleading comment on SQL injection defense layers

The cleanUpRowsOwnedBy query uses DO $$ BEGIN EXECUTE format('DELETE FROM public.%I WHERE %I = %L', '${table}', '${column}', '${userId}'); END $$;. The code is safe because the JS-interpolated values land as positional string-literal arguments to PostgreSQL format(), which properly escapes them. The SAFE_IDENT_RE regex provides a second layer. However, the comment misleadingly implies format() is the primary guard over the regex, when in reality the JS template literal interpolation happens first. A future maintainer reading this could misunderstand which layer is load-bearing.

💡 Suggestion: Rewrite the comment to accurately describe the defense: the regex validates the shape, and the values then pass through format(%I/%L) as a second layer. Both are required because the JS interpolation happens before Postgres sees the query.

dyad-assistant · 2026-06-30T21:44:27Z

+      // Kill the whole process tree — with a shell (Windows) or fast package
+      // managers, the spawned child forks descendants (npx/playwright/chromium)
+      // that a plain child.kill() would leave orphaned.
+      if (child.pid) {


🟡 MEDIUM

Unbounded stdout/stderr accumulation in memory

Both stdout and stderr are accumulated as in-memory strings with no size cap. A test producing excessive output (e.g., a console.log loop) could grow these unboundedly. The TestsPanel also accumulates runState.output without limit. The test handler only uses the last 1500 chars for error reporting, suggesting the full buffer is not needed.

💡 Suggestion: Cap accumulated output to a reasonable size (e.g., ring-buffer the last 100-200KB) to prevent unbounded memory growth from runaway test output.

dyad-assistant · 2026-06-30T21:44:27Z

+        })
+        .catch((err) => {
+          if (!cancelled) showError(err);
+        })


🟡 MEDIUM

Spec list fetch does not use TanStack Query

The listAppTests IPC call is managed with manual useState/useEffect and cancellation flags instead of TanStack Query, which the repo conventionally uses for IPC-backed data fetching. This loses automatic deduplication, cache invalidation, and error retry that TanStack Query provides. The streaming run itself reasonably lives outside TanStack Query, but the spec discovery query would benefit from the established pattern.

💡 Suggestion: Refactor the listAppTests call to use useQuery (or the repo's IPC query wrapper) for consistency with other panels' data fetching patterns.

dyad-assistant · 2026-06-30T21:44:27Z

+ * cascade FK are NOT removed — a documented limitation of the free-tier model.
+ */
+async function cleanUpRowsOwnedBy({
+  projectId,


🟡 MEDIUM

Silent data leak when test user row cleanup fails

When cleanUpRowsOwnedBy fails for some tables (e.g., FK constraints on tables not matching the OWNER_COLUMNS heuristic), test data silently remains in the user's real database. Since the test user is subsequently deleted, orphaned rows referencing the deleted user's id could cause integrity issues. The user receives no indication that cleanup was incomplete.

💡 Suggestion: Surface a warning in the isolation.reason field when cleanup errors occur, so the user knows some test data may remain in their database.

dyad-assistant · 2026-06-30T21:44:29Z

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge
Recommendation: ready

This is a well-designed, security-conscious PR that adds a significant new feature. The IPC boundary is carefully maintained: test file paths are validated against a strict regex before reaching Playwright CLI, screenshot reads resolve symlinks before containment checks, Zod schemas validate all inputs, and the isolation system dead-ends safely rather than running tests against real data. The DB schema changes include the required Drizzle migration and snapshot, and crash recovery is handled via startup reconciliation sweeps. Test coverage is thorough with unit tests for all major utilities.

No HIGH severity issues were found. The MEDIUM items below are worth considering for follow-up but do not block merge.

Issues Summary

Severity	File	Issue
🟡 MEDIUM	`src/ipc/utils/supabase_test_user.ts:340`	Misleading comment on SQL injection defense layers
🟡 MEDIUM	`src/ipc/utils/spawn_streaming.ts:83`	Unbounded stdout/stderr accumulation in memory
🟡 MEDIUM	`src/components/preview_panel/TestsPanel.tsx:577`	Spec list fetch does not use TanStack Query
🟡 MEDIUM	`src/ipc/utils/supabase_test_user.ts:306`	Silent data leak when test user row cleanup fails

🟢 Low Priority Notes (6 items)

Missing newline at end of migration file - drizzle/0034_gray_cerebro.sql ends without a trailing newline. (drizzle/0034_gray_cerebro.sql)
parseTestCases skips * lines overzealously - A line like */ test("after comment", ... would be skipped, causing a phantom missing test in the tree UI. Best-effort is documented. (src/ipc/utils/parse_test_cases.ts)
ensureTestScript rewrites package.json with 2-space indentation - Projects using different formatting get a noisy git diff. (src/ipc/utils/playwright_bootstrap.ts)
EMPTY_TEST_RUN_STATE frozen arrays typed as mutable - The as unknown as cast hides that runningFiles is readonly string[], risking a silent runtime failure if .push() is ever called on it. (src/atoms/testRuntimeAtoms.ts)
Unrelated cleanup bundled in PR - Removal of basicAgentMode and PRO_AGENT_ONLY_TOOLS in tool_definitions.ts is a separate concern. (src/pro/main/ipc/handlers/local_agent/tool_definitions.ts)
Scaffold .gitignore gets Playwright entries for all new apps - Harmless but adds test-related patterns even when tests are never used. (scaffold/.gitignore)

Detail on MEDIUM findings

Misleading comment on SQL injection defense layers (src/ipc/utils/supabase_test_user.ts:340): The cleanUpRowsOwnedBy query uses DO $$ BEGIN EXECUTE format('DELETE FROM public.%I WHERE %I = %L', '${table}', '${column}', '${userId}'); END $$;. The code is actually safe -- the JS-interpolated values land as positional string-literal arguments to PostgreSQL's format(), which properly escapes them via %I/%L. The SAFE_IDENT_RE regex provides a second layer. However, the comment says "Let Postgres escape the identifiers/value via format(%I, %L) instead of relying on the regex check above as the only guard" -- this is misleading because the JS template literal interpolation happens before format() processes its arguments, making it look like the values bypass format() when they actually don't. A future maintainer who loosens the regex trusting format() to catch edge cases would still be safe here, but the comment should accurately describe the layered defense.

Unbounded stdout/stderr accumulation (src/ipc/utils/spawn_streaming.ts:83): Both stdout and stderr are accumulated in-memory strings with no size cap. A test that produces excessive output (e.g., a console.log loop) could grow these unboundedly. The TestsPanel also accumulates runState.output without limit. The test handler already only uses run.stderr.trim().slice(-1500) for the infra error message, suggesting the full buffer isn't needed. Consider capping to a ring-buffer (e.g., last 100-200KB).

Spec list fetch does not use TanStack Query (src/components/preview_panel/TestsPanel.tsx:577): The listAppTests IPC call is managed with manual useState/useEffect + cancellation flags instead of TanStack Query, which the repo uses elsewhere for IPC-backed data fetching. This loses automatic deduplication, cache invalidation, and error retry. The streaming run itself reasonably lives outside TanStack Query, but the discovery query would benefit from the pattern.

Silent data leak when test user row cleanup fails (src/ipc/utils/supabase_test_user.ts:306): When cleanUpRowsOwnedBy fails for some tables (e.g., FK constraints on tables not matching the OWNER_COLUMNS heuristic), test data silently remains in the user's real database. Since the test user is subsequently deleted, orphaned rows referencing the deleted user's id could cause integrity issues. The user sees no indication that cleanup was incomplete. Consider surfacing a warning in the isolation.reason field when cleanup errors occur.

Generated by Dyadbot persona-based code review

wwwillchen · 2026-07-01T04:38:25Z

@wwwillchen Quick update :

I removed the test record feature from this PR , i'm gonna open another PR for recording tests with a more precise and reliable version.

for DB isolation , i went with a test branch for neon and test account for supabase since branches are not possible with the free plan .

I'll try to wrap up this PR in the next couple of days.

got it - thanks for teh update.

azizmejri1 temporarily deployed to ai-bots June 25, 2026 20:34 — with GitHub Actions Inactive

azizmejri1 had a problem deploying to ai-bots June 25, 2026 20:34 — with GitHub Actions Failure

azizmejri1 temporarily deployed to ai-bots June 25, 2026 20:34 — with GitHub Actions Inactive

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

azizmejri1 temporarily deployed to ai-bots June 25, 2026 20:43 — with GitHub Actions Inactive

dyad-assistant Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread src/ipc/handlers/tests_handlers.ts Outdated

Comment thread src/components/preview_panel/TestsPanel.tsx

Comment thread src/components/chat/DyadMarkdownParser.tsx

Comment thread src/components/preview_panel/TestsPanel.tsx

azizmejri1 temporarily deployed to ai-bots June 27, 2026 14:48 — with GitHub Actions Inactive

azizmejri1 had a problem deploying to ai-bots June 27, 2026 14:48 — with GitHub Actions Failure

azizmejri1 temporarily deployed to ai-bots June 27, 2026 14:57 — with GitHub Actions Inactive

dyad-assistant Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread src/ipc/utils/normalize_test_path.ts Outdated

Comment thread src/components/chat/DyadMarkdownParser.tsx

Comment thread src/components/preview_panel/TestsPanel.tsx

Comment thread src/ipc/utils/normalize_test_path.ts

Comment thread src/components/preview_panel/PreviewIframe.tsx

Test account for supabase

5ef4b1c

azizmejri1 temporarily deployed to ai-bots June 28, 2026 20:28 — with GitHub Actions Inactive

azizmejri1 had a problem deploying to ai-bots June 28, 2026 20:28 — with GitHub Actions Failure

azizmejri1 temporarily deployed to ai-bots June 28, 2026 20:28 — with GitHub Actions Inactive

azizmejri1 temporarily deployed to ai-bots June 28, 2026 20:39 — with GitHub Actions Inactive

dyad-assistant Bot reviewed Jun 28, 2026

View reviewed changes

Comment thread src/components/chat/DyadMarkdownParser.tsx

Comment thread src/ipc/utils/supabase_test_user.ts

Comment thread src/components/preview_panel/TestsPanel.tsx

Comment thread src/ipc/services/isolated_test_db.ts

Comment thread src/ipc/utils/spawn_streaming.ts

azizmejri1 and others added 2 commits June 29, 2026 21:30

Consolidate test branch migrations into one

dd9beeb

Merge the neon_test_branch_id and supabase_test_user_id column additions into a single migration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

azizmejri1 temporarily deployed to ai-bots June 29, 2026 20:42 — with GitHub Actions Inactive

azizmejri1 had a problem deploying to ai-bots June 29, 2026 20:42 — with GitHub Actions Failure

azizmejri1 temporarily deployed to ai-bots June 29, 2026 20:42 — with GitHub Actions Inactive

azizmejri1 temporarily deployed to ai-bots June 29, 2026 20:57 — with GitHub Actions Inactive

dyad-assistant Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread src/components/chat/DyadMarkdownParser.tsx

Comment thread src/ipc/services/isolated_test_db.ts

improvements

1409183

azizmejri1 temporarily deployed to ai-bots June 30, 2026 19:26 — with GitHub Actions Inactive

azizmejri1 had a problem deploying to ai-bots June 30, 2026 19:26 — with GitHub Actions Failure

azizmejri1 temporarily deployed to ai-bots June 30, 2026 19:26 — with GitHub Actions Inactive

azizmejri1 temporarily deployed to ai-bots June 30, 2026 19:36 — with GitHub Actions Inactive

dyad-assistant Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread src/ipc/handlers/tests_handlers.ts

Comment thread src/ipc/utils/supabase_test_user.ts

Comment thread src/components/preview_panel/TestsPanel.tsx Outdated

Comment thread src/components/chat/DyadGenerateTest.tsx

azizmejri1 temporarily deployed to ai-bots June 30, 2026 21:34 — with GitHub Actions Inactive

azizmejri1 had a problem deploying to ai-bots June 30, 2026 21:34 — with GitHub Actions Failure

azizmejri1 temporarily deployed to ai-bots June 30, 2026 21:44 — with GitHub Actions Inactive

dyad-assistant Bot reviewed Jun 30, 2026

View reviewed changes

Uh oh!

Conversation

azizmejri1 commented Jun 25, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wwwillchen commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dyad-assistant Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dyad-assistant Bot commented Jun 25, 2026

🔍 Dyadbot Code Review Summary

Issues Summary

Uh oh!

wwwillchen commented Jun 27, 2026

Uh oh!

azizmejri1 commented Jun 27, 2026

🤖 Claude Code Review Summary

PR Confidence: 4/5

Unresolved Threads

Resolved Threads

Uh oh!

dyad-assistant Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dyad-assistant Bot commented Jun 27, 2026

🔍 Dyadbot Code Review Summary

Issues Summary

Uh oh!

wwwillchen commented Jun 28, 2026

Uh oh!

dyad-assistant Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dyad-assistant Bot commented Jun 28, 2026

🔍 Dyadbot Code Review Summary

Issues Summary

Uh oh!

wwwillchen commented Jun 29, 2026

Uh oh!

dyad-assistant Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dyad-assistant Bot commented Jun 29, 2026

🔍 Dyadbot Code Review Summary

Issues Summary

Uh oh!

azizmejri1 commented Jun 29, 2026

Uh oh!

wwwillchen commented Jun 30, 2026

Uh oh!

dyad-assistant Bot left a comment

Choose a reason for hiding this comment

azizmejri1 commented Jun 25, 2026 •

edited by cubic-dev-ai Bot

Loading