[Feature]: Skill and tooling improvements from a Claude Code (Opus 4.7) session building a TodoApp

## What problem are you trying to solve?

### Context for this feedback

I'm validating the `win-dev-skills` agent and skills with **Claude Code (Opus 4.7, 1M context)** rather than GitHub Copilot, to see how well they generalize across agentic harnesses. The skills work — the final app builds, launches, and is functionally correct — but the session also exposed several rough edges that are worth fixing while they're fresh. This issue is a single tracking item for those findings; happy to split into separate issues during triage.

### Repro session

- **Harness:** Claude Code, model `claude-opus-4-7` (Opus 4.7, 1M context window)
- **Plugin/skills version:** `winui` plugin v0.2.3 (installed via `/plugin`, then `/reload-plugins`)
- **Invocation:** `@agent-winui:winui-dev Can you create a simple todo app`
- **Skills exercised:** `winui-setup`, `winui-design`, `winui-dev-workflow`, `winui-ui-testing`
- **Outcome:** working WinUI 3 app at ~150 LoC across `Models/`, `ViewModels/`, `Services/`, with MVVM via `CommunityToolkit.Mvvm`, `x:Bind`, `ThemeResource`, JSON persistence under LocalAppData. App launches cleanly, all features work, persistence verified across restart.
- **Cost:** ~62 min wall time, ~112 tool uses across two agent invocations (the first invocation timed out mid-build and had to be resumed manually).

The cost-to-output ratio is what motivated this review — an LLM-driven scaffold of a todo app should not need 112 tool uses, and identifying *why* it did points directly at fixable issues in the skills and tooling.

## Proposed solution

### 1. Lint/analyzer rule for the `DataContext.X` binding footgun

The agent generated this in `MainPage.xaml`:

```xml
<Button Command="{Binding DataContext.DeleteTodoCommand, ElementName=TodoListView}" .../>
```

…but the page only exposes a `ViewModel` property for `x:Bind` and never assigns `DataContext`. The binding silently failed — clicking Delete did nothing. The XAML compiler doesn't catch this, and it only surfaced because the agent ran UI automation afterwards. **Without UI testing this would have shipped silently broken.** This is the most concrete and important item in the issue: it's a pattern the design skill or its templates could plausibly emit again, and there is no compiler-level guardrail today.

**Proposal:** add a Roslyn analyzer rule (or, as a stopgap, a `winui-code-review` lint check) that flags `{Binding DataContext.X, ElementName=Y}` when the named element's parent page/control never assigns `DataContext` in code-behind or XAML. Suggested fix-it: switch to `Click` handler with `Tag="{x:Bind <item>}"`, which is the pattern the agent ultimately landed on after the smoke test caught it.

### 2. `winui-setup` should be fallback-only, not preconditioned

The skill description says it's for "after a Windows reset, or when another winui skill reports a missing prerequisite" — i.e., a fallback. But when the parent prompt mentioned "use winui-setup first if prerequisites aren't ready", the agent ran setup checks pre-emptively because that phrasing matched *what the skill does*, not *when it should run*. On a machine with a working toolchain this is wasted turns.

**Proposal:** either (a) strengthen the description with explicit "do NOT run speculatively — only invoke when another skill reports a specific missing-prerequisite error string," or (b) add a fast-path entry point in `winui-dev-workflow` that just attempts the build and dispatches to setup only on the specific error patterns it can recognize. Option (a) is cheaper and likely sufficient.

### 3. `winapp run` exit-code semantics

`winapp run --debug-output` exited with **255 when the app was killed** by the agent at the end of the smoke test. That return code is indistinguishable from a real failure without parsing stderr — the agent had to write extra logic to second-guess whether its own session had actually succeeded.

**Proposal:** distinguish between outcomes with separate exit codes (or, better, a structured JSON status line on stdout that agents can parse without ambiguity):

- `0` — app exited cleanly on its own
- `1` — build failed
- `2` — launch failed (built ok, but did not start)
- `130` (or similar dedicated code) — app killed by signal/Ctrl-C
- explicit handling for "process still running, output detached"

### 4. `winui-ui-testing` scales tests to template, not feature surface

The skill generated **19 smoke-test cases for a 4-feature todo app** (add, check, delete, persist). Most cases were template-driven assertions (e.g., element-existence checks for every named control) rather than feature-driven validations of the user-visible behavior.

This was partly induced by my prompt asking for a "smoke test" — but a skill named `winui-ui-testing` arguably should produce a *smoke* test by default and require an explicit flag for the exhaustive batch.

**Proposal:** scale the generated test count to the declared feature surface — e.g., 1–3 tests per feature by default, with explicit opt-in for exhaustive accessibility/element audits. A `--scope=smoke|full` flag would cover this cleanly.

### 5. Cold-build first-run experience needs progress milestones

The very first build (NuGet restore + WindowsAppSDK + .NET 10 SDK pull) is a multi-minute black box. The first agent invocation timed out mid-build because it couldn't distinguish "this is normal, still working" from "this is hung." It had to be resumed by spawning a second agent that picked up where the first left off — half the wall-time cost of the session is attributable to this one issue.

**Proposal:** have `BuildAndRun.ps1` print explicit milestone lines that agents can grep for and treat as keep-alive signals, e.g.:

```
[winapp] RESTORE_START
[winapp] RESTORE_COMPLETE  duration=187s
[winapp] BUILD_START
[winapp] BUILD_COMPLETE    duration=42s  artifact=...\TodoApp.exe
[winapp] LAUNCH_COMPLETE   pid=1234
```

Bonus: document expected cold-start timing in the skill so agents can pick a single appropriate sleep instead of polling. This mostly matters for the first run on a clean machine; on warm machines the existing flow is fine.

## Alternatives considered

- **Filing each item as a separate issue.** I'd suggest splitting once you triage — #1, #3, #4 are pretty independent and could each be a small PR; #2 and #5 are documentation/script tweaks. I kept them together so the connecting context (a real session, with the same root causes recurring) isn't lost.
- **Adding everything to `winui-code-review` instead of a Roslyn analyzer.** For #1 specifically, a code-review skill catch happens late (after the file is written and possibly already running). A Roslyn analyzer surfaces it at build time, which agents notice immediately and fix on the next iteration. The analyzer is the better long-term home; the code-review skill could be a short-term stopgap.

## Additional context

- This was a Claude Code session, not Copilot, so the existing `winui-session-report` analyzer (which expects Copilot session events) didn't apply directly — observations here are from reviewing the agent's tool transcripts and final artifacts manually.
- Some of the inefficiency in the session was caused by the parent prompt (asking for "use `winui-setup` first" and asking for a "smoke test"). That's a prompting issue on my end, but it points to a robustness question: skill descriptions are strong enough most of the time but get overridden by parent prompts. Item #2's stronger framing would help.
- Happy to contribute PRs for any of these — particularly #1, #3, and #5 if there's interest.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Skill and tooling improvements from a Claude Code (Opus 4.7) session building a TodoApp #68

What problem are you trying to solve?

Context for this feedback

Repro session

Proposed solution

1. Lint/analyzer rule for the `DataContext.X` binding footgun

2. `winui-setup` should be fallback-only, not preconditioned

3. `winapp run` exit-code semantics

4. `winui-ui-testing` scales tests to template, not feature surface

5. Cold-build first-run experience needs progress milestones

Alternatives considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Skill and tooling improvements from a Claude Code (Opus 4.7) session building a TodoApp #68

Description

What problem are you trying to solve?

Context for this feedback

Repro session

Proposed solution

1. Lint/analyzer rule for the DataContext.X binding footgun

2. winui-setup should be fallback-only, not preconditioned

3. winapp run exit-code semantics

4. winui-ui-testing scales tests to template, not feature surface

5. Cold-build first-run experience needs progress milestones

Alternatives considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Lint/analyzer rule for the `DataContext.X` binding footgun

2. `winui-setup` should be fallback-only, not preconditioned

3. `winapp run` exit-code semantics

4. `winui-ui-testing` scales tests to template, not feature surface