feat(secrets): vendor rules, scan UX, candidate visibility by wesm · Pull Request #548 · kenn-io/agentsview

wesm · 2026-05-26T18:06:13Z

Summary

Adds six well-anchored definite-tier rules to the secret scanner with placeholder validators following the existing rigor:

openai-key matches sk-proj-, sk-svcacct-, and sk-admin- prefixes. A maskOpenAIKey helper preserves the matched prefix length when redacting.
gitlab-pat matches glpat- personal/project access tokens.
npm-token matches the post-2021 npm_ + 36 base62 format (legacy classic-hex tokens are deliberately not matched).
pypi-token anchors on the literal pypi-AgEIcHlwaS5vcmcC macaroon-header bytes and enforces PyPI's documented 85-char body minimum (69-char variable tail).
huggingface-token matches hf_ + base62 body.
sendgrid-key matches the two-segment SG.<22>.<43> format, validating each segment.

Every new validator chains notTrailingRunRepeat(_, 4) before bodyLooksRandom / bodyHasNoPlaceholderShape so random-looking bodies with trailing AAAA placeholder shapes don't slip through.

Fixes the high-entropy-assignment rule so trailing base64 = padding is included in the captured span (was clipped before), and updates highEntropyValue to TrimRight the padding before measuring length and entropy.

Splits SecretScanSummary into DefiniteFindings and CandidateFindings counts across sync, service, and CLI layers. The CLI human-output line now reads with definite leaks; N findings (X definite, Y candidate). JSON consumers get the new fields automatically.

Rewrites secrets scan help text to surface the inline-sync (definite-only) vs full-scan (definite + candidate) two-tier model, and adds a hint after the human-format summary pointing users at secrets list --confidence all when candidate findings exist. JSON callers don't see the hint and can render their own UX.

Bumps rulesAlgorithmVersion from 5 to 6. Adds fixture-deny-list entries for the synthetic test bodies the new rules correctly identify.

🤖 Generated with Claude Code

Approved-via-brainstorming design for a single PR adding six vendor rules (OpenAI, GitLab, npm, PyPI, HuggingFace, SendGrid), fixing the base64 padding capture in high-entropy-assignment, plumbing a definite/candidate breakdown through the scan summary, rewriting scan help text, and adding a candidate-visibility hint after non-JSON scan output. One rulesAlgorithmVersion bump (5 -> 6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Definite vs candidate rule wording: motivation paragraph now correctly attributes JWT, basic-auth-url, and high-entropy-assignment to the candidate tier. - High-entropy padding fix: highEntropyValue must TrimRight '=' before measuring length and Shannon entropy, otherwise appended padding can push a borderline body below the 3.5-bit floor. Validator-padding regression test added. v6 doc comment updated. - PyPI boundary test: the 85-char floor applies to the base64 body after pypi-, with the 16-char macaroon-header anchor counted in. Correct boundary cases are 68-char tail (must NOT match) and 69-char tail (MUST match). - openai-key mask: replace fixed 8-char prefix preservation with a maskOpenAIKey helper that picks the matched prefix length, so sk-svcacct- (11) and sk-admin- (9) are not truncated mid-word. - openai-key citation: phrase the prefix patterns as scanner knowledge derived from observed token format, not a 2024 announcement URL we cannot cite cleanly. Project/service-account/admin concepts are cited to OpenAI's API platform docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Twelve TDD-driven tasks covering the approved spec: per-vendor-rule commits (openai-key, gitlab-pat, npm-token, pypi-token, huggingface- token, sendgrid-key), high-entropy padding capture + validator trim, SecretScanSummary breakdown plumbing, CLI help/hint UX, rulesAlgorithmVersion bump, verification pass, and scratch-file cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a definite-tier rule for OpenAI project, service-account, and admin key prefixes. Includes maskOpenAIKey helper that preserves the matched prefix length (sk-svcacct- and sk-admin- differ from sk-proj-), and notOpenAIKeyPlaceholder validator using the existing bodyLooksRandom + notTrailingRunRepeat structural checks. Legacy sk- + 48-char keys are deliberately not matched. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replaces the misleading maskKeepEnds(s, 3, 4) fallback in maskOpenAIKey with a panic — the regex guarantees one of the three known prefixes, so silently producing wrong output if a future rule change desyncs the two is worse than crashing. Rewrites the prefix-ordering comment to be honest about the framing: today no prefix shadows another, so the longest-first order is defensive against a hypothetical future addition, not protection against a current risk.

Definite-tier rule for GitLab personal/project access tokens. Uses the Google-key terminator pattern (capture-then-non-body-byte) because the token alphabet includes '-'. Validator reuses bodyLooksRandom on the trimmed body. Source: GitLab token prefix docs.

A 20-char body with 16 random-looking chars followed by 'AAAA' passes bodyLooksRandom (entropy ~3.92, AAAA covers only 20% of the body), so without notTrailingRunRepeat the validator reports glpat-...AAAA as a definite leak. Mirrors the anthropic-key and openai-key pattern.

Definite-tier rule for npm access tokens in the 2021+ format: npm_ + 36 base62 chars (30 payload + 6 CRC). Validator uses bodyHasNoPlaceholderShape on the fixed-length body (entropy gate is over-restrictive at this length, like AWS access keys). Legacy classic-format tokens (bare hex) are deliberately not matched.

Same false-positive shape that prompted the gitlab-pat fix (commit 158384f): a 36-char body with 32 random-looking chars followed by 'AAAA' passes bodyHasNoPlaceholderShape (5/36 = 14% single-byte coverage, below the 75% threshold), so without notTrailingRunRepeat the validator reports npm_...AAAA as a definite leak.

Definite-tier rule for PyPI API tokens. Anchors on the literal base64-encoded macaroon header bytes (pypi-AgEIcHlwaS5vcmcC) that every PyPI token starts with — a stronger anchor than just pypi-. The 85-char body minimum from PyPI's docs translates to a 69-char variable tail after the 16-char anchor. Validator includes notTrailingRunRepeat(s, 4) up front to catch the random-body-with-trailing-AAAA placeholder shape that bodyLooksRandom alone doesn't reject (same gap addressed in Tasks 2 and 3 for gitlab-pat and npm-token). Source: docs.pypi.org/api/secrets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Definite-tier rule for Hugging Face access tokens. Base62 body, minimum 30 chars. Validator uses notTrailingRunRepeat + bodyLooksRandom on the trimmed body, matching the pattern established for the other variable- body vendor rules in this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous body was 29 chars, below the regex's 30-char minimum, so the test passed because the regex rejected it before the validator was called. Padding by one char makes the body 30 chars, so the regex matches and notTrailingRunRepeat is the active gate.

Definite-tier rule for SendGrid API keys. Two fixed-length base64url segments separated by a literal '.'. Validator splits on the internal dot, applies bodyHasNoPlaceholderShape to the 22-char identifier (short, like AWS) and notTrailingRunRepeat + bodyLooksRandom to the 43-char secret (matching the trailing-run guard pattern in the other variable- body vendor rules in this PR).

Widens the high-entropy-assignment capture group to include trailing '=' padding so the reported span and redacted display match the literal value. Updates highEntropyValue to TrimRight the padding before measuring length and Shannon entropy: '=' is not in the body charset, so trimming guarantees padding can't affect the gate. Adds tests for the span behavior and validator trim. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds DefiniteFindings and CandidateFindings to sync.SecretScanSummary, mirrors them on service.SecretScanSummary, passes them through both direct and HTTP backends, and surfaces them in the CLI human-output line. The existing WithSecrets field keeps its meaning (sessions with ≥1 definite finding); the human label changes from 'with secrets' to 'with definite leaks' since the new line co-displays candidate counts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rewrites 'secrets scan' Short and adds a Long that explains the inline-vs-full two-tier model and the --confidence-all step needed to see candidate findings. Retightens the --backfill help. Adds a smart hint printed after the scan summary when candidate findings exist and output is human-formatted; JSON callers get candidate_findings in the payload and decide their own UX. Includes hint-shown, hint-suppressed, and JSON-no-hint regression tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Marks the ruleset change in this PR (six new vendor rules, base64 padding capture widening, highEntropyValue trim-padding). Sessions previously scanned at v5 are flagged stale and will be re-scanned by the next 'secrets scan' or 'secrets scan --backfill', which is the designed behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds SHA-256 hashes for the 14 new test fixture values introduced by the six new vendor rules (openai-key, gitlab-pat, npm-token, pypi-token, huggingface-token, sendgrid-key). These synthetic tokens from rules_test.go were appearing as 447 definite findings across development conversations; the deny-list reduces that to 39 (all pre-existing rules, all expected). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Per the repo convention, design and plan files under docs/superpowers/ are internal scratch and are removed before the feature branch is pushed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces commit-era phrasing with a description of the rule category that stays meaningful after the PR merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

roborev-ci · 2026-05-26T18:09:23Z

roborev: Combined Review (`ff144b6`)

No Medium, High, or Critical findings were reported.

All review agents found the changes clean.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

wesm and others added 20 commits May 25, 2026 22:21

chore: remove internal design/plan scratch before push

3f0b291

Per the repo convention, design and plan files under docs/superpowers/ are internal scratch and are removed before the feature branch is pushed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(secrets): drop ephemeral 'in this PR' from sendgrid comment

ff144b6

Replaces commit-era phrasing with a description of the rule category that stays meaningful after the PR merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(secrets): vendor rules, scan UX, candidate visibility#548

feat(secrets): vendor rules, scan UX, candidate visibility#548
wesm wants to merge 20 commits into
mainfrom
feat/secrets-new-features

wesm commented May 26, 2026

Uh oh!

roborev-ci Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wesm commented May 26, 2026

Summary

Uh oh!

roborev-ci Bot commented May 26, 2026

roborev: Combined Review (ff144b6)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

roborev: Combined Review (`ff144b6`)