feat(vex): synthesize matches from SBOM for affected VEX statements by xnox · Pull Request #3464 · anchore/grype

xnox · 2026-05-26T14:56:12Z

Summary

Lets VEX affected / under_investigation statements add findings for
packages present in the SBOM but absent from grype's vulnerability
database — the gap that makes govulncheck-style VEX docs round-trip
into grype output today.

OpenVEX: after the existing ignored-match promotion loop, walk
the package catalog and synthesize a match.Match for each statement
whose product (or product+subcomponent) purl names a package. Version
matching is exact, or wildcard when the statement omits a version —
no implicit ranges, matching the OpenVEX spec.
CSAF: same synthesis, with status-aware version semantics from
the spec:
- last_affected → pkg.version <= stmt.version (ceiling)
- first_affected → pkg.version >= stmt.version (floor)
- known_affected / recommended / under_investigation → exact
- fixed / known_not_affected → never synthesize
  Comparisons use grype/version via pkg.VersionFormat, so they are
  ecosystem-aware (semver, deb, rpm, apk, go-module, …). Statement
  qualifiers must be a subset of the package's qualifiers; type,
  namespace and name must match exactly.
The vexProcessorImplementation interface and ApplyVEX now take
[]pkg.Package, plumbed through findVEXMatches.
Synthesis keys on (vuln ID, package purl) and skips pairs already
present in remaining or ignored matches, so the path never duplicates
a DB-backed finding.
Behavior is gated by existing config (vex-add: [affected, under_investigation] + a matching ignore rule), so default scans are
unchanged.

Closes #3145, completes the augment phase from #1365.

Motivation

Stock grype:

$ grype ./step
No vulnerabilities found

govulncheck against the same binary:

$ govulncheck -mode=binary ./step
Vulnerability #1: GO-2026-5030
    Found in: golang.org/x/net@v0.53.0
    Fixed in: golang.org/x/net@v0.55.0
… (and 5 more in x/net)

With this change plus an OpenVEX doc declaring those vulns affected:

$ grype ./step --vex affected.vex.json -c grype.yaml
NAME              INSTALLED  TYPE       VULNERABILITY  SEVERITY
golang.org/x/net  v0.53.0    go-module  GO-2026-5025   Unknown
golang.org/x/net  v0.53.0    go-module  GO-2026-5026   Unknown
… (one row per VEX statement)

The same machinery works for CSAF documents, where last_affected
produces a ceiling match (e.g. SBOM v0.50.0 matches last_affected v0.99.0 but is excluded by last_affected v0.10.0).

Test plan

go test ./grype/vex/... ./grype/ — passes, including the new
unit tests
go build ./... and go vet ./grype/... clean
End-to-end: built grype with these changes, ran against a real
Go binary (step) with both OpenVEX and CSAF documents, confirmed
synthesized findings appear in table and json output formats
Test_UnaffectedFiltering still passes (verified independently;
any local breakage was caused by a stale auto-generated
listing.xxh64 fixture, not by this change)
Coverage on touched packages: grype/vex/csaf 46.3% → 62.8%,
grype/vex/openvex 63.4% → 77.1%; no regression elsewhere

New tests

grype/vex/openvex/implementation_test.go:

TestAugmentMatches_SynthesizesFromPackageCatalog — 7 cases:
affected synthesizes; under_investigation synthesizes; not_affected /
fixed do not; purl mismatch does not; empty catalog does not;
non-matching vulnerability in ignore rule does not.
TestAugmentMatches_DoesNotDuplicateExistingMatches — dedup against
an existing DB-backed match.

grype/vex/csaf/implementation_test.go:

TestPackageMatchesStatement — 16 cases covering ceiling/floor/
exact/wildcard plus name/namespace/type mismatches across go-module
versions.
TestAugmentMatches_SynthesizesFromPackageCatalog — 9 cases covering
each affected-like status (last_affected, first_affected,
known_affected) against lower/equal/higher SBOM versions, plus
fixed and known_not_affected negatives.
TestAugmentMatches_DoesNotDuplicateExistingMatches_CSAF — dedup.

🤖 Generated with Claude Code

Previously, AugmentMatches could only promote a vulnerability that grype's DB had already found and that another rule had filtered into the ignored list. When the DB had no record of a (vulnerability, package) pair, an "affected" VEX statement naming that package was silently ignored, even though the statement is the strongest possible claim that the package is vulnerable. This left a visible gap versus tools like govulncheck, which report findings the grype DB simply does not carry. This change lets VEX `affected` / `under_investigation` statements synthesize a finding directly from the SBOM: * The vexProcessorImplementation interface and ApplyVEX now receive the package catalog, plumbed through findVEXMatches in the vulnerability matcher. * OpenVEX: after the existing ignored-match loop, walk the catalog and add a match for each statement that names a package by purl. Version matching is exact (or wildcard when the statement omits a version), matching the OpenVEX spec — no implicit ranges. * CSAF: same synthesis loop, but with status-aware version semantics that follow the CSAF spec: - last_affected → pkg.version <= stmt.version (ceiling) - first_affected → pkg.version >= stmt.version (floor) - known_affected / recommended / under_investigation → exact - fixed / known_not_affected → never synthesize Comparisons use grype/version with pkg.VersionFormat, so they are ecosystem-aware (semver, deb, rpm, apk, go-module, etc.). Statement qualifiers must be a subset of the package's qualifiers; type, namespace and name must match exactly. Dedup: synthesis keys on (vulnerability ID, package purl) and skips any pair already present in the remaining or ignored match sets, so the new path never duplicates a DB-backed finding. Behavior is gated by the existing VEX configuration: users still need `vex-add: [affected, under_investigation]` plus a matching ignore rule for the synthesized matches to surface, so default scans are unchanged. Tests: * grype/vex/openvex/implementation_test.go covers exact-match synthesis, status filtering, purl mismatch, empty catalog, ignore-rule vulnerability filtering, and dedup against existing matches. * grype/vex/csaf/implementation_test.go adds TestPackageMatchesStatement (16 cases for ceiling/floor/exact/wildcard + identity mismatches), TestAugmentMatches_SynthesizesFromPackageCatalog (9 cases per status against lower/equal/higher SBOM versions), and a dedup test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@surgut.co.uk>

The affected/under_investigation synthesis added in the previous commit walked the whole package catalog for every VEX statement, and the innermost comparison re-parsed PURLs on each iteration (OpenVEX via go-vex PurlMatches -> 2x packageurl.FromString; CSAF via packageMatchesStatement -> 2x packageurl.FromString). The result was an O(statements x packages) loop with O(statements x packages) PURL parses, so cost grew quadratically with catalog size. At ~1000 packages this added over a second of pure VEX work, and it roughly quadrupled each time the catalog doubled. This commit keeps the matching semantics identical but stops re-scanning and re-parsing: * Package PURLs are parsed once and bucketed by (type, namespace, name) identity. PurlMatches / packageMatchesStatement both require those three to be equal, so a statement can only ever match packages sharing that key. Each statement now consults just the relevant bucket(s) instead of the entire catalog, turning the hot path from O(S x P) into roughly O(S + P + matches). * OpenVEX: candidate packages are gathered from the statement's product and subcomponent purls via the index. Image-wide statements (an image/context product with no subcomponents), which by definition match every package, are detected and still fall back to the full catalog so behavior is unchanged. * CSAF: per-advisory product purls are cached so CollectProductIdentificationHelpers (which walks the whole product tree) runs once per product ID instead of once per package, the per-vulnerability status map allocation is replaced with a fixed slice, and packageMatchesStatement is split so the parsed-purl form (packageMatchesParsed) is reused without re-parsing. * existingVulnPackageKeys uses Matches.Enumerate() instead of Sorted() since ordering is irrelevant there. No behavior change: all existing grype/vex unit tests pass unchanged, including the synthesis, status-filtering, dedup, and image-wide cases. Benchmarks ---------- Measured with throwaway benchmarks (one affected statement per package, package-as-product for OpenVEX / known_affected for CSAF), driving AugmentMatches over catalogs of 577/1000/2000 packages: OpenVEX (grype/vex/openvex): pkgs before ns/op after ns/op speedup before allocs after allocs 577 432,312,762 3,870,233 ~112x 3,004,023 18,556 1000 1,310,622,527 7,120,679 ~184x 9,013,321 32,117 2000 5,797,006,526 14,058,239 ~412x 36,026,849 64,164 CSAF (grype/vex/csaf): pkgs before ns/op after ns/op speedup before allocs after allocs 577 388,074,243 6,079,186 ~64x 2,011,000 18,003 1000 1,086,835,592 14,001,203 ~78x 6,023,286 31,147 2000 5,279,739,731 35,716,893 ~148x 24,046,765 62,204 Before, doubling the catalog (1000 -> 2000) multiplied time by ~4.4x (OpenVEX) / ~4.9x (CSAF) -- quadratic. After, it is ~2x / ~2.5x -- linear. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@surgut.co.uk>

xnox · 2026-06-02T11:48:01Z

Pushed a follow-up (2c9f9d52) addressing the performance concerns on the affected/under_investigation synthesis path.

The original synthesis walked the whole package catalog for every VEX statement and re-parsed PURLs on each comparison — O(statements × packages) with as many PURL parses, i.e. quadratic in catalog size. Packages are now parsed once and bucketed by (type, namespace, name) identity (which PurlMatches/packageMatchesStatement already require to be equal), so each statement only looks at the handful of packages that share its identity. CSAF additionally caches per-advisory product purls so the product tree is walked once per product instead of once per package. Semantics are unchanged (image-wide statements still fall back to the full catalog) and all existing grype/vex tests pass unchanged.

Benchmark (one affected statement per package, driving AugmentMatches):

catalog	before	after	speedup
OpenVEX 1000 pkgs	1.31 s	7.1 ms	~184×
OpenVEX 2000 pkgs	5.80 s	14 ms	~412×
CSAF 1000 pkgs	1.09 s	14 ms	~78×
CSAF 2000 pkgs	5.28 s	36 ms	~148×

Scaling is now linear instead of quadratic (doubling the catalog ~doubles the time rather than ~quadrupling it). At ~1000 packages the synthesis step is single-digit-to-low-double-digit milliseconds, so the affected-VEX supplement is now effectively negligible on top of a normal scan (which is dominated by SBOM cataloging and DB matching that take seconds).

kzantow

Sorry a bit of this feedback might be a bit abstract, ping me if you have questions.

I think we can probably move forward with this if we could at least flip the package indexes to vulnerability indexes and avoid passing a package slice to the implementations. It would be great to maybe move some of the duplicated logic into the processor: e.g. add a new interface function for each implementation to return a list of Vulnerability objects (the affected records) instead of passing []pkg.Package into each implementation, the processor handles iterating the packages in a single spot and matching against indexed vulnerabilities. I think this will fit better into a future state. If you take away the augmentation, each vex file is effectively a VulnerabilityProvider and the VEX processor is effectively a Matcher that handles all package types. We already want to merge vulnerabilities, so I think the aforementioned changes will require less refactoring later.

kzantow · 2026-06-22T20:30:24Z

+// only ever match a statement that shares this key. Indexing packages by it
+// lets synthesis compare each statement against the handful of packages with a
+// matching identity instead of the whole catalog.
+func purlIdentityKey(p packageurl.PackageURL) string {


we should probably use comparable types instead of concat'd strings as map keys -- these can definitely add up to noticable time with the scale we have sometimes, e.g.:

type purlKey struct { typ, namespace, name string }

(same comment everywhere we are making strings as map keys)

kzantow · 2026-06-22T20:35:31Z

+		want     bool
+	}{
+		// last_affected: ceiling
+		{"last_affected matches lower pkg version", "pkg:golang/golang.org/x/net@v0.54.0", "pkg:golang/golang.org/x/net@v0.53.0", lastAffected, true},


these would be more clear if they used the property name form, e.g.:

{ name: ... stmtPURL: ... ...

kzantow · 2026-06-23T13:18:52Z

+// synthesisCandidate describes a (vulnerability, package) pair that should be
+// added to grype's results based on a CSAF advisory, when no DB-backed match
+// already exists.
+type synthesisCandidate struct {


it looks like we now have a synthesisCandidate which is converted to an advisoryMatch which is converted to a match.Match... could we avoid the middlemen on these and just directly create Vulnerability, IgnoreRule/IgnoreFilter, and Match objects or similar? We could move the IgnoreFilter indexing to some shared location

kzantow · 2026-06-23T14:21:00Z

+
+// buildPackageIndex parses every package purl once and buckets the packages by
+// their (type, namespace, name) identity.
+func buildPackageIndex(pkgs []pkg.Package) map[string][]indexedPackage {


It looks like both of these implementations have similar buildPackageIndex functions that operate on the full set of packages. I think we should flip this to instead build indexes for VEX rules. It's hard to say which would be a smaller set (definitely VEX rules with no vex files), but we operate a single-package at a time in the matcher world and already have indexes for IgnoreRules and other IgnoreFilters; I see a lot of similarity to matchers here and I think a future refactoring is likely to introduce per-package streaming, which this would be incompatible with.

xnox force-pushed the vex-affected branch from 6d41136 to 2305bb5 Compare May 26, 2026 14:58

kzantow reviewed May 26, 2026

View reviewed changes

Comment thread grype/vex/csaf/csaf.go Outdated

kzantow reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vex): synthesize matches from SBOM for affected VEX statements#3464

feat(vex): synthesize matches from SBOM for affected VEX statements#3464
xnox wants to merge 2 commits into
anchore:mainfrom
xnox:vex-affected

xnox commented May 26, 2026

Uh oh!

Uh oh!

xnox commented Jun 2, 2026

Uh oh!

kzantow left a comment

Uh oh!

kzantow Jun 22, 2026

Uh oh!

kzantow Jun 22, 2026

Uh oh!

kzantow Jun 23, 2026

Uh oh!

kzantow Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xnox commented May 26, 2026

Summary

Motivation

Test plan

New tests

Uh oh!

Uh oh!

xnox commented Jun 2, 2026

Uh oh!

kzantow left a comment

Choose a reason for hiding this comment

Uh oh!

kzantow Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

kzantow Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

kzantow Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

kzantow Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants