feat(vex): synthesize matches from SBOM for affected VEX statements#3464
feat(vex): synthesize matches from SBOM for affected VEX statements#3464xnox wants to merge 2 commits into
Conversation
Previously, AugmentMatches could only promote a vulnerability that grype's
DB had already found and that another rule had filtered into the ignored
list. When the DB had no record of a (vulnerability, package) pair, an
"affected" VEX statement naming that package was silently ignored, even
though the statement is the strongest possible claim that the package is
vulnerable. This left a visible gap versus tools like govulncheck, which
report findings the grype DB simply does not carry.
This change lets VEX `affected` / `under_investigation` statements
synthesize a finding directly from the SBOM:
* The vexProcessorImplementation interface and ApplyVEX now receive the
package catalog, plumbed through findVEXMatches in the vulnerability
matcher.
* OpenVEX: after the existing ignored-match loop, walk the catalog and
add a match for each statement that names a package by purl. Version
matching is exact (or wildcard when the statement omits a version),
matching the OpenVEX spec — no implicit ranges.
* CSAF: same synthesis loop, but with status-aware version semantics
that follow the CSAF spec:
- last_affected → pkg.version <= stmt.version (ceiling)
- first_affected → pkg.version >= stmt.version (floor)
- known_affected / recommended / under_investigation → exact
- fixed / known_not_affected → never synthesize
Comparisons use grype/version with pkg.VersionFormat, so they are
ecosystem-aware (semver, deb, rpm, apk, go-module, etc.). Statement
qualifiers must be a subset of the package's qualifiers; type,
namespace and name must match exactly.
Dedup: synthesis keys on (vulnerability ID, package purl) and skips any
pair already present in the remaining or ignored match sets, so the new
path never duplicates a DB-backed finding.
Behavior is gated by the existing VEX configuration: users still need
`vex-add: [affected, under_investigation]` plus a matching ignore rule
for the synthesized matches to surface, so default scans are unchanged.
Tests:
* grype/vex/openvex/implementation_test.go covers exact-match
synthesis, status filtering, purl mismatch, empty catalog,
ignore-rule vulnerability filtering, and dedup against existing
matches.
* grype/vex/csaf/implementation_test.go adds TestPackageMatchesStatement
(16 cases for ceiling/floor/exact/wildcard + identity mismatches),
TestAugmentMatches_SynthesizesFromPackageCatalog (9 cases per status
against lower/equal/higher SBOM versions), and a dedup test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@surgut.co.uk>
The affected/under_investigation synthesis added in the previous commit
walked the whole package catalog for every VEX statement, and the
innermost comparison re-parsed PURLs on each iteration (OpenVEX via
go-vex PurlMatches -> 2x packageurl.FromString; CSAF via
packageMatchesStatement -> 2x packageurl.FromString). The result was an
O(statements x packages) loop with O(statements x packages) PURL parses,
so cost grew quadratically with catalog size. At ~1000 packages this
added over a second of pure VEX work, and it roughly quadrupled each time
the catalog doubled.
This commit keeps the matching semantics identical but stops re-scanning
and re-parsing:
* Package PURLs are parsed once and bucketed by (type, namespace, name)
identity. PurlMatches / packageMatchesStatement both require those
three to be equal, so a statement can only ever match packages sharing
that key. Each statement now consults just the relevant bucket(s)
instead of the entire catalog, turning the hot path from
O(S x P) into roughly O(S + P + matches).
* OpenVEX: candidate packages are gathered from the statement's product
and subcomponent purls via the index. Image-wide statements (an
image/context product with no subcomponents), which by definition
match every package, are detected and still fall back to the full
catalog so behavior is unchanged.
* CSAF: per-advisory product purls are cached so
CollectProductIdentificationHelpers (which walks the whole product
tree) runs once per product ID instead of once per package, the
per-vulnerability status map allocation is replaced with a fixed
slice, and packageMatchesStatement is split so the parsed-purl form
(packageMatchesParsed) is reused without re-parsing.
* existingVulnPackageKeys uses Matches.Enumerate() instead of Sorted()
since ordering is irrelevant there.
No behavior change: all existing grype/vex unit tests pass unchanged,
including the synthesis, status-filtering, dedup, and image-wide cases.
Benchmarks
----------
Measured with throwaway benchmarks (one affected statement per package,
package-as-product for OpenVEX / known_affected for CSAF), driving
AugmentMatches over catalogs of 577/1000/2000 packages:
OpenVEX (grype/vex/openvex):
pkgs before ns/op after ns/op speedup before allocs after allocs
577 432,312,762 3,870,233 ~112x 3,004,023 18,556
1000 1,310,622,527 7,120,679 ~184x 9,013,321 32,117
2000 5,797,006,526 14,058,239 ~412x 36,026,849 64,164
CSAF (grype/vex/csaf):
pkgs before ns/op after ns/op speedup before allocs after allocs
577 388,074,243 6,079,186 ~64x 2,011,000 18,003
1000 1,086,835,592 14,001,203 ~78x 6,023,286 31,147
2000 5,279,739,731 35,716,893 ~148x 24,046,765 62,204
Before, doubling the catalog (1000 -> 2000) multiplied time by ~4.4x
(OpenVEX) / ~4.9x (CSAF) -- quadratic. After, it is ~2x / ~2.5x -- linear.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@surgut.co.uk>
|
Pushed a follow-up ( The original synthesis walked the whole package catalog for every VEX statement and re-parsed PURLs on each comparison — Benchmark (one affected statement per package, driving
Scaling is now linear instead of quadratic (doubling the catalog ~doubles the time rather than ~quadrupling it). At ~1000 packages the synthesis step is single-digit-to-low-double-digit milliseconds, so the affected-VEX supplement is now effectively negligible on top of a normal scan (which is dominated by SBOM cataloging and DB matching that take seconds). |
kzantow
left a comment
There was a problem hiding this comment.
Sorry a bit of this feedback might be a bit abstract, ping me if you have questions.
I think we can probably move forward with this if we could at least flip the package indexes to vulnerability indexes and avoid passing a package slice to the implementations. It would be great to maybe move some of the duplicated logic into the processor: e.g. add a new interface function for each implementation to return a list of Vulnerability objects (the affected records) instead of passing []pkg.Package into each implementation, the processor handles iterating the packages in a single spot and matching against indexed vulnerabilities. I think this will fit better into a future state. If you take away the augmentation, each vex file is effectively a VulnerabilityProvider and the VEX processor is effectively a Matcher that handles all package types. We already want to merge vulnerabilities, so I think the aforementioned changes will require less refactoring later.
| // only ever match a statement that shares this key. Indexing packages by it | ||
| // lets synthesis compare each statement against the handful of packages with a | ||
| // matching identity instead of the whole catalog. | ||
| func purlIdentityKey(p packageurl.PackageURL) string { |
There was a problem hiding this comment.
we should probably use comparable types instead of concat'd strings as map keys -- these can definitely add up to noticable time with the scale we have sometimes, e.g.:
type purlKey struct {
typ, namespace, name string
}
(same comment everywhere we are making strings as map keys)
| want bool | ||
| }{ | ||
| // last_affected: ceiling | ||
| {"last_affected matches lower pkg version", "pkg:golang/golang.org/x/net@v0.54.0", "pkg:golang/golang.org/x/net@v0.53.0", lastAffected, true}, |
There was a problem hiding this comment.
these would be more clear if they used the property name form, e.g.:
{
name: ...
stmtPURL: ...
...
| // synthesisCandidate describes a (vulnerability, package) pair that should be | ||
| // added to grype's results based on a CSAF advisory, when no DB-backed match | ||
| // already exists. | ||
| type synthesisCandidate struct { |
There was a problem hiding this comment.
it looks like we now have a synthesisCandidate which is converted to an advisoryMatch which is converted to a match.Match... could we avoid the middlemen on these and just directly create Vulnerability, IgnoreRule/IgnoreFilter, and Match objects or similar? We could move the IgnoreFilter indexing to some shared location
|
|
||
| // buildPackageIndex parses every package purl once and buckets the packages by | ||
| // their (type, namespace, name) identity. | ||
| func buildPackageIndex(pkgs []pkg.Package) map[string][]indexedPackage { |
There was a problem hiding this comment.
It looks like both of these implementations have similar buildPackageIndex functions that operate on the full set of packages. I think we should flip this to instead build indexes for VEX rules. It's hard to say which would be a smaller set (definitely VEX rules with no vex files), but we operate a single-package at a time in the matcher world and already have indexes for IgnoreRules and other IgnoreFilters; I see a lot of similarity to matchers here and I think a future refactoring is likely to introduce per-package streaming, which this would be incompatible with.
Summary
Lets VEX
affected/under_investigationstatements add findings forpackages present in the SBOM but absent from grype's vulnerability
database — the gap that makes govulncheck-style VEX docs round-trip
into grype output today.
the package catalog and synthesize a
match.Matchfor each statementwhose product (or product+subcomponent) purl names a package. Version
matching is exact, or wildcard when the statement omits a version —
no implicit ranges, matching the OpenVEX spec.
the spec:
last_affected→ pkg.version<=stmt.version (ceiling)first_affected→ pkg.version>=stmt.version (floor)known_affected/recommended/under_investigation→ exactfixed/known_not_affected→ never synthesizeComparisons use
grype/versionviapkg.VersionFormat, so they areecosystem-aware (semver, deb, rpm, apk, go-module, …). Statement
qualifiers must be a subset of the package's qualifiers; type,
namespace and name must match exactly.
vexProcessorImplementationinterface andApplyVEXnow take[]pkg.Package, plumbed throughfindVEXMatches.(vuln ID, package purl)and skips pairs alreadypresent in remaining or ignored matches, so the path never duplicates
a DB-backed finding.
vex-add: [affected, under_investigation]+ a matching ignore rule), so default scans areunchanged.
Closes #3145, completes the augment phase from #1365.
Motivation
Stock grype:
govulncheck against the same binary:
With this change plus an OpenVEX doc declaring those vulns
affected:The same machinery works for CSAF documents, where
last_affectedproduces a ceiling match (e.g. SBOM
v0.50.0matcheslast_affected v0.99.0but is excluded bylast_affected v0.10.0).Test plan
go test ./grype/vex/... ./grype/— passes, including the newunit tests
go build ./...andgo vet ./grype/...cleanGo binary (
step) with both OpenVEX and CSAF documents, confirmedsynthesized findings appear in
tableandjsonoutput formatsTest_UnaffectedFilteringstill passes (verified independently;any local breakage was caused by a stale auto-generated
listing.xxh64fixture, not by this change)grype/vex/csaf46.3% → 62.8%,grype/vex/openvex63.4% → 77.1%; no regression elsewhereNew tests
grype/vex/openvex/implementation_test.go:TestAugmentMatches_SynthesizesFromPackageCatalog— 7 cases:affected synthesizes; under_investigation synthesizes; not_affected /
fixed do not; purl mismatch does not; empty catalog does not;
non-matching vulnerability in ignore rule does not.
TestAugmentMatches_DoesNotDuplicateExistingMatches— dedup againstan existing DB-backed match.
grype/vex/csaf/implementation_test.go:TestPackageMatchesStatement— 16 cases covering ceiling/floor/exact/wildcard plus name/namespace/type mismatches across go-module
versions.
TestAugmentMatches_SynthesizesFromPackageCatalog— 9 cases coveringeach affected-like status (
last_affected,first_affected,known_affected) against lower/equal/higher SBOM versions, plusfixedandknown_not_affectednegatives.TestAugmentMatches_DoesNotDuplicateExistingMatches_CSAF— dedup.🤖 Generated with Claude Code