Skip to content

Hardening + honesty pass: logger/FORTIFY/fuzzing, LSM-hook BTF detection fix, opt-in signal-fallback#179

Merged
ErenAri merged 6 commits into
mainfrom
chore/hardening-phase0-1
May 29, 2026
Merged

Hardening + honesty pass: logger/FORTIFY/fuzzing, LSM-hook BTF detection fix, opt-in signal-fallback#179
ErenAri merged 6 commits into
mainfrom
chore/hardening-phase0-1

Conversation

@ErenAri

@ErenAri ErenAri commented May 29, 2026

Copy link
Copy Markdown
Owner

Summary

Hardening + honesty pass (Phase 0/1) plus a genuine enforcement bug fix and an
opt-in Tier-3 enforcement path. Five focused commits; verified on Linux 6.17
(BPF-LSM enabled). Full suite 292/293 — the one "failure" is the deliberate
90-day vendored-crypto review staleness gate (vendored_dependency_contract),
which is a governance control firing as designed, not a code defect.

Commits

  1. fix(logging,policy): const char log field + correct policy-gen guard; align claims*

    • Adds a field(const std::string&, const char*) overload. Without it,
      C-string field values bound to the bool overload and rendered as true
      (e.g. {program=true}) in text logs/SIEM. Adds tests/test_logging.cpp.
    • Moves the first bump_policy_generation() into the direct-apply branch (it
      was the only atomic guard for that path, not dead code) and removes an
      unnecessary audit-downgrade window during shadow population.
    • README/GUARANTEES: module-load blocking requires active kernel lockdown;
      OverlayFS copy-up is detection + async re-propagation (ENFORCED→AUDITED).
      Adds docs/MEMORY_SAFETY.md.
  2. build: strengthen userspace hardening_FORTIFY_SOURCE=3,
    -fstack-clash-protection, -D_GLIBCXX_ASSERTIONS, each probed via
    check_<lang>_compiler_flag so the GCC/Clang × x86_64/arm64 matrix degrades
    gracefully. readelf confirms PIE, full RELRO, canary, FORTIFY _chk symbols.

  3. ci(fuzz): ClusterFuzzLite integration — free, in-repo continuous fuzzing of
    the userspace input-parsing surface using the existing libFuzzer harnesses.
    The ENABLE_FUZZING block now honors $LIB_FUZZING_ENGINE (OSS-Fuzz/CFLite)
    and falls back to local -fsanitize=fuzzer,address; the five harness defs are
    deduped into a loop. All 5 fuzzers build locally; fuzz_policy runs clean.

  4. fix(bpf): detect optional LSM hooks by their bpf_lsm_-prefixed BTF symbol
    ⚠️ Real enforcement bug. detect_missing_optional_lsm_hooks() queried the
    bare hook name (socket_connect) as a BTF FUNC. LSM attach points are
    bpf_lsm_socket_connect — the bare name never exists — so every optional LSM
    hook (network, bprm, mmap) was marked "missing" and disabled, silently
    downgrading network/exec/mmap enforcement; a fail-closed network policy made
    the daemon exit at the enforce gate. (File deny survived — those hooks are
    required, not optional.) Before: run --enforce with a network deny policy
    logged Disabling optional LSM program ... socket_connect and fail-closed.
    After: socket_connect attaches, daemon stays in ENFORCE, a connect() to a
    denied IP returns -EPERM (net_connect_block action=BLOCK), allowed IPs
    unaffected. This bug was surfaced by the logger fix in commit 1.

  5. feat(net): opt-in signal-fallback connect enforcement for non-BPF-LSM kernels

    • --enforce-fallback=signal (default off): a sys_enter_connect tracepoint
      matches the existing network deny maps and, in enforce mode, kills the
      offending process via bpf_send_signal().
    • Config flag reuses one reserved byte of agent_config (size/offsets
      unchanged; static_assert(sizeof==48) holds). Threaded via a defaulted
      daemon_run param (no churn to other callers).
    • Verified on 6.17: the program passes the verifier and attaches; a connect to
      a denied IP with --enforce-signal=none is killed by SIGKILL
      (net_connect_block action=KILL, protocol=0 — distinct from the LSM hook's
      action=BLOCK), allowed IPs unaffected.
    • Caveat (documented in GUARANTEES.md): on genuinely LSM-absent hosts the
      enforce-gate still treats the missing LSM hook as a degradation; teaching the
      gate to accept signal-fallback as primary enforcement there is a follow-up.
      Today it's verified as opt-in defense-in-depth alongside LSM enforcement.

Verification

  • Fresh build clean under -Werror; full ctest 292/293 (1 = crypto-review gate).
  • BPF object loads + attaches all programs on kernel 6.17; end-to-end connect
    block + signal-fallback kill confirmed against a TEST-NET-2 IP (198.51.100.7).
  • Go operator unaffected (not touched).

Not included

  • operator/go.mod / operator/go.sum were already modified before this work
    (pre-existing go mod tidy) and are intentionally excluded.

🤖 Generated with Claude Code

ErenAri and others added 5 commits May 29, 2026 22:27
…; align claims

Logging: add a `field(const std::string&, const char*)` overload. Without it,
C-string/string-literal field values bound to the bool overload (const char*->bool
is a standard conversion that outranks the user-defined const char*->std::string),
silently rendering every C-string field as "true" in text logs and SIEM output.
Adds tests/test_logging.cpp as a regression guard.

Policy: the first bump_policy_generation() was not dead code — it was the only
atomic guard for the non-shadow direct-apply path. Move it into the direct-apply
branch so each path bumps exactly once immediately before mutating live maps, and
the shadow path no longer forces an unnecessary audit-downgrade window during
shadow population.

Docs: align enforcement claims with mechanisms — module-load blocking requires
active kernel lockdown; OverlayFS copy-up is detection + asynchronous userspace
re-propagation (relabeled ENFORCED->AUDITED, race window documented in
GUARANTEES.md). Add docs/MEMORY_SAFETY.md documenting the C++ userspace posture.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, _GLIBCXX_ASSERTIONS)

Probe each flag with check_<lang>_compiler_flag so the cross-compiler/cross-arch
matrix degrades gracefully:
- _FORTIFY_SOURCE: prefer level 3 (GCC 12+/Clang 9+ w/ glibc >= 2.34), fall back to 2
- -fstack-clash-protection where supported
- -D_GLIBCXX_ASSERTIONS (C++): bounds/precondition checks in std:: containers

Verified on GCC 13 / glibc 2.39: all three probes pass, full suite stays green
(291/292; the 1 is the intentional 90-day vendored-crypto review gate), and the
linked binary carries PIE, full RELRO+BIND_NOW, stack canary, and FORTIFY _chk
symbols. The existing _FORTIFY_SOURCE / fstack-protector tokens remain in
CMakeLists so verify_trustworthiness.sh continues to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds free, in-repo continuous fuzzing of the userspace input-parsing surface
(policy parser, signed-bundle parser, network rules, path validation, event
decoder) using the existing libFuzzer harnesses:

- .clusterfuzzlite/{Dockerfile,build.sh,project.yaml}: standard OSS-Fuzz build
  contract; SKIP_BPF_BUILD=ON since fuzzers target userspace; ships checked-in
  seed corpora as <fuzzer>_seed_corpus.zip.
- .github/workflows/cflite-pr.yml: diff-based (code-change) PR fuzzing for
  address + undefined sanitizers; injection-safe (no run: steps, no untrusted
  event input).
- CMakeLists.txt: the ENABLE_FUZZING block now honors $LIB_FUZZING_ENGINE so
  OSS-Fuzz/CFLite supply the engine+sanitizer, falling back to local
  -fsanitize=fuzzer,address; deduped the five harness definitions into a loop.
  Hardening flags are skipped under ENABLE_FUZZING so they don't interfere with
  sanitizer instrumentation.

Verified locally: all 5 fuzzers build with clang; fuzz_policy runs clean on its
seed corpus. Normal build still hardened; full suite 291/292 (the 1 is the
intentional 90-day vendored-crypto review gate). The CFLite container path is
exercised by the new workflow when this PR opens.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mbol

detect_missing_optional_lsm_hooks() looked up the BARE hook name (e.g.
"socket_connect", "bprm_check_security") as a BTF_KIND_FUNC. BPF-LSM attach
points are exposed in vmlinux BTF as `bpf_lsm_<hook>` (e.g.
`bpf_lsm_socket_connect`) — the bare name never exists — so the lookup always
failed and every optional LSM hook (socket_connect/bind/listen/accept/sendmsg,
bprm_check_security, file_mmap) was marked "missing" and disabled via
set_autoload(false). The result: network deny, exec-identity, and mmap
enforcement were silently downgraded, and a fail-closed network policy made the
daemon exit at the enforce gate. (File deny survived because file_open /
inode_permission are required hooks, not in the optional set.)

This matches the symbol the `probe` command and the kernel already use.

Verified on Linux 6.17 (bpf-lsm enabled): before, `run --enforce` with a
network deny policy logged "Disabling optional LSM program ... socket_connect"
and exited fail-closed; after, socket_connect attaches, the daemon stays in
ENFORCE, a connect() to a denied IP returns -EPERM (net_connect_block,
action=BLOCK), and a connect to an allowed IP is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… kernels

Adds a Tier-3 enforcement path for hosts where BPF-LSM is unavailable and
connect() cannot be denied with -EPERM. Opt-in via `--enforce-fallback=signal`
(default off): a sys_enter_connect tracepoint matches the existing network deny
maps and, in enforce mode, terminates the offending process with
bpf_send_signal() (SIGKILL by default; honors SIGKILL escalation).

- bpf: new handle_tp_connect tracepoint; reuses the deny_ipv4/ipv6/cidr/port/
  ip_port + cgroup match helpers. Protocol-agnostic (socket not resolvable at
  syscall entry). Inert unless agent_cfg.signal_fallback_enforce is set.
- config: signal_fallback_enforce reuses one reserved byte of agent_config —
  size/offsets unchanged (static_assert(sizeof==48) still holds), mirrored in
  userspace types.hpp.
- userspace: --enforce-fallback=signal|off threaded via a defaulted daemon_run
  param (no churn to other callers); tracepoint attached as optional with its
  link tracked in state.links (no BpfState field added).

Verified on Linux 6.17: handle_tp_connect passes the verifier and attaches; a
connect() to a denied IP with --enforce-signal=none is killed by SIGKILL
(net_connect_block action=KILL, protocol=0 from this tracepoint, distinct from
the LSM hook's action=BLOCK), while an allowed IP is unaffected. Full suite
292/293 (the 1 is the 90-day vendored-crypto review gate).

Caveat (documented in GUARANTEES.md): on genuinely LSM-absent hosts the
enforce-gate still treats the missing LSM connect hook as a degradation; wiring
the gate to accept signal-fallback as primary enforcement there is a follow-up.
Today this is verified as opt-in defense-in-depth alongside LSM enforcement.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 29, 2026 20:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens userspace build/logging/fuzzing posture, corrects optional BPF-LSM hook detection, and adds an opt-in tracepoint signal fallback for network connect enforcement.

Changes:

  • Adds logger const char* field handling with regression tests and adjusts policy generation bump timing.
  • Strengthens compiler hardening and introduces ClusterFuzzLite PR fuzzing infrastructure.
  • Updates BPF/daemon/CLI/config plumbing for optional signal fallback and revises enforcement documentation.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_logging.cpp Adds structured logger overload regression tests.
src/logging.hpp Adds const char* field overload.
src/policy_runtime.cpp Moves generation bump to the direct-apply mutation window.
src/types.hpp Adds signal_fallback_enforce to userspace agent config layout.
bpf/aegis_common.h Mirrors the new BPF agent config field/default.
src/daemon.hpp Extends daemon_run with fallback flag.
src/daemon.cpp Writes fallback enablement into agent config.
src/cli_run.cpp Parses --enforce-fallback.
src/bpf_ops.cpp Changes optional LSM BTF detection to prefixed symbols.
src/bpf_attach.cpp Attaches the signal-fallback tracepoint program.
bpf/aegis_net.bpf.h Adds sys_enter_connect signal-fallback enforcement logic.
CMakeLists.txt Adds hardening probes, logging test, CLI test, and fuzz target loop.
.github/workflows/cflite-pr.yml Adds ClusterFuzzLite PR workflow.
.clusterfuzzlite/project.yaml Defines CFLite project metadata.
.clusterfuzzlite/Dockerfile Defines CFLite/OSS-Fuzz build image.
.clusterfuzzlite/build.sh Builds and exports fuzzers for CFLite.
README.md Aligns enforcement and hardening claims.
docs/GUARANTEES.md Documents async OverlayFS and signal-fallback guarantees.
docs/MEMORY_SAFETY.md Adds memory-safety posture documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bpf_ops.cpp Outdated
Comment thread .github/workflows/cflite-pr.yml
Comment thread bpf/aegis_net.bpf.h
Resolve conflicts from 147 upstream commits:
- src/bpf_ops.cpp: upstream independently fixed the same bare-vs-bpf_lsm_ BTF
  symbol bug via a kOptionalLsmHooks{hook_name, btf_symbol} table; took
  upstream's version (more precise — e.g. file_mmap -> bpf_lsm_mmap_file) and
  dropped my now-redundant fix.
- src/daemon.cpp + src/cli_run.cpp: combined upstream's enable_cap_drop param
  and event-dedup wiring with this branch's trailing enforce_fallback_signal
  daemon_run param (matches the auto-merged daemon.hpp signature).

Verified: build clean, full suite 386/386 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ErenAri ErenAri merged commit 8e026ce into main May 29, 2026
39 of 45 checks passed
ErenAri added a commit that referenced this pull request May 30, 2026
…mpiles

The ClusterFuzzLite PR checks (added in #179) failed building aegisbpf_lib in the
OSS-Fuzz base image:

    /usr/include/bpf/bpf.h:252:6: error: ISO C++ forbids forward references to
    'enum' types

The base image's system libbpf-dev is too old; its <bpf/bpf.h> forward-declares
an enum, which clang++ rejects under -std=gnu++20. (Local builds don't hit this
because the host libbpf is newer.)

Fix: pre-fetch libbpf 1.6.1 source in the Dockerfile (network is available during
image build) and have build.sh build + statically link it via -DSTATIC_LIBBPF=ON
+ -DFETCHCONTENT_SOURCE_DIR_LIBBPF_SRC=/opt/libbpf-src. This gives the fuzzers a
modern, C++-clean, self-contained libbpf (no libbpf.so runtime dep for
check_build) and needs no network during the compile step.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants