Skip to content

Fix binfmt_misc handler loss on peer distro termination (rebase of #14443)#40617

Open
benhillis wants to merge 7 commits into
masterfrom
benhill/binfmt-13885-rebase
Open

Fix binfmt_misc handler loss on peer distro termination (rebase of #14443)#40617
benhillis wants to merge 7 commits into
masterfrom
benhill/binfmt-13885-rebase

Conversation

@benhillis
Copy link
Copy Markdown
Member

Summary

Fixes the binfmt_misc handler loss when peer WSL distributions terminate. Without this fix, running a second systemd-enabled distro and shutting it down clears the kernel's global binfmt_misc table, breaking interop in all surviving distros ("Exec format error" on cmd.exe, mono, etc.).

This is a rebase of @yeelam-gordon's #14443 (221 commits behind master) plus a hardened test. The fix commits are unchanged — full credit to Gordon.

Fixes #13885.

PR Checklist

  • Tests added: UnitTests::BinfmtSurvivesDistroTermination
  • Tests pass locally (bin\x64\Debug\test.bat /name:UnitTests::UnitTests::BinfmtSurvivesDistroTermination)
  • Documentation: N/A
  • Localization: N/A

Detailed Description

Root cause

binfmt_misc entries are keyed by user namespace, not by mount namespace. WSL peer distros clone with CLONE_NEWNS | CLONE_NEWPID | CLONE_NEWUTS | SIGCHLD — no CLONE_NEWUSER — so they share the root user namespace and therefore the single global binfmt entry table.

When a peer distro shuts down, its systemd-binfmt.service ExecStop= runs systemd-binfmt --unregister, which writes -1 to /proc/sys/fs/binfmt_misc/status. The kernel handler for that write (parse_command()) calls clear_entries() — wiping the global WSLInterop entry, not just the peer's view of it.

Fix (Gordon's commits, replayed)

  1. fix(init): prevent binfmt_misc handler loss on distro termination — writes /run/binfmt.d/WSLInterop.conf plus systemd drop-ins that neutralize systemd-binfmt --unregister and re-register WSLInterop after any clear.
  2. fix(init): split binfmt registration macros for WSL1/WSL2 compatibility — adds BINFMT_INTEROP_REGISTRATION_STRING_VM with the F flag (:FP) so the kernel preloads /init and the interop path survives clear_entries()-and-restore races.
  3. fix(init): add ExecStartPost to ensure WSL binfmt handler priority
  4. fix(init): clean up stale binfmt config when protectBinfmt is disabled
  5. fix(init): log error on binfmt config cleanup failure

Test

BinfmtSurvivesDistroTermination (TAEF, WSL2_TEST_METHOD):

  1. Enable systemd on primary test distro, register a peer test distro with systemd, terminate to apply the conf.
  2. Assert cmd.exe /c "echo alive" works in both primary and peer.
  3. wsl --terminate <peer> — triggers the unregister-on-shutdown path that previously wiped the global entry.
  4. Assert cmd.exe /c "echo alive" still works in primary.
  5. Grep /proc/sys/fs/binfmt_misc/WSLInterop for ^flags:.*F to catch a regression to :P-only registration.
  6. Cleanup: wsl --unregister <peer>.

Why this approach over MS_PRIVATE mount propagation

I initially tried marking /proc/sys/fs/binfmt_misc as MS_PRIVATE in peer mount namespaces (#40612, #40614). That approach cannot work because the data lives in the user namespace, not the mount namespace — see CI failures on those PRs. Closed in favor of this PR.

Validation Steps

cmake .
cmake --build . -- -m
bin\x64\Debug\test.bat /name:UnitTests::UnitTests::BinfmtSurvivesDistroTermination

Local run: Total=1, Passed=1, Failed=0.

Supersedes #14443. Closes #40612, closes #40614.

yeelam-gordon and others added 6 commits May 21, 2026 04:15
Fixes the issue where terminating a WSL2 distro with systemd clears
binfmt_misc handlers across all running distros. Changes:
- Add mount unit override to prevent binfmt_misc unmount during shutdown
- Fix FP vs P flag inconsistency in WSLInterop registration
- Use declarative /run/binfmt.d/ approach for handler persistence

Fixes #13885

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce BINFMT_INTEROP_REGISTRATION_STRING_VM with ':FP' flags for
WSL2-only paths (mini_init, systemd generator), while keeping the
original BINFMT_INTEROP_REGISTRATION_STRING with ':P' for WSL1 (lxcore)
which does not support the 'F' (fix-binary) flag.

- binfmt.h: Add _VM variant macro with ':FP', restore base macro to ':P'
- main.cpp: Use _VM macro for mini_init registration (WSL2 only)
- init.cpp: Use _VM macro for /run/binfmt.d/ config (systemd, WSL2 only)
- config.cpp: Unchanged, continues using ':P' for WSL1 registration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The pure declarative approach via /run/binfmt.d/ does not handle conflicting
binfmt configs. When a distro installs its own binfmt handler with the same
name (WSLInterop), systemd-binfmt processes files alphabetically, potentially
letting the conflicting handler win.

Add ExecStartPost to the systemd-binfmt service override that forcefully
unregisters any conflicting handler and re-registers WSL's after
systemd-binfmt's normal ExecStart completes. This combines declarative config
for proper re-registration with imperative override for guaranteed priority.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When protectBinfmt or interopEnabled is turned off, remove the
/run/binfmt.d/WSLInterop.conf file from any previous run to ensure
config changes take effect without requiring a full VM restart.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tolerate ENOENT (file doesn't exist) but log unexpected errors
when removing stale /run/binfmt.d/WSLInterop.conf, matching the
pattern used in timezone.cpp and plan9.cpp.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Validates the core scenario this PR fixes: terminating one WSL2
distro with systemd doesn't break binfmt_misc interop in another
running distro. Imports a second distro, enables systemd on both,
terminates one, and verifies cmd.exe interop + binfmt handler
registration still works in the remaining distro.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 04:43
@benhillis benhillis requested a review from a team as a code owner May 21, 2026 04:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens WSL’s protection of the global binfmt_misc WSLInterop handler so Windows interop continues to function in surviving WSL2 distros when a peer systemd-enabled distro terminates (preventing the “Exec format error” regression described in #13885). It also adds/strengthens TAEF coverage to validate the generated systemd/binfmt configuration and the multi-distro termination scenario.

Changes:

  • Generate /run/binfmt.d/WSLInterop.conf plus systemd generator drop-ins to prevent unregister-on-shutdown and to re-register WSLInterop after clears/restarts.
  • Split binfmt registration strings so VM/WSL2 uses :FP (fix-binary) while retaining a non-F variant for non-VM scenarios.
  • Add a new WSL2 regression test that imports a peer distro, enables systemd, terminates it, and verifies interop + F flag persistence.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
test/windows/UnitTests.cpp Adds validation of generated binfmt/systemd files and introduces the peer-distro termination regression test.
src/linux/init/main.cpp Switches binfmt registration constant to use the VM-specific :FP macro.
src/linux/init/init.cpp Writes binfmt config + systemd drop-ins for protecting/re-registering WSLInterop; cleans up stale config when disabled.
src/linux/init/binfmt.h Introduces VM-specific registration macro (:FP) alongside the existing :P string.

Comment thread src/linux/init/init.cpp
Comment on lines +386 to +411
const auto serviceOverrideContent = std::format(
R"(# Note: This file is generated by WSL to prevent distributions from removing the WSL binfmt entry on shutdown.
# To disable this unit, add the following to /etc/wsl.conf:
# [boot]
# protectBinfmt=false

[Service]
ExecStop=
ExecStart=/bin/sh -c '(echo -1 > {}/{}) ; (echo "{}" > {})' )",
ExecStartPost=/bin/sh -c '(echo -1 > {}/{} 2>/dev/null || true) ; echo "{}" > {}'
)",
BINFMT_MISC_MOUNT_TARGET,
LX_INIT_BINFMT_NAME,
BINFMT_INTEROP_REGISTRATION_STRING(LX_INIT_BINFMT_NAME),
BINFMT_INTEROP_REGISTRATION_STRING_VM(LX_INIT_BINFMT_NAME),
BINFMT_MISC_REGISTER_FILE);
constexpr auto* mountOverrideContent = R"(# Note: This file is generated by WSL to keep binfmt_misc mounted during shutdown.
# To disable this unit, add the following to /etc/wsl.conf:
# [boot]
# protectBinfmt=false

[Unit]
DefaultDependencies=no
Before=umount.target

[Mount]
Options=nosuid,nodev,noexec
)";
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

binfmt config cleared on all distro when distro with systemd is terminated

3 participants