Skip to content

fix(wireplumber): reconnect when PipeWire/WirePlumber restarts (#2882)#5168

Merged
Alexays merged 3 commits into
masterfrom
fix-2882
Jul 5, 2026
Merged

fix(wireplumber): reconnect when PipeWire/WirePlumber restarts (#2882)#5168
Alexays merged 3 commits into
masterfrom
fix-2882

Conversation

@Alexays

@Alexays Alexays commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Please review carefully — this refactors the wireplumber module's connection lifecycle, and I couldn't runtime-test it.

wireplumber: no reconnect when PipeWire/WirePlumber restarts (#2882). The module's connection was one-shot: the constructor called wp_core_connect() once, connected only to the object manager's installed signal, and had no WpCore disconnected handler or connect-failure recovery. So when PipeWire/WirePlumber restarted or crashed, the core silently disconnected, om_/mixer_api_/def_nodes_api_ went dead, and the module stayed stale/blank until Waybar was restarted.

This splits the one-shot init into a reusable setupConnection() / teardownConnection() pair (used by both the constructor and reconnect), connects the WpCore disconnected signal, and adds a bounded Glib::signal_timeout (2s, no busy loop) that tears down the dead connection and rebuilds it, retrying until PipeWire is back and re-arming on a later disconnect. It preserves the recent async-callback use-after-free guard (isModuleAlive()/registry) and cancels the reconnect timer + tears down cleanly in the destructor; it also clears the cached default node-name out-params on teardown to avoid a per-reconnect leak. handleScroll already no-ops while mixer_api_ is null, so the reconnect window is safe.

Previously the wireplumber module connected to PipeWire once in its
constructor and had no handling for the connection being lost. When
PipeWire or the wireplumber service restarted (or crashed), the module
went stale/blank and never recovered until Waybar itself was restarted.

Connect to the WpCore "disconnected" signal and, on disconnect, schedule
a bounded main-loop retry (Glib::signal_timeout) that tears down the now
invalid core/object-manager/mixer-api references and rebuilds the whole
connection from scratch, re-running the async API and object-manager
setup. Connection setup/teardown is factored into setupConnection() and
teardownConnection() so startup and reconnect share one code path.

The reconnect timer is cancelled in the destructor and the existing
isModuleAlive() registry guard still protects in-flight async callbacks,
so teardown during a pending reconnect stays safe.

Fixes #2882.
spdlog::error("[{}]: Could not connect to PipeWire: '{}'", name_, type_);
throw std::runtime_error("Could not connect to PipeWire\n");
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): onMixerApiLoaded (unchanged by this diff, but now reachable repeatedly) ends with:

self->event_box_.add_events(Gdk::SCROLL_MASK | Gdk::SMOOTH_SCROLL_MASK);
self->event_box_.signal_scroll_event().connect(sigc::mem_fun(*self, &Wireplumber::handleScroll));

Before this PR that ran once, from the constructor. Now setupConnection() re-runs the same async chain (asyncLoadRequiredApiModules → ... → onMixerApiLoaded) on every successful reconnect, and sigc::signal::connect doesn't dedupe — event_box_ isn't recreated by setupConnection()/teardownConnection(), so a duplicate scroll handler is added per reconnect.

Behavior stays correct: GTK's scroll-event uses the boolean handled accumulator, so emission stops at the first handler returning TRUE and the duplicates never run — they just accumulate as dead connections (plus a redundant add_events call) for the life of the module.

Since this wiring only needs to happen once per module instance, move it here and delete it from onMixerApiLoaded (linked lines above). handleScroll already no-ops safely while mixer_api_ is null, so wiring it before the first successful connect is safe.

Suggested change
}
event_box_.add_events(Gdk::SCROLL_MASK | Gdk::SMOOTH_SCROLL_MASK);
event_box_.signal_scroll_event().connect(sigc::mem_fun(*this, &Wireplumber::handleScroll));
}

Comment on lines +146 to +156
bool waybar::modules::Wireplumber::onReconnectTimeout() {
teardownConnection();
spdlog::info("[{}]: attempting to reconnect to PipeWire...", name_);
if (setupConnection()) {
spdlog::info("[{}]: reconnected to PipeWire", name_);
return false;
}
teardownConnection();
spdlog::debug("[{}]: reconnect failed; retrying in {} ms", name_, kReconnectIntervalMs);
return true;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): stale async completions from a superseded connection can corrupt the new generation's state

onReconnectTimeout rebuilds wp_core_/om_/apis_/pending_plugins_ in place on the same self, but wp_core_load_component() / wp_object_activate() are called with no GCancellable, and isModuleAlive() only checks that the C++ object still exists — not which connection generation a callback belongs to. If PipeWire drops again while the async chain (onDefaultNodesApiLoadedonMixerApiLoadedonPluginActivated) is still in flight — likely with a crash-looping service, the exact case this PR targets — the stale callback fires against the rebuilt connection's fields:

if (--self->pending_plugins_ == 0) {
wp_core_install_object_manager(self->wp_core_, self->om_);
}

A stray decrement corrupts the new generation's pending_plugins_ (never reaches 0, or hits it early), and wp_core_install_object_manager can run on the new om_/wp_core_ out of sequence — reproducing the stale/blank symptom of #2882 via a second disconnect during the reconnect window.

Fix: tag each generation (epoch counter bumped in setupConnection(), captured and compared in the callbacks), or pass a per-generation GCancellable cancelled in teardownConnection(), so completions from a torn-down generation are no-ops.

…e scroll once

Addresses review on #5168:

- Generational aliasing (blocking): setupConnection()/onReconnectTimeout()
  rebuild wp_core_/om_/pending_plugins_ in place on the same self, but the
  async load/activate callbacks carried no generation, and isModuleAlive()
  only proves self still exists. If PipeWire dropped again while a previous
  connection's async chain was still in flight, a stale completion would run
  against the rebuilt connection (a stray --pending_plugins_, an out-of-order
  install_object_manager), re-creating #2882's stale/blank state. Each async
  call now carries an AsyncCall{self, generation}; connection_generation_ is
  bumped in setupConnection(), and every callback drops out when its
  generation no longer matches (checked after isModuleAlive short-circuits).

- Duplicate scroll handlers: onMixerApiLoaded re-runs on every reconnect and
  connected a new scroll handler each time (dead but accumulating). Moved the
  one-time wiring to the constructor; handleScroll no-ops while mixer_api_ is
  null, so wiring it before the first connect is safe.
@Alexays

Alexays commented Jul 5, 2026

Copy link
Copy Markdown
Owner Author

Thanks for the thorough review — both are addressed in the latest push.

Generational aliasing (blocking): you're right that isModuleAlive() only proves self still exists, not that a completion belongs to the current connection. Each async call now carries an AsyncCall{self, generation} as its user_data; connection_generation_ is bumped in setupConnection(), and every one of the three async callbacks (onDefaultNodesApiLoaded, onMixerApiLoaded, onPluginActivated) drops out when its captured generation no longer matches — checked after the isModuleAlive short-circuit so a freed self is never dereferenced. A stale completion from a torn-down connection can no longer touch the rebuilt one's pending_plugins_/om_, so the second-disconnect-during-reconnect window you described is a no-op now.

Duplicate scroll handler (non-blocking): moved the add_events/signal_scroll_event().connect to the constructor and removed it from onMixerApiLoaded, since handleScroll already no-ops while mixer_api_ is null.

…OB write)

g_variant_lookup with the "b" format writes a gboolean (gint, 4 bytes),
but muted_ and source_muted_ are C++ bool members (1 byte). Passing their
addresses caused a 3-byte out-of-bounds write past the member (undefined
behavior). Read into a gboolean temporary and assign back to the bool,
preserving the prior value when "mute" is absent.
@Alexays Alexays merged commit 6fc2304 into master Jul 5, 2026
17 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants