feat: lazy-start picamera2 encoders and camera to save idle CPU#146
Conversation
📝 WalkthroughWalkthroughAdds CameraSession and LazyEncoder reference-counted wrappers, integrates them in CSI to defer encoder startup, updates JPEG streaming/snapshot paths to acquire/release MJPEG encoders, and coordinates H264 encoder acquire/release around WebRTC peer lifecycle and error paths. ChangesLazy Encoder Lifecycle Management
Sequence Diagram(s)No additional diagrams beyond those in the review stack artifact. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@spyglass/camera/lazy_encoder.py`:
- Around line 76-85: In release(), handle exceptions from
self._picam2.stop_encoder so that if _refs reaches zero you always clear
self._encoder and release self._session even when stop_encoder fails: wrap the
stop call in try/except/finally (or try/finally) around
self._picam2.stop_encoder(self._encoder) to ensure self._encoder = None and, if
self._session is not None, self._session.release() still run; keep the lock
around ref count manipulation but ensure exceptions from stop_encoder are
caught/handled so they don't prevent session release and future lazy starts.
- Around line 59-74: The acquire() method currently increments self._refs and
calls self._session.acquire() before the try, so if _encoder_factory() or
self._picam2.start_encoder(...) raises the ref stays incremented and the session
remains acquired; modify acquire() so that any exception after acquiring the
session also rolls back both self._refs and the session: ensure the except block
not only sets self._encoder = None and decrements self._refs, but also calls
self._session.release() when self._session is not None (or move the
self._session.acquire() into the try so the same rollback logic applies);
reference symbols: LazyEncoder.acquire, self._refs,
self._session.acquire/release, self._encoder_factory,
self._picam2.start_encoder, and self._encoder.
In `@spyglass/server/webrtc_whep.py`:
- Around line 80-121: The peer and encoder cleanup can leak because
RTCPeerConnection pc is placed in pcs before h264_encoder.acquire() and SDP
header/write steps occur outside the protected block; move the
h264_encoder.acquire() call into the main try block before inserting
pcs[str(secret)] = pc, perform the SDP construction and handler header/wfile
writes (send_default_headers, handler.send_header(...), handler.end_headers(),
handler.wfile.write(...)) inside that same try after setting
pc.localDescription, and keep the existing except block that calls
pcs.pop(str(secret), None), _release_encoder_once(), and await pc.close() so any
failure during acquire or SDP write removes the peer and releases the encoder.
Ensure references are to h264_encoder, pcs, _release_encoder_once, pc, and
handler as in the diff.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 62338d05-7b57-4c51-8e37-523017e959f8
📒 Files selected for processing (4)
spyglass/camera/csi.pyspyglass/camera/lazy_encoder.pyspyglass/server/jpeg.pyspyglass/server/webrtc_whep.py
8fae89d to
963b7cd
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
spyglass/camera/csi.py (1)
72-82: 💤 Low valueConsider logging suppressed exceptions at debug level.
The defensive
try/except/passpattern is appropriate for shutdown paths where encoders may already be stopped. However, silently swallowing exceptions can make debugging difficult if unexpected failures occur during shutdown.♻️ Optional: Add debug logging for suppressed exceptions
+from spyglass import logger + def stop(self): # Encoders / camera may already be stopped (no consumers); guard the # systemd shutdown path so we don't error out on that case. try: self.picam2.stop_encoder() - except Exception: - pass + except Exception as e: + logger.debug("stop_encoder during shutdown: %s", e) try: self.picam2.stop() - except Exception: - pass + except Exception as e: + logger.debug("stop during shutdown: %s", e)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@spyglass/camera/csi.py` around lines 72 - 82, The stop method currently swallows all exceptions from self.picam2.stop_encoder() and self.picam2.stop(); change the except blocks to log the caught exceptions at debug level instead of/passingly ignoring them so we retain shutdown resilience but surface diagnostic info. Specifically, in the stop method surrounding self.picam2.stop_encoder() and self.picam2.stop(), catch Exception as e and call the module/class logger (or processLogger if available) to log a debug message including context like "picam2.stop_encoder failed" or "picam2.stop failed" and the exception details; keep the behavior of not re-raising so shutdown remains safe.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@spyglass/server/webrtc_whep.py`:
- Around line 80-88: The release path currently may call _release_encoder_once()
even if h264_encoder.acquire() failed, causing double-decrement of
LazyEncoder._refs; fix by tracking a successful_acquire boolean (e.g., acquired
= False) that you set to True only after h264_encoder.acquire() returns
successfully, and change _release_encoder_once() (or its caller) to call
h264_encoder.release() only when acquired is True (and still guard with
encoder_released to ensure single release). Update references around
h264_encoder.acquire(), _release_encoder_once(), and any exception handlers so
they check the acquired flag before invoking release().
---
Nitpick comments:
In `@spyglass/camera/csi.py`:
- Around line 72-82: The stop method currently swallows all exceptions from
self.picam2.stop_encoder() and self.picam2.stop(); change the except blocks to
log the caught exceptions at debug level instead of/passingly ignoring them so
we retain shutdown resilience but surface diagnostic info. Specifically, in the
stop method surrounding self.picam2.stop_encoder() and self.picam2.stop(), catch
Exception as e and call the module/class logger (or processLogger if available)
to log a debug message including context like "picam2.stop_encoder failed" or
"picam2.stop failed" and the exception details; keep the behavior of not
re-raising so shutdown remains safe.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 11a8fd51-fd30-4e6d-b269-72f60f433d30
📒 Files selected for processing (4)
spyglass/camera/csi.pyspyglass/camera/lazy_encoder.pyspyglass/server/jpeg.pyspyglass/server/webrtc_whep.py
Previously, csi.py started both the MJPEG encoder and (when WebRTC was
enabled) the H264 encoder unconditionally at server startup, regardless
of whether any client was connected. Both encoders ran continuously,
burning CPU on every frame even with zero consumers.
This change introduces LazyEncoder, a small ref-counted wrapper around
picamera2's start_encoder/stop_encoder. Encoders are now started on the
first consumer (MJPEG /stream or /snapshot, WebRTC WHEP POST) and
stopped when the last consumer disconnects.
Measured CPU impact on a Raspberry Pi 4B with a Camera Module 3
(hardware encoders enabled, sampled with `top`, single python process,
percentages are of one core; the kernel reports values >100% when the
process spans multiple cores):
Default config (640x480 @ 15 FPS):
Idle (no clients): ~13-15% -> ~5%
Live MJPEG / WebRTC client: unchanged
Heavier config (1920x1080 @ 30 FPS):
Idle (no clients): ~95-100% -> ~15%
Live WebRTC only: ~120-135% -> ~60%
Live MJPEG only: ~95-104% -> ~50-60%
The biggest wins come from two effects: idle is no longer paying for
either encoder, and a live WebRTC-only client no longer pays for the
unused MJPEG encoder running in parallel (and vice versa).
The remaining idle CPU is the picamera2 capture loop itself, which
still runs continuously; pausing the camera entirely is left for a
follow-up commit.
Implementation notes:
- LazyEncoder.acquire/release are protected by a threading.Lock; a
failed start_encoder rolls back the refcount so callers can retry.
- WebRTC release is wired to RTCPeerConnection 'closed' state with an
idempotent guard, plus a try/except around setup so a failure during
SDP exchange or ICE gathering does not leak an encoder.
- USB camera path is unaffected: it never used encoders, and the new
code uses getattr(handler, ..., None) so missing attributes fall
back to no-op.
All 129 existing unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Building on the previous LazyEncoder commit, this commit ref-counts the
Picamera2 instance itself: the camera only runs while at least one
encoder is active. When the last encoder is released, picam2.stop() is
called; on the next consumer, picam2.start() runs again.
Adds a small CameraSession wrapper next to LazyEncoder; each LazyEncoder
takes an optional session and acquires/releases it together with the
underlying encoder, so the camera is the union of the encoders' refs.
Measured CPU impact, applied on top of the previous commit (Raspberry
Pi 4B, Camera Module 3, hardware encoders, sampled with `top`):
Default config (640x480 @ 15 FPS):
Idle (no clients): ~5% -> ~0%
Live MJPEG / WebRTC client: unchanged
Heavier config (1920x1080 @ 30 FPS):
Idle (no clients): ~15% -> ~0%
Live WebRTC client: ~60% (unchanged)
Cumulative effect of both commits at 1080p30:
Idle: ~95-100% -> ~0%
Live WebRTC: ~120-135% -> ~60%
Cold-start latency on first connect (camera off -> on -> first JPEG):
~150-500 ms for /snapshot at 1080p30, dominated by libcamera sensor
init and AE/AWB warm-up. The first cold start after process boot is
typically slowest (~500 ms); subsequent cold starts are faster
(~150-200 ms) since some libcamera state appears to be retained.
There is no warm path while no consumers are active: every snapshot
released triggers a full camera stop.
Implementation notes:
- CameraSession.acquire/release are protected by a threading.Lock and
roll back the refcount if picam2.start() raises, mirroring the
LazyEncoder pattern.
- LazyEncoder.acquire takes the session ref before calling start_encoder
so the camera is always running when the encoder starts. release does
the inverse: stop_encoder first, then drop the session ref. If
start_encoder fails, the session ref is rolled back too.
- Lock acquisition order is consistent (encoder lock then session lock),
so two encoders sharing one session cannot deadlock.
- CSI.stop() now defensively calls picam2.stop_encoder() and
picam2.stop() under try/except, since at systemd shutdown the camera
may already be stopped (no consumers) and stop_encoder/stop would
otherwise raise.
- USB camera path is unaffected; only csi.py wires the session in.
All 129 existing unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
963b7cd to
7ac9e50
Compare
|
I'm fine with AI PRs, but next time please write it at the top as a Disclaimer. While reading this I already thought this might be a full AI PR resulting in some kind of annoyance, as I didn't see a disclaimer till the end.
Something about the statistics seem off. Those are just statistics and the fact is, that there is actually an issue with this. A delay of 150-500ms, or to be fair 150-200ms if the AI is correct after the first start, is a dealbreaker for the actual usecase of The Adaptive MJPEG issue can be solved with the The CPU usage at high res with 30FPS is really high and imo this FPS value is too high for a webcam stream that is intended for monitoring your printer. These copying actions result in the increased CPU usage if you start to watch the WebRTC stream. I will need to do quite some testing on this before we can release such a thing. Especially in regards of CPU spikes during encoder startup. If the CPU usage spikes we will create problems in combination with Klipper. I can most likely start testing in about 2-3 weeks. Till that we need to discuss the timelapse scenario with possible solutions on how to handle this without introducing any problems for current users. |
Adds two new options to control how long each encoder (and the
underlying camera, when no other encoder is active) keeps running
after the last consumer disconnects:
--mjpeg-linger-seconds / MJPEG_LINGER_SECONDS (default: -1)
--webrtc-linger-seconds / WEBRTC_LINGER_SECONDS (default: 5)
Semantics:
-1 keeps the encoder running once started; spyglass pre-warms it at
startup so the first consumer never pays cold-start latency.
0 stops the encoder immediately when the last consumer releases
(the behavior introduced in the first two commits of this branch).
> 0 stops the encoder N seconds after the last consumer releases; a
fresh acquire within the window cancels the pending stop.
The MJPEG default (-1) restores the upstream 'always on' idle behavior
so /snapshot use cases (notably timelapse) and the Mainsail Adaptive
MJPEG service keep working without paying cold-start latency on each
request. The WebRTC default (5s) keeps the bulk of the lazy CPU win
while bridging brief peer reconnects.
Implementation:
- LazyEncoder gains a linger_seconds ctor parameter and a shared
threading.Timer for the delayed stop. A monotonically increasing
_stop_token invalidates any stale timer callback that already raced
past Timer.cancel().
- _stop_now_locked centralizes the stop_encoder / session.release
sequence so both the immediate and lingered stop paths use the same
code with the same try/finally guarantees.
- CSI.start_and_run_server pre-warms each encoder for which linger < 0
by issuing a single never-released acquire() at startup.
- Camera.start_and_run_server abstract signature and the USB override
gain the new kwargs for compatibility; USB ignores them since it
does not use LazyEncoder.
Adds focused unit tests for LazyEncoder covering acquire/release
ref-counting, all three linger modes, timer cancellation on re-acquire,
session sharing across two encoders, and rollback on start_encoder /
stop_encoder failure.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the review. I've added a follow-up commit with per-encoder linger. Addresses the New flags
Three modes per encoder:
MJPEG defaults to Test coverage
On-device verificationVerified on a Pi 4B with Camera Module 3 at 1920×1080 @ 30 fps: idle CPU with MJPEG warm + WebRTC lazy matches the prior "always-running MJPEG" baseline, WebRTC encoder starts on first peer / stops after the linger window, and Orthogonal CPU win — see #147Looked beyond lazy encoders and found a larger lever: picamera2's default main-stream pixel format is
Split out as #147 since it stands alone on |
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
|
Thanks for the contribution |
|
Thanks for taking the time to review! |
Use of AI
This PR was written using GitHub Copilot CLI with Claude Opus 4.7. I understand different projects have different policies wrt to AI generated code. Do let me know if this is not ok for this project. From reviewing the code changes as well as the resulting CPU usage numbers and continued usability from mainsail / direct http consumption, I can confirm this is having the desired effect.
I do not have any prior experience on this repo so please do share feedback if anything was missed of if this is taking a completely wrong approach.
Summary
Spyglass currently starts the MJPEG encoder, the H264 encoder (when WebRTC is enabled), and the picamera2 capture loop unconditionally at server startup. They run continuously regardless of whether any client is connected. On a Pi 4B with a Camera Module 3 at 1920x1080 @ 30 FPS this costs ~95-100% of one CPU core at idle (and ~120-135% during a live WebRTC session, since the unused MJPEG encoder runs in parallel).
This PR adds reference-counted lazy start/stop:
feat: lazy-start picamera2 encoders to save idle CPU) — wrapsstart_encoder/stop_encoderin a smallLazyEncoder. MJPEG is acquired by/streamand/snapshot; H264 is acquired by the WHEPPOSTand released onRTCPeerConnectionclose.feat: stop the picamera2 capture loop when no consumers) — adds aCameraSessionthat does the same forpicam2.start()/picam2.stop(), so the camera itself stops when no encoders are active.Measured CPU impact
Raspberry Pi 4B, Camera Module 3, hardware encoders, sampled with `top` (percentages of one core; >100% means the process spans multiple cores):
The biggest wins come from two effects: idle no longer pays for either encoder + the capture loop, and a live single-protocol client no longer pays for the unused other encoder.
Cold-start latency
Cold-start
/snapshotat 1080p30 (camera off → on → first JPEG): ~150-500 ms, dominated by libcamera sensor init and AE/AWB warm-up. The first cold start after process boot is typically slowest (~500 ms); subsequent cold starts are faster (~150-200 ms).This is an acceptable tradeoff for the idle savings; in practice Mainsail/Fluidd dashboards either keep a long-lived WebRTC stream open (one cold start per session) or poll
/snapshotat a cadence where 100-200 ms is invisible.Tested
/snapshot→ returns valid JPEG, camera starts and stops cleanly per request./stream(MJPEG) → starts encoder on connect, releases on disconnect./webrtc/whepfrom Mainsail → live video, encoder released on tab close (Camera stoppedin log).Possible follow-ups (not in this PR)
--camera-linger-secondsflag so consecutive snapshots within a window reuse a warm camera. Skipped here because the cold-start cost is small (~100-150 ms) and it adds non-trivial complexity (timer races, default-value debate). Easy to add later if there's demand.