perf(csi): request YUV420 main stream when supported (-28% to -39% CPU)#147
Conversation
Previously, csi.py started both the MJPEG encoder and (when WebRTC was
enabled) the H264 encoder unconditionally at server startup, regardless
of whether any client was connected. Both encoders ran continuously,
burning CPU on every frame even with zero consumers.
This change introduces LazyEncoder, a small ref-counted wrapper around
picamera2's start_encoder/stop_encoder. Encoders are now started on the
first consumer (MJPEG /stream or /snapshot, WebRTC WHEP POST) and
stopped when the last consumer disconnects.
Measured CPU impact on a Raspberry Pi 4B with a Camera Module 3
(hardware encoders enabled, sampled with `top`, single python process,
percentages are of one core; the kernel reports values >100% when the
process spans multiple cores):
Default config (640x480 @ 15 FPS):
Idle (no clients): ~13-15% -> ~5%
Live MJPEG / WebRTC client: unchanged
Heavier config (1920x1080 @ 30 FPS):
Idle (no clients): ~95-100% -> ~15%
Live WebRTC only: ~120-135% -> ~60%
Live MJPEG only: ~95-104% -> ~50-60%
The biggest wins come from two effects: idle is no longer paying for
either encoder, and a live WebRTC-only client no longer pays for the
unused MJPEG encoder running in parallel (and vice versa).
The remaining idle CPU is the picamera2 capture loop itself, which
still runs continuously; pausing the camera entirely is left for a
follow-up commit.
Implementation notes:
- LazyEncoder.acquire/release are protected by a threading.Lock; a
failed start_encoder rolls back the refcount so callers can retry.
- WebRTC release is wired to RTCPeerConnection 'closed' state with an
idempotent guard, plus a try/except around setup so a failure during
SDP exchange or ICE gathering does not leak an encoder.
- USB camera path is unaffected: it never used encoders, and the new
code uses getattr(handler, ..., None) so missing attributes fall
back to no-op.
All 129 existing unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Building on the previous LazyEncoder commit, this commit ref-counts the
Picamera2 instance itself: the camera only runs while at least one
encoder is active. When the last encoder is released, picam2.stop() is
called; on the next consumer, picam2.start() runs again.
Adds a small CameraSession wrapper next to LazyEncoder; each LazyEncoder
takes an optional session and acquires/releases it together with the
underlying encoder, so the camera is the union of the encoders' refs.
Measured CPU impact, applied on top of the previous commit (Raspberry
Pi 4B, Camera Module 3, hardware encoders, sampled with `top`):
Default config (640x480 @ 15 FPS):
Idle (no clients): ~5% -> ~0%
Live MJPEG / WebRTC client: unchanged
Heavier config (1920x1080 @ 30 FPS):
Idle (no clients): ~15% -> ~0%
Live WebRTC client: ~60% (unchanged)
Cumulative effect of both commits at 1080p30:
Idle: ~95-100% -> ~0%
Live WebRTC: ~120-135% -> ~60%
Cold-start latency on first connect (camera off -> on -> first JPEG):
~150-500 ms for /snapshot at 1080p30, dominated by libcamera sensor
init and AE/AWB warm-up. The first cold start after process boot is
typically slowest (~500 ms); subsequent cold starts are faster
(~150-200 ms) since some libcamera state appears to be retained.
There is no warm path while no consumers are active: every snapshot
released triggers a full camera stop.
Implementation notes:
- CameraSession.acquire/release are protected by a threading.Lock and
roll back the refcount if picam2.start() raises, mirroring the
LazyEncoder pattern.
- LazyEncoder.acquire takes the session ref before calling start_encoder
so the camera is always running when the encoder starts. release does
the inverse: stop_encoder first, then drop the session ref. If
start_encoder fails, the session ref is rolled back too.
- Lock acquisition order is consistent (encoder lock then session lock),
so two encoders sharing one session cannot deadlock.
- CSI.stop() now defensively calls picam2.stop_encoder() and
picam2.stop() under try/except, since at systemd shutdown the camera
may already be stopped (no consumers) and stop_encoder/stop would
otherwise raise.
- USB camera path is unaffected; only csi.py wires the session in.
All 129 existing unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds two new options to control how long each encoder (and the
underlying camera, when no other encoder is active) keeps running
after the last consumer disconnects:
--mjpeg-linger-seconds / MJPEG_LINGER_SECONDS (default: -1)
--webrtc-linger-seconds / WEBRTC_LINGER_SECONDS (default: 5)
Semantics:
-1 keeps the encoder running once started; spyglass pre-warms it at
startup so the first consumer never pays cold-start latency.
0 stops the encoder immediately when the last consumer releases
(the behavior introduced in the first two commits of this branch).
> 0 stops the encoder N seconds after the last consumer releases; a
fresh acquire within the window cancels the pending stop.
The MJPEG default (-1) restores the upstream 'always on' idle behavior
so /snapshot use cases (notably timelapse) and the Mainsail Adaptive
MJPEG service keep working without paying cold-start latency on each
request. The WebRTC default (5s) keeps the bulk of the lazy CPU win
while bridging brief peer reconnects.
Implementation:
- LazyEncoder gains a linger_seconds ctor parameter and a shared
threading.Timer for the delayed stop. A monotonically increasing
_stop_token invalidates any stale timer callback that already raced
past Timer.cancel().
- _stop_now_locked centralizes the stop_encoder / session.release
sequence so both the immediate and lingered stop paths use the same
code with the same try/finally guarantees.
- CSI.start_and_run_server pre-warms each encoder for which linger < 0
by issuing a single never-released acquire() at startup.
- Camera.start_and_run_server abstract signature and the USB override
gain the new kwargs for compatibility; USB ignores them since it
does not use LazyEncoder.
Adds focused unit tests for LazyEncoder covering acquire/release
ref-counting, all three linger modes, timer cancellation on re-acquire,
session sharing across two encoders, and rollback on start_encoder /
stop_encoder failure.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the PR. Did you do image comparisons between both color formats? Nevertheless this should be a configurable option and if the image quality indeed differs we should write some documentation about it. |
|
YUV420 is what JPEG & modern video formats encode. The formats do not provide options for RGB space chromas. In effect what this PR is doing is forcing the whole processing pipeline to operate in YUV space earlier (at least after sensor debayer & whatever happens there I guess?). This should not have any perceivable visual artifacts. If you look in how pipelines operate in typical video software, they will always prefer operating on color spaces which are optimal to process. This is what's happening with this proposed change. Happy to provide further context if needed. In this case though a config option IMO has 0 value and just adds needless complexity. |
|
I personally also use AI for things I don't understand or get a feature faster but you should make sure that at least some basic code quality is still given. |
|
Happy to update code to better fit your coding style. From my initial reading of the code before posting the PR it was perfectly sensible to me (after a few iterations with the AI). I definitely will not post PRs on any projects without reading the code first. Let me know what you feel would need changing on top of the 2 comments you posted earlier which I would appreciate a bit more clarity on before addressing them. Thanks! |
|
I just skipped through the code till now, so I made a mistake in one of the comments sry for that 😓 |
|
Just as a general note. I sometimes sound very harsh but it's most of the times not meant like that. |
|
No worries. I really appreciate the quick replies |
The picamera2 default main-stream format is XBGR8888 (4 bytes/pixel), but the V4L2 H264/MJPEG encoders both accept YUV420 natively. Letting libcamera deliver YUV420 directly avoids an in-encoder colour-space conversion and cuts main-stream memory bandwidth ~2.7x (1.5 bpp vs 4 bpp), reducing CPU substantially on a Pi 4B + Camera Module 3: | scenario | XBGR8888 | YUV420 | reduction | |--------------------------|---------:|-------:|----------:| | idle (encoders, no WHEP) | 89.1% | ~54% | ~39% | | live WebRTC | 93.0% | ~67% | ~28% | CSI._main_stream_config() proactively enumerates the camera's supported pixel formats via libcamera (camera.generate_configuration([Video- Recording]).formats.pixel_formats) and picks the first encoder-compatible format from a preference list. If none of the preferred formats are supported - or the libcamera query fails - we fall back to the picamera2 default (no format hint), which is the pre-change behaviour. USB cameras are untouched. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
71d323a to
e83a40a
Compare
|
I tested the code and you are right, there are no visual artifacts or similar. Performance is way better too. I would refactor a few lines of code to improve readability. |
|
Thanks. Do feel free to refactor as you see fit. |
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
|
I adjusted the |
|
Looks good thanks. |
|
I'm back home now and can finally do the finishing touches. I came across something weird in your description:
This is correct, but it's just the same set of formats, isn't it? |
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
|
Good question. jpeg encoder explicitly lists the formats here: https://github.com/raspberrypi/picamera2/blob/3e6dff099d3982c8fbdb25f7cbe2e4c17bac34f7/picamera2/encoders/jpeg_encoder.py#L33 . From reading the code, YUV420 is handled differently from the RGB colors in the code so isn't listed in FORMAT_TABLE as the encode function looks to be different for YUV vs RGB color spaces. and yeah, agreed that the AI phrasing is a bit misleading as currently the format list is identical in https://github.com/raspberrypi/picamera2/blob/3e6dff099d3982c8fbdb25f7cbe2e4c17bac34f7/picamera2/encoders/v4l2_encoder.py#L41 . Where it's actually way broader is in the actual kernel v4l2 format listing https://github.com/torvalds/linux/blob/322008f87f917e2217eeac386a9410945092eb2e/include/uapi/linux/videodev2.h#L524 . |
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
…sage (#147) The picamera2 default main-stream format is XBGR8888 (4 bytes/pixel), but the V4L2 H264/MJPEG encoders both accept YUV420 natively. Letting libcamera deliver YUV420 directly avoids an in-encoder colour-space conversion and cuts main-stream memory bandwidth ~2.7x (1.5 bpp vs 4 bpp), reducing CPU substantially. Signed-off-by: Patrick Gehrsitz <github@mryel.de> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Patrick Gehrsitz <github@mryel.de>
|
Thanks for this contribution. I think I would have never come up with this myself ^^ |
|
You're welcome. Thanks for the review & follow-ups |
Use of AI
This PR was written using GitHub Copilot CLI with Claude Opus 4.7. I've confirmed continued working behavior.
Also from past experience coding in VLC and associated projects a couple of decades ago this makes total sense to me ... video & jpeg work in the YUV space so avoiding any need to convert to a different color space is an immediate win.
Summary
Spyglass currently relies on picamera2's default main-stream pixel format, which is
XBGR8888(4 bytes/pixel). The Pi V4L2 H264 and MJPEG hardware encoders both acceptYUV420natively (1.5 bytes/pixel — 2.7× less DMA bandwidth) — so asking libcamera to deliverYUV420directly avoids both the colour-space conversion inside the encoder and a large amount of memory bandwidth on the camera→ISP→encoder path.This PR adds proactive format selection in the CSI camera path:
Camera._main_stream_config(width, height)is a new extension point on the abstractCamerabase — the default returns{"size": (w, h)}(i.e. USB cameras are unchanged).CSI._main_stream_configenumerates the formats libcamera actually advertises for theVideoRecordingrole viapicam2.camera.generate_configuration([VideoRecording]).at(0).formats.pixel_formatsand picks the first entry from a preference list("YUV420", "BGR888", "RGB888", "XBGR8888", "XRGB8888")that the camera supports. These five formats are exactly the intersection of what the V4L2 HW encoder and the SWJpegEncoderboth accept (seepicamera2.encoders.v4l2_encoder._v4l2_formatandpicamera2.encoders.jpeg_encoder); requesting anything else risks the encoder failing atstart_encoder()withRuntimeError("Unrecognised format ...").A reactive
try/exceptaroundpicam2.configure()was tried first but rejected: libcamera'svalidate()returnsAdjusted(notInvalid) for many unsupported formats, silently substituting e.g.RGB161616→YUYV.picam2.configure()then succeeds, only for the MJPEG encoder to crash downstream becauseYUYVis not in its format table. Proactive enumeration avoids that whole class of failure mode.Measured CPU impact
Raspberry Pi 4B, Camera Module 3 (IMX708), 1920×1080 @ 30 FPS, hardware encoders, sampled with
ps -p $PID -o pcpu:upstream/develop(XBGR8888)Note: this PR does not include the lazy-encoder changes in #146 — these numbers are with both encoders running eagerly, which is the current
developbehaviour. The two PRs compose: with lazy encoders enabled and only one of MJPEG/WebRTC active, idle drops to ~0% (per #146) and the live overhead also benefits from the smaller main stream.Tested
configuring streams: (0) 1920x1080-YUV420/Rec709(no “Camera configuration has been adjusted!” message — proof that libcamera honoured our pick exactly),/snapshotreturns a valid 1920×1080 JPEG.RGB161616entry to the preference list — confirmed ourINFOlog fires (Main stream using 'YUV420' (preferred 'RGB161616' not supported by camera)), service starts cleanly,/snapshotstill valid.tests/test_camera_configure.pycovers: top-preference picked, fall-through to next supported preference, no-overlap returns no format key, libcamera enumerationRuntimeErroris handled, baseCameradoes not enumerate (so USB path is untouched).