Skip to content

perf(csi): request YUV420 main stream when supported (-28% to -39% CPU)#147

Merged
mryel00 merged 19 commits into
mainsail-crew:developfrom
antoinecellerier:csi-yuv420-stream
Jun 22, 2026
Merged

perf(csi): request YUV420 main stream when supported (-28% to -39% CPU)#147
mryel00 merged 19 commits into
mainsail-crew:developfrom
antoinecellerier:csi-yuv420-stream

Conversation

@antoinecellerier

@antoinecellerier antoinecellerier commented May 22, 2026

Copy link
Copy Markdown

Use of AI

This PR was written using GitHub Copilot CLI with Claude Opus 4.7. I've confirmed continued working behavior.

Also from past experience coding in VLC and associated projects a couple of decades ago this makes total sense to me ... video & jpeg work in the YUV space so avoiding any need to convert to a different color space is an immediate win.

Summary

Spyglass currently relies on picamera2's default main-stream pixel format, which is XBGR8888 (4 bytes/pixel). The Pi V4L2 H264 and MJPEG hardware encoders both accept YUV420 natively (1.5 bytes/pixel — 2.7× less DMA bandwidth) — so asking libcamera to deliver YUV420 directly avoids both the colour-space conversion inside the encoder and a large amount of memory bandwidth on the camera→ISP→encoder path.

This PR adds proactive format selection in the CSI camera path:

  • Camera._main_stream_config(width, height) is a new extension point on the abstract Camera base — the default returns {"size": (w, h)} (i.e. USB cameras are unchanged).
  • CSI._main_stream_config enumerates the formats libcamera actually advertises for the VideoRecording role via picam2.camera.generate_configuration([VideoRecording]).at(0).formats.pixel_formats and picks the first entry from a preference list ("YUV420", "BGR888", "RGB888", "XBGR8888", "XRGB8888") that the camera supports. These five formats are exactly the intersection of what the V4L2 HW encoder and the SW JpegEncoder both accept (see picamera2.encoders.v4l2_encoder._v4l2_format and picamera2.encoders.jpeg_encoder); requesting anything else risks the encoder failing at start_encoder() with RuntimeError("Unrecognised format ...").
  • If the libcamera query fails, or none of the preferred formats are supported, we fall back to omitting the format key entirely, which preserves the previous picamera2 default — i.e. this PR can only ever match-or-improve the previous behaviour.

A reactive try/except around picam2.configure() was tried first but rejected: libcamera's validate() returns Adjusted (not Invalid) for many unsupported formats, silently substituting e.g. RGB161616YUYV. picam2.configure() then succeeds, only for the MJPEG encoder to crash downstream because YUYV is not in its format table. Proactive enumeration avoids that whole class of failure mode.

Measured CPU impact

Raspberry Pi 4B, Camera Module 3 (IMX708), 1920×1080 @ 30 FPS, hardware encoders, sampled with ps -p $PID -o pcpu:

Scenario upstream/develop (XBGR8888) This PR (YUV420) Reduction
Idle (encoders running, no WebRTC) 89.1% (87.5–90.6%) ~54% (40–72%) ~39%
Live WebRTC 93.0% (91.1–94.6%) ~67% (52–100%) ~28%

Note: this PR does not include the lazy-encoder changes in #146 — these numbers are with both encoders running eagerly, which is the current develop behaviour. The two PRs compose: with lazy encoders enabled and only one of MJPEG/WebRTC active, idle drops to ~0% (per #146) and the live overhead also benefits from the smaller main stream.

Tested

  • All existing unit tests pass (129 → 134, +5 new tests for the format-selection path).
  • Smoke-tested on a Pi 4B + Camera Module 3:
    • Happy path: libcamera log shows configuring streams: (0) 1920x1080-YUV420/Rec709 (no “Camera configuration has been adjusted!” message — proof that libcamera honoured our pick exactly), /snapshot returns a valid 1920×1080 JPEG.
    • Fall-through path: temporarily prepended a RGB161616 entry to the preference list — confirmed our INFO log fires (Main stream using 'YUV420' (preferred 'RGB161616' not supported by camera)), service starts cleanly, /snapshot still valid.
  • The new tests/test_camera_configure.py covers: top-preference picked, fall-through to next supported preference, no-overlap returns no format key, libcamera enumeration RuntimeError is handled, base Camera does not enumerate (so USB path is untouched).

antoinecellerier and others added 3 commits May 20, 2026 20:11
Previously, csi.py started both the MJPEG encoder and (when WebRTC was
enabled) the H264 encoder unconditionally at server startup, regardless
of whether any client was connected. Both encoders ran continuously,
burning CPU on every frame even with zero consumers.

This change introduces LazyEncoder, a small ref-counted wrapper around
picamera2's start_encoder/stop_encoder. Encoders are now started on the
first consumer (MJPEG /stream or /snapshot, WebRTC WHEP POST) and
stopped when the last consumer disconnects.

Measured CPU impact on a Raspberry Pi 4B with a Camera Module 3
(hardware encoders enabled, sampled with `top`, single python process,
percentages are of one core; the kernel reports values >100% when the
process spans multiple cores):

  Default config (640x480 @ 15 FPS):
    Idle (no clients):              ~13-15%  ->  ~5%
    Live MJPEG / WebRTC client:     unchanged

  Heavier config (1920x1080 @ 30 FPS):
    Idle (no clients):              ~95-100%  ->  ~15%
    Live WebRTC only:              ~120-135%  ->  ~60%
    Live MJPEG only:                ~95-104%  ->  ~50-60%

The biggest wins come from two effects: idle is no longer paying for
either encoder, and a live WebRTC-only client no longer pays for the
unused MJPEG encoder running in parallel (and vice versa).

The remaining idle CPU is the picamera2 capture loop itself, which
still runs continuously; pausing the camera entirely is left for a
follow-up commit.

Implementation notes:
- LazyEncoder.acquire/release are protected by a threading.Lock; a
  failed start_encoder rolls back the refcount so callers can retry.
- WebRTC release is wired to RTCPeerConnection 'closed' state with an
  idempotent guard, plus a try/except around setup so a failure during
  SDP exchange or ICE gathering does not leak an encoder.
- USB camera path is unaffected: it never used encoders, and the new
  code uses getattr(handler, ..., None) so missing attributes fall
  back to no-op.

All 129 existing unit tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Building on the previous LazyEncoder commit, this commit ref-counts the
Picamera2 instance itself: the camera only runs while at least one
encoder is active. When the last encoder is released, picam2.stop() is
called; on the next consumer, picam2.start() runs again.

Adds a small CameraSession wrapper next to LazyEncoder; each LazyEncoder
takes an optional session and acquires/releases it together with the
underlying encoder, so the camera is the union of the encoders' refs.

Measured CPU impact, applied on top of the previous commit (Raspberry
Pi 4B, Camera Module 3, hardware encoders, sampled with `top`):

  Default config (640x480 @ 15 FPS):
    Idle (no clients):                ~5%  ->  ~0%
    Live MJPEG / WebRTC client:       unchanged

  Heavier config (1920x1080 @ 30 FPS):
    Idle (no clients):               ~15%  ->  ~0%
    Live WebRTC client:              ~60%  (unchanged)

Cumulative effect of both commits at 1080p30:
  Idle:           ~95-100%   ->   ~0%
  Live WebRTC:   ~120-135%   ->  ~60%

Cold-start latency on first connect (camera off -> on -> first JPEG):
~150-500 ms for /snapshot at 1080p30, dominated by libcamera sensor
init and AE/AWB warm-up. The first cold start after process boot is
typically slowest (~500 ms); subsequent cold starts are faster
(~150-200 ms) since some libcamera state appears to be retained.
There is no warm path while no consumers are active: every snapshot
released triggers a full camera stop.

Implementation notes:
- CameraSession.acquire/release are protected by a threading.Lock and
  roll back the refcount if picam2.start() raises, mirroring the
  LazyEncoder pattern.
- LazyEncoder.acquire takes the session ref before calling start_encoder
  so the camera is always running when the encoder starts. release does
  the inverse: stop_encoder first, then drop the session ref. If
  start_encoder fails, the session ref is rolled back too.
- Lock acquisition order is consistent (encoder lock then session lock),
  so two encoders sharing one session cannot deadlock.
- CSI.stop() now defensively calls picam2.stop_encoder() and
  picam2.stop() under try/except, since at systemd shutdown the camera
  may already be stopped (no consumers) and stop_encoder/stop would
  otherwise raise.
- USB camera path is unaffected; only csi.py wires the session in.

All 129 existing unit tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds two new options to control how long each encoder (and the
underlying camera, when no other encoder is active) keeps running
after the last consumer disconnects:

  --mjpeg-linger-seconds  / MJPEG_LINGER_SECONDS   (default: -1)
  --webrtc-linger-seconds / WEBRTC_LINGER_SECONDS  (default: 5)

Semantics:
  -1 keeps the encoder running once started; spyglass pre-warms it at
     startup so the first consumer never pays cold-start latency.
   0 stops the encoder immediately when the last consumer releases
     (the behavior introduced in the first two commits of this branch).
 > 0 stops the encoder N seconds after the last consumer releases; a
     fresh acquire within the window cancels the pending stop.

The MJPEG default (-1) restores the upstream 'always on' idle behavior
so /snapshot use cases (notably timelapse) and the Mainsail Adaptive
MJPEG service keep working without paying cold-start latency on each
request. The WebRTC default (5s) keeps the bulk of the lazy CPU win
while bridging brief peer reconnects.

Implementation:
- LazyEncoder gains a linger_seconds ctor parameter and a shared
  threading.Timer for the delayed stop. A monotonically increasing
  _stop_token invalidates any stale timer callback that already raced
  past Timer.cancel().
- _stop_now_locked centralizes the stop_encoder / session.release
  sequence so both the immediate and lingered stop paths use the same
  code with the same try/finally guarantees.
- CSI.start_and_run_server pre-warms each encoder for which linger < 0
  by issuing a single never-released acquire() at startup.
- Camera.start_and_run_server abstract signature and the USB override
  gain the new kwargs for compatibility; USB ignores them since it
  does not use LazyEncoder.

Adds focused unit tests for LazyEncoder covering acquire/release
ref-counting, all three linger modes, timer cancellation on re-acquire,
session sharing across two encoders, and rollback on start_encoder /
stop_encoder failure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@antoinecellerier antoinecellerier marked this pull request as ready for review May 22, 2026 17:04
@mryel00

mryel00 commented May 22, 2026

Copy link
Copy Markdown
Member

Thanks for the PR.

Did you do image comparisons between both color formats?
I would expect YUV420 to look more dull in terms of colors or maybe some compression artifacts.
I'm no colorspace expert but the reduced data size obviously comes at a price and this might result in a less appealing image.
Performance should not be the only measured variable for such changes.

Nevertheless this should be a configurable option and if the image quality indeed differs we should write some documentation about it.

@antoinecellerier

antoinecellerier commented May 22, 2026

Copy link
Copy Markdown
Author

YUV420 is what JPEG & modern video formats encode. The formats do not provide options for RGB space chromas. In effect what this PR is doing is forcing the whole processing pipeline to operate in YUV space earlier (at least after sensor debayer & whatever happens there I guess?). This should not have any perceivable visual artifacts.

If you look in how pipelines operate in typical video software, they will always prefer operating on color spaces which are optimal to process. This is what's happening with this proposed change.

Happy to provide further context if needed. In this case though a config option IMO has 0 value and just adds needless complexity.

Comment thread spyglass/camera/csi.py Outdated
Comment thread spyglass/camera/camera.py
@mryel00

mryel00 commented May 22, 2026

Copy link
Copy Markdown
Member

I personally also use AI for things I don't understand or get a feature faster but you should make sure that at least some basic code quality is still given.
I don't really like to review AI PRs especially if I'm the first human reading the code.

@antoinecellerier

antoinecellerier commented May 31, 2026

Copy link
Copy Markdown
Author

Happy to update code to better fit your coding style. From my initial reading of the code before posting the PR it was perfectly sensible to me (after a few iterations with the AI). I definitely will not post PRs on any projects without reading the code first. Let me know what you feel would need changing on top of the 2 comments you posted earlier which I would appreciate a bit more clarity on before addressing them. Thanks!

Comment thread spyglass/camera/camera.py
Comment thread spyglass/camera/csi.py Outdated
@mryel00

mryel00 commented May 31, 2026

Copy link
Copy Markdown
Member

I just skipped through the code till now, so I made a mistake in one of the comments sry for that 😓
I will soon have time and will actually test and check the code in detail.
This was more like a first scan what felt kinda obvious, but as you saw I made a stupid mistake there. Sry again 🙈

@mryel00

mryel00 commented May 31, 2026

Copy link
Copy Markdown
Member

Just as a general note. I sometimes sound very harsh but it's most of the times not meant like that.
If I make a mistake I'm happy about corrections 😄

@antoinecellerier

Copy link
Copy Markdown
Author

No worries. I really appreciate the quick replies

The picamera2 default main-stream format is XBGR8888 (4 bytes/pixel),
but the V4L2 H264/MJPEG encoders both accept YUV420 natively. Letting
libcamera deliver YUV420 directly avoids an in-encoder colour-space
conversion and cuts main-stream memory bandwidth ~2.7x (1.5 bpp vs 4
bpp), reducing CPU substantially on a Pi 4B + Camera Module 3:

  | scenario                 | XBGR8888 | YUV420 | reduction |
  |--------------------------|---------:|-------:|----------:|
  | idle (encoders, no WHEP) |   89.1%  |  ~54%  |    ~39%   |
  | live WebRTC              |   93.0%  |  ~67%  |    ~28%   |

CSI._main_stream_config() proactively enumerates the camera's supported
pixel formats via libcamera (camera.generate_configuration([Video-
Recording]).formats.pixel_formats) and picks the first
encoder-compatible format from a preference list. If none of the
preferred formats are supported - or the libcamera query fails - we
fall back to the picamera2 default (no format hint), which is the
pre-change behaviour. USB cameras are untouched.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mryel00

mryel00 commented Jun 9, 2026

Copy link
Copy Markdown
Member

I tested the code and you are right, there are no visual artifacts or similar. Performance is way better too.
Thank you for the contribution ❤️

I would refactor a few lines of code to improve readability.
If you want to do it yourself, I will comment the specific lines.

Comment thread spyglass/camera/csi.py Outdated
@antoinecellerier

Copy link
Copy Markdown
Author

Thanks. Do feel free to refactor as you see fit.

Signed-off-by: Patrick Gehrsitz <github@mryel.de>
@mryel00

mryel00 commented Jun 10, 2026

Copy link
Copy Markdown
Member

I adjusted the pick_main_stream_format method to not nest the logic too much.
The logic should still be the exact same.

@antoinecellerier

Copy link
Copy Markdown
Author

Looks good thanks.
And I learned about using next() to get the first or default value from an iterable!

@mryel00

mryel00 commented Jun 21, 2026

Copy link
Copy Markdown
Member

I'm back home now and can finally do the finishing touches.

I came across something weird in your description:

These five formats are exactly the intersection of what the V4L2 HW encoder and the SW JpegEncoder both accept (see picamera2.encoders.v4l2_encoder._v4l2_format and picamera2.encoders.jpeg_encoder)

This is correct, but it's just the same set of formats, isn't it?
The JpegEncoder doesn't even explicitely list the YUV420 format in it's FORMAT_TABLE but checks for it in a condition in the encode_func.
So I just wanna check, if I'm missing something here, as the description is written as if there should be different formats in those two encoders.

mryel00 added 3 commits June 21, 2026 15:14
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
@antoinecellerier

Copy link
Copy Markdown
Author

Good question.

jpeg encoder explicitly lists the formats here: https://github.com/raspberrypi/picamera2/blob/3e6dff099d3982c8fbdb25f7cbe2e4c17bac34f7/picamera2/encoders/jpeg_encoder.py#L33 . From reading the code, YUV420 is handled differently from the RGB colors in the code so isn't listed in FORMAT_TABLE as the encode function looks to be different for YUV vs RGB color spaces.

and yeah, agreed that the AI phrasing is a bit misleading as currently the format list is identical in https://github.com/raspberrypi/picamera2/blob/3e6dff099d3982c8fbdb25f7cbe2e4c17bac34f7/picamera2/encoders/v4l2_encoder.py#L41 . Where it's actually way broader is in the actual kernel v4l2 format listing https://github.com/torvalds/linux/blob/322008f87f917e2217eeac386a9410945092eb2e/include/uapi/linux/videodev2.h#L524 .

mryel00 added 7 commits June 21, 2026 19:46
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Signed-off-by: Patrick Gehrsitz <github@mryel.de>
@mryel00 mryel00 merged commit 5d3e62f into mainsail-crew:develop Jun 22, 2026
5 checks passed
mryel00 added a commit that referenced this pull request Jun 22, 2026
…sage (#147)

The picamera2 default main-stream format is XBGR8888 (4 bytes/pixel),
but the V4L2 H264/MJPEG encoders both accept YUV420 natively. Letting
libcamera deliver YUV420 directly avoids an in-encoder colour-space
conversion and cuts main-stream memory bandwidth ~2.7x (1.5 bpp vs 4
bpp), reducing CPU substantially.

Signed-off-by: Patrick Gehrsitz <github@mryel.de>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Patrick Gehrsitz <github@mryel.de>
@mryel00

mryel00 commented Jun 22, 2026

Copy link
Copy Markdown
Member

Thanks for this contribution. I think I would have never come up with this myself ^^

@antoinecellerier

Copy link
Copy Markdown
Author

You're welcome. Thanks for the review & follow-ups

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants