Skip to content

Drive DisplayID-tiled (5K) monitors as one output#2041

Open
algesten wants to merge 15 commits into
Smithay:masterfrom
algesten:feature/tiled-output
Open

Drive DisplayID-tiled (5K) monitors as one output#2041
algesten wants to merge 15 commits into
Smithay:masterfrom
algesten:feature/tiled-output

Conversation

@algesten

Copy link
Copy Markdown

Some high-res monitors can't fit their full resolution down a single cable, so they present as several DRM connectors, one per tile, each carrying a slice of the image. The connectors share a DisplayID Tile Topology block describing how the tiles fit together (the LG UltraFine 5K and Dell UP2715K are the common ones).

This drives such a monitor as a single logical output. The tiles sit on separate CRTCs but page-flip together in one atomic commit, so they stay frame-locked. The primary tile's vblank stands in for the whole group.

The pieces:

  • DrmDevice::tile_info reads a connector's TILE property into a TileInfo. Connectors sharing a group_id belong to the same monitor.
  • DrmCompositor::new_tiled drives N CRTCs as one logical output. It's a general primitive (a single-GPU video wall qualifies too), so it takes geometry as a TilePlacement per CRTC rather than TileInfo directly. All surfaces must live on one device so they can share the atomic commit.
  • DrmOutputManager::initialize_tiled_output wraps that as one DrmOutput, keyed by the primary tile's CRTC.

Each tile renders its slice of the logical scene into its own framebuffer. The relocation offset is derived through the output transform, so it stays correct under rotation and flip. That part is pure geometry and unit-tested across all eight transforms.

Exercised on a real LG UltraFine 5K over its two DisplayPort streams (via cosmic-comp): display, live rotation, and screencopy all work.

Making use of this requires changes to cosmic-comp to read the TileInfo and using new_tiled. This is deliberate, since cosmic-comp needs to present this correctly to the user in settings.

I have prepared the cosmic-comp changes, but will keep that PR away until we settled the smithay side. It needs tidying up before PR, but for reference, here it is: pop-os/cosmic-comp@master...algesten:cosmic-comp:tiled-output

Close #2034

algesten added 8 commits May 25, 2026 17:32
The first generation of 5K monitors (LG UltraFine 5K, Dell UP2715K,
mid-2010s) couldn't fit 5120x2880@60 down a single DP 1.2 cable and
instead carry the image as two streams, one per tile half. Each half
presents as its own DRM connector, with a DisplayID Tile Topology
block in its EDID describing where it sits in the monitor. The
kernel parses that block and exposes the result as the `TILE`
immutable blob, formatted as eight ':'-delimited decimals:
group_id:flags:num_h_tiles:num_v_tiles:loc_h:loc_v:tile_w:tile_h.

Newer 5K panels carry the image on a single cable using DSC over
DP 1.4, or uncompressed over DP 2.1 UHBR.

`DrmDevice::tile_info()` reads the blob, parses it into a new
`backend::drm::TileInfo` struct, and returns `None` for connectors
without one. Compositors can use the shared `group_id` to identify
connectors that belong to the same physical monitor.

The kernel's `flags` field is parsed and discarded. It isn't
documented and Mutter ignores it.

AI-Assisted
A tiled DrmCompositor (added in a following commit) drives several CRTCs
as one logical, frame-locked output, each CRTC scanning out a sub-rect of
that output. `TilePlacement` describes one CRTC's contribution: its
`DrmSurface`, the planes it may use, and its `Rectangle` sub-rect in the
logical output's untransformed coordinate space.

`validate_tile_regions` checks that a set of regions exactly tiles a
gap-free, overlap-free rectangle anchored at the origin, returning the
size of the logical output they compose. Pure geometry, factored into a
free function with unit tests. The same-device requirement is enforced
later by the constructor.

AI-Assisted
A tiled DrmCompositor renders the logical scene per CRTC by relocating
elements so each tile's slice lands in its framebuffer. The offset must
be derived *through* the output transform: tile regions are fixed in
framebuffer space, but the transform is applied to the whole logical
scene before tiles are sliced, so a raw framebuffer-space offset is only
correct at Transform::Normal.

`tile_logical_region` maps a tile's fixed framebuffer-space region into
the logical (post-transform) coordinate space via `transform_rect_in`,
and `tile_render_offset` returns the element relocation that places that
slice at the tile framebuffer's origin. Both are exposed as
`TilePlacement` methods for the renderer to call.

The geometry is pure, so rotation correctness is unit-tested across all
eight transforms — the property test asserts the fixed halves still
partition the rotated logical output with no gaps or overlaps, which a
framebuffer-space offset would violate for non-Normal transforms.

AI-Assisted
Pure refactor in preparation for tiled output: move the per-scan-out
pipeline fields (surface, planes, swapchain, frame-state machine,
element-state caches, cursor plane, …) out of DrmCompositor into a new
private TileState struct, and have DrmCompositor hold a
SmallVec<[TileState; 1]>. The logical-output state (output_mode_source,
framebuffer_exporter, cursor_size, debug_flags, span) stays on
DrmCompositor.

With exactly one tile this is behaviour-identical — every method now
operates on self.tiles[0]; a following commit adds the multi-tile
constructor and render path. Debug is now a manual impl that reports the
tile count, since deriving it through SmallVec<TileState> would demand a
Debug bound on the allocator buffer that the previous flat derive avoided.

No functional change; the full lib test suite is unchanged (89 passing).

AI-Assisted
Add the multi-CRTC constructor. `new_tiled(output_mode_source, config:
impl Into<TileConfig>, …)` validates that the tiles tile a gap-free
origin-anchored rectangle (TileConfig::logical_size) and share one DRM
device, then builds one TileState per CRTC — each with an
OutputModeSource::Static sized to its region and carrying the logical
output's scale/transform.

The per-surface construction is factored into a shared build_tile helper;
new() is reduced to the single-tile case that delegates to it (region =
the whole output), so there is one construction path. FrameError gains an
InvalidTileConfig variant (From<TileConfigError>) for the validation
failures.

The TileState.region field is set but not yet read — the multi-tile
render path that consumes it lands next. No change to single-tile
behaviour; 89 lib tests unchanged.

AI-Assisted
The plane-assignment helpers (try_assign_element and the per-plane
try_assign_* / element_config it drives) always operated on
`self.tiles[0]`. Thread a `tile: usize` through them so they act on an
arbitrary tile, in preparation for the multi-tile render loop. The sole
caller (render_frame) still passes 0, so behaviour is unchanged.

AI-Assisted
Generalize the render path from a single CRTC to N. The former
render_frame body becomes `render_tile`, which renders one tile's slice
into that tile's own swapchain and damage tracker (its mode is read from
the tile's tracker, so a `Static` per-tile slice and the single-tile
whole-output case are handled uniformly) and stashes the prepared frame
in `self.tiles[tile].next_frame`.

`render_frame` keeps its signature and, for the single-tile case,
delegates straight to `render_tile(0, ..)` — behaviour is unchanged. For
a tiled output it relocates the elements into each tile's framebuffer
space (offset derived through the output transform, so rotation stays
correct) and renders every tile, then returns one frame-locked
`RenderFrameResult`:

  - `is_empty` is the AND across tiles, so a change confined to one tile
    still queues the whole group (avoids dropping to one tile's refresh);
  - the per-tile `RenderElementStates` are merged the same way a single
    render folds an element seen on several planes;
  - the plane elements describe the primary tile, recovered as references
    into the original elements via a new `RelocateRenderElement::element`
    accessor (full-logical screencopy is left for a later change).

Pure aggregation logic is covered by unit tests.

AI-Assisted
A tiled DrmCompositor flips all of its tiles in a single atomic commit, so
the caller's per-frame `user_data` token is one value for the whole group,
not one per tile (and it generally is not `Clone` — e.g. cosmic-comp passes
presentation feedback that must be submitted exactly once).

Move `user_data` out of the per-tile `QueuedFrame`/`PendingFrame` wrappers
and onto the `DrmCompositor` as `queued_user_data`/`pending_user_data`. The
wrappers (and `TileState`) lose their `U` parameter and keep only the
scan-out `FrameState`. `submit`/`handle_flip`/`frame_submitted` thread the
token through the group fields. Single-tile behaviour is unchanged; this
prepares the lifecycle for driving several tiles from one queued frame.

AI-Assisted
@algesten

Copy link
Copy Markdown
Author

@Drakulix I hope this is in line with what you were imagining. Happy to take pointers and rework stuff if it's not right. The PR should be possible to both compile and review commit-by-commit.

@algesten algesten force-pushed the feature/tiled-output branch from e9f2d77 to 2c01971 Compare May 26, 2026 10:04
algesten added 2 commits May 26, 2026 12:20
A tiled DrmCompositor flips all its CRTCs together in a single atomic
commit so they stay frame-locked. To assemble that commit, every CRTC's
state has to land in one `AtomicModeReq`. The property handles all resolve
through the device's shared `PropMapping`, so one request can carry every
CRTC's properties.

`DrmSurface` gains three internal primitives. `append_to_request` emits a
surface's page-flip or modeset state into a shared `&mut AtomicModeReq`
without committing. `commit_request` commits such a request (for a modeset
it test-commits first, since a multi-CRTC commit can fail where each CRTC
passes alone). `mark_request_submitted` advances state after success. The
release-mode `AtomicRequest` property logic is extracted into free
functions (`req_set_plane` etc.) so the per-surface builder and the append
path share one implementation. Legacy devices have no atomic API and
return `Error::AtomicUnsupported`.

`queue_frame`/`commit_frame`/`submit`/`frame_submitted` dispatch to tiled
variants when driving more than one tile. They store even an unchanged
tile's frame (`render_tile`'s `store_empty`) so the tiles always flip
together, the primary tile included, then build one request from all
tiles, commit it via the primary surface, and advance every tile on
success. Because the flip is atomic, the primary CRTC's vblank stands in
for the whole group, so the caller routes that one vblank to
`frame_submitted` (unchanged signature) to advance every tile. The other
CRTCs' redundant vblanks belong to no logical output and are ignored. A
pending modeset on any tile routes the whole commit through the modeset
path. The single-tile path is untouched.

AI-Assisted
`DrmCompositor::new_tiled` drives several CRTCs as one frame-locked
output, but consumers go through the `DrmOutputManager`/`DrmOutput` layer,
which only had a single-CRTC `initialize_output`. Add the tiled
counterpart: it takes one `DrmTilePlacement` per CRTC (CRTC + mode +
connectors + planes + sub-rect), creates a surface for each, and builds
one `new_tiled` compositor wrapped in a `DrmOutput` keyed by the primary
tile's CRTC. The output's render/queue/frame_submitted then delegate to
that compositor exactly as for a single-CRTC output.

Unlike `initialize_output` it does not yet retry under bandwidth pressure
(overlay-plane disabling / implicit modifiers); a tiled output is one
dedicated monitor, so the simple path is used for now.

AI-Assisted
@algesten algesten force-pushed the feature/tiled-output branch from 2c01971 to 7c0432c Compare May 26, 2026 10:20

@Drakulix Drakulix left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on a first pass, this looks pretty good. I commented on a few bits, that stood out to me.

I am very sure we can proceed with this approach.

The whole PR is quite comment-heavy. This isn't inherently a bad thing, but a good amount of bits and pieces are quite repetitive or needlessly verbose. I assume this is due to the AI-assisted nature of the PR. (Code-wise this is pretty good though, good job cleaning it up.) I'd suggest to focus mostly on documenting reasoning and design decisions, but try to cut down on repetition and obvious explanations.

One thing I am wondering is, whether the code in DrmCompositor would already be able to direct-scanout a client buffer spanning multiple outputs (think a fullscreened window). I think it should be able to figure out the necessary parameters for CRTC_W/H and SRC_W/H properties, which is exciting. For compositing though I think going with multiple swapchains is a good approach (instead of going for a shared framebuffer covering the whole space), since it is likely more compatible.

Last but not least: It isn't 100% obvious to me, if a compositor could use this to make a "tiled output", that doesn't actually use the TILE-property, but simply wants to combine multiple outputs. (Assuming the outputs can each use atomic-commits and are connected to the same DrmDevice.) I don't see a logical reason, why that shouldn't work, so I just want to make sure the code allows it.

Ping @cmeissl. Since you wrote most of the code that is refactored here, I would really like a review from you, if you have the time.

Comment thread src/backend/drm/compositor/mod.rs Outdated
Comment thread src/backend/drm/surface/atomic.rs
Comment thread src/backend/drm/output.rs Outdated
algesten added 5 commits May 26, 2026 15:01
Trim repetitive comments around the tiled compositor path and document new_tiled as a generic same-device multi-CRTC primitive, not something tied to the DRM TILE property.

AI-Assisted
Replace the deprecated Point::from tuple conversion in tile-region validation.

AI-Assisted
Mirror initialize_output's fallback sequence for tiled outputs. If the initial multi-CRTC compositor setup fails, first force existing outputs through composited frames to lower overlay-plane bandwidth, then retry with implicit modifiers before reporting the original initialization failure.

AI-Assisted
Expose a typed wrapper for assembling one atomic commit from multiple DrmSurfaces. The primary surface begins the request and is included immediately, additional same-device surfaces can be appended, and commit consumes the builder while marking contributors submitted only after the shared commit succeeds.

Use the new builder in the tiled compositor paths so callers no longer manually coordinate append_to_request, commit_request, and mark_request_submitted.

AI-Assisted
Format the tiled-output DRM changes with cargo fmt.

AI-Assisted
@algesten

Copy link
Copy Markdown
Author

I am very sure we can proceed with this approach.

🎉

The whole PR is quite comment-heavy. This isn't inherently a bad thing, but a good amount of bits and pieces are quite repetitive or needlessly verbose. I assume this is due to the AI-assisted nature of the PR. (Code-wise this is pretty good though, good job cleaning it up.) I'd suggest to focus mostly on documenting reasoning and design decisions, but try to cut down on repetition and obvious explanations.

I have made a pass, making it terser and removed some redundant comments.

One thing I am wondering is, whether the code in DrmCompositor would already be able to direct-scanout a client buffer spanning multiple outputs (think a fullscreened window). I think it should be able to figure out the necessary parameters for CRTC_W/H and SRC_W/H properties, which is exciting. For compositing though I think going with multiple swapchains is a good approach (instead of going for a shared framebuffer covering the whole space), since it is likely more compatible.

It may work on drivers that accept clipped/off-screen plane destinations, but the code does not explicitly compute per-tile SRC_X/Y/W/H crops. We could potentially add crops to make it more compatible.

Last but not least: It isn't 100% obvious to me, if a compositor could use this to make a "tiled output", that doesn't actually use the TILE-property, but simply wants to combine multiple outputs. (Assuming the outputs can each use atomic-commits and are connected to the same DrmDevice.) I don't see a logical reason, why that shouldn't work, so I just want to make sure the code allows it.

The idea is that it should allow that. The only restriction on new_tile should be that it must be the same card, since it's an atomic commit. The values you read out of the TILE property of EDID needs to be converted into the TilePlacement, but there is no hard connection between these two.

/// # Ok(())
/// # }
/// ```
pub fn begin_atomic_commit<'a, 'b>(

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that this API could be made more strict. We can't use "the most elegant Rust", of locking the lifetime of a &'a mut because inside DrmSurface we have Arc<Mutex<XXXInner>>, meaning a mut doesn't lock anything.

That means this API currently is not guaranteeing atomic commits across threads (or even re-entrant/interleaved calls on the same thread). This could be fixed by adding some Arc<Mutex<CommitLock>> that can be exclusively taken for the lifetime of the DrmAtomicCommit.

However I was unsure about introducing more internal Arc<Mutex - so didn't do it straight away.

@Drakulix Drakulix requested a review from cmeissl May 26, 2026 15:28
@cmeissl

cmeissl commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Ping @cmeissl. Since you wrote most of the code that is refactored here, I would really like a review from you, if you have the time.

Sure, I am going to review it today.

@algesten

algesten commented Jun 8, 2026

Copy link
Copy Markdown
Author

Sure, I am going to review it today.

Maybe we should close #2034 and move the comments here, since it takes the TILE property PR and builds on it?

@cmeissl

cmeissl commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Sure, I am going to review it today.

Maybe we should close #2034 and move the comments here, since it takes the TILE property PR and builds on it?

Yeah, sorry, I had both PRs open and picked the wrong one when answering.

I kept it on top of FrameState/ElementState rather than tile-aware on purpose. I didn't want the CRTC level to know anything about the TILE property. A tile is just a DrmSurface with its own planes and swapchain, and the composition sits above that.

I don't know if there is a use case here, but I imagined it would be possible to use the tiled output with several normal screens on the same card too.

I did not mean to add it to DrmSurface directly. With the current approach it looks like we would end up with a framebuffer per tile which sounds unnecessary. It should work fine to have the same exporter and framebuffer on multiple crtc. What I was thinking about is to make the instances in the ElementState and the planes in FrameState CRTC aware. This would also allow to keep the user data in PendingFrame, moving it out of the Frame opens the door for unnecessary desync of the frame state and the user data.

You're right. I'll make a new test using the same DrmAtomicCommit builder so it spans all tiles, rather than make FrameState tile-aware.

What I though about is adding an abstraction for holding multiple DrmSurface and their relationship, just like you did with the tiles, but as a separate abstraction. It could provide the whole test/commit/page_flip logic and allow us to keep a single interface for the frame.

We also have to initialize all planes states once at the start so that each test can incrementally extend the planes. Doing it per Tile and ignoring the atomic properties of all other tiles will lead to uncatched edge cases and might prevent using planes in some cases.
So I would suggest to not special case for tiled/single crtc, but to always run the same logic and only branch in the abstraction responsible for calling test_state and so on.
So for example each render_frame first initializes the frame with all primary planes and no overlay planes and then incrementally run the per TILE/CRTC logic.

I think we also have to provide atomic requests for all other functionality like enabling VRR, not sure doing it one-by-one will work.
The previously mentioned abstraction could resemble the DrmSurface interface and transparently build the atomic requests for all surfaces. The legacy path could just return an error for now in case there are multiple surfaces defined.

@algesten

algesten commented Jun 9, 2026

Copy link
Copy Markdown
Author

Thank you for that feedback! I will process it and fix the PR. A bit busy with work right now so it may take me until the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants