perf: read ahead optimization for OSC/DCS/APC PUT by homanp · Pull Request #5832 · xtermjs/xterm.js

homanp · 2026-04-23T16:02:14Z

Follow-up from #5825
Replaces the multi-condition comparison chains in OSC_PUT, DCS_PUT, and APC_PUT inner loops with static Uint8Array lookup tables and clean while loops.

Parser benchmark:

benchmark	master	this PR	change
OSC string (short)	123.8	107.1	-13%
OSC string (long)	134.0	151.6	+13%
OSC class (short)	157.3	147.7	-6%
OSC class (long)	536.0	544.6	+2%
DCS string (short)	91.9	88.0	-4%
DCS string (long)	125.0	153.0	+22%
DCS class (short)	116.9	119.6	+2%
DCS class (long)	431.5	540.4	+25%

Long payloads improve +13-25%. short OSC regresses slightly from the table indirection but short OSC payloads are rare in real terminal data (most carry window titles, URLs, or image data). DCS is positive across the board.

homanp · 2026-04-23T16:04:28Z

@jerch I tried a couple of different approaches here, landed on lookup table an option. Happy to get feedback on this.

jerch · 2026-04-23T16:16:46Z

Oh wow - that is an interesting approach, tbh - would not have thought, that another table can give advantage here. Lemme play around with this for a bit.

homanp · 2026-04-23T16:20:28Z

Oh wow - that is an interesting approach, tbh - would not have thought, that another table can give advantage here. Lemme play around with this for a bit.

Couldn't find a better way without nuking short string.

jerch · 2026-04-23T17:41:10Z

@homanp The table eval is actually slightly wrong, as it only respects byte values. The input is UTF32, thus the final condition against NON_ASCII_PRINTABLE should include all codepoints in 0xa0..0x10FFFF.

homanp · 2026-04-23T17:49:07Z

@homanp The table eval is actually slightly wrong, as it only respects byte values. The input is UTF32, thus the final condition against NON_ASCII_PRINTABLE should include all codepoints in 0xa0..0x10FFFF.

Added a fix: codepoints > 0xff now bypass the table lookup

jerch · 2026-04-23T18:45:02Z

@homanp This run slower now for me.

I have played around with DCS_PUT only - this is the fastest I could find:

        case ParserAction.DCS_PUT:
          // inner loop - exit DCS_PUT: 0x18, 0x1a, 0x1b, 0x7f, 0x80 - 0x9f
          // unhook triggered by: 0x1b, 0x9c (success) and 0x18, 0x1a (abort)
          c = i;
          for (; c < l4;) {
            if (
              (data[++c] <= 0x1b || data[c] > 0x7f) &&
              (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))
            ) break;
            if (
              (data[++c] <= 0x1b || data[c] > 0x7f) &&
              (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))
            ) break;
            if (
              (data[++c] <= 0x1b || data[c] > 0x7f) &&
              (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))
            ) break;
            if (
              (data[++c] <= 0x1b || data[c] > 0x7f) &&
              (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))
            ) break;
          }
          if (c >= l4) {
            while (c < length) {
              if (
                (data[c] <= 0x1b || data[c] > 0x7f) &&
                (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))
            ) break;
              c++;
            }
          }
          this._dcsParser.put(data, i, c);
          i = c - 1;
          break;

Idea: Most DCS payloads are ASCII printables (base64 data), so this optimizes for ASCII.

Edit: Simplified code.

homanp · 2026-04-23T19:44:43Z

@homanp This run slower now for me.

I have played around with DCS_PUT only - this is the fastest I could find:

        case ParserAction.DCS_PUT:

          // inner loop - exit DCS_PUT: 0x18, 0x1a, 0x1b, 0x7f, 0x80 - 0x9f

          // unhook triggered by: 0x1b, 0x9c (success) and 0x18, 0x1a (abort)

          c = i;

          for (; c < l4;) {

            if (

              (data[++c] <= 0x1b || data[c] > 0x7f) &&

              (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))

            ) break;

            if (

              (data[++c] <= 0x1b || data[c] > 0x7f) &&

              (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))

            ) break;

            if (

              (data[++c] <= 0x1b || data[c] > 0x7f) &&

              (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))

            ) break;

            if (

              (data[++c] <= 0x1b || data[c] > 0x7f) &&

              (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))

            ) break;

          }

          if (c >= l4) {

            while (c < length) {

              if (

                (data[c] <= 0x1b || data[c] > 0x7f) &&

                (data[c] === 0x1b || data[c] === 0x1a || data[c] === 0x18 || (data[c] > 0x7f && data[c] < 160))

            ) break;

              c++;

            }

          }

          this._dcsParser.put(data, i, c);

          i = c - 1;

          break;

Idea: Most DCS payloads are ASCII printables (base64 data), so this optimizes for ASCII.

Edit: Simplified code.

Will play around with in an hour or so.

jerch · 2026-04-23T19:56:51Z

-          // inner loop - exit APC_PUT: 0x18, 0x1a, 0x1b, 0x9c
-          for (let j = i + 1; ; ++j) {
-            if (j >= length || (code = data[j]) === 0x18 || code === 0x1a || code === 0x1b || code === 0x9c || (code > 0x7f && code < NON_ASCII_PRINTABLE)) {
-              this._apcParser.put(data, i, j);


FIXME: The code === 0x9c condition is redundant here.

dang, for APC parser tests & benchmarks are missing 😱
--> #5834

jerch · 2026-04-23T20:28:32Z

And for OSC_PUT I currently have this (very similar to PRINT):

        case ParserAction.OSC_PUT:
          // inner loop: 0x20 (SP) included, 0x7F (DEL) included
          c = i;
          while (c < l4
            && data[++c] >= 0x20 && (data[c] <= 0x7f || data[c] >= NON_ASCII_PRINTABLE)
            && data[++c] >= 0x20 && (data[c] <= 0x7f || data[c] >= NON_ASCII_PRINTABLE)
            && data[++c] >= 0x20 && (data[c] <= 0x7f || data[c] >= NON_ASCII_PRINTABLE)
            && data[++c] >= 0x20 && (data[c] <= 0x7f || data[c] >= NON_ASCII_PRINTABLE)
          ) {}
          if (c >= l4) {
            while (c < length && data[c] >= 0x20 && (data[c] <= 0x7f || data[c] >= NON_ASCII_PRINTABLE)) {
              c++;
            }
          }
          this._oscParser.put(data, i, c);
          i = c - 1;
          break;

homanp · 2026-04-23T20:32:48Z

@jerch switched to your approach, two-stage condition with ASCII fast filter for DCS/APC and PRINT-style unrolling for OSC. dropped the lookup tables.

parser benchmark:

benchmark	master	this PR	change
OSC string (short)	123.8	116.6	-6%
OSC string (long)	134.0	150.6	+12%
OSC class (short)	157.3	143.9	-9%
OSC class (long)	536.0	547.2	+2%
DCS string (short)	91.9	87.1	-5%
DCS string (long)	125.0	150.4	+20%
DCS class (short)	116.9	118.0	+1%
DCS class (long)	431.5	522.6	+21%

Apply loop unrolling with two-stage condition checks optimized for ASCII payloads. DCS/APC use a fast filter (data[c] <= 0x1b || > 0x7f) that skips the detailed exit check for common ASCII bytes. OSC uses the same pattern as PRINT. Shared c and l4 variables across the parse method avoid repeated declarations.

homanp · 2026-04-23T20:44:57Z

Maybe we should hold on this until the new benchmarks and tests are done?

jerch · 2026-04-23T20:46:52Z

@homanp Not so overwhelming like the other perf optimizations, but better than before. The tiny penalty on short OSC is ok, the real burner for OSC, DCS & APC are the image sequences, where one sequence can have >10 MB of payload. There even 20% speedup makes a huge difference.

homanp · 2026-04-23T21:01:00Z

@homanp Not so overwhelming like the other perf optimizations, but better than before. The tiny penalty on short OSC is ok, the real burner for OSC, DCS & APC are the image sequences, where one sequence can have >10 MB of payload. There even 20% speedup makes a huge difference.

I've been thinking about the buffer and what type of optimizations could be done there. I'm guessing its heavily optimized already but still. Would be interesting to take a stab at next.

jerch · 2026-04-23T21:10:32Z

I've been thinking about the buffer and what type of optimizations could be done there. I'm guessing its heavily optimized already but still. Would be interesting to take a stab at next.

Sure feel free to give it a go. I think the primitives in Bufferline.ts have not much room for speedup, perf-wise during data input more interesting are:

Inputhandler.print - takes a huge runtime portion (was it >60% of the input chain? don't remember)
UTF32 encoding in src/common/input/TextDecoder.ts
other often used handlers like SGR (no clue about its runtime)

homanp · 2026-04-23T21:13:42Z

I've been thinking about the buffer and what type of optimizations could be done there. I'm guessing its heavily optimized already but still. Would be interesting to take a stab at next.

Sure feel free to give it a go. I think the primitives in Bufferline.ts have not much room for speedup, perf-wise during data input more interesting are:

Inputhandler.print - takes a huge runtime portion (was it >60% of the input chain? don't remember)

UTF32 encoding in src/common/input/TextDecoder.ts

other often used handlers like SGR (no clue about its runtime)

Will look into it.

jerch · 2026-04-24T08:37:52Z

@homanp FYI: the current APC handling is quite off and needs a major rework (see #5834). So we def. need to hold this back until it got fixed.

homanp · 2026-04-24T08:48:23Z

@homanp FYI: the current APC handling is quite off and needs a major rework (see #5834). So we def. need to hold this back until it got fixed.

Makes sense, get back to me here when you have it worked out and I can run antoher pass.

jerch · 2026-05-06T11:14:37Z

@homanp The APC handling is fixed with #5840, feel free to continue the perf investigations.

homanp · 2026-05-06T11:15:39Z

@homanp The APC handling is fixed with #5840, feel free to continue the perf investigations.

Thanks, will merge into this and keep working on it.

jerch · 2026-05-06T11:29:01Z

You can update to master, as it is already merged there.

homanp · 2026-05-11T07:27:53Z

@jerch taking a stab at this today. saw you merged the changes

homanp · 2026-05-11T20:06:40Z

@jerch Pushed another pass here.

The main change is a parser-local fast path for simple 7-bit OSC / DCS / APC string payloads. It only takes the fast path for plain printable payloads, and falls back to the existing transition table for the more complex forms.

I also added a couple of regression tests around the read-ahead behavior.

The APC benchmarks that were added now look much better on my machine:

Case	Baseline	Current	Change
APC string short	60.77 MB/s	156.82 MB/s	+158.04%
APC string long	81.85 MB/s	114.17 MB/s	+39.49%
APC class short	86.88 MB/s	279.73 MB/s	+221.98%
APC class long	308.73 MB/s	545.83 MB/s	+76.80%

Happy to get your feedback on this.

I tried a couple of different approaches with minimal perf gains, I think I might be on to something here, but in all honesty the diff is to big for a PR of my taste :)

homanp · 2026-05-11T20:09:39Z

Here's the full benchmark:

Case	Baseline	Current	Change	Eval
PRINT - a	529.04 MB/s	598.81 MB/s	+13.19%	OK
EXECUTE - `\n`	243.26 MB/s	265.90 MB/s	+9.31%	OK
ESCAPE - ESC E	58.17 MB/s	56.23 MB/s	-3.32%	OK
ESCAPE with collect - ESC % G	76.22 MB/s	68.09 MB/s	-10.67%	FAIL
CSI - CSI A	224.14 MB/s	282.07 MB/s	+25.85%	OK
CSI with collect - CSI ? p	300.91 MB/s	290.91 MB/s	-3.32%	OK
CSI params short	180.64 MB/s	272.68 MB/s	+50.95%	OK
CSI params long	215.47 MB/s	321.01 MB/s	+48.98%	OK
OSC string short	89.97 MB/s	133.25 MB/s	+48.11%	OK
OSC string long	106.05 MB/s	110.88 MB/s	+4.56%	OK
OSC class short	93.33 MB/s	199.26 MB/s	+113.49%	OK
OSC class long	314.14 MB/s	651.01 MB/s	+107.24%	OK
DCS string short	67.90 MB/s	131.69 MB/s	+93.96%	OK
DCS string long	91.06 MB/s	114.48 MB/s	+25.72%	OK
DCS class short	95.90 MB/s	419.30 MB/s	+337.24%	OK
DCS class long	263.03 MB/s	514.02 MB/s	+95.42%	OK
APC string short	60.77 MB/s	156.82 MB/s	+158.04%	OK
APC string long	81.85 MB/s	114.17 MB/s	+39.49%	OK
APC class short	86.88 MB/s	279.73 MB/s	+221.98%	OK
APC class long	308.73 MB/s	545.83 MB/s	+76.80%	OK

homanp force-pushed the perf/payload-loop-unrolling branch from b25ed1e to 6a6157c Compare April 23, 2026 17:48

jerch reviewed Apr 23, 2026

View reviewed changes

homanp force-pushed the perf/payload-loop-unrolling branch from 6a6157c to bfefc1d Compare April 23, 2026 20:31

homanp force-pushed the perf/payload-loop-unrolling branch from bfefc1d to a3dda7f Compare April 23, 2026 20:34

homanp requested a review from jerch April 23, 2026 20:44

Tyriar assigned jerch Apr 26, 2026

jerch added 2 commits May 10, 2026 18:14

Merge branch 'master' into perf/payload-loop-unrolling

db3119d

fix APC_PUT state

a69072a

jerch changed the title ~~perf: table-driven payload loops for OSC/DCS/APC PUT~~ perf: read ahead optimization for OSC/DCS/APC PUT May 10, 2026

perf: fast-path simple string payloads

a4f910d

Conversation

homanp commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

homanp commented Apr 23, 2026

Uh oh!

jerch commented Apr 23, 2026

Uh oh!

homanp commented Apr 23, 2026

Uh oh!

jerch commented Apr 23, 2026

Uh oh!

homanp commented Apr 23, 2026

Uh oh!

jerch commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

homanp commented Apr 23, 2026

Uh oh!

jerch Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

jerch Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

jerch commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

homanp commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

homanp commented Apr 23, 2026

Uh oh!

jerch commented Apr 23, 2026

Uh oh!

homanp commented Apr 23, 2026

Uh oh!

jerch commented Apr 23, 2026

Uh oh!

homanp commented Apr 23, 2026

Uh oh!

jerch commented Apr 24, 2026

Uh oh!

homanp commented Apr 24, 2026

Uh oh!

jerch commented May 6, 2026

Uh oh!

homanp commented May 6, 2026

Uh oh!

jerch commented May 6, 2026

Uh oh!

homanp commented May 11, 2026

Uh oh!

homanp commented May 11, 2026

Uh oh!

homanp commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

homanp commented Apr 23, 2026 •

edited

Loading

jerch commented Apr 23, 2026 •

edited

Loading

jerch commented Apr 23, 2026 •

edited

Loading

homanp commented Apr 23, 2026 •

edited

Loading