Make V8 string to NSString conversions UTF-16 faithful by adrian-niculescu · Pull Request #392 · NativeScript/ios

adrian-niculescu · 2026-06-13T21:55:44Z

V8 strings are UTF-16, but a few of the V8 <-> NSString bridge points round-trip through UTF-8 first, and that loses data in two ways:

Lone surrogates get corrupted. A JS string can hold an unpaired surrogate (for example "\uD834"), which has no valid UTF-8 encoding, so the round-trip replaces it with U+FFFD. This affects every site that goes through the UTF-8 path, including tns::ToUtf16String itself.
Embedded NUL gets truncated at the sites that use C-string APIs. DictionaryAdapter builds strings with [NSString stringWithUTF8String:result.c_str()], and a couple of reverse-direction spots fold [obj UTF8String] into a std::string, both of which stop at the first \0.

The fix is to keep everything in UTF-16:

tns::ToUtf16String now reads the V8 string's native two-byte buffer directly via v8::String::Value instead of ToString + std::codecvt_utf8_utf16. As a bonus that drops the deprecated codecvt (there was already a // FIXME on it) and is faster.
The JS -> NSString sites in DictionaryAdapter, Interop and ArgConverter now build the NSString with stringWithCharacters:length: off ToUtf16String, instead of ToNSString / stringWithUTF8String:.
The two NSString -> V8 spots (DictionaryAdapter objectForKey: and the indexed getter in ArgConverter) pass the NSString straight to ToV8String instead of going through a UTF8String C string. ToV8String(NSString*) already exists and uses an explicit byte length, so it is NUL safe.

I left tns::ToVector alone on purpose. Its input is already a UTF-8 std::string, so it is a different conversion and not part of this bug.

Test: added a TestRunner case that bridges a JS string with a lone high surrogate to an NSString and checks the code unit survives. Without the fix it comes back as U+FFFD.

Formatting note: I matched each file's existing indentation (DictionaryAdapter.mm is 4-space, the others are 2-space) and did not run clang-format, to keep the diff to just the real changes.

We have shipped this in our downstream fork for a while.

Summary by CodeRabbit

Bug Fixes
- Improved JavaScript-to-native string interop by using UTF-16 consistently, improving correctness for edge-case Unicode values (including surrogate pairs and lone surrogates).
- Updated dictionary/map key and object property enumeration to preserve string values accurately during interop.
Tests
- Added Jasmine coverage ensuring lone high (\uD834) and lone low (\uDC00) surrogate strings are preserved by verifying UTF-16 length and code unit values.

coderabbitai · 2026-06-13T21:56:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ac4f49b9-3444-474a-8c05-5d5086d13e23

📥 Commits

Reviewing files that changed from the base of the PR and between 705533c and 8cc59bf.

📒 Files selected for processing (5)

NativeScript/runtime/ArgConverter.mm
NativeScript/runtime/DictionaryAdapter.mm
NativeScript/runtime/Helpers.mm
NativeScript/runtime/Interop.mm
TestRunner/app/tests/ApiTests.js

🚧 Files skipped from review as they are similar to previous changes (5)

NativeScript/runtime/Helpers.mm
TestRunner/app/tests/ApiTests.js
NativeScript/runtime/Interop.mm
NativeScript/runtime/ArgConverter.mm
NativeScript/runtime/DictionaryAdapter.mm

📝 Walkthrough

Walkthrough

This PR updates NativeScript iOS string bridging to use UTF-16 data directly in helper, adapter, and interop paths. It also adds tests confirming that JS strings with lone high and low surrogates preserve their UTF-16 code units across the V8-Objective-C boundary.

Changes

UTF-16 Bridging Conversion

Layer / File(s)	Summary
UTF-16 helper core implementation `NativeScript/runtime/Helpers.mm`	`tns::ToUtf16String` now reads V8's native UTF-16 buffer directly via `v8::String::Value`, handling empty values and `StringObject` wrappers, and removes the codecvt-based conversion path.
ArgConverter V8 to Objective-C string conversion `NativeScript/runtime/ArgConverter.mm`	`SetValue` builds `NSString` from UTF-16 via the updated helper; `IndexedPropertyGetterCallback` passes `NSString*` directly to `tns::ToV8String`.
DictionaryAdapter key enumeration and access `NativeScript/runtime/DictionaryAdapter.mm`	Map and object key enumeration, plus direct key lookup, switch to UTF-16-based `NSString` construction and direct `NSString*` forwarding.
Interop WriteValue and ToObject conversion `NativeScript/runtime/Interop.mm`	`WriteValue` and `ToObject` replace `tns::ToNSString` with direct UTF-16 construction for string-like values.
Surrogate code unit preservation tests `TestRunner/app/tests/ApiTests.js`	New tests verify that lone high and low surrogates survive bridging into `NSString` as individual UTF-16 code units.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through strings both old and new,
With UTF-16 shining bright and true.
No UTF-8 detours in sight today,
Lone surrogates hop safe along the way.
Hooray for code units, crisp and neat! 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: updating V8 string to NSString conversions to preserve UTF-16 fidelity instead of using lossy UTF-8 round-trips.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

TestRunner/app/tests/ApiTests.js (1)
15-22: ⚡ Quick win

Test description in PR objectives claims "round trip" but implementation is one-way.

The PR objectives state this test verifies "a JS lone high surrogate survives the JS → NSString → JS round trip," but the test only converts JS → NSString and checks the NSString properties. To truly verify a round trip, the test should convert the NSString back to a JS string and assert it still equals "\uD834".

The current assertions (ns.length === 1 and UTF-8 byte length === 0) do prove the NSString preserves the lone surrogate internally, but a complete round-trip verification would strengthen confidence in both directions of the bridge.
🔄 Suggested enhancement to complete the round trip
 it("preserves a lone surrogate when bridging a JS string to NSString", function () {
     // A lone high surrogate is valid in a JS string but has no UTF-8 encoding.
     // Faithful UTF-16 bridging keeps it, so the NSString reports 0 UTF-8 bytes.
     // A UTF-8 round-trip would have replaced it with U+FFFD, which is 3 UTF-8 bytes.
-    var ns = NSString.stringWithString("\uD834");
+    var original = "\uD834";
+    var ns = NSString.stringWithString(original);
     expect(ns.length).toBe(1);
     expect(ns.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)).toBe(0);
+    
+    // Complete the round trip: NSString → JS
+    var roundTripped = ns.toString();
+    expect(roundTripped).toBe(original);
+    expect(roundTripped.charCodeAt(0)).toBe(0xD834);
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TestRunner/app/tests/ApiTests.js` around lines 15 - 22, The test function
"preserves a lone surrogate when bridging a JS string to NSString" verifies the
first direction of the round trip (JS string to NSString) but does not complete
the round trip by converting the NSString back to a JS string. After the
existing assertions on ns.length and ns.lengthOfBytesUsingEncoding, add an
assertion that converts the NSString (stored in variable ns) back to a JS string
and verifies it equals the original lone surrogate character "\uD834" to fully
validate the round-trip conversion.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@TestRunner/app/tests/ApiTests.js`:
- Around line 15-22: The test function "preserves a lone surrogate when bridging
a JS string to NSString" verifies the first direction of the round trip (JS
string to NSString) but does not complete the round trip by converting the
NSString back to a JS string. After the existing assertions on ns.length and
ns.lengthOfBytesUsingEncoding, add an assertion that converts the NSString
(stored in variable ns) back to a JS string and verifies it equals the original
lone surrogate character "\uD834" to fully validate the round-trip conversion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a85c5090-e1e6-4cf0-b02b-428ae8103d44

📥 Commits

Reviewing files that changed from the base of the PR and between 16439b5 and a37e7f2.

📒 Files selected for processing (5)

NativeScript/runtime/ArgConverter.mm
NativeScript/runtime/DictionaryAdapter.mm
NativeScript/runtime/Helpers.mm
NativeScript/runtime/Interop.mm
TestRunner/app/tests/ApiTests.js

🚧 Files skipped from review as they are similar to previous changes (4)

NativeScript/runtime/Helpers.mm
NativeScript/runtime/Interop.mm
NativeScript/runtime/ArgConverter.mm
NativeScript/runtime/DictionaryAdapter.mm

adrian-niculescu · 2026-06-13T22:36:49Z

@coderabbitai thanks, but I'm going to skip this suggestion, it would actually break the test.

Completing the round trip with ns.toString() would fail: the reverse NSString to JS direction still goes through UTF-8 (ToV8String(NSString*) uses [value UTF8String]), so it turns the lone surrogate back into U+FFFD. expect(ns.toString()).toBe("\uD834") and charCodeAt(0) === 0xD834 would both fail. I confirmed that by running the TestRunner suite locally.

That reverse path is intentionally not touched for surrogates in this PR. The reverse-direction edits here only fix NUL truncation (passing an explicit byte length instead of folding through a C string), not surrogate fidelity, so there is no faithful NSString to JS path to round-trip a lone surrogate through. That is exactly why the test observes the NSString with a numeric native query (lengthOfBytesUsingEncoding: is 0 for a faithfully kept lone surrogate, 3 once it has become U+FFFD) instead of converting back to a JS string.

The bidirectional NUL case is already covered by the existing Marshals NSString with null character test in NSStringTests.js.

coderabbitai · 2026-06-13T22:37:17Z

Tip

For best results, initiate chat on the files or code changes.

@adrian-niculescu, thanks for the thorough explanation — that makes perfect sense. The ToV8String(NSString*) path still uses [value UTF8String] which mangles lone surrogates back to U+FFFD, so a round-trip assertion via ns.toString() would indeed be a false expectation at this stage. Using lengthOfBytesUsingEncoding(NSUTF8StringEncoding) === 0 as a numeric native proxy is a clean way to confirm the surrogate was kept intact without attempting the broken reverse path. I'll withdraw my suggestion.

(╯°□°）╯ 🐇 ← gently backing away from that suggestion

✏️ Learnings added

Learnt from: adrian-niculescu
URL: https://github.com/NativeScript/ios/pull/392

Timestamp: 2026-06-13T22:37:15.839Z
Learning: In NativeScript iOS (NativeScript/ios), the `ToV8String(NSString*)` function in `Helpers.mm` still goes through `[value UTF8String]`, which means lone (unpaired) UTF-16 surrogates get corrupted to U+FFFD on the NSString→JS direction. PR `#392` intentionally fixes only the JS→NSString direction (surrogate fidelity) and the NUL-truncation issue on both directions, but does NOT fix NSString→JS surrogate fidelity. Therefore, a test for a lone surrogate round-trip through NSString cannot assert `ns.toString() === "\uD834"` — it uses `ns.lengthOfBytesUsingEncoding(NSUTF8StringEncoding) === 0` as a native numeric proxy instead (faithfully stored lone surrogate yields 0 UTF-8 bytes; a corrupted U+FFFD would yield 3).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai

🧹 Nitpick comments (1)

TestRunner/app/tests/ApiTests.js (1)

15-31: ⚡ Quick win

Consider adding a lone low surrogate test case for completeness.

The current test verifies a lone high surrogate (U+D834, range 0xD800-0xDBFF). Adding a similar test for a lone low surrogate (e.g., "\uDC00", range 0xDC00-0xDFFF) would strengthen coverage of the UTF-16 fidelity fix, since both surrogate ranges should be preserved but have different bit patterns.

📝 Optional additional test case

it("preserves a lone low surrogate when bridging a JS string to NSString", function () {
    var ns = NSString.stringWithString("\uDC00");
    expect(ns.length).toBe(1);
    
    var buffer = interop.alloc(interop.sizeof(interop.types.uint16));
    ns.getCharactersRange(buffer, NSMakeRange(0, 1));
    var codeUnit = new interop.Reference(interop.types.uint16, buffer).value;
    interop.free(buffer);
    
    expect(codeUnit).toBe(0xDC00);
});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TestRunner/app/tests/ApiTests.js` around lines 15 - 31, The test suite
currently only covers high surrogates (U+D834) but lacks coverage for low
surrogates in the UTF-16 bridging behavior. Add a new test case after the
existing "preserves a lone surrogate when bridging a JS string to NSString" test
that follows the same pattern but uses a low surrogate code unit (U+DC00 instead
of U+D834) to verify that lone low surrogates are also preserved correctly when
bridging JS strings to NSString. The new test should verify that the bridged
NSString preserves the low surrogate value as 0xDC00.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@TestRunner/app/tests/ApiTests.js`:
- Around line 15-31: The test suite currently only covers high surrogates
(U+D834) but lacks coverage for low surrogates in the UTF-16 bridging behavior.
Add a new test case after the existing "preserves a lone surrogate when bridging
a JS string to NSString" test that follows the same pattern but uses a low
surrogate code unit (U+DC00 instead of U+D834) to verify that lone low
surrogates are also preserved correctly when bridging JS strings to NSString.
The new test should verify that the bridged NSString preserves the low surrogate
value as 0xDC00.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e750c080-3819-4ab5-b5d6-c546b25801a3

📥 Commits

Reviewing files that changed from the base of the PR and between a37e7f2 and 705533c.

📒 Files selected for processing (5)

NativeScript/runtime/ArgConverter.mm
NativeScript/runtime/DictionaryAdapter.mm
NativeScript/runtime/Helpers.mm
NativeScript/runtime/Interop.mm
TestRunner/app/tests/ApiTests.js

🚧 Files skipped from review as they are similar to previous changes (4)

NativeScript/runtime/Interop.mm
NativeScript/runtime/Helpers.mm
NativeScript/runtime/DictionaryAdapter.mm
NativeScript/runtime/ArgConverter.mm

V8 strings are UTF-16, but several bridge points round-tripped them through UTF-8 before building an NSString (and one reverse direction did too). That path corrupts lone surrogates, which get replaced with U+FFFD, and where it used C-string APIs (stringWithUTF8String:, [NSString UTF8String] folded into a std::string) it truncated at an embedded NUL. Rework tns::ToUtf16String to read the V8 string's native two-byte buffer directly, which also drops the deprecated std::codecvt_utf8_utf16. Switch the DictionaryAdapter, Interop and ArgConverter string sites to ToUtf16String + stringWithCharacters:length:, and pass NSString straight to ToV8String instead of going through a UTF8String C string. Add TestRunner cases asserting lone surrogates survive the JS to NSString bridge: each reads the bridged string's first UTF-16 code unit straight out of its buffer and checks it is unchanged (high U+D834 and low U+DC00), whereas the old UTF-8 round trip would have turned either into U+FFFD.

adrian-niculescu · 2026-06-13T23:50:09Z

🧹 Nitpick comments (1)

TestRunner/app/tests/ApiTests.js (1)> 15-31: ⚡ Quick win

Consider adding a lone low surrogate test case for completeness.
The current test verifies a lone high surrogate (U+D834, range 0xD800-0xDBFF). Adding a similar test for a lone low surrogate (e.g., "\uDC00", range 0xDC00-0xDFFF) would strengthen coverage of the UTF-16 fidelity fix, since both surrogate ranges should be preserved but have different bit patterns.

📝 Optional additional test case

it("preserves a lone low surrogate when bridging a JS string to NSString", function () {
    var ns = NSString.stringWithString("\uDC00");
    expect(ns.length).toBe(1);
    
    var buffer = interop.alloc(interop.sizeof(interop.types.uint16));
    ns.getCharactersRange(buffer, NSMakeRange(0, 1));
    var codeUnit = new interop.Reference(interop.types.uint16, buffer).value;
    interop.free(buffer);
    
    expect(codeUnit).toBe(0xDC00);
});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TestRunner/app/tests/ApiTests.js` around lines 15 - 31, The test suite
currently only covers high surrogates (U+D834) but lacks coverage for low
surrogates in the UTF-16 bridging behavior. Add a new test case after the
existing "preserves a lone surrogate when bridging a JS string to NSString" test
that follows the same pattern but uses a low surrogate code unit (U+DC00 instead
of U+D834) to verify that lone low surrogates are also preserved correctly when
bridging JS strings to NSString. The new test should verify that the bridged
NSString preserves the low surrogate value as 0xDC00.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@TestRunner/app/tests/ApiTests.js`:
- Around line 15-31: The test suite currently only covers high surrogates
(U+D834) but lacks coverage for low surrogates in the UTF-16 bridging behavior.
Add a new test case after the existing "preserves a lone surrogate when bridging
a JS string to NSString" test that follows the same pattern but uses a low
surrogate code unit (U+DC00 instead of U+D834) to verify that lone low
surrogates are also preserved correctly when bridging JS strings to NSString.
The new test should verify that the bridged NSString preserves the low surrogate
value as 0xDC00.

ℹ️ Review info

@coderabbitai, re the low surrogate nitpick: added a lone low surrogate case (U+DC00) next to the high surrogate one in 8cc59bf, and renamed the original so the pair reads clearly. Both read the bridged code unit back as a number via getCharacters:range: instead of a UTF-8 conversion, since converting a lone surrogate to UTF-8 hangs the test runner on CI's iOS 17.5 simulator.

coderabbitai · 2026-06-13T23:57:22Z

Tip

For best results, initiate chat on the files or code changes.

@adrian-niculescu, great — both high (\uD834) and low (\uDC00) surrogate cases are now covered, and the renaming makes the pair read cleanly together. The getCharacters:range: / numeric approach is exactly right given the CI constraint you surfaced.

Good to know about the iOS 17.5 simulator hang on UTF-8 lone surrogate conversion — I'll record that for future reference.

✏️ Learnings added

Learnt from: adrian-niculescu
URL: https://github.com/NativeScript/ios/pull/392

Timestamp: 2026-06-13T23:57:12.035Z
Learning: In NativeScript iOS (NativeScript/ios), attempting to convert a lone (unpaired) UTF-16 surrogate to UTF-8 (e.g., via `[NSString UTF8String]` or `lengthOfBytesUsingEncoding:NSUTF8StringEncoding` in a way that triggers actual string conversion) can **hang the test runner** on the iOS 17.5 simulator in CI. Tests that verify lone surrogate preservation (high surrogates U+D800–U+DBFF or low surrogates U+DC00–U+DFFF) must read the code unit back as a numeric value via `getCharacters:range:` into a `uint16` buffer rather than attempting any UTF-8 string conversion. This was discovered while implementing tests for PR `#392`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

adrian-niculescu force-pushed the fix/utf16-string-conversion-fidelity branch from 16439b5 to a37e7f2 Compare June 13, 2026 22:28

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

adrian-niculescu force-pushed the fix/utf16-string-conversion-fidelity branch from a37e7f2 to 705533c Compare June 13, 2026 23:28

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

adrian-niculescu force-pushed the fix/utf16-string-conversion-fidelity branch from 705533c to 8cc59bf Compare June 13, 2026 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make V8 string to NSString conversions UTF-16 faithful#392

Make V8 string to NSString conversions UTF-16 faithful#392
adrian-niculescu wants to merge 1 commit into
NativeScript:mainfrom
adrian-niculescu:fix/utf16-string-conversion-fidelity

adrian-niculescu commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated Code Review Effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

adrian-niculescu commented Jun 13, 2026

Uh oh!

coderabbitai Bot commented Jun 13, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

adrian-niculescu commented Jun 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

adrian-niculescu commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated Code Review Effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

adrian-niculescu commented Jun 13, 2026

Uh oh!

coderabbitai Bot commented Jun 13, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

adrian-niculescu commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adrian-niculescu commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

adrian-niculescu commented Jun 13, 2026 •

edited

Loading