Skip to content

fix: escape all CDATA end sequences in builder#724

Open
spokodev wants to merge 1 commit into
Leonidas-from-XIV:masterfrom
spokodev:fix/cdata-escape-all-occurrences
Open

fix: escape all CDATA end sequences in builder#724
spokodev wants to merge 1 commit into
Leonidas-from-XIV:masterfrom
spokodev:fix/cdata-escape-all-occurrences

Conversation

@spokodev

Copy link
Copy Markdown

Problem

When cdata: true is set on the Builder, a string value that contains two or more ]]> sequences produces invalid XML and is corrupted on round-trip.

const xml2js = require('xml2js')

const value = 'a ]]> b ]]> c'
const xml = new xml2js.Builder({ cdata: true }).buildObject({ r: value })
// emits: <r><![CDATA[a ]]]]><![CDATA[> b ]]> c]]></r>
//                                          ^^^ second ]]> left intact

xml2js.parseString(xml, (err, res) => {
  console.log(res.r) // 'a ]]> b  c]]>'  (corrupted; expected 'a ]]> b ]]> c')
})

The first ]]> is split correctly, but the second one survives intact. Per XML 1.0 section 2.7, a CDATA section ends at the first literal ]]>, so that second sequence prematurely terminates the section. The remaining text is then re-parsed as element content, dropping/garbling characters and (depending on the value) emitting malformed XML. A value with a single ]]> round-trips fine, which is why this slipped through.

Root cause

escapeCDATA uses a non-global string replace, which only replaces the first occurrence:

escapeCDATA = function(entry) {
  return entry.replace(']]>', ']]]]><![CDATA[>');
};

Fix

Use a global regex so every ]]> occurrence is split:

return entry.replace(/]]>/g, ']]]]><![CDATA[>');

Applied to both the CoffeeScript source (src/builder.coffee) and the committed compiled output (lib/builder.js); the compiled change matches coffee -c output exactly.

Tests

Added a round-trip test asserting that a value with two ]]> sequences survives Builder({ cdata: true }) -> parseString unchanged.

  • Before the fix: the new test fails with 'a ]]> b c]]>' == 'a ]]> b ]]> c'.
  • After the fix: full suite green (100 tests pass, exit 0).

escapeCDATA used a non-global string replace, so only the first `]]>`
in a value was split. A second `]]>` survived intact and prematurely
terminated the CDATA section (XML 1.0 section 2.7), producing invalid
XML and corrupting the value on round-trip.

Make the replace global so every `]]>` occurrence is split.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant