Skip to content

KAFKA-20658: Cast SMT honors ByteBuffer remaining bytes#22534

Open
BK202503 wants to merge 1 commit into
apache:trunkfrom
BK202503:KAFKA-20658
Open

KAFKA-20658: Cast SMT honors ByteBuffer remaining bytes#22534
BK202503 wants to merge 1 commit into
apache:trunkfrom
BK202503:KAFKA-20658

Conversation

@BK202503

Copy link
Copy Markdown

JIRA: KAFKA-20658

What

Cast.castToString Base64-encoded ByteBuffer BYTES values via Utils.readBytes(ByteBuffer). Utils.readBytes(ByteBuffer) reads from index 0 to the buffer's limit and ignores position, so a sliced or partially consumed buffer produced a string for bytes outside the logical BYTES value.

Switched the one call site to Utils.toArray(ByteBuffer), which the sibling Connect conversion paths already use. It respects the buffer's position..limit contract and also supports direct buffers. Utils was already imported in this file.

Tests

Added one CastTest case that fails against the previous implementation and passes with this change:

  • castSlicedByteBufferFieldToStringUsesRemainingBytes wraps {1,2,3,4}, slides the buffer to position=1/limit=3, casts the struct field to string, and asserts the Base64 output is AgM= ({2,3}) instead of AQID ({1,2,3}).

The existing castFieldsWithSchema case still passes (covers the byte[] path and a heap-backed-from-position-0 ByteBuffer).

Validation

./gradlew :connect:transforms:test \
  --tests org.apache.kafka.connect.transforms.CastTest.castSlicedByteBufferFieldToStringUsesRemainingBytes \
  --tests org.apache.kafka.connect.transforms.CastTest.castFieldsWithSchema

Both tests pass locally on JDK 17.

Scope

This PR targets only the Cast SMT. KAFKA-20657 (JsonConverter) is already up as #22533; KAFKA-20656 (Struct.getBytes/equals) and KAFKA-20666 (Connect offset backing stores) cover the same ByteBuffer.array()-class bug in their own components and are intentionally left for separate PRs.

Committer Checklist

  • Verified design and implementation
  • Verified test coverage and CI build status
  • Verified documentation (including upgrade notes) updates (no user-facing surface change beyond bug-fix behavior)

`Cast.castToString` Base64-encoded `ByteBuffer` BYTES values via
`Utils.readBytes(ByteBuffer)`, which reads from index `0` to the
buffer's `limit` and ignores `position`. A sliced or partially consumed
buffer therefore emitted bytes that were never part of the logical
BYTES value.

Switch the call site to `Utils.toArray(ByteBuffer)`, which is what the
sibling Connect conversion paths already use; it respects the buffer's
remaining-bytes contract and supports direct buffers as a bonus. The
fix is a one-line change at the only ByteBuffer call site in `Cast`;
no other transforms in this file touch a `ByteBuffer`.

Added one `CastTest` case that fails against the previous
implementation and passes with this fix:

- `castSlicedByteBufferFieldToStringUsesRemainingBytes` wraps
  `{1,2,3,4}`, slides the buffer to position=1/limit=3, casts the
  struct field to string, and asserts the Base64 output is `AgM=`
  (`{2,3}`) instead of `AQID` (`{1,2,3}`).

Signed-off-by: BK202503 <199436087+BK202503@users.noreply.github.com>
@nileshkumar3

Copy link
Copy Markdown
Contributor

@BK202503 I already have a pr opened for this #22503 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

connect small Small PRs triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants