Skip to content

KAFKA-20656: Struct honors ByteBuffer remaining bytes#22536

Open
BK202503 wants to merge 1 commit into
apache:trunkfrom
BK202503:KAFKA-20656
Open

KAFKA-20656: Struct honors ByteBuffer remaining bytes#22536
BK202503 wants to merge 1 commit into
apache:trunkfrom
BK202503:KAFKA-20656

Conversation

@BK202503

Copy link
Copy Markdown

JIRA: KAFKA-20656

What

Kafka Connect BYTES values may be either byte[] or ByteBuffer, and ConnectSchema recommends ByteBuffer because plain arrays do not implement content-based equals()/hashCode(). Struct did not honor that contract:

  • getBytes(...) called ByteBuffer.array(), returning the entire backing array instead of the logical remaining bytes, and throwing UnsupportedOperationException for direct buffers.
  • equals and hashCode delegated straight to Arrays.deepEquals/deepHashCode over the raw values array, so two structs holding the same logical BYTES value supplied as byte[] vs. ByteBuffer were not equal.

Changes

  • Struct.getBytes(String) now goes through Utils.toArray(ByteBuffer). That copies the buffer's remaining bytes and works for direct buffers — the same approach Values.convertToBytes already uses elsewhere in this module.
  • Struct.equals(Object) and Struct.hashCode() now compare/hash a normalizedBytesValues() view of the struct that copies top-level BYTES fields stored as ByteBuffer into byte[]. The underlying values array is not mutated, so get(String) callers still see whatever representation was put in. Non-BYTES fields go through Arrays.deepEquals/deepHashCode unchanged.

The normalization is scoped to top-level BYTES fields, matching the reporter's reproducer. Nested BYTES inside ARRAY/MAP/STRUCT fields keep the previous behavior in this PR.

Tests

Added three regression tests in StructTest that fail against the previous implementation and pass with this change:

  • testGetBytesPreservesByteBufferRemainingBytes: a sliced ByteBuffer returns only the logical bytes.
  • testGetBytesSupportsDirectByteBuffer: a direct buffer serializes instead of throwing.
  • testEqualsAndHashCodeWithEquivalentByteArrayAndByteBufferValues: a byte[]-valued struct and a ByteBuffer-valued struct with the same logical content are .equals and share a hashCode.

Validation

./gradlew :connect:api:test --tests "org.apache.kafka.connect.data.StructTest.*"

All StructTest tests pass on JDK 17, including the new three, the existing testFlatStruct, and the existing testEqualsAndHashCodeWithByteArrayValue (which exercises the unchanged byte[]-only equality path).

Scope

This PR completes the four-ticket ByteBuffer.array() cluster for Kafka Connect BYTES values; the other three are split into independent PRs to keep each reviewable:

Committer Checklist

  • Verified design and implementation
  • Verified test coverage and CI build status
  • Verified documentation (including upgrade notes) updates (no public API surface change beyond bug-fix behavior; equality semantics now match the documented BYTES contract)

Kafka Connect BYTES values may be either `byte[]` or `ByteBuffer`, and
`ConnectSchema` recommends `ByteBuffer` because plain arrays do not
implement content-based `equals()`/`hashCode()`. `Struct` did not
honor that contract:

- `getBytes(...)` called `ByteBuffer.array()`, returning the whole
  backing array instead of the logical remaining bytes and throwing
  `UnsupportedOperationException` for direct buffers.
- `equals` and `hashCode` compared raw stored objects, so two structs
  holding the same logical BYTES value supplied as `byte[]` and
  `ByteBuffer` did not compare equal.

`getBytes` now goes through `Utils.toArray(ByteBuffer)`, which copies
the buffer's remaining bytes and supports direct buffers (the same
approach already used by `Values.convertToBytes` in this module).

`equals`/`hashCode` now normalize top-level BYTES fields stored as
`ByteBuffer` into `byte[]` (without mutating the underlying `values`
array) before delegating to `Arrays.deepEquals`/`deepHashCode`, so the
representation a caller used to `put(...)` no longer leaks into struct
identity.

Added three regression tests in `StructTest`:

- `testGetBytesPreservesByteBufferRemainingBytes`: sliced
  `ByteBuffer` returns only the logical bytes.
- `testGetBytesSupportsDirectByteBuffer`: a direct buffer is
  serialized instead of throwing.
- `testEqualsAndHashCodeWithEquivalentByteArrayAndByteBufferValues`:
  a `byte[]`-valued struct and a `ByteBuffer`-valued struct with the
  same content are `.equals` and share a `hashCode`.

The existing `testFlatStruct` and `testEqualsAndHashCodeWithByteArrayValue`
still pass on JDK 17.

Signed-off-by: BK202503 <199436087+BK202503@users.noreply.github.com>
@github-actions github-actions Bot added triage PRs from the community connect labels Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

connect triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant