Skip to content

KAFKA-20623: Heartbeat extension for streams group topology description plugin (1/3)#22551

Open
frankvicky wants to merge 1 commit into
apache:trunkfrom
frankvicky:KAFKA-20623-1-heartbeat
Open

KAFKA-20623: Heartbeat extension for streams group topology description plugin (1/3)#22551
frankvicky wants to merge 1 commit into
apache:trunkfrom
frankvicky:KAFKA-20623-1-heartbeat

Conversation

@frankvicky

@frankvicky frankvicky commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

JIRA: KAFKA-20623 This PR is a part of KIP-1331

Wires the plugin reference into GroupCoordinatorService and adds the
heartbeat-path gate that asks streams clients to push their topology
description.

Plugin reference on the service

  • GroupCoordinatorService.Builder.build() resolves the plugin
    internally via
    config.streamsGroupTopologyDescriptionPlugin(Map.of())
  • The service constructor accepts
    Optional<StreamsGroupTopologyDescriptionPlugin> and
    hands it to a new TopologyDescriptionManager that owns the plugin
    reference and the
    per-group push back-off.

Heartbeat post-processing

  • StreamsGroupHeartbeatResult carries three epoch fields now —
    currentTopologyEpoch, storedDescriptionTopologyEpoch,
    failedDescriptionTopologyEpoch — so the service-layer gate can
    decide whether to set
    TopologyDescriptionRequired=true without re-reading the group on
    every heartbeat.
  • GroupMetadataManager builds the heartbeat result with these fields
    at the four
    existing StreamsGroupHeartbeatResult construction sites.
  • TopologyDescriptionManager.maybeSetTopologyDescriptionRequired(...)
    runs in the
    .thenApply(...) after the heartbeat write. The flag is set when the
    plugin is
    configured, the response has no error, the current epoch is resolved,
    that epoch is
    neither stored nor permanently failed at the plugin, the response does
    not carry a
    STALE_TOPOLOGY status, and the per-group back-off window is not in
    effect.

Back-off

  • StreamsGroupTopologyDescriptionBackoff is a broker-level, per-group
    exponential
    back-off (30 s → 1 h, doubled on each arm at the same topology epoch,
    reset on
    topology-epoch advance).
  • Check-and-arm is folded into a single atomic armIfNotActive
    compute so two
    concurrent heartbeats for the same group cannot both arm the back-off
    and double its
    window beyond the intended length.
  • The back-off is non-timeline, non-replayed state — rebuilt from
    scratch on broker
    restart. Convergence after a restart is driven by the persisted
    StoredDescriptionTopologyEpoch / FailedDescriptionTopologyEpoch
    fields on each
    streams group.
  • The clear(...) and armOrExtend(...) sites consumed by the push and
    DeleteGroups
    paths land in the follow-up PRs along with their respective entry
    points on
    TopologyDescriptionManager.

…on plugin (1/3)

Wires the plugin reference into GroupCoordinatorService and adds the
heartbeat-path
gate that asks streams clients to push their topology description.
Plugin lookup is
resolved internally from groupCoordinatorConfig (no BrokerServer
change).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Kafka Broker group-coordinator triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant