feat(epp): add topology-extractor datalayer plugin#1678
Open
elevran wants to merge 6 commits into
Open
Conversation
3fd1524 to
ec952ab
Compare
elevran
commented
Jun 18, 2026
055c91b to
725472c
Compare
vMaroon
reviewed
Jun 21, 2026
Introduces a topology-extractor endpoint extractor that stamps each endpoint with a Topology attribute (hostname) at creation time. The hostname is resolved once when the endpoint is created: - Default: the Pod hostname field (EndpointMetadata.PodName). - With hostnameLabel configured: the value of the named pod label. If the label is absent, the attribute is not set. Works for both k8s-discovered and file-based endpoints. The attribute is a static Cloneable value stored in the endpoint's AttributeMap. The extractor self-registers with the endpoint-notification-source, creating a default source if none is configured. Planned scorer (not implemented): topology-locality-scorer would score 1.0 for endpoints whose Topology.Hostname matches a key set on the request attribute store, and 0.0 otherwise. The request key would be published by a DataProducer reading a request header (e.g. x-topology-key) or from EPP node metadata. When no key is present on the request, all endpoints score 0.0. Signed-off-by: Etai Lev Ran <elevran@gmail.com>
…fications - Remove auto-generated release note fragment (generated by CI action) - Register for both the endpoint notification source and a Pod k8s notification source via RegisterDependencies - With hostnameLabel configured: endpoint handler extracts the label value and stamps the Topology attribute; Pod handler is a no-op - Without hostnameLabel: endpoint handler tracks the live Endpoint in an internal map; Pod notification handler reads spec.hostname from the Pod object and stamps the matching endpoint - Maintain the endpoint map under a RWMutex; remove entries on delete The prior implementation only handled endpoint events, which do not carry the full Pod object. Pod notifications provide spec.hostname for the no-label path. Signed-off-by: Etai Lev Ran <elevran@gmail.com>
…ogy-extractor
The prior implementation keyed endpoint lookup by endpoint NamespacedName
(e.g. worker-1-rank-0), which never matches the pod notification key
(worker-1). Also, pod notifications fire before endpoints are created, so
the attribute was never stamped.
Fix:
- Key both internal maps by pod identity {PodName, Namespace}.
- endpoints map holds a []Endpoint per pod to cover all rank entries.
- hostnames map caches spec.hostname from pod notifications; whichever
event fires first, the attribute is written once both have been seen.
- Only cache hostnames for ready pods; evict on not-ready or pod delete.
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
README: remove stale 'at endpoint creation' phrasing; clarify that Hostname is sourced from spec.hostname or a configured pod label. Tests: drop always-constant parameters from helper functions (unparam); introduce constants for repeated string literals (goconst). Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Add zone and region fields to the Topology attribute and the params struct. Each param names the pod label to read; defaults are the standard Kubernetes topology labels (corev1.LabelHostname, LabelTopologyZone, LabelTopologyRegion). The hostname label falling back to spec.hostname is preserved: when the hostname label is absent the endpoint is tracked and stamped once the Pod notification fires. Zone and region have no fallback. Zone and region values are not read from Node objects -- that would require RBAC for Node GET/LIST. They are populated when the pod itself carries the corresponding labels. Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Change endpoints map inner type from []Endpoint to map[NamespacedName]Endpoint. EventAddOrUpdate re-fires on endpoint updates, so the previous append caused duplicate entries for the same endpoint. Map assignment is idempotent. Remove the now-unused slices import. Fix README wording: "pod label" -> "endpoint label". Signed-off-by: Etai Lev Ran <elevran@gmail.com>
b232151 to
b785485
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind feature
What this PR does / why we need it:
Adds a
topology-extractorendpoint extractor that stamps each endpoint with aTopologyattribute (hostname, zone, region) for use by topology-aware scoring.The plugin registers for both endpoint lifecycle events and Pod k8s notification
events. Label names for each topology field are configured via
hostname,zone,and
regionparameters; when omitted, the standard Kubernetes topology labels areused as defaults:
kubernetes.io/hostnametopology.kubernetes.io/zonetopology.kubernetes.io/regionLabels are read from the endpoint's pod metadata at endpoint event time. When the
hostname label is absent on a pod, the plugin falls back to
spec.hostnamefromthe Pod notification event. Zone and region have no fallback.
Note: zone and region values are not read from k8s Node objects -- that would
require additional RBAC to allow GET/LIST on Nodes. The topology fields are
populated when the pod itself carries the corresponding labels (e.g. propagated
via the Downward API or admission webhook from the node's labels).
This is groundwork for topology-aware scoring in disaggregated inference: a planned
topology-locality-scorerwill score endpoints whoseTopology.Hostnamematches akey on the request attribute store, enabling locality-aware routing that reduces KV
cache transfer latency across nodes.
Which issue(s) this PR fixes:
Refs #545
Test plan:
Release note (write
NONEif no user-facing change):