Skip to content

isugimpy/node-readiness-manager

Repository files navigation

Node Readiness Manager

A Kubernetes controller that watches cluster conditions and dynamically manages node labels, taints, and annotations to enforce dependency ordering for infrastructure components.

Overview

In Kubernetes clusters, node readiness depends on a cascade of infrastructure components: CNI plugins, device plugins, CSI drivers, custom node agents, and system daemons. These components typically run as DaemonSets, and without coordination, their pods can race against each other for node placement.

This controller solves that problem by watching cluster conditions and applying node modifications based on declarative policies.

Features

  • Declarative configuration: All behavior defined via Custom Resource Definitions
  • Runtime composition: Policies reference each other; the controller builds execution logic dynamically
  • Generic resource watching: Can observe any Kubernetes resource type, not just nodes
  • Safe node modification: Add and remove labels, annotations, and taints on nodes
  • Time-based guarantees: Minimum duration requirements before actions trigger (prevents flapping)
  • Dependency sequencing: Enable proper startup ordering for DaemonSets and other workloads
  • Dry-run mode: Preview actions without applying them
  • Prometheus metrics: Full observability via Prometheus metrics
  • Helm chart: Easy deployment via Helm

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        API Server                               │
│                                                                 │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │ WatchSource  │  │NodeDepPolicy │  │   Cluster Resources   │  │
│  │ (CRD)        │  │   (CRD)      │  │   (Pods, Nodes, etc.) │  │
│  └──────┬───────┘  └──────┬───────┘  └──────────┬───────────┘  │
└─────────┼──────────────────┼──────────────────────┼──────────────┘
          │                  │                      │
          ▼                  ▼                      ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Node Readiness Manager                       │
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐    ┌────────────┐  │
│  │  Controller     │───▶│  Graph Builder  │───▶│ Reconciler │  │
│  │  (Watch CRDs)   │    │  (DAG + Order)  │    │  (Per Node)│  │
│  └─────────────────┘    └─────────────────┘    └─────┬──────┘  │
│                                                       │         │
│  ┌─────────────────┐    ┌─────────────────┐          │         │
│  │  Watch Manager  │◀───│  Watch Sources  │◀─────────┘         │
│  │  (Generic)      │    │  (Per CRD Type) │                    │
│  └────────┬────────┘    └─────────────────┘                    │
│           │                                                     │
│           ▼                                                     │
│  ┌─────────────────┐    ┌─────────────────┐                    │
│  │  Condition      │    │  Action         │                    │
│  │  Evaluator      │───▶│  Executor       │                    │
│  └─────────────────┘    └────────┬────────┘                    │
│                                 │                               │
└─────────────────────────────────┼───────────────────────────────┘
                                  │
                                  ▼
                          ┌─────────────────┐
                          │   Node Labels,  │
                          │  Taints,        │
                          │  Annotations    │
                          └─────────────────┘

Custom Resources

WatchSource

A reusable resource that declares what the controller should observe in the cluster.

apiVersion: node-readiness-manager.io/v1alpha1
kind: WatchSource
metadata:
  name: cni-ready
  namespace: node-readiness-manager
spec:
  resource:
    apiVersion: v1
    kind: Pod
    namespace: kube-system
    labelSelector:
      app: calico-node
  condition:
    type: podReady
    match: all

NodeDependencyPolicy

The primary policy resource. Defines what to watch (via a WatchSource), what action to take on nodes, and what other policies must be satisfied first.

apiVersion: node-readiness-manager.io/v1alpha1
kind: NodeDependencyPolicy
metadata:
  name: cni-node-ready
  namespace: node-readiness-manager
spec:
  displayName: "CNI Node Ready"
  description: "Mark nodes as CNI-ready once calico pods are running"

  watchSourceRef:
    name: cni-ready
    namespace: node-readiness-manager

  targetNodes:
    matchLabels:
      kubernetes.io/os: linux
    matchExpressions:
      - key: node-role.kubernetes.io/worker
        operator: Exists

  actions:
    - type: addTaint
      taint:
        key: node.cni.io/not-ready
        effect: NoSchedule
    - type: addLabel
      label:
        key: node.cni.io/ready
        value: "true"
    - type: removeTaint
      taint:
        key: node.cni.io/not-ready
        effect: NoSchedule
    - type: removeLabel
      label:
        key: node.cni.io/ready
        value: "true"

  timing:
    satisfiedDuration: 30s
    evaluationInterval: 10s

  dependsOn:
    - node-readiness-manager/cni-prereqs-satisfied

  nodeSelectorTemplate:
    annotations:
      node-readiness-manager.io/policy-status: "active"

Getting Started

Prerequisites

  • Kubernetes v1.28+
  • Helm 3.x (for Helm chart deployment)
  • Go 1.22+ (for building from source)

Installing via Helm

helm install node-readiness-manager ./charts/node-readiness-manager \
  --namespace node-readiness-manager \
  --create-namespace

Building from Source

# Build the controller
make build

# Run locally (requires kubeconfig)
make run

# Run tests
make test

# Generate CRD and RBAC manifests
make manifests

Deploying via Kustomize

# Install CRDs
make install

# Deploy the controller
make deploy

Example Scenarios

CNI Readiness Before Workloads

# Step 1: Wait for CNI prerequisites (kubelet, container runtime)
apiVersion: node-readiness-manager.io/v1alpha1
kind: NodeDependencyPolicy
metadata:
  name: cni-prereqs-satisfied
spec:
  watchSourceRef:
    name: node-conditions-ok
  targetNodes:
    matchLabels:
      kubernetes.io/os: linux
  actions:
    - type: addTaint
      taint:
        key: node.cni.io/preparing
        effect: NoSchedule
    - type: addTaint
      taint:
        key: node.cni.io/not-ready
        effect: NoSchedule
  dependsOn: []
---
# Step 2: Wait for CNI pods to be ready
apiVersion: node-readiness-manager.io/v1alpha1
kind: NodeDependencyPolicy
metadata:
  name: cni-ready
spec:
  watchSourceRef:
    name: cni-ready
  targetNodes:
    matchLabels:
      kubernetes.io/os: linux
  actions:
    - type: removeTaint
      taint:
        key: node.cni.io/preparing
        effect: NoSchedule
    - type: addLabel
      label:
        key: node.cni.io/cni-ready
        value: "true"
  dependsOn:
    - node-readiness-manager/cni-prereqs-satisfied
---
# Step 3: Allow workload DaemonSets to schedule
apiVersion: node-readiness-manager.io/v1alpha1
kind: NodeDependencyPolicy
metadata:
  name: workload-ready
spec:
  watchSourceRef:
    name: cni-ready
  targetNodes:
    matchLabels:
      kubernetes.io/os: linux
  actions:
    - type: removeTaint
      taint:
        key: node.cni.io/not-ready
        effect: NoSchedule
    - type: addLabel
      label:
        key: node.cni.io/workload-ready
        value: "true"
  dependsOn:
    - node-readiness-manager/cni-ready

Node Condition Monitoring

apiVersion: node-readiness-manager.io/v1alpha1
kind: WatchSource
metadata:
  name: node-memory-pressure
spec:
  resource:
    apiVersion: v1
    kind: Node
  condition:
    type: nodeCondition
    nodeConditionType: MemoryPressure
    operator: equals
    value: "False"
---
apiVersion: node-readiness-manager.io/v1alpha1
kind: NodeDependencyPolicy
metadata:
  name: memory-pressure-not-ready
spec:
  watchSourceRef:
    name: node-memory-pressure
  targetNodes:
    matchLabels:
      kubernetes.io/os: linux
  actions:
    - type: addTaint
      taint:
        key: node.memory.io/pressure
        effect: NoSchedule
    - type: addAnnotation
      annotation:
        key: node-readiness-manager.io/memory-pressure
        value: "true"
  dependsOn: []

Observability

Prometheus Metrics

Metric Type Labels Description
ndm_policies_total Gauge name, namespace, status Number of policies by status
ndm_policies_evaluated_total Counter name, namespace, result Total evaluations (success/failure)
ndm_actions_applied_total Counter name, namespace, action_type Total actions applied
ndm_watch_connections Gauge source Active watch connections
ndm_reconciliation_duration_seconds Histogram policy Time to reconcile a policy
ndm_node_condition_duration_seconds Histogram policy, node Time condition was satisfied

Kubernetes Events

The controller emits Kubernetes events on:

  • Policy creation/update/deletion
  • Condition state changes (watching → satisfied, satisfied → timeout)
  • Action application (add/remove label, taint, annotation)
  • Errors (missing references, cycles, API failures)

Safety Features

  1. Idempotent actions: All label/taint/annotation operations are idempotent
  2. Minimum duration: Conditions must be satisfied for satisfiedDuration before actions apply
  3. Graceful degradation: If the controller is unavailable, nodes retain their current state
  4. Manual override: Operators can remove controller-managed labels/taints manually
  5. Dry-run support: Policies can be validated in dry-run mode

Project Structure

node-readiness-manager/
├── api/
│   └── node-readiness-manager.io/
│       └── v1alpha1/
│           ├── groupversion_info.go
│           ├── watchsource_types.go
│           ├── watchsource_validation.go
│           ├── nodedependencypolicy_types.go
│           └── nodedependencypolicy_validation.go
├── cmd/
│   └── manager/
│       └── main.go
├── config/
│   ├── crd/
│   │   └── bases/
│   ├── rbac/
│   ├── manager/
│   ├── samples/
│   └── default/
├── internal/
│   ├── controller/
│   ├── graph/
│   ├── watch/
│   ├── evaluator/
│   ├── reconciler/
│   ├── action/
│   ├── metrics/
│   └── webhook/
├── charts/
│   └── node-readiness-manager/
├── Dockerfile
├── Makefile
├── go.mod
└── go.sum

License

Apache License 2.0 - see LICENSE file for details.

Contributing

Contributions are welcome! Please submit pull requests or open issues for bugs and feature requests.

About

Manages Kubernetes node labels and taints based on cluster and node context

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages