diff --git a/docs/metrics.md b/docs/metrics.md
index 5b968f8c65..0bd916804a 100644
--- a/docs/metrics.md
+++ b/docs/metrics.md
@@ -42,6 +42,78 @@ Metrics defined by llm-d Router are in addition to Inference Gateway metrics. Fo
 > This metric is maintained for backward compatibility with the deprecated
 > `pd-profile-handler`. New deployments should use `disagg_decision_total`.
 
+## Flow Control Metrics
+
+Exposed when the `flowControl` feature gate is enabled. All carry the `llm_d_epp_` prefix.
+
+### `flow_control_request_queue_duration_seconds`
+
+*   **Type:** Histogram
+*   **Labels:**
+    *   `fairness_id`: string (the tenant or flow identifier for fairness rotation)
+    *   `priority`: string (the priority band, e.g., "0", "10")
+    *   `outcome`: string (`Dispatched`, `RejectedCapacity`, `RejectedOther`, `EvictedTTL`, `EvictedContextCancelled`, `EvictedOther`)
+    *   `inference_pool`: string
+    *   `model_name`: string
+    *   `target_model_name`: string
+*   **Release Stage:** ALPHA
+*   **Description:** Total time a request spends in the Flow Control layer, from enqueue to final outcome.
+*   **Usage:** Primary latency signal for flow control. Rising p99 indicates backends are saturated or capacity limits are too tight.
+
+### `flow_control_dispatch_cycle_duration_seconds`
+
+*   **Type:** Histogram
+*   **Release Stage:** ALPHA
+*   **Description:** Time taken for each internal dispatch cycle.
+*   **Usage:** Measures the overhead of the dispatch loop itself. Rising values indicate increasing cost per cycle from saturation detection, priority band iteration, or fairness evaluation.
+
+### `flow_control_request_enqueue_duration_seconds`
+
+*   **Type:** Histogram
+*   **Labels:**
+    *   `fairness_id`: string (the tenant or flow identifier)
+    *   `priority`: string (the priority band)
+    *   `outcome`: string
+*   **Release Stage:** ALPHA
+*   **Description:** Time taken to enqueue a request into the Flow Control layer.
+*   **Usage:** Measures the time spent in capacity checks and queue insertion within the processor.
+
+### `flow_control_queue_size`
+
+*   **Type:** Gauge
+*   **Labels:**
+    *   `fairness_id`: string (the tenant or flow identifier)
+    *   `priority`: string (the priority band)
+    *   `inference_pool`: string
+    *   `model_name`: string
+    *   `target_model_name`: string
+*   **Release Stage:** ALPHA
+*   **Description:** Current number of requests actively held in the Flow Control queue.
+*   **Usage:** Tracks queue depth per priority band and tenant. A steadily growing value indicates the dispatch rate is lower than the arrival rate.
+
+### `flow_control_queue_bytes`
+
+*   **Type:** Gauge
+*   **Labels:**
+    *   `fairness_id`: string (the tenant or flow identifier)
+    *   `priority`: string (the priority band)
+    *   `inference_pool`: string
+    *   `model_name`: string
+    *   `target_model_name`: string
+*   **Release Stage:** ALPHA
+*   **Description:** Current total size in bytes of requests actively held in the Flow Control queue.
+*   **Usage:** Tracks memory pressure from queued requests. Compare against the configured `maxBytes` capacity to gauge how close a band is to rejecting new requests.
+
+### `flow_control_pool_saturation`
+
+*   **Type:** Gauge
+*   **Labels:**
+    *   `inference_pool`: string
+*   **Release Stage:** ALPHA
+*   **Description:** Current saturation level of the inference pool (0.0 = empty, 1.0 = fully saturated).
+*   **Usage:** When saturation reaches the usage limit threshold, the dispatch cycle skips dispatching and requests remain queued. Sustained 1.0 indicates all backends are at capacity.
+
+
 ## Opt-in ext_proc Stream Metrics
 
 Three metrics covering ext_proc gRPC stream lifecycle. Disabled by default; enable with `--enable-grpc-stream-metrics`. These metrics are emitted under the `llm_d_epp_` prefix (separate from `llm_d_inference_scheduler_*`).