diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index e53a901..08f98cf 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -58,7 +58,7 @@ jobs: -ldflags="-s -w -X ${BUILDINFO_PACKAGE}.Version=${VERSION} -X ${BUILDINFO_PACKAGE}.Commit=${COMMIT} -X ${BUILDINFO_PACKAGE}.Date=${BUILD_DATE}" \ -o "${work}/${cmd}${ext}" "./cmd/${cmd}" done - cp README.md README.zh-CN.md QUICKSTART.md CHANGELOG.md AI_CONTRACT.md LICENSE NOTICE SECURITY.md "${work}/" + cp README.md README.zh-CN.md QUICKSTART.md UPGRADING.md CHANGELOG.md AI_CONTRACT.md LICENSE NOTICE SECURITY.md "${work}/" mkdir -p "${work}/docs" cp -R docs/assets "${work}/docs/" mkdir -p "${work}/local" diff --git a/AI_CONTRACT.md b/AI_CONTRACT.md index 04f0d9c..aefd4bc 100644 --- a/AI_CONTRACT.md +++ b/AI_CONTRACT.md @@ -40,6 +40,15 @@ The response always includes: - `PV` - `StorageClass` - `CSIDriver` +- `Service` +- `Ingress` +- `EndpointSlice` +- `NetworkPolicy` +- `ServiceAccount` +- `RoleBinding` +- `ClusterRoleBinding` +- `Node` +- `VolumeAttachment` - `HelmRelease` - `HelmChart` @@ -101,7 +110,8 @@ best-effort reasoning. `recipe` and `lanes` are additive Incident Context Pack v1 metadata. `recipe` is a product label for how the graph should be interpreted; it does not change the underlying graph identity contract. v1 recipes are: `pod-incident`, -`workload-incident`, `helm-ownership`, and `helm-upgrade-runtime-failure`. +`workload-incident`, `storage-csi`, `service-routing`, `identity`, +`node-context`, `helm-ownership`, and `helm-upgrade-runtime-failure`. `warnings`, `partial`, `degradedSources`, `budgets`, `rankedEvidence`, and `conflicts` are additive diagnostic metadata. Agents should read them before @@ -232,6 +242,14 @@ The following edge kinds are intended as stable phase-1 diagnostic semantics: - `affected_by_webhook` - `managed_by_csi_controller` - `served_by_csi_node_agent` +- `attaches_pv` + +### Routing, policy, and scaling +- `routes_to_service` +- `targets_pod` +- `applies_to_pod` +- `scales_workload` +- `protects_pod` ### Helm / package provenance - `managed_by_helm_release` diff --git a/CHANGELOG.md b/CHANGELOG.md index 0e97125..bdf462d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,36 @@ # Changelog +## v0.1.8 - 2026-06-01 + +### Added + +- Added diagnostic recipe vocabulary for `storage-csi`, `service-routing`, + `identity`, and `node-context` across the CLI/API/viewer contract. +- Added richer Event and Pod runtime evidence fields, including Event counts + and timestamps plus container waiting, terminated, and restart status. +- Added fallback Event correlation by involved object kind/name when UID + evidence is absent. +- Added topology coverage for Ingress, EndpointSlice, NetworkPolicy, + HorizontalPodAutoscaler, PodDisruptionBudget, and VolumeAttachment. +- Added offline failure-mode fixtures for CrashLoopBackOff, FailedScheduling, + Service selector mismatch, missing ConfigMap/Secret, RBAC forbidden, + admission webhook failure, and PVC/CSI provisioning failure. +- Added viewer support for `focusNode` URL state and the expanded recipe and + entry-kind vocabulary. +- Added `UPGRADING.md` with user-facing compatibility and upgrade guidance. + +### Changed + +- Changed the Helm chart default to `rbac.readSecrets=false`; Secret topology + collection now requires an explicit opt-in. +- Updated release, installation, documentation, and access skill examples for + `v0.1.8`. + +### Validation + +- Local Go test, viewer, Helm, binary, client, schema, and sample checks are + expected before tagging. + ## v0.1.7 - 2026-05-06 ### Added diff --git a/QUICKSTART.md b/QUICKSTART.md index 1ac1cde..8e4095b 100644 --- a/QUICKSTART.md +++ b/QUICKSTART.md @@ -62,7 +62,7 @@ the CLI at that local server. Set the version and choose the release archive for your machine: ```bash -export KO_VERSION=v0.1.7 +export KO_VERSION=v0.1.8 curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz" tar -xzf "kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz" cd "kubernetes-ontology_${KO_VERSION}_linux_amd64" @@ -158,8 +158,9 @@ path above. Set the version and image namespace you want to use: ```bash -export KO_VERSION=v0.1.7 +export KO_VERSION=v0.1.8 export KO_IMAGE=ghcr.io/colvin-y/kubernetes-ontology +curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology-0.1.8.tgz" ``` Use the `KO_VERSION` value for the release tag you want to install. If you @@ -169,7 +170,7 @@ image reference. Install the Helm chart: ```bash -helm upgrade --install kubernetes-ontology ./charts/kubernetes-ontology \ +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ --namespace kubernetes-ontology \ --create-namespace \ --set image.repository="${KO_IMAGE}" \ @@ -183,14 +184,14 @@ read-only RBAC. The daemon uses those in-cluster credentials only for `get`/`list`/`watch` collection. Inside the pod, the server listens on `:18080` rather than `0.0.0.0:18080` so Kubernetes IPv4, IPv6, and dual-stack networking can use the wildcard listener supported by the runtime. By default the chart -grants `secrets` `get`/`list`/`watch` permission so Secret nodes and -`uses_secret` edges can be collected. To run without Secret collection: +does not grant Secret reads. To include Secret nodes and `uses_secret` edges, +opt in explicitly: ```bash -helm upgrade --install kubernetes-ontology ./charts/kubernetes-ontology \ +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ --namespace kubernetes-ontology \ --reuse-values \ - --set rbac.readSecrets=false + --set rbac.readSecrets=true ``` Wait for the server: @@ -564,6 +565,9 @@ fault-diagnosis workflow and downstream AI-agent consumption. It includes additive `schemaVersion`, `recipe`, `lanes`, `partial`, `warnings`, `budgets`, `rankedEvidence`, `degradedSources`, and `conflicts` fields so agents can tell bounded evidence from complete cluster truth. +Canonical recipe hints are `pod-incident`, `workload-incident`, `storage-csi`, +`service-routing`, `identity`, `node-context`, `helm-ownership`, and +`helm-upgrade-runtime-failure`. When resources carry standard Helm metadata, diagnostic graphs can also include `HelmRelease` and `HelmChart` nodes connected by `managed_by_helm_release` and `installs_chart` edges. These edges are label evidence with confidence scores, @@ -590,6 +594,7 @@ the Helm output. An offline reference fixture for this story is available at `samples/helm-upgrade-failure/diagnostic-graph.json`. +Additional failure-mode fixtures live under `samples/failure-modes/`. Pod-centered diagnostic queries keep shared nodes bounded by default. For example, a pod's `ServiceAccount` is shown, but the traversal does not continue @@ -650,6 +655,13 @@ recipe metadata, freshness, budget truncation, warnings, conflicts, degraded sources, and ranked evidence. Evidence and conflict entries focus the related node or edge when the fixture or daemon response includes IDs. +Viewer URLs can carry state for handoff between docs, issues, and agents: + +```text +http://127.0.0.1:8765/?diagnostic=1&kind=Pod&namespace=default&name=my-pod&recipe=pod-incident +http://127.0.0.1:8765/?file=samples/failure-modes/crashloopbackoff/diagnostic-graph.json&focusNode=demo-cluster/core/Pod/default/api-7c9d/pod-crash/_ +``` + Select a node and use `Expand 1 hop` to fetch the next layer from the daemon. The CLI equivalent is: diff --git a/README.md b/README.md index dc92235..1825ed7 100644 --- a/README.md +++ b/README.md @@ -56,6 +56,14 @@ This project turns those object reads into a graph: - `PV` - `StorageClass` - `CSIDriver` +- `Service` +- `Ingress` +- `EndpointSlice` +- `NetworkPolicy` +- `ServiceAccount` +- `RoleBinding` +- `ClusterRoleBinding` +- `Node` - `HelmRelease` - `HelmChart` @@ -89,6 +97,10 @@ The graph can recover and correlate: - display-only controller ownership rules for controller pods that Kubernetes does not expose through owner references - service selector matches +- Ingress backend Service references +- EndpointSlice Pod target references +- NetworkPolicy, HPA, and PodDisruptionBudget selector/target evidence +- VolumeAttachment to PersistentVolume evidence - pod to node placement - pod to Secret, ConfigMap, ServiceAccount, image, PVC, PV, StorageClass, and CSI driver paths @@ -224,9 +236,9 @@ There are three deployment modes: - Helm mode installs this project's own Deployment, Service, ServiceAccount, ConfigMap, and read-only RBAC so the daemon and viewer can run in-cluster. That install-time footprint is expected. The granted RBAC is limited to - `get`, `list`, and `watch` for collected resources. Secret reads are enabled - by default so Secret nodes and `uses_secret` edges can be collected; set - `rbac.readSecrets=false` to disable them. + `get`, `list`, and `watch` for collected resources. Secret reads are disabled + by default; set `rbac.readSecrets=true` only when Secret nodes and + `uses_secret` edges are needed. The HTTP API is intended for local or controlled environments, not public multi-tenant exposure. @@ -241,7 +253,7 @@ the published GHCR image. The release archive includes the server viewer `kubernetes-ontology-viewer`. ```bash -export KO_VERSION=v0.1.7 +export KO_VERSION=v0.1.8 curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz" tar -xzf "kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz" cd "kubernetes-ontology_${KO_VERSION}_linux_amd64" @@ -297,10 +309,11 @@ clusters, mirror `ghcr.io/colvin-y/kubernetes-ontology` to an internal registry and set `KO_IMAGE` to that mirror, or use the release binary path above. ```bash -export KO_VERSION=v0.1.7 +export KO_VERSION=v0.1.8 export KO_IMAGE=ghcr.io/colvin-y/kubernetes-ontology +curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology-0.1.8.tgz" -helm upgrade --install kubernetes-ontology ./charts/kubernetes-ontology \ +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ --namespace kubernetes-ontology \ --create-namespace \ --set image.repository="${KO_IMAGE}" \ @@ -328,6 +341,15 @@ The Helm chart creates the project Deployment, Service, ServiceAccount, ConfigMap, and read-only RBAC required to run in-cluster. It also deploys the topology viewer by default: +To include Secret nodes and `uses_secret` edges, opt in explicitly: + +```bash +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ + --namespace kubernetes-ontology \ + --reuse-values \ + --set rbac.readSecrets=true +``` + ```bash kubectl -n kubernetes-ontology port-forward svc/kubernetes-ontology-viewer 8765:8765 ``` @@ -468,6 +490,10 @@ Diagnostic responses include additive `schemaVersion`, `recipe`, `lanes`, `partial`, `warnings`, `budgets`, `rankedEvidence`, `degradedSources`, and `conflicts` fields. Agents should use those fields to distinguish bounded evidence from complete cluster truth. +Current recipe hints are `pod-incident`, `workload-incident`, `storage-csi`, +`service-routing`, `identity`, `node-context`, `helm-ownership`, and +`helm-upgrade-runtime-failure`. Offline failure-mode fixtures live under +`samples/failure-modes/`. Expand one graph node: @@ -610,7 +636,7 @@ Tagged releases publish: `kubernetes-ontology-viewer`, Quickstart docs, release notes, and a local config example - a packaged Helm chart archive, for example - `kubernetes-ontology-0.1.7.tgz` + `kubernetes-ontology-0.1.8.tgz` - a multi-architecture image at `ghcr.io/colvin-y/kubernetes-ontology:` - SemVer aliases without the leading `v`, plus `latest` diff --git a/README.zh-CN.md b/README.zh-CN.md index 140015d..9bcbd78 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -42,8 +42,8 @@ daemon 运行时不会: - Helm 模式会安装本项目自己的 Deployment、Service、ServiceAccount、 ConfigMap 和只读 RBAC,让 daemon 和 viewer 能在集群内运行。这些是安装阶段 的预期资源。chart 授予的 RBAC 只包含对采集资源的 `get`、`list`、 - `watch` 权限;Secret 读取默认开启,以便采集 Secret 节点和 `uses_secret` - 关系;如需关闭可设置 `rbac.readSecrets=false`。 + `watch` 权限;Secret 读取默认关闭。只有需要采集 Secret 节点和 + `uses_secret` 关系时,才显式设置 `rbac.readSecrets=true`。 HTTP API 建议只暴露在本机或受控内网环境中,不要直接作为公网多租户服务使用。 @@ -53,12 +53,28 @@ HTTP API 建议只暴露在本机或受控内网环境中,不要直接作为 - `Pod` - `Workload` +- `PVC` +- `PV` +- `StorageClass` +- `CSIDriver` +- `Service` +- `Ingress` +- `EndpointSlice` +- `NetworkPolicy` +- `ServiceAccount` +- `RoleBinding` +- `ClusterRoleBinding` +- `Node` +- `HelmRelease` +- `HelmChart` 可以恢复和展示的关系包括: - `Pod -> ReplicaSet -> Deployment` 等 ownerReference 链路 - 自定义 workload 资源,例如 OpenKruise ASTS、Redis Cluster 等 CRD 对象 - Service selector 到 Pod 的匹配关系 +- Ingress 到 Service、EndpointSlice 到 Pod、NetworkPolicy 到 Pod 的路由与策略证据 +- HPA 到 Workload、PodDisruptionBudget 到 Pod、VolumeAttachment 到 PV 的关系 - Pod 到 Node、Secret、ConfigMap、ServiceAccount、Image、PVC 的关系 - PVC、PV、StorageClass、CSI Driver 的存储链路 - ServiceAccount 到 RoleBinding、ClusterRoleBinding 的证据 @@ -143,7 +159,7 @@ Skill marketplace 对外链接故意指向默认分支,这样 Agent 会拿到 `kubernetes-ontology-viewer`。 ```bash -export KO_VERSION=v0.1.7 +export KO_VERSION=v0.1.8 curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz" tar -xzf "kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz" cd "kubernetes-ontology_${KO_VERSION}_linux_amd64" @@ -205,10 +221,11 @@ streamMode: informer `KO_IMAGE` 改成内部地址,或者使用上面的二进制方式。 ```bash -export KO_VERSION=v0.1.7 +export KO_VERSION=v0.1.8 export KO_IMAGE=ghcr.io/colvin-y/kubernetes-ontology +curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology-0.1.8.tgz" -helm upgrade --install kubernetes-ontology ./charts/kubernetes-ontology \ +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ --namespace kubernetes-ontology \ --create-namespace \ --set image.repository="${KO_IMAGE}" \ @@ -231,6 +248,15 @@ release tag,然后查询状态: kubernetes-ontology --server "http://127.0.0.1:18080" --status ``` +如果需要采集 Secret 节点和 `uses_secret` 边,显式开启: + +```bash +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ + --namespace kubernetes-ontology \ + --reuse-values \ + --set rbac.readSecrets=true +``` + 短期试用结束后,用 `Ctrl-C` 停止 `kubectl port-forward`;如果不再长期使用, 可以卸载集群内资源: @@ -353,6 +379,9 @@ OpenKruise,这是正常情况,不需要为此中断启动。 诊断返回会额外包含 `partial`、`warnings`、`budgets`、 `rankedEvidence`、`degradedSources` 和 `conflicts`。Agent 应优先读取 这些字段,区分“有边界的证据图”和“完整集群事实”。 +当前 recipe 包括 `pod-incident`、`workload-incident`、`storage-csi`、 +`service-routing`、`identity`、`node-context`、`helm-ownership` 和 +`helm-upgrade-runtime-failure`。离线故障样例位于 `samples/failure-modes/`。 展开一个图节点: diff --git a/SECURITY.md b/SECURITY.md index 8eef46b..7b5169a 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -41,8 +41,9 @@ Expected behavior: - The daemon and viewer should be exposed only on localhost or controlled internal networks unless an operator adds external protection. - The HTTP API has no built-in authentication or TLS yet. -- Secret reads are used only to model Secret nodes and `uses_secret` edges. - Disable them with `rbac.readSecrets=false` when that evidence is not needed. +- Secret reads are disabled by default in the Helm chart. If enabled with + `rbac.readSecrets=true`, they are used only to model Secret nodes and + `uses_secret` edges. Out of scope for the current MVP: diff --git a/UPGRADING.md b/UPGRADING.md new file mode 100644 index 0000000..2b7cc84 --- /dev/null +++ b/UPGRADING.md @@ -0,0 +1,103 @@ +# Upgrading kubernetes-ontology + +This guide covers user-facing upgrades across the CLI, daemon, Helm chart, +viewer, and access skill. + +## Compatibility Policy + +Use the same released version for: + +- `kubernetes-ontology` CLI +- `kubernetes-ontologyd` daemon +- `kubernetes-ontology-viewer` +- Helm chart +- container image tag + +Patch releases may add fields to diagnostic JSON. AI-agent clients should ignore +unknown fields and rely on `schemaVersion` plus the stable fields documented in +`AI_CONTRACT.md`. + +## Upgrade To v0.1.8 + +`v0.1.8` changes the Helm chart default: + +```yaml +rbac: + readSecrets: false +``` + +This keeps the default RBAC footprint smaller. Pod-to-Secret graph edges and +Secret nodes are collected only when you explicitly opt in: + +```bash +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ + --namespace kubernetes-ontology \ + --reuse-values \ + --set rbac.readSecrets=true +``` + +If Secret reads remain disabled, diagnostics should treat Secret evidence as +unavailable rather than absent proof. + +## Binary Upgrade + +Download the matching release archive for your platform: + +```bash +export KO_VERSION=v0.1.8 +curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz" +tar -xzf "kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz" +./kubernetes-ontology_${KO_VERSION}_linux_amd64/kubernetes-ontology --version +``` + +Replace any older local binaries with the new CLI, daemon, and viewer from the +same archive. + +## Helm Upgrade + +Download the packaged chart from the release assets: + +```bash +export KO_VERSION=v0.1.8 +curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology-0.1.8.tgz" +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ + --namespace kubernetes-ontology \ + --create-namespace \ + --set image.repository=ghcr.io/colvin-y/kubernetes-ontology \ + --set image.tag="${KO_VERSION}" \ + --set cluster="your-logical-cluster" \ + --set contextNamespaces='{default,kube-system}' +``` + +Verify: + +```bash +kubectl -n kubernetes-ontology rollout status deploy/kubernetes-ontology +kubectl -n kubernetes-ontology port-forward svc/kubernetes-ontology 18080:18080 +kubernetes-ontology --server http://127.0.0.1:18080 --status +``` + +## Skill Upgrade + +The Codex skill is distributed from the repository path. Reinstall it after a +runtime upgrade so setup commands and recipe guidance match the latest release: + +```bash +npx skills add https://github.com/Colvin-Y/kubernetes-ontology/tree/main/skills/kubernetes-ontology-access -g --agent codex +``` + +Restart Codex after reinstalling the skill. + +## Rollback + +Use the previous matching binary archive, chart archive, and image tag: + +```bash +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.7.tgz \ + --namespace kubernetes-ontology \ + --reuse-values \ + --set image.tag=v0.1.7 +``` + +The daemon stores graph state in memory only, so rollback rebuilds the current +graph from the Kubernetes API after the pod restarts. diff --git a/charts/kubernetes-ontology/Chart.yaml b/charts/kubernetes-ontology/Chart.yaml index e9317b3..02df2c3 100644 --- a/charts/kubernetes-ontology/Chart.yaml +++ b/charts/kubernetes-ontology/Chart.yaml @@ -2,8 +2,8 @@ apiVersion: v2 name: kubernetes-ontology description: Read-only Kubernetes ontology server with an optional dependency-free viewer. type: application -version: 0.1.7 -appVersion: "0.1.7" +version: 0.1.8 +appVersion: "0.1.8" keywords: - kubernetes - ontology diff --git a/charts/kubernetes-ontology/templates/rbac.yaml b/charts/kubernetes-ontology/templates/rbac.yaml index 6f081a2..37d86e8 100644 --- a/charts/kubernetes-ontology/templates/rbac.yaml +++ b/charts/kubernetes-ontology/templates/rbac.yaml @@ -34,6 +34,23 @@ rules: resources: - jobs verbs: ["get", "list", "watch"] + - apiGroups: ["autoscaling"] + resources: + - horizontalpodautoscalers + verbs: ["get", "list", "watch"] + - apiGroups: ["discovery.k8s.io"] + resources: + - endpointslices + verbs: ["get", "list", "watch"] + - apiGroups: ["networking.k8s.io"] + resources: + - ingresses + - networkpolicies + verbs: ["get", "list", "watch"] + - apiGroups: ["policy"] + resources: + - poddisruptionbudgets + verbs: ["get", "list", "watch"] - apiGroups: ["rbac.authorization.k8s.io"] resources: - clusterrolebindings @@ -43,6 +60,7 @@ rules: resources: - csidrivers - storageclasses + - volumeattachments verbs: ["get", "list", "watch"] - apiGroups: ["admissionregistration.k8s.io"] resources: diff --git a/charts/kubernetes-ontology/values.yaml b/charts/kubernetes-ontology/values.yaml index 3824776..51d6c5c 100644 --- a/charts/kubernetes-ontology/values.yaml +++ b/charts/kubernetes-ontology/values.yaml @@ -16,7 +16,7 @@ serviceAccount: rbac: create: true - readSecrets: true + readSecrets: false podAnnotations: {} podLabels: {} diff --git a/cmd/kubernetes-ontology/main.go b/cmd/kubernetes-ontology/main.go index a568510..821f9a6 100644 --- a/cmd/kubernetes-ontology/main.go +++ b/cmd/kubernetes-ontology/main.go @@ -98,7 +98,7 @@ func main() { flag.IntVar(&storageMaxDepth, "storage-max-depth", 5, "Maximum BFS depth for storage and CSI related traversal") flag.IntVar(&maxNodes, "max-nodes", 0, "Maximum diagnostic nodes to return. Empty uses the built-in safe default.") flag.IntVar(&maxEdges, "max-edges", 0, "Maximum diagnostic edges to return. Empty uses the built-in safe default.") - flag.StringVar(&diagnosticRecipe, "recipe", "", "Diagnostic recipe hint: pod-incident, workload-incident, helm-ownership, or helm-upgrade-runtime-failure") + flag.StringVar(&diagnosticRecipe, "recipe", "", "Diagnostic recipe hint: pod-incident, workload-incident, storage-csi, service-routing, identity, node-context, helm-ownership, or helm-upgrade-runtime-failure") flag.BoolVar(&expandTerminalNodes, "expand-terminal-nodes", false, "Traverse through diagnostic terminal nodes instead of stopping at them") flag.DurationVar(&bootstrapTimeout, "bootstrap-timeout", 2*time.Minute, "Timeout for initial full snapshot bootstrap") flag.BoolVar(&statusOnly, "status-only", false, "Bootstrap runtime and print runtime status instead of querying a diagnostic subgraph") diff --git a/docs/agent-recipes.md b/docs/agent-recipes.md index e26a5f3..7a73a50 100644 --- a/docs/agent-recipes.md +++ b/docs/agent-recipes.md @@ -13,6 +13,10 @@ The `--recipe` CLI flag and `recipe=...` HTTP parameter are available in | --- | --- | --- | | `pod-incident` | `Pod` | Start from a bad Pod and rank runtime evidence. | | `workload-incident` | `Workload` | Start from a Deployment/StatefulSet-style controller and inspect rollout dependencies. | +| `storage-csi` | `PVC`, `PV`, `StorageClass`, `CSIDriver`, `VolumeAttachment` | Inspect PVC/PV/StorageClass/CSI and attachment evidence. | +| `service-routing` | `Service`, `Ingress`, `EndpointSlice`, `NetworkPolicy` | Inspect route, Service selector, EndpointSlice target, and policy evidence. | +| `identity` | `ServiceAccount`, `RoleBinding`, `ClusterRoleBinding` | Inspect identity and RBAC binding evidence. | +| `node-context` | `Node` or unscheduled/scheduled `Pod` | Inspect node placement, node conditions, and nearby runtime evidence. | | `helm-ownership` | `HelmRelease` or `HelmChart` | Inspect label-derived Helm release/chart provenance. | | `helm-upgrade-runtime-failure` | `Pod`, `Workload`, or `HelmRelease` | Diagnose the part of a failed Helm upgrade that reached Kubernetes when Helm CLI output is missing. | @@ -58,3 +62,8 @@ http://127.0.0.1:8765/?file=samples/helm-upgrade-failure/diagnostic-graph.json The fixture includes `schemaVersion`, `recipe`, `lanes`, `warnings`, `degradedSources`, `budgets`, `rankedEvidence`, `conflicts`, and `freshness`. + +Additional offline failure-mode fixtures are available under +`samples/failure-modes/` for CrashLoopBackOff, FailedScheduling, Service +selector mismatch, missing ConfigMap/Secret, RBAC forbidden, admission webhook, +and PVC/CSI provisioning cases. diff --git a/docs/articles/kubernetes-ontology-intro.zh-CN.md b/docs/articles/kubernetes-ontology-intro.zh-CN.md index 5175450..1538642 100644 --- a/docs/articles/kubernetes-ontology-intro.zh-CN.md +++ b/docs/articles/kubernetes-ontology-intro.zh-CN.md @@ -141,8 +141,8 @@ AI Agent 理解集群状态,这个项目应该会有用。 Helm 模式会安装项目自己的 Deployment、Service、ServiceAccount、ConfigMap 和只读 RBAC,这是服务运行所需的安装 footprint。默认 RBAC 只包含采集资源的 -`get`、`list`、`watch` 权限;Secret 读取默认开启,以便恢复 Secret 节点和 -`uses_secret` 关系;如需关闭可设置 `rbac.readSecrets=false`。 +`get`、`list`、`watch` 权限;Secret 读取默认关闭。只有需要恢复 Secret 节点和 +`uses_secret` 关系时,才显式设置 `rbac.readSecrets=true`。 HTTP API 建议只暴露在本机或受控内网环境,不建议直接作为公网多租户服务。 diff --git a/docs/design/open-source-diagnostics-evolution-plan.md b/docs/design/open-source-diagnostics-evolution-plan.md index d2642fe..ab9f750 100644 --- a/docs/design/open-source-diagnostics-evolution-plan.md +++ b/docs/design/open-source-diagnostics-evolution-plan.md @@ -561,11 +561,12 @@ Target: under 5 minutes. Champion path: ```bash -export KO_VERSION=v0.2.0 -helm upgrade --install kubernetes-ontology oci://ghcr.io/colvin-y/charts/kubernetes-ontology \ - --version "${KO_VERSION}" \ +export KO_VERSION=v0.1.8 +curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology-0.1.8.tgz" +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ --namespace kubernetes-ontology \ --create-namespace \ + --set image.tag="${KO_VERSION}" \ --set cluster="demo" \ --set contextNamespaces='{default}' kubectl -n kubernetes-ontology port-forward svc/kubernetes-ontology 18080:18080 @@ -579,8 +580,8 @@ path. | Issue | Decision | | --- | --- | -| Helm install currently requires repo checkout | Publish chart as OCI or release `.tgz`; make zero-clone install primary. | -| Secret reads default to enabled | Change planned default to `rbac.readSecrets=false`; make Helm exact evidence opt-in. | +| Zero-clone Helm install | Ship a release `.tgz` chart as the primary path; OCI packaging can remain a follow-up. | +| Secret reads are sensitive | Keep `rbac.readSecrets=false` by default; make exact Secret evidence opt-in. | | Skill tracks `main`, runtime tracks tags | Skill becomes version-aware bootstrapper or gets tagged/tested per runtime release. | | Recipe names differ across CLI/API/viewer/docs | Define one recipe vocabulary and keep old flags/routes as aliases. | | Errors lack next operator action | Add `nextCommand` or `nextActions` to CLI/server/viewer where useful. | @@ -610,8 +611,8 @@ Compatibility: | Stage | Developer does | Current friction | Plan fix | | --- | --- | --- | --- | -| Discover | Opens README | Product value is clear, but install paths are long | Put one zero-clone path first | -| Install | Chooses Helm or binary | Helm path requires checkout | Publish OCI/release chart | +| Discover | Opens README | Product value is clear, but install paths are still dense | Keep one zero-clone path first | +| Install | Chooses Helm or binary | Helm path depends on GitHub Release reachability | Publish release chart now; evaluate OCI later | | First run | Runs status | Needs CLI binary plus port-forward | Add copy-paste recipe with expected output | | Diagnose | Runs PodIncident | Recipe vocabulary not unified | Add canonical recipe commands | | Inspect | Opens viewer URL | URL state contract incomplete | Add recipe/focus/evidence URL contract | @@ -624,10 +625,10 @@ Compatibility: I open the README because I have a stuck Pod and want to know whether this tool can explain ownership and blast radius. I see the safety boundary, which matters. Then I hit a split: binary path, Helm path, source path, skill path. If I choose -Helm today, the command uses `./charts/kubernetes-ontology`, so I need a repo -checkout before I can evaluate the product. That breaks the five-minute test. +Helm today, the command downloads a release chart instead of requiring a repo +checkout, so I can evaluate the product without compiling from source. -The better path is: install chart from a release location, port-forward, run one +The path is: install chart from a release location, port-forward, run one recipe command, get JSON plus a viewer URL. If Secret access is off by default and the output says "exact Helm evidence unavailable, label evidence only," I trust it more, not less. Honest unknowns beat magic. diff --git a/docs/ontology/kubernetes-ontology.owl b/docs/ontology/kubernetes-ontology.owl index 6471ede..72345da 100644 --- a/docs/ontology/kubernetes-ontology.owl +++ b/docs/ontology/kubernetes-ontology.owl @@ -99,6 +99,36 @@ Service + + Ingress + A Kubernetes Ingress with Service backend references. + + Ingress + + + EndpointSlice + A Kubernetes EndpointSlice with Pod target references. + + EndpointSlice + + + NetworkPolicy + A Kubernetes NetworkPolicy with Pod selector evidence. + + NetworkPolicy + + + HPA + A Kubernetes HorizontalPodAutoscaler with scale target evidence. + + HPA + + + PodDisruptionBudget + A Kubernetes PodDisruptionBudget with selector evidence. + + PodDisruptionBudget + ConfigMap A Kubernetes ConfigMap referenced by Pods. @@ -145,6 +175,12 @@ PV + + VolumeAttachment + A Kubernetes VolumeAttachment for a PersistentVolume on a Node. + + VolumeAttachment + StorageClass A Kubernetes StorageClass. @@ -478,6 +514,66 @@ inferred helm-labels/v1 + + routes_to_service + Relates an ingress-like route resource to a Service referenced by backend routing rules. + routes_to_service + + + explicit_ref + asserted + ingress-backend/v1 + + + targets_pod + Relates an EndpointSlice to a Pod targetRef observed in its endpoint list. + targets_pod + + + explicit_ref + asserted + endpointslice-targetref/v1 + + + applies_to_pod + Relates a NetworkPolicy to Pods selected by its podSelector. + applies_to_pod + + + selector_match + asserted + networkpolicy-selector/v1 + + + scales_workload + Relates a HorizontalPodAutoscaler to the workload named by scaleTargetRef. + scales_workload + + + explicit_ref + asserted + hpa-scale-target/v1 + + + protects_pod + Relates a PodDisruptionBudget to Pods selected by its selector. + protects_pod + + + selector_match + asserted + pdb-selector/v1 + + + attaches_pv + Relates a VolumeAttachment to the PersistentVolume named by its source. + attaches_pv + + + explicit_ref + asserted + volumeattachment-source/v1 + related_to Generic relation for topology slices and future relations that do not yet have a specific edge kind. diff --git a/docs/release.md b/docs/release.md index 87f5d5b..0c61718 100644 --- a/docs/release.md +++ b/docs/release.md @@ -77,11 +77,11 @@ queries against the in-cluster daemon. Use semantic version tags: ```bash -git tag v0.1.7 -git push origin v0.1.7 +git tag v0.1.8 +git push origin v0.1.8 ``` -Replace `v0.1.7` with the release tag you are publishing. +Replace `v0.1.8` with the release tag you are publishing. Pushing the tag starts both workflows: @@ -93,8 +93,8 @@ Pushing the tag starts both workflows: - `.github/workflows/docker.yml` builds and pushes a multi-architecture image: ```text - ghcr.io/colvin-y/kubernetes-ontology:v0.1.7 - ghcr.io/colvin-y/kubernetes-ontology:0.1.7 + ghcr.io/colvin-y/kubernetes-ontology:v0.1.8 + ghcr.io/colvin-y/kubernetes-ontology:0.1.8 ghcr.io/colvin-y/kubernetes-ontology:latest ``` @@ -113,35 +113,35 @@ gh run list --workflow Docker --limit 5 Check the release assets: ```bash -gh release view v0.1.7 +gh release view v0.1.8 ``` Inspect one binary archive and the Helm chart archive when release packaging changes: ```bash -gh release download v0.1.7 --pattern 'kubernetes-ontology_v0.1.7_linux_amd64.tar.gz' --clobber -tar -tzf kubernetes-ontology_v0.1.7_linux_amd64.tar.gz | grep -E 'kubernetes-ontologyd$|kubernetes-ontology$|QUICKSTART.md|local/kubernetes-ontology.yaml.example' -tar -xzf kubernetes-ontology_v0.1.7_linux_amd64.tar.gz -./kubernetes-ontology_v0.1.7_linux_amd64/kubernetes-ontology --version -gh release download v0.1.7 --pattern 'kubernetes-ontology-0.1.7.tgz' --clobber -helm show chart kubernetes-ontology-0.1.7.tgz +gh release download v0.1.8 --pattern 'kubernetes-ontology_v0.1.8_linux_amd64.tar.gz' --clobber +tar -tzf kubernetes-ontology_v0.1.8_linux_amd64.tar.gz | grep -E 'kubernetes-ontologyd$|kubernetes-ontology$|QUICKSTART.md|UPGRADING.md|local/kubernetes-ontology.yaml.example' +tar -xzf kubernetes-ontology_v0.1.8_linux_amd64.tar.gz +./kubernetes-ontology_v0.1.8_linux_amd64/kubernetes-ontology --version +gh release download v0.1.8 --pattern 'kubernetes-ontology-0.1.8.tgz' --clobber +helm show chart kubernetes-ontology-0.1.8.tgz ``` Pull the image: ```bash -docker pull ghcr.io/colvin-y/kubernetes-ontology:v0.1.7 +docker pull ghcr.io/colvin-y/kubernetes-ontology:v0.1.8 ``` Deploy through Helm: ```bash -helm upgrade --install kubernetes-ontology ./charts/kubernetes-ontology \ +helm upgrade --install kubernetes-ontology kubernetes-ontology-0.1.8.tgz \ --namespace kubernetes-ontology \ --create-namespace \ --set image.repository=ghcr.io/colvin-y/kubernetes-ontology \ - --set image.tag=v0.1.7 \ + --set image.tag=v0.1.8 \ --set cluster=your-logical-cluster \ --set contextNamespaces='{default,kube-system}' ``` diff --git a/index.html b/index.html index 1078633..f55ec56 100644 --- a/index.html +++ b/index.html @@ -5,9 +5,17 @@ kubernetes-ontology - Kubernetes evidence graph for diagnostics - - - + + + + + + + + + + +