Skip to content

kadeksuryam/infra-overview

Repository files navigation

Infrastructure Overview

Multi-provider infrastructure dashboard. Discovers servers from Hetzner Cloud, Hetzner Robot, and Cloudflare, collects metrics via Prometheus, tracks health via blackbox_exporter probes, and shows costs across all resources.

Features

  • Multi-provider discovery — Hetzner Cloud, Hetzner Robot (dedicated/auction), Cloudflare (zones, domains, R2, Workers, Pages)
  • Prometheus metrics — CPU, RAM, disk, network, containers (via node_exporter + cAdvisor)
  • Health monitoring — blackbox_exporter probes (HTTP/TCP/ICMP) configured from the admin UI, with uptime % tracking and incident detection
  • Cost tracking — auto-detected from providers + manual cost overrides for auction servers
  • Probe targets APIGET /api/probe-targets/{module} serves Prometheus http_sd_configs for blackbox_exporter
  • Multi-customer/project — organize servers by customer and project
  • Admin UI — manage providers, customers, probe targets, cost overrides, external services
  • Optimization suggestions — right-sizing recommendations based on utilization trends

Quick Start

cp .env.example .env
# Edit .env — set DATABASE_URL, OVERVIEW_AUTH_PASSWORD, ENCRYPTION_KEY
docker-compose up -d
# Dashboard: http://localhost:8080

Local Development

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Requires PostgreSQL running
DATABASE_URL=postgresql://overview:overview@localhost:5432/overview python run_local.py

Running Tests

pip install pytest pytest-asyncio respx
# Without PostgreSQL (unit tests only):
python -m pytest tests/ -v
# With PostgreSQL (full suite):
TEST_DATABASE_URL=postgresql://overview:overview@localhost:5432/overview_test python -m pytest tests/ -v

Configuration

All configuration is via environment variables (see .env.example):

Variable Required Description
DATABASE_URL Yes PostgreSQL connection string
OVERVIEW_AUTH_PASSWORD Yes (prod) Dashboard login password
ENCRYPTION_KEY Yes (prod) Fernet key for encrypting API keys at rest
POLL_INTERVAL_SECONDS No Sync interval (default: 300s)
INFRA_REPO_URL No Git repo for hostcfg/WireGuard peer discovery

Providers, customers, and probe targets are managed through the admin UI at /admin/settings.

Health Monitoring Architecture

Admin UI (probe targets)
    ↓
GET /api/probe-targets/{module}  ←  Prometheus (http_sd_configs, every 30s)
    ↓
blackbox_exporter (HTTP/TCP/ICMP probes)
    ↓
Prometheus (probe_success, probe_duration_seconds)
    ↓
ProbeCollector (reads from Prometheus)
    ↓
HealthTracker (history, uptime %, incidents)
    ↓
Dashboard (server page: health checks, uptime, incidents)

Project Structure

app/
├── admin/          Admin UI routes (providers, customers, cost overrides)
├── api/            REST API + probe targets endpoint
├── connectors/     ProbeCollector, Uptime Kuma
├── db/             PostgreSQL async layer (mixins for each domain)
├── health/         HealthTracker (incidents, uptime)
├── metrics/        Prometheus client
├── providers/      Hetzner Cloud, Hetzner Robot, Cloudflare
├── web/            Dashboard routes + Jinja2 templates
├── scheduler.py    Background collection loop
└── reconciler.py   Merges provider data into customer hierarchy

Tech Stack

FastAPI, asyncpg (PostgreSQL), Jinja2, HTMX, httpx, Pydantic

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors