Skip to content

[Bug]: Watchdog restart loop when a Bearer Token is set — /api/health is behind auth #6

Description

@circa1665

App Version

0.2.10

Home Assistant Version

2026.6.2

Installation Type

Home Assistant OS (HAOS)

Hardware

Raspberry Pi 5

What happened?

With a Bearer Token configured, the add-on enters a permanent Supervisor watchdog
restart loop (~every 4 minutes), even though the agent itself is completely healthy.

The agent wraps its entire mux — including /api/health — in the auth middleware
whenever a token is set (agent/main.go):

    var handler http.Handler = mux
    if cfg.Token != "" {
        handler = authMiddleware(cfg.Token, mux)   // /api/health now requires the token
    }

The add-on's watchdog probe (config.yaml) targets that same endpoint:

watchdog: "http://[HOST]:[PORT:9099]/api/health"

But Supervisor's watchdog sends an unauthenticated GET and only treats status < 300
as alive (supervisor/apps/app.py → watchdog_application):

    async with self.sys_websession.get(url, timeout=WATCHDOG_TIMEOUT, ssl=False) as req:
        if req.status < 300:
            return True
    return False

So the probe gets 401 {"error":"unauthorized"}, Supervisor logs "missing application
response", and after two consecutive failures restarts a perfectly healthy container.
The trigger is purely setting a token; with the token empty the middleware isn't
installed, /api/health returns 200, and the watchdog passes.

Note the code fix lives in the agent (smart-sniffer repo, agent/main.go), but I'm
filing here since the watchdog URL is declared in this repo's config.yaml.

Suggested fix: exempt the liveness endpoint from auth — e.g. let r.URL.Path ==
"/api/health" through before the token check in authMiddleware, or register
/api/health on a bare mux and only wrap the data routes. (A weaker one-line
alternative without touching the agent: change config.yaml to a TCP check,
watchdog: "tcp://[HOST]:[PORT:9099]", though that only confirms the port is open,
not that the HTTP app is serving.)

Steps to Reproduce

  1. Install the SMART Sniffer App, and integration and set a matching Bearer Token in both.
  2. Leave the add-on's Watchdog toggle on (default).
  3. Start the add-on.
  4. Supervisor log loops every ~4 min: "Watchdog missing application response" →
    "Watchdog found a problem ... application!" → Stopping/Cleaning/Starting.
  5. The add-on's own log shows the agent healthy the whole time (listening on
    0.0.0.0:9099, polling every 60s), killed only by the watchdog-triggered stop.

App Logs

# --- Add-on log: one full cycle. Agent is healthy start to finish. ---
[19:42:37] INFO: Running preflight checks...
[19:42:37] INFO: smartctl version: 7.5
[19:42:37] INFO: Drives detected by smartctl: 1
[19:42:37] INFO: Drive access: OK (/dev/nvme0)
2026/06/09 19:42:38 SMART Sniffer Agent v0.5.14
2026/06/09 19:42:38 Listening on: 0.0.0.0:9099
2026/06/09 19:42:38 Auth: enabled
2026/06/09 19:42:38 first poll: using --scan-open for protocol detection, waking drives for SMART baseline
2026/06/09 19:42:38 cache refreshed: 1 drive(s)
2026/06/09 19:43:38 cache refreshed: 1 drive(s)
2026/06/09 19:44:38 cache refreshed: 1 drive(s)
2026/06/09 19:45:38 cache refreshed: 1 drive(s)
2026/06/09 19:46:36 Shutting down…          # <-- stopped by Supervisor, not a crash
2026/06/09 19:46:36 shutdown complete elapsed=1.374062ms

# --- Supervisor log: the watchdog driving the restart ---
WARNING [supervisor.misc.tasks] Watchdog missing application response from 0449a086_smart_sniffer_agent
WARNING [supervisor.misc.tasks] Watchdog found a problem with 0449a086_smart_sniffer_agent application!
INFO    [supervisor.docker.manager] Stopping addon_0449a086_smart_sniffer_agent application
INFO    [supervisor.docker.app] Starting Docker app 0449a086/aarch64-addon-smart_sniffer_agent with version 0.2.10

Additional Context

  • Single NVMe boot drive (/dev/nvme0), smartctl 7.5, bundled agent v0.5.14.
  • Loop ran ~180 times over 12 hours until the token was removed.
  • Workarounds confirmed: (a) clear the Bearer Token, or (b) disable the add-on's
    Watchdog toggle. Both stop the loop; (a) also restores a passing watchdog.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions