Skip to content

WATCHER: clean up orphaned producer XCom-backup Variable on terminal failure (no retries left) #2823

@tatiana

Description

@tatiana

Context

Follow-up from #2816 (which added enable_watcher_reliable_retry). Surfaced during review of that PR.

Problem

The WATCHER producer backs up its per-node dbt statuses to an Airflow Variable so a retry can restore them. The Variable is deleted on a successful run, and a retry deletes it after restoring. But if the producer fails gracefully with no retries left (retries=0, or the final retry is exhausted), there is no cleanup path, so the backup Variable is orphaned — it accumulates in the metadata DB (and in an external secrets backend, as a stale secret) over time.

This is pre-existing — it has been the case since the per-node Variable backup was introduced in #2559, and affects both enable_watcher_reliable_retry=True (eager) and False (on-failure callback) modes equally. #2816 does not change this behaviour; this issue tracks fixing it separately.

When it happens

  1. Producer fails gracefully (e.g. a dbt model error).
  2. The producer has no further retries (retries=0, or try_number >= max_tries).
  3. The backup Variable was written (eagerly per-node, or once via the on-failure callback) and is never deleted.

Proposed approaches

  • Delete surviving producer backup Variables on DAG-run completion via the existing cosmos/listeners/dag_run_listener.py (on_dag_run_success/on_dag_run_failed) — robust, also covers hard-kill orphans. This was earmarked in the original BOSS-439 plan.
  • Or a lighter in-operator guard: on the final failed attempt, skip the on-failure write and delete any eager backup.

Long term

Superseded by #2771 (Airflow 3.3 Task & Asset Store), which removes the Variable-backup mechanism entirely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions