Add Modal deploy guardrail tooling by therealityreport · Pull Request #145 · therealityreport/trr-backend

therealityreport · 2026-06-10T14:42:41Z

Summary

recover the missing Modal deploy wrapper, API canary helper, and wrong-workspace cleanup script
add tests for deploy safety and lazy imports
make wrong-workspace Modal stop non-interactive with --yes

Verification

.venv/bin/python -m pytest tests/scripts/test_cleanup_wrong_workspace_modal.py tests/scripts/test_deploy_backend_modal.py tests/utils/test_lazy_imports.py tests/scripts/test_verify_modal_readiness.py
git diff --check

Orchestrator

target: therealityreport/trr-backend
action: publish-and-sync
mode: takeover
source branch: codex/recover-modal-deploy-guardrails

Summary by CodeRabbit

Documentation
- Added an incident report detailing a backend API crash loop, root cause, remediation, verification, and deploy history.
New Features
- Added an API cold-start /health canary run after deployments.
- Added a CLI to detect and optionally stop mistaken workspace deployments.
- Added a local loopback internal-admin proxy bypass for admin requests.
- Added a lazy-import helper to defer optional module loading.
Chores
- Improved deploy safety: workspace validation, image payload checks, readiness verification, and incident stamping.
Tests
- Added unit tests covering deploy tooling, canary behavior, cleanup CLI, auth proxy behavior, and lazy-imports.

coderabbitai · 2026-06-10T14:43:44Z

📝 Walkthrough

Walkthrough

Adds Modal deployment and observability tooling: lazy imports, API cold-start canary helpers, a deploy orchestration CLI that enforces workspace and stamps deploy history, a wrong-workspace cleanup CLI, local loopback admin-proxy support, unit tests, and an incident runbook.

Changes

Modal Deployment Tooling and Observability

Layer / File(s)	Summary
Lazy module loading utility `trr_backend/utils/lazy_imports.py`, `tests/utils/test_lazy_imports.py`	`LazyModule` defers `importlib.import_module` until first attribute access; test verifies import occurs only on first access.
API canary health check helpers `scripts/modal/api_canary.py`	`health_url()` normalizes base URL and appends `/health`; `run_api_cold_start_canary()` retries `/health` with configurable attempts/timeout and returns structured results; `skipped_api_canary()` returns a not-ran payload.
Deploy backend orchestration and verification `scripts/modal/deploy_backend.py`, `tests/scripts/test_deploy_backend_modal.py`	Orchestrator pins `MODAL_PROFILE`, enforces required workspace, builds/runs `modal deploy`, runs readiness probe and optional API cold-start canary, fetches deploy history, and stamps incident notes. Tests cover env pinning, workspace validation, command builders, readiness/canary behavior, history stamping, and note path resolution.
Wrong workspace deployment cleanup utility `scripts/modal/cleanup_wrong_workspace_deploy.py`, `tests/scripts/test_cleanup_wrong_workspace_modal.py`	CLI detects and optionally stops Modal apps deployed to a wrong workspace by verifying authoritative readiness, validating profile/workspace resolution, checking app presence/history, and optionally issuing `modal app stop`. Tests validate readiness blocking, successful stop, and rejection when the wrong profile resolves to authoritative.
Local loopback internal-admin proxy `api/auth.py`, `tests/api/test_auth.py`	Adds `x-trr-local-admin-proxy` handling plus host normalization and loopback detection; `require_internal_admin` can short-circuit to a constructed internal-admin identity when proxied from loopback. Tests verify acceptance from loopback and rejection of remote spoofing.
Incident report documentation `docs/observability/modal-v439-v440-serve-backend-api-crash-loop-2026-05-28.md`	Runbook documenting Modal `serve_backend_api` crash loop (v439→v440), root cause (eager `scripts` imports + missing mount), remediations (lazy-loading, minimal mounts, payload validation, post-deploy canary), follow-up guard regression (v442) and fix (v443), verification, and deploy history stamp.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A lazy hop, imports delayed till need,

Canaries knock softly on /health indeed,
Deploys get stamped, wrong-workspace apps unmade,
A loopback whisper grants an admin’s aid,
The rabbit nibbles logs and hums with glee.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.74% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Add Modal deploy guardrail tooling' directly and accurately describes the main objective: adding Modal deploy safety infrastructure including deploy wrapper, API canary helpers, wrong-workspace cleanup script, and supporting tests.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/publish/trr-backend/20260610-modal-deploy-guardrails

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-10T14:47:02Z

Codex Exhaustive Code Review

Findings

High - deploy wrapper cannot complete post-deploy validation
In scripts/modal/deploy_backend.py, build_readiness_command() passes --probe-api-canary and --api-canary-timeout-seconds to scripts/modal/verify_modal_readiness.py. That readiness CLI does not define either argument in its parser, so after modal deploy succeeds, verify_deployed_readiness() will raise from subprocess.run(..., check=True), main() will print Modal deploy canary failed, and the wrapper exits 1. Practical impact: the new “required deploy path” reports failed deploys even when the deploy succeeded, and never reaches the intended fallback canary or incident stamp. Smallest fix: either add those flags and api_canary output to verify_modal_readiness.py, or remove the unsupported flags and rely on the existing fallback run_api_cold_start_canary(readiness["api_web_url"]). Add a test that validates the built readiness command against the real readiness parser.

No other blocking findings found in the PR diff.

Notes

The incident note says api.routers.admin_show_sync now lazy-loads scripts.sync.*, but admin_show_sync.py is unchanged in this PR range and still imports those modules at import time. I did not count that as a newly introduced regression because it is outside the changed diff, but if this PR is meant to deliver that fix, the implementation is missing from the reviewed changes.

Validation: git diff --check passed, and AST parsing of all added Python files passed. I could not run the targeted pytest files because this runner has no pytest installed (python -m pytest -> No module named pytest).

backend_surface: Modal deployment/cleanup scripts, canary helper, lazy import utility.
contracts_changed: No API or DB contract changes in the diff.
migrations_added: None.
downstream_updates: None required from reviewed code.

Copilot

Pull request overview

Adds back operational guardrails around deploying the TRR backend on Modal (pinned workspace/profile deploy wrapper, API /health cold-start canary, and wrong-workspace cleanup), plus a small lazy-import helper and tests/docs to prevent a repeat of prior startup regressions.

Changes:

Introduces scripts/modal/deploy_backend.py wrapper that pins the Modal profile/workspace, runs readiness checks, and performs an API cold-start canary, optionally stamping an incident note with deploy history.
Adds scripts/modal/cleanup_wrong_workspace_deploy.py to stop mistaken deployments from a non-authoritative Modal workspace (non-interactive --yes).
Adds trr_backend/utils/lazy_imports.py and corresponding tests, plus new tests for the deploy/cleanup tooling and an incident note document.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`trr_backend/utils/lazy_imports.py`	Adds a simple lazy module proxy to defer optional imports until first use.
`tests/utils/test_lazy_imports.py`	Tests lazy-import behavior (import only occurs on first attribute access).
`scripts/modal/api_canary.py`	Adds shared `/health` canary helper for post-deploy cold-start verification.
`scripts/modal/deploy_backend.py`	Adds Modal deploy wrapper with pinned profile/workspace, readiness gating, canary, and incident stamping.
`tests/scripts/test_deploy_backend_modal.py`	Adds unit coverage for deploy wrapper behavior (workspace gating, command construction, stamping helpers).
`scripts/modal/cleanup_wrong_workspace_deploy.py`	Adds non-authoritative workspace cleanup tool (readiness gate + optional stop).
`tests/scripts/test_cleanup_wrong_workspace_modal.py`	Adds unit coverage for wrong-workspace cleanup behavior and stop command.
`docs/observability/modal-v439-v440-serve-backend-api-crash-loop-2026-05-28.md`	Adds incident note used by the deploy wrapper for stamping deploy history context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  `https://modal.com/apps/admin-56995/main/deployed/trr-backend-jobs`
+- Readiness:
+  - `ok = true`
+  - `modal_workspace.workspace = admin-56995`


coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/modal/api_canary.py`:
- Around line 13-17: The health_url function currently only checks for non-empty
base but doesn't validate the URL scheme; update health_url to parse api_web_url
(e.g., using urllib.parse.urlparse) and ensure the scheme is either 'https' or
'http' before returning the health path, raising a RuntimeError if the scheme is
missing/invalid; keep the existing trimming/rstrip behavior and build the
returned string from the validated base.

In `@scripts/modal/deploy_backend.py`:
- Around line 160-189: The function format_deploy_history_stamp directly indexes
canary['url'], canary['status'], and canary['attempt'] which can raise KeyError
if canary is missing those keys or is None; update format_deploy_history_stamp
to defensively read canary via safe access (e.g., check if canary is truthy and
use canary.get('url', '<unknown>') / canary.get('status', '<unknown>') /
canary.get('attempt', '<unknown>') or otherwise substitute a clear default like
'N/A'), and optionally add a short warning log when expected keys are absent so
the rest of the stamp generation always succeeds without exceptions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5388d109-873f-4707-878e-cee9eaebaa38

📥 Commits

Reviewing files that changed from the base of the PR and between 21f774b and 525a343.

📒 Files selected for processing (8)

docs/observability/modal-v439-v440-serve-backend-api-crash-loop-2026-05-28.md
scripts/modal/api_canary.py
scripts/modal/cleanup_wrong_workspace_deploy.py
scripts/modal/deploy_backend.py
tests/scripts/test_cleanup_wrong_workspace_modal.py
tests/scripts/test_deploy_backend_modal.py
tests/utils/test_lazy_imports.py
trr_backend/utils/lazy_imports.py

coderabbitai · 2026-06-10T14:54:28Z

+def health_url(api_web_url: str) -> str:
+    base = str(api_web_url or "").strip().rstrip("/")
+    if not base:
+        raise RuntimeError("Modal readiness did not return api_web_url for cold-start canary.")
+    return f"{base}/health"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add URL scheme validation for defense-in-depth.

The health_url function validates that the base URL is non-empty but does not verify the URL scheme. While api_web_url is expected to come from Modal's readiness API in trusted contexts, adding scheme validation (ensuring https:// or http://) would prevent potential SSRF if the input source changes or is compromised.

🛡️ Proposed fix to validate URL scheme

def health_url(api_web_url: str) -> str: base = str(api_web_url or "").strip().rstrip("/") if not base: raise RuntimeError("Modal readiness did not return api_web_url for cold-start canary.") + if not base.startswith(("https://", "http://")): + raise RuntimeError(f"Invalid URL scheme for cold-start canary: {base}") return f"{base}/health"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/modal/api_canary.py` around lines 13 - 17, The health_url function currently only checks for non-empty base but doesn't validate the URL scheme; update health_url to parse api_web_url (e.g., using urllib.parse.urlparse) and ensure the scheme is either 'https' or 'http' before returning the health path, raising a RuntimeError if the scheme is missing/invalid; keep the existing trimming/rstrip behavior and build the returned string from the validated base.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/modal/deploy_backend.py (1)
292-295: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Unguarded dict access can crash the deploy script if canary keys are missing.

Line 294 directly accesses canary['url'], canary['status'], and canary['attempt'] outside the try block (lines 281-291). If the readiness check returns api_canary with ok: true but missing these keys (e.g., due to an upstream API contract change in verify_modal_readiness.py), this will raise an unhandled KeyError and crash the deploy script.

The same issue was previously fixed in format_deploy_history_stamp at lines 164-166 by using .get() with defaults.
🛡️ Proposed fix to use safe dict access
     print(
         "Modal API cold-start canary passed: "
-        f"url={canary['url']} status={canary['status']} attempt={canary['attempt']}",
+        f"url={canary.get('url', '<unknown>')} status={canary.get('status', '<unknown>')} attempt={canary.get('attempt', '<unknown>')}",
         flush=True,
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/modal/deploy_backend.py` around lines 292 - 295, The print statement
that formats canary info uses direct dict access (canary['url'],
canary['status'], canary['attempt']) which can raise KeyError if keys are
missing; change those to safe accesses using canary.get('url', '<unknown>'),
canary.get('status', '<unknown>') and canary.get('attempt', '<unknown>')
(mirroring the safe access used in format_deploy_history_stamp) so the
deploy_backend.py output never crashes even if verify_modal_readiness.py omits
keys; update the same pattern anywhere else in this function that reads from the
canary dict.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@scripts/modal/deploy_backend.py`:
- Around line 292-295: The print statement that formats canary info uses direct
dict access (canary['url'], canary['status'], canary['attempt']) which can raise
KeyError if keys are missing; change those to safe accesses using
canary.get('url', '<unknown>'), canary.get('status', '<unknown>') and
canary.get('attempt', '<unknown>') (mirroring the safe access used in
format_deploy_history_stamp) so the deploy_backend.py output never crashes even
if verify_modal_readiness.py omits keys; update the same pattern anywhere else
in this function that reads from the canary dict.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e0863550-7fbe-42f0-8219-e3e2302fdcee

📥 Commits

Reviewing files that changed from the base of the PR and between 525a343 and 4c1c05c.

📒 Files selected for processing (4)

docs/observability/modal-v439-v440-serve-backend-api-crash-loop-2026-05-28.md
scripts/modal/api_canary.py
scripts/modal/deploy_backend.py
tests/scripts/test_deploy_backend_modal.py

✅ Files skipped from review due to trivial changes (1)

docs/observability/modal-v439-v440-serve-backend-api-crash-loop-2026-05-28.md

🚧 Files skipped from review as they are similar to previous changes (2)

scripts/modal/api_canary.py
tests/scripts/test_deploy_backend_modal.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/auth.py`:
- Around line 55-67: The _local_internal_admin_proxy_allowed check currently
trusts a client-supplied marker header plus loopback addresses which allows
bypass; make this opt-in by requiring a server-controlled credential before
returning _build_local_internal_admin_identity(): add a server-side
secret/config (e.g. _LOCAL_INTERNAL_ADMIN_SECRET) and validate the incoming
marker header value against that secret (or validate a proxy-signed token) in
_local_internal_admin_proxy_allowed (and similar logic used around lines
~300-302) in addition to the loopback checks; only if the header matches the
configured secret and both client.host and Host are loopback should you allow
minting the local internal admin identity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5aaad9f1-c17a-46a0-9312-152904ff8018

📥 Commits

Reviewing files that changed from the base of the PR and between 4c1c05c and 9ca1486.

📒 Files selected for processing (3)

api/auth.py
scripts/modal/deploy_backend.py
tests/api/test_auth.py

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/modal/deploy_backend.py

coderabbitai · 2026-06-10T15:48:36Z

+def _local_internal_admin_proxy_allowed(request: Request) -> bool:
+    if (request.headers.get(_LOCAL_INTERNAL_ADMIN_PROXY_HEADER) or "").strip().lower() not in {
+        "1",
+        "true",
+        "yes",
+        "on",
+    }:
+        return False
+    client_host = request.client.host if request.client else ""
+    if not _is_loopback_host(client_host):
+        return False
+    return _is_loopback_host(request.headers.get("host"))
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This opens an unauthenticated internal-admin path behind any loopback proxy.

This branch trusts a caller-set marker header plus request.client.host/Host to mint an internal_admin identity. If the app is ever fronted by a same-host reverse proxy, public requests will also arrive with a loopback client address, so forwarding x-trr-local-admin-proxy lets a remote caller bypass the JWT/shared-secret checks and even choose x-trr-admin-* identity fields. Please make this path opt-in and require a server-controlled credential before returning _build_local_internal_admin_identity().

Minimal hardening direction

def _local_internal_admin_proxy_allowed(request: Request) -> bool: + if not _env_flag_strict("TRR_INTERNAL_ADMIN_ALLOW_LOCAL_LOOPBACK_PROXY", False): + return False if (request.headers.get(_LOCAL_INTERNAL_ADMIN_PROXY_HEADER) or "").strip().lower() not in { "1", "true", "yes", "on", }: return False client_host = request.client.host if request.client else "" if not _is_loopback_host(client_host): return False - return _is_loopback_host(request.headers.get("host")) + return _is_loopback_host(request.headers.get("host")) and _internal_admin_secret_matches(request)

Also applies to: 300-302

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/auth.py` around lines 55 - 67, The _local_internal_admin_proxy_allowed check currently trusts a client-supplied marker header plus loopback addresses which allows bypass; make this opt-in by requiring a server-controlled credential before returning _build_local_internal_admin_identity(): add a server-side secret/config (e.g. _LOCAL_INTERNAL_ADMIN_SECRET) and validate the incoming marker header value against that secret (or validate a proxy-signed token) in _local_internal_admin_proxy_allowed (and similar logic used around lines ~300-302) in addition to the loopback checks; only if the header matches the configured secret and both client.host and Host are loopback should you allow minting the local internal admin identity.

Add Modal deploy guardrail tooling

525a343

Copilot AI review requested due to automatic review settings June 10, 2026 14:42

Copilot started reviewing on behalf of therealityreport June 10, 2026 14:42 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Address Modal deploy guardrail review

4c1c05c

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

therealityreport added 2 commits June 10, 2026 11:30

Harden Modal canary output

dea0566

Allow loopback internal admin proxy

9ca1486

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Conversation

therealityreport commented Jun 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Orchestrator

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

github-actions Bot commented Jun 10, 2026

Codex Exhaustive Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

therealityreport commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading