Add Modal deploy guardrail tooling#145
Conversation
📝 WalkthroughWalkthroughAdds Modal deployment and observability tooling: lazy imports, API cold-start canary helpers, a deploy orchestration CLI that enforces workspace and stamps deploy history, a wrong-workspace cleanup CLI, local loopback admin-proxy support, unit tests, and an incident runbook. ChangesModal Deployment Tooling and Observability
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
Codex Exhaustive Code ReviewFindings
No other blocking findings found in the PR diff. Notes The incident note says Validation: backend_surface: Modal deployment/cleanup scripts, canary helper, lazy import utility. |
There was a problem hiding this comment.
Pull request overview
Adds back operational guardrails around deploying the TRR backend on Modal (pinned workspace/profile deploy wrapper, API /health cold-start canary, and wrong-workspace cleanup), plus a small lazy-import helper and tests/docs to prevent a repeat of prior startup regressions.
Changes:
- Introduces
scripts/modal/deploy_backend.pywrapper that pins the Modal profile/workspace, runs readiness checks, and performs an API cold-start canary, optionally stamping an incident note with deploy history. - Adds
scripts/modal/cleanup_wrong_workspace_deploy.pyto stop mistaken deployments from a non-authoritative Modal workspace (non-interactive--yes). - Adds
trr_backend/utils/lazy_imports.pyand corresponding tests, plus new tests for the deploy/cleanup tooling and an incident note document.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
trr_backend/utils/lazy_imports.py |
Adds a simple lazy module proxy to defer optional imports until first use. |
tests/utils/test_lazy_imports.py |
Tests lazy-import behavior (import only occurs on first attribute access). |
scripts/modal/api_canary.py |
Adds shared /health canary helper for post-deploy cold-start verification. |
scripts/modal/deploy_backend.py |
Adds Modal deploy wrapper with pinned profile/workspace, readiness gating, canary, and incident stamping. |
tests/scripts/test_deploy_backend_modal.py |
Adds unit coverage for deploy wrapper behavior (workspace gating, command construction, stamping helpers). |
scripts/modal/cleanup_wrong_workspace_deploy.py |
Adds non-authoritative workspace cleanup tool (readiness gate + optional stop). |
tests/scripts/test_cleanup_wrong_workspace_modal.py |
Adds unit coverage for wrong-workspace cleanup behavior and stop command. |
docs/observability/modal-v439-v440-serve-backend-api-crash-loop-2026-05-28.md |
Adds incident note used by the deploy wrapper for stamping deploy history context. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| `https://modal.com/apps/admin-56995/main/deployed/trr-backend-jobs` | ||
| - Readiness: | ||
| - `ok = true` | ||
| - `modal_workspace.workspace = admin-56995` |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/modal/api_canary.py`:
- Around line 13-17: The health_url function currently only checks for non-empty
base but doesn't validate the URL scheme; update health_url to parse api_web_url
(e.g., using urllib.parse.urlparse) and ensure the scheme is either 'https' or
'http' before returning the health path, raising a RuntimeError if the scheme is
missing/invalid; keep the existing trimming/rstrip behavior and build the
returned string from the validated base.
In `@scripts/modal/deploy_backend.py`:
- Around line 160-189: The function format_deploy_history_stamp directly indexes
canary['url'], canary['status'], and canary['attempt'] which can raise KeyError
if canary is missing those keys or is None; update format_deploy_history_stamp
to defensively read canary via safe access (e.g., check if canary is truthy and
use canary.get('url', '<unknown>') / canary.get('status', '<unknown>') /
canary.get('attempt', '<unknown>') or otherwise substitute a clear default like
'N/A'), and optionally add a short warning log when expected keys are absent so
the rest of the stamp generation always succeeds without exceptions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5388d109-873f-4707-878e-cee9eaebaa38
📒 Files selected for processing (8)
docs/observability/modal-v439-v440-serve-backend-api-crash-loop-2026-05-28.mdscripts/modal/api_canary.pyscripts/modal/cleanup_wrong_workspace_deploy.pyscripts/modal/deploy_backend.pytests/scripts/test_cleanup_wrong_workspace_modal.pytests/scripts/test_deploy_backend_modal.pytests/utils/test_lazy_imports.pytrr_backend/utils/lazy_imports.py
| def health_url(api_web_url: str) -> str: | ||
| base = str(api_web_url or "").strip().rstrip("/") | ||
| if not base: | ||
| raise RuntimeError("Modal readiness did not return api_web_url for cold-start canary.") | ||
| return f"{base}/health" |
There was a problem hiding this comment.
Add URL scheme validation for defense-in-depth.
The health_url function validates that the base URL is non-empty but does not verify the URL scheme. While api_web_url is expected to come from Modal's readiness API in trusted contexts, adding scheme validation (ensuring https:// or http://) would prevent potential SSRF if the input source changes or is compromised.
🛡️ Proposed fix to validate URL scheme
def health_url(api_web_url: str) -> str:
base = str(api_web_url or "").strip().rstrip("/")
if not base:
raise RuntimeError("Modal readiness did not return api_web_url for cold-start canary.")
+ if not base.startswith(("https://", "http://")):
+ raise RuntimeError(f"Invalid URL scheme for cold-start canary: {base}")
return f"{base}/health"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/modal/api_canary.py` around lines 13 - 17, The health_url function
currently only checks for non-empty base but doesn't validate the URL scheme;
update health_url to parse api_web_url (e.g., using urllib.parse.urlparse) and
ensure the scheme is either 'https' or 'http' before returning the health path,
raising a RuntimeError if the scheme is missing/invalid; keep the existing
trimming/rstrip behavior and build the returned string from the validated base.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/modal/deploy_backend.py (1)
292-295:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winUnguarded dict access can crash the deploy script if canary keys are missing.
Line 294 directly accesses
canary['url'],canary['status'], andcanary['attempt']outside the try block (lines 281-291). If the readiness check returnsapi_canarywithok: truebut missing these keys (e.g., due to an upstream API contract change inverify_modal_readiness.py), this will raise an unhandledKeyErrorand crash the deploy script.The same issue was previously fixed in
format_deploy_history_stampat lines 164-166 by using.get()with defaults.🛡️ Proposed fix to use safe dict access
print( "Modal API cold-start canary passed: " - f"url={canary['url']} status={canary['status']} attempt={canary['attempt']}", + f"url={canary.get('url', '<unknown>')} status={canary.get('status', '<unknown>')} attempt={canary.get('attempt', '<unknown>')}", flush=True, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/modal/deploy_backend.py` around lines 292 - 295, The print statement that formats canary info uses direct dict access (canary['url'], canary['status'], canary['attempt']) which can raise KeyError if keys are missing; change those to safe accesses using canary.get('url', '<unknown>'), canary.get('status', '<unknown>') and canary.get('attempt', '<unknown>') (mirroring the safe access used in format_deploy_history_stamp) so the deploy_backend.py output never crashes even if verify_modal_readiness.py omits keys; update the same pattern anywhere else in this function that reads from the canary dict.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@scripts/modal/deploy_backend.py`:
- Around line 292-295: The print statement that formats canary info uses direct
dict access (canary['url'], canary['status'], canary['attempt']) which can raise
KeyError if keys are missing; change those to safe accesses using
canary.get('url', '<unknown>'), canary.get('status', '<unknown>') and
canary.get('attempt', '<unknown>') (mirroring the safe access used in
format_deploy_history_stamp) so the deploy_backend.py output never crashes even
if verify_modal_readiness.py omits keys; update the same pattern anywhere else
in this function that reads from the canary dict.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e0863550-7fbe-42f0-8219-e3e2302fdcee
📒 Files selected for processing (4)
docs/observability/modal-v439-v440-serve-backend-api-crash-loop-2026-05-28.mdscripts/modal/api_canary.pyscripts/modal/deploy_backend.pytests/scripts/test_deploy_backend_modal.py
✅ Files skipped from review due to trivial changes (1)
- docs/observability/modal-v439-v440-serve-backend-api-crash-loop-2026-05-28.md
🚧 Files skipped from review as they are similar to previous changes (2)
- scripts/modal/api_canary.py
- tests/scripts/test_deploy_backend_modal.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/auth.py`:
- Around line 55-67: The _local_internal_admin_proxy_allowed check currently
trusts a client-supplied marker header plus loopback addresses which allows
bypass; make this opt-in by requiring a server-controlled credential before
returning _build_local_internal_admin_identity(): add a server-side
secret/config (e.g. _LOCAL_INTERNAL_ADMIN_SECRET) and validate the incoming
marker header value against that secret (or validate a proxy-signed token) in
_local_internal_admin_proxy_allowed (and similar logic used around lines
~300-302) in addition to the loopback checks; only if the header matches the
configured secret and both client.host and Host are loopback should you allow
minting the local internal admin identity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5aaad9f1-c17a-46a0-9312-152904ff8018
📒 Files selected for processing (3)
api/auth.pyscripts/modal/deploy_backend.pytests/api/test_auth.py
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/modal/deploy_backend.py
| def _local_internal_admin_proxy_allowed(request: Request) -> bool: | ||
| if (request.headers.get(_LOCAL_INTERNAL_ADMIN_PROXY_HEADER) or "").strip().lower() not in { | ||
| "1", | ||
| "true", | ||
| "yes", | ||
| "on", | ||
| }: | ||
| return False | ||
| client_host = request.client.host if request.client else "" | ||
| if not _is_loopback_host(client_host): | ||
| return False | ||
| return _is_loopback_host(request.headers.get("host")) | ||
|
|
There was a problem hiding this comment.
This opens an unauthenticated internal-admin path behind any loopback proxy.
This branch trusts a caller-set marker header plus request.client.host/Host to mint an internal_admin identity. If the app is ever fronted by a same-host reverse proxy, public requests will also arrive with a loopback client address, so forwarding x-trr-local-admin-proxy lets a remote caller bypass the JWT/shared-secret checks and even choose x-trr-admin-* identity fields. Please make this path opt-in and require a server-controlled credential before returning _build_local_internal_admin_identity().
Minimal hardening direction
def _local_internal_admin_proxy_allowed(request: Request) -> bool:
+ if not _env_flag_strict("TRR_INTERNAL_ADMIN_ALLOW_LOCAL_LOOPBACK_PROXY", False):
+ return False
if (request.headers.get(_LOCAL_INTERNAL_ADMIN_PROXY_HEADER) or "").strip().lower() not in {
"1",
"true",
"yes",
"on",
}:
return False
client_host = request.client.host if request.client else ""
if not _is_loopback_host(client_host):
return False
- return _is_loopback_host(request.headers.get("host"))
+ return _is_loopback_host(request.headers.get("host")) and _internal_admin_secret_matches(request)Also applies to: 300-302
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/auth.py` around lines 55 - 67, The _local_internal_admin_proxy_allowed
check currently trusts a client-supplied marker header plus loopback addresses
which allows bypass; make this opt-in by requiring a server-controlled
credential before returning _build_local_internal_admin_identity(): add a
server-side secret/config (e.g. _LOCAL_INTERNAL_ADMIN_SECRET) and validate the
incoming marker header value against that secret (or validate a proxy-signed
token) in _local_internal_admin_proxy_allowed (and similar logic used around
lines ~300-302) in addition to the loopback checks; only if the header matches
the configured secret and both client.host and Host are loopback should you
allow minting the local internal admin identity.
Summary
Verification
Orchestrator
Summary by CodeRabbit
Documentation
New Features
Chores
Tests