fix(roast): persist jobs + SSE in Redis to stop "job not found" 404s#4
Merged
Conversation
Roast job state and the SSE event log lived in process RAM, so any worker restart (idle spin-down, OOM, redeploy on Render free tier) wiped the store and made /stream and poll return 404 mid-run. - New services/job_store.py: pluggable store. RedisJobStore (when REDIS_URL is set) persists job state, a replayable event log, and the cancel flag with a TTL; InMemoryJobStore keeps the original behaviour for dev/test. Factory falls back to in-memory if Redis is unreachable so boot never fails. - Pipeline checkpoints state at each stage + on completion; SSE now replays the full event log from the start so reconnecting/late clients catch up. - Staleness guard: a non-terminal job with a dead pipeline thread surfaces as failed instead of hanging forever. - Cancel uses a Redis flag and lets state expire via TTL (reconnect sees cancelled, not 404). - Add redis dep (requirements/-prod/pyproject/uv.lock), REDIS_URL + ROAST_JOB_TTL + ROAST_STALE_SECONDS config, render.yaml + .env.example. - Tests: backend/tests/test_job_store.py (both backends, factory fallback). - Bump transitive form-data 4.0.5 -> 4.0.6 (audit gate, GHSA-hmw2-7cc7-3qxx).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Roast job state + the SSE event log lived in process RAM. Any worker restart on Render free tier (idle spin-down, OOM during a swarm, redeploy) wiped the store, so
GET /api/roast/<id>/streamand the poll endpoint returned 404 job not found mid-run.Fix
State store, not a task queue — pipeline still runs in-process; Redis just makes state survive restarts.
services/job_store.py— pluggable store:RedisJobStore(whenREDIS_URLset): job state, a replayable event log, and the cancel flag, all with a TTL.InMemoryJobStore(default): original behaviour → dev/test/CI unchanged.make_store()falls back to in-memory if Redis is unreachable, so boot never fails.failedinstead of hanging forever.cancelled, not 404.REDIS_URL,ROAST_JOB_TTL,ROAST_STALE_SECONDS. Wired inrender.yaml+.env.example.redis>=5.0.0added across requirements/-prod/pyproject/uv.lock.form-data4.0.5→4.0.6 (audit gate, GHSA-hmw2-7cc7-3qxx; pre-existing, unrelated).Tests
backend/tests/test_job_store.py— both backends (Redis via in-process fake), serialization roundtrip, factory fallback.ci-local.shgreen: backend pytest (259 passed, 9 skipped) + ruff, frontend eslint/vitest/build/prod-audit.Deploy step (required)
Create an Upstash Redis DB → set
REDIS_URL=rediss://...in Renderswarmie-backendenv → redeploy. Without it, store stays in-memory and the 404 persists.Scope note
Mid-run OOM still ends that live run (thread dies with the worker) — now shown as a clean "interrupted, start again" instead of a 404. Surviving OOM mid-run needs the deferred task-queue path.