Add automatic provider instance fallback#3482
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| activeThreadKey, | ||
| new Set(threadActivities.map((activity) => String(activity.id))), | ||
| ); | ||
| return; |
There was a problem hiding this comment.
First open skips fallback toast
Medium Severity
The fallback toast effect seeds every existing activity id the first time a thread key is seen, then returns without showing toasts. If automatic fallback finishes while the user is on another thread (or before they open that thread in the session), the success or failure activity is already in the list and no toast is shown.
Reviewed by Cursor Bugbot for commit 26e4553. Configure here.
ApprovabilityVerdict: Needs human review This PR introduces automatic provider instance fallback - a substantial new feature with complex orchestration logic, new state management, and significant runtime behavior changes. New features of this scope warrant human review. An unresolved bug report in the toast handling adds further reason for review. You can customize Macroscope's approvability policy. Learn more. |
- Preserve the original instance/session in fallback chains - Emit restore metadata and clearer chat status when no fallback succeeds


What Changed
Adds opt-in automatic fallback between multiple instances of the same provider driver.
Why
A provider instance can become unusable because of a usage limit, expired authentication, process failure, transport failure, or temporary unavailability. Users with another configured instance of the same provider previously had to surface the error, switch instances manually, and restart or continue the task themselves.
This keeps those failures recoverable without changing normal behavior when the feature is disabled or when the error is not operational.
How It Works
Failure classification
Fallback is attempted only for operational failures:
Validation errors, cancellation, permission decisions, malformed prompts, and unrelated provider errors continue through the existing error path.
Candidate selection
Candidates are considered in provider-list order. An instance is skipped when:
Each skipped instance records a user-facing reason. The workflow stops immediately when one candidate accepts the turn; later candidates are not attempted.
First request versus an active task
For the first user message, a compatible same-driver instance receives the original prompt, attachments, model, runtime mode, and interaction mode.
For an existing conversation or a failure during an active task, fallback additionally requires the same provider-native continuation group. It starts the candidate with the original native resume cursor and sends a hidden
Continue.turn. No summary or synthesized context is inserted, so the provider resumes from its own conversation state.Success and total failure
On success, the server commits the candidate session and model selection, persists one fallback activity, and the UI shows:
Skipped candidates are available under expandable toast details.
If every candidate is skipped or fails, the server restores the original provider binding when necessary, preserves the original model selection, emits one fallback-failed activity, and then allows the original provider error to follow the existing error path. If fallback infrastructure itself fails unexpectedly, that failure is logged and the original error is still surfaced.
UI Changes
Before this change, the Providers settings page had no fallback controls and operational provider errors surfaced immediately. The screenshots below show the new states.
Global opt-in
Per-instance participation
Disabled control explanation
Exhausted fallback details
Successful switching demo
The recording shows an operational failure on the active provider instance, the automatic handoff, the final switch toast, and the updated active instance/model selection.
https://raw.githubusercontent.com/edoedac0/t3code/7322c743ac3586fe82839bd25a8bda40b4019c39/pr-assets/provider-instance-fallback/successful-switch.mp4
Validation
./node_modules/.bin/vp check— passed (0 errors; 20 existing warnings)./node_modules/.bin/vp run typecheck— passed./node_modules/.bin/vp test— 536 files passed, 2 skipped; 4,073 tests passed, 7 skippedChecklist
Note
Add automatic provider instance fallback when a provider turn fails
provider.fallback.succeededorprovider.fallback.failedactivity records.providerFallback.enabledserver setting (defaultfalse) and a per-instanceallowFallbackflag so operators control participation.Macroscope summarized 43d4637.
Note
High Risk
Changes core provider session binding, turn recovery, and runtime event ingestion; misclassification or trial/handoff bugs could hide errors, leak partial output, or leave threads on the wrong instance.
Overview
Introduces automatic provider instance fallback (off by default) so threads can recover from operational failures by trying other same-driver instances before surfacing errors.
Server orchestration classifies service and runtime failures, plans candidates in provider-list order (model match, availability, per-instance
allowFallback, continuation compatibility, no re-tries in the current chain), and runsattemptProviderFallbackunder a per-thread lock. Turn-start failures inProviderCommandReactorattempt fallback before the usual failure activity;ProviderRuntimeIngestiondoes the same on mid-task failures with a hiddenContinue.turn, filters stale instance events, and uses a trial gate so provisional candidate output is held until a handoff commits or is discarded.Contracts & UI:
ServerSettings.providerFallback.enabledand per-instanceallowFallback; settings switches and ChatView toasts forprovider.fallback.succeeded/provider.fallback.failedwith skipped-instance details.Reviewed by Cursor Bugbot for commit 43d4637. Bugbot is set up for automated code reviews on this repo. Configure here.