fix(providers): honor llama.cpp per-model context windows#3311
Conversation
Read per-model llama.cpp meta.n_ctx values during discovery, refresh selected models after lazy load, and bypass fresh cache reuse for llama.cpp refreshes so server restarts update context windows.\n\nFixes #3310
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 24adad2890
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const override = this.#resolveLiveModelOverride(current); | ||
| const patch: ModelPatch = {}; | ||
| if (override?.contextWindow === undefined && current.contextWindow !== contextWindow) { | ||
| patch.contextWindow = contextWindow; |
There was a problem hiding this comment.
Respect custom llama.cpp model limits
When a llama.cpp provider uses discovery and also defines a same-id custom model in models.yml, user-pinned contextWindow/maxTokens from that custom model are merged via #mergeCustomModels, not stored in #modelOverrides. This guard only checks modelOverrides (and the maxTokens guard below does the same), so simply selecting the model can replace the configured limits with /models metadata and mutate the registry until reload; please treat matching custom model fields as overrides here as well.
Useful? React with 👍 / 👎.
Treat same-id custom llama.cpp model contextWindow and maxTokens fields as pinned limits during selected-model metadata refresh.\n\nFixes #3310
|
Updated #3311:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 15d0e95d6f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!isLlamaCppDiscovery) { | ||
| return model; | ||
| } | ||
| const contextWindow = await discoverLlamaCppModelContextWindow(model, this.#discoveryContext()); |
There was a problem hiding this comment.
Avoid resolving keys while switching llama.cpp models
When the selected llama.cpp provider is configured with apiKey: "!..." (or an OAuth credential that needs refresh), this call goes through #discoveryContext(), whose resolver calls getApiKeyForProvider; command-backed keys run execSync with a 10s timeout before the model switch completes. setModel intentionally used hasConfiguredAuth so /model and role switches stayed non-blocking and credentials were resolved lazily per request, so selecting a protected llama.cpp model can now freeze the UI just to refresh metadata. Use a non-resolving auth probe here, or defer this metadata refresh to the request path where the key is already being resolved.
Useful? React with 👍 / 👎.
|
|
||
| ### Fixed | ||
|
|
||
| - Fixed llama.cpp discovery to prefer per-model `/v1/models` `meta.n_ctx`/`meta.n_ctx_train` values, refresh selected models after lazy load, and bypass fresh-cache reuse so server restarts update context windows. ([#3310](https://github.com/can1357/oh-my-pi/issues/3310)) |
There was a problem hiding this comment.
Move the changelog entry to Unreleased
This new entry is under the versioned ## [16.1.16] - 2026-06-23 section while ## [Unreleased] is empty. The root AGENTS.md says new entries must go under Unreleased and released sections are immutable, so the next release notes will omit this fix while also rewriting historical changelog content.
Useful? React with 👍 / 👎.
Use a non-resolving discovery context for selected-model llama.cpp metadata refresh so command-backed and OAuth credentials stay lazy during model switches.\n\nFixes #3310
|
Updated #3311:
|
Move the llama.cpp per-model context window entry from the released 16.1.16 section to [Unreleased] so the next release notes carry it.\n\nFixes #3310
|
Updated #3311:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e431f5caa6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!isLlamaCppDiscovery) { | ||
| return model; | ||
| } | ||
| const contextWindow = await discoverLlamaCppModelContextWindow(model, this.#nonResolvingDiscoveryContext()); |
There was a problem hiding this comment.
Preserve llama.cpp auth for metadata refresh
When a llama.cpp server is protected by a configured or stored API key, this new selected-model refresh path uses #nonResolvingDiscoveryContext, whose resolver always returns undefined, so discoverLlamaCppModelContextWindow calls /models without Authorization. A protected endpoint returns 401 and the helper silently falls back to the cached/default model, leaving the loaded model's context window unpatched; avoid resolving command/OAuth credentials without dropping already available bearer credentials.
Useful? React with 👍 / 👎.
Repro
A focused
bun -e "$REPRO_SCRIPT"probe calleddiscoverLlamaCppModels()with/v1/modelsreturningctx-88kasmeta.n_ctx: 88832andctx-65kasmeta.n_ctx: 65536while/propsreturnedn_ctx: 0; before the fix both discovered models reportedcontextWindow=128000.Cause
packages/coding-agent/src/config/model-discovery.tstyped llama.cpp/v1/modelsentries as{ id: string }, discardingdata[].meta.n_ctxandmeta.n_ctx_train, then built every model from the single/propsserver context window.packages/coding-agent/src/config/model-registry.tsalso treated fresh llama.cpp discovery cache rows as reusable for 24h, so a restarted server could keep stale context metadata until manual cache deletion.Fix
/v1/modelsentries withmeta.n_ctxandmeta.n_ctx_train, preferring those per-model values over/propsand falling back to/propsonly for unloaded models.contextWindowand discovery-derivedmaxTokenswhenmeta.n_ctxappears.Verification
Ran
bun -e "$REPRO_SCRIPT"and observedctx-88k contextWindow=88832plusctx-65k contextWindow=65536; ranbun test packages/coding-agent/test/model-discovery.test.tswith 22 passing tests; ranbun checksuccessfully. Fixes #3310