fix(providers): honor llama.cpp per-model context windows by roboomp · Pull Request #3311 · can1357/oh-my-pi

roboomp · 2026-06-23T12:25:03Z

Repro

A focused bun -e "$REPRO_SCRIPT" probe called discoverLlamaCppModels() with /v1/models returning ctx-88k as meta.n_ctx: 88832 and ctx-65k as meta.n_ctx: 65536 while /props returned n_ctx: 0; before the fix both discovered models reported contextWindow=128000.

Cause

packages/coding-agent/src/config/model-discovery.ts typed llama.cpp /v1/models entries as { id: string }, discarding data[].meta.n_ctx and meta.n_ctx_train, then built every model from the single /props server context window. packages/coding-agent/src/config/model-registry.ts also treated fresh llama.cpp discovery cache rows as reusable for 24h, so a restarted server could keep stale context metadata until manual cache deletion.

Fix

Parse llama.cpp /v1/models entries with meta.n_ctx and meta.n_ctx_train, preferring those per-model values over /props and falling back to /props only for unloaded models.
Add selected-model metadata refresh so a lazy-loaded/sleeping llama.cpp model patches its registry contextWindow and discovery-derived maxTokens when meta.n_ctx appears.
Bypass fresh-cache reuse for llama.cpp refreshes so normal refresh paths re-query the local server after restarts, while retaining cached fallback on discovery failure.
Add regression coverage for per-model context discovery, lazy selected-model patching, and fresh-cache bypass.

Verification

Ran bun -e "$REPRO_SCRIPT" and observed ctx-88k contextWindow=88832 plus ctx-65k contextWindow=65536; ran bun test packages/coding-agent/test/model-discovery.test.ts with 22 passing tests; ran bun check successfully. Fixes #3310

Read per-model llama.cpp meta.n_ctx values during discovery, refresh selected models after lazy load, and bypass fresh cache reuse for llama.cpp refreshes so server restarts update context windows.\n\nFixes #3310

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24adad2890

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-23T12:29:27Z

+		const override = this.#resolveLiveModelOverride(current);
+		const patch: ModelPatch = {};
+		if (override?.contextWindow === undefined && current.contextWindow !== contextWindow) {
+			patch.contextWindow = contextWindow;


Respect custom llama.cpp model limits

When a llama.cpp provider uses discovery and also defines a same-id custom model in models.yml, user-pinned contextWindow/maxTokens from that custom model are merged via #mergeCustomModels, not stored in #modelOverrides. This guard only checks modelOverrides (and the maxTokens guard below does the same), so simply selecting the model can replace the configured limits with /models metadata and mutate the registry until reload; please treat matching custom model fields as overrides here as well.

Useful? React with 👍 / 👎.

Treat same-id custom llama.cpp model contextWindow and maxTokens fields as pinned limits during selected-model metadata refresh.\n\nFixes #3310

roboomp · 2026-06-23T12:33:35Z

Updated #3311:

Selected-model llama.cpp metadata refresh now treats same-id custom models.yml contextWindow and maxTokens as pinned limits, matching modelOverrides behavior.
Added a regression test covering a discovered llama.cpp provider with a same-id custom model whose limits must survive meta.n_ctx refresh.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15d0e95d6f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-23T12:39:07Z

+		if (!isLlamaCppDiscovery) {
+			return model;
+		}
+		const contextWindow = await discoverLlamaCppModelContextWindow(model, this.#discoveryContext());


Avoid resolving keys while switching llama.cpp models

When the selected llama.cpp provider is configured with apiKey: "!..." (or an OAuth credential that needs refresh), this call goes through #discoveryContext(), whose resolver calls getApiKeyForProvider; command-backed keys run execSync with a 10s timeout before the model switch completes. setModel intentionally used hasConfiguredAuth so /model and role switches stayed non-blocking and credentials were resolved lazily per request, so selecting a protected llama.cpp model can now freeze the UI just to refresh metadata. Use a non-resolving auth probe here, or defer this metadata refresh to the request path where the key is already being resolved.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-23T12:39:07Z


 ### Fixed

+- Fixed llama.cpp discovery to prefer per-model `/v1/models` `meta.n_ctx`/`meta.n_ctx_train` values, refresh selected models after lazy load, and bypass fresh-cache reuse so server restarts update context windows. ([#3310](https://github.com/can1357/oh-my-pi/issues/3310))


Move the changelog entry to Unreleased

This new entry is under the versioned ## [16.1.16] - 2026-06-23 section while ## [Unreleased] is empty. The root AGENTS.md says new entries must go under Unreleased and released sections are immutable, so the next release notes will omit this fix while also rewriting historical changelog content.

Useful? React with 👍 / 👎.

Use a non-resolving discovery context for selected-model llama.cpp metadata refresh so command-backed and OAuth credentials stay lazy during model switches.\n\nFixes #3310

roboomp · 2026-06-23T12:46:02Z

Updated #3311:

Selected-model llama.cpp metadata refresh now uses a non-resolving discovery context, so /model switches do not execute command-backed keys or refresh OAuth just to read /v1/models metadata.
Added a regression test proving command-backed apiKey: "!..." is not invoked during selected-model metadata refresh.

Move the llama.cpp per-model context window entry from the released 16.1.16 section to [Unreleased] so the next release notes carry it.\n\nFixes #3310

roboomp · 2026-06-23T12:48:08Z

Updated #3311:

Moved the llama.cpp per-model context window changelog entry from the released [16.1.16] section to [Unreleased] so the next release carries it.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e431f5caa6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-23T12:51:14Z

+		if (!isLlamaCppDiscovery) {
+			return model;
+		}
+		const contextWindow = await discoverLlamaCppModelContextWindow(model, this.#nonResolvingDiscoveryContext());


Preserve llama.cpp auth for metadata refresh

When a llama.cpp server is protected by a configured or stored API key, this new selected-model refresh path uses #nonResolvingDiscoveryContext, whose resolver always returns undefined, so discoverLlamaCppModelContextWindow calls /models without Authorization. A protected endpoint returns 401 and the helper silently falls back to the cached/default model, leaving the loaded model's context window unpatched; avoid resolving command/OAuth credentials without dropping already available bearer credentials.

Useful? React with 👍 / 👎.

fix(providers): honored llama cpp model context

24adad2

Read per-model llama.cpp meta.n_ctx values during discovery, refresh selected models after lazy load, and bypass fresh cache reuse for llama.cpp refreshes so server restarts update context windows.\n\nFixes #3310

roboomp mentioned this pull request Jun 23, 2026

llama.cpp: context window stale and shared across all models — ignores per-model meta.n_ctx from /v1/models #3310

Closed

github-actions Bot added the vouched Passed the vouch gate label Jun 23, 2026

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

fix(providers): preserved custom llama cpp limits

15d0e95

Treat same-id custom llama.cpp model contextWindow and maxTokens fields as pinned limits during selected-model metadata refresh.\n\nFixes #3310

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

fix(providers): avoided llama cpp key resolution on switch

f552f26

Use a non-resolving discovery context for selected-model llama.cpp metadata refresh so command-backed and OAuth credentials stay lazy during model switches.\n\nFixes #3310

docs(changelog): moved llama cpp fix under unreleased

e431f5c

Move the llama.cpp per-model context window entry from the released 16.1.16 section to [Unreleased] so the next release notes carry it.\n\nFixes #3310

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

can1357 merged commit 94060e8 into main Jun 24, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(providers): honor llama.cpp per-model context windows#3311

fix(providers): honor llama.cpp per-model context windows#3311
can1357 merged 4 commits into
mainfrom
farm/b1c3c65a/fix-llama-cpp-context-window

roboomp commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Uh oh!

roboomp commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Uh oh!

roboomp commented Jun 23, 2026

Uh oh!

roboomp commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		### Fixed

		- Fixed llama.cpp discovery to prefer per-model `/v1/models` `meta.n_ctx`/`meta.n_ctx_train` values, refresh selected models after lazy load, and bypass fresh-cache reuse so server restarts update context windows. ([#3310](https://github.com/can1357/oh-my-pi/issues/3310))

Conversation

roboomp commented Jun 23, 2026

Repro

Cause

Fix

Verification

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

roboomp commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

roboomp commented Jun 23, 2026

Uh oh!

roboomp commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants