Skip to content

feat: add DeepSeek/Mistral/OpenAI-compatible providers and model CLI#9

Merged
joungminsung merged 6 commits into
mainfrom
feat/add-more-models
May 21, 2026
Merged

feat: add DeepSeek/Mistral/OpenAI-compatible providers and model CLI#9
joungminsung merged 6 commits into
mainfrom
feat/add-more-models

Conversation

@joungminsung

Copy link
Copy Markdown
Owner

Summary

Expands OpenDocuments's model ecosystem from 5 to 8 providers and introduces a first-class CLI for model management so users no longer need to hand-edit opendocuments.config.ts or .env.

New model plugins (3)

Plugin Models Embedding Notes
model-deepseek DeepSeek-V3.2 / R1 / V4 (upcoming) OpenAI-compatible API, 164K context, cheap reasoning
model-mistral Small 4 (MoE) / Large 2.1 / Codestral / Pixtral mistral-embed (1024-dim) Chat + embeddings + vision
model-openai-compatible Any OpenAI-compatible endpoint Optional vLLM / LM Studio / Together / Fireworks / Groq / DeepInfra / SiliconFlow / OpenRouter; supports extraHeaders and disableEmbedding

Gemma 3/4 and other Ollama models continue to work through the existing model-ollama plugin (ollama pull gemma3:27b + model.llm in config).

New CLI: opendocuments model (7 subcommands)

  • model list [--suggestions] — current config + installed Ollama models (size, params, quantization) + curated catalog of local/cloud options
  • model pull <a> <b> <c> — batch pull with per-model size estimates, summed disk-headroom check, in-place progress, and a failure summary at the end
  • model install-ollama — runs the official Ollama install script on macOS/Linux and waits for the daemon (noop if already running; clear fallback for Windows)
  • model set-key <provider> — password-masked prompt (or --key inline), updates existing .env line instead of appending, warns if .env isn't in .gitignore
  • model test — round-trips the configured LLM + embedder, reports latency / chunks / embedding dimension
  • model switch — interactive provider swap that rewrites only the model: block of the config (preserves everything else)
  • model rm <name> — delete an Ollama model

init wizard

  • Cloud menu adds DeepSeek and Mistral
  • Third backend option: OpenAI-compatible endpoint with baseUrl prompt (vLLM / LM Studio / Groq / Together / Fireworks / OpenRouter)
  • API-key validation extended to grok, deepseek, mistral (previously openai / anthropic / google only)
  • Secondary embedding provider flow generalized from anthropic-only to any provider without embeddings (deepseek joins anthropic)
  • Ollama auto-install: offers to run the official install script and waits up to 15s for daemon startup
  • Pre-pull disk-space check with per-model size estimates and 1.5GB headroom
  • Local model recommendations refreshed for April 2026 (Gemma 3 / Qwen 3.5 / Llama 4 / DeepSeek R1 distilled)
  • Progress updates render in place instead of spamming stdio

doctor diagnostics

  • Per-provider API ping section for openai, anthropic, google, grok, deepseek, mistral, openai-compatible
  • Distinguishes 401/403 (bad key) from network errors and missing env vars, and links to each provider's key-provisioning URL
  • Secondary embedding provider is pinged independently
  • Ollama model-presence check now only fires for models Ollama actually owns (avoids spurious failures when LLM is cloud + embedder is Ollama)

Shared utility

packages/cli/src/utils/ollama.tsisOllamaRunning, listOllamaModels, pullOllamaModel with streaming progress, deleteOllamaModel, getAvailableDiskBytes (via statfsSync), estimateModelSize (curated hint table), getOllamaInstallCommand (platform-aware).

Server bootstrap

PROVIDER_MAP and EMBEDDING_DIMENSIONS extended to cover the three new providers. No breaking changes to existing config.

Test plan

  • npm run build — 30/30 workspaces build clean
  • npm run typecheck — all packages pass
  • New plugin tests: 14/14 (deepseek 5 + mistral 4 + openai-compatible 5)
  • rewriteModelBlock helper tests: 4/4 (cloud swap, openai-compatible + baseUrl, round-trip to ollama, missing-block error)
  • Smoke-tested opendocuments model --help, model list --suggestions against real Ollama install
  • Pre-existing CLI ask.test.ts RAG timeout failure reproduces on unchanged bootstrap (tracked separately, not caused by this PR)
  • Interactive test of init new-provider flow (manual)
  • doctor ping against at least one configured cloud provider (manual)

Out of scope / follow-ups

  • Web UI model manager (CLI only for now)
  • Ollama-native progress parity with ollama pull (we stream the API; it's close but not identical)
  • Changeset entries for the three new plugin packages (add when cutting a release)

joungminsung and others added 4 commits April 20, 2026 12:50
Expands the model plugin ecosystem beyond the existing 5 providers so users
can plug in a much wider range of LLM backends without writing glue code.

- model-deepseek: DeepSeek-V3.2 / R1 / V4 via OpenAI-compatible API (cheap
  reasoning, 164K context). LLM-only; pair with a secondary embedding
  provider.
- model-mistral: Mistral Small 4 (MoE), Large 2.1, Codestral, Pixtral. LLM +
  1024-dim `mistral-embed`, vision-capable.
- model-openai-compatible: generic provider for any OpenAI-compatible
  endpoint — vLLM, LM Studio, Together, Fireworks, Groq, DeepInfra,
  SiliconFlow, OpenRouter. Supports `extraHeaders` and `disableEmbedding`.

Also wires the three new providers into `PROVIDER_MAP` / `EMBEDDING_DIMENSIONS`
in server bootstrap, adds documentation and Gemma 4 Ollama examples, and
updates the README Cloud Providers table.

Gemma 3/4 (and any other Ollama model) is already supported via the existing
`model-ollama` plugin — only requires `ollama pull gemma3:27b` and setting
`model.llm` in the config.

All 14 new unit tests pass. Existing `packages/cli/tests/commands/ask.test.ts`
RAG timeout failure reproduces on unchanged bootstrap code and is unrelated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Makes model setup and switching a first-class CLI flow instead of requiring
hand-editing opendocuments.config.ts. Also extends init to cover the three new
providers added in the previous commit (DeepSeek, Mistral, openai-compatible).

New command: opendocuments model
- model list [--suggestions] — current config + installed Ollama models with
  per-model disk footprint, plus a curated catalog of local/cloud options.
- model pull <name> — streams /api/pull progress, checks Ollama reachability,
  estimates disk footprint, and refuses to clobber a low-disk machine
  silently. Offers the official install command when Ollama is missing.
- model rm <name> — deletes a local Ollama model.
- model test [-p prompt] — round-trips a short prompt against the configured
  LLM and embedder, reports latency/chunks/embedding-dim. Surfaces
  degraded-mode issues immediately.
- model switch — interactive provider swap. Rewrites only the `model:` block
  of opendocuments.config.ts (preserves the rest of the config). Supports
  all 8 providers including openai-compatible with baseUrl.

init improvements
- Cloud menu now includes DeepSeek and Mistral.
- Third backend option: "OpenAI-compatible endpoint" (vLLM / LM Studio / Groq
  / Together / Fireworks / OpenRouter) with baseUrl prompt.
- API key validation extended to grok, deepseek, mistral (previously openai /
  anthropic / google only).
- "Secondary embedding provider" flow generalised from anthropic-only to any
  provider that lacks an embeddings API (deepseek joins anthropic).
- Ollama auto-install: on macOS/Linux, offers to run the official install
  script and waits for the daemon to come up.
- Pre-pull disk-space check with per-model size estimates (1.5GB headroom).
- Progress line updates in place instead of spamming stdio.inherit.
- Local model recommendations refreshed for April 2026 (Gemma 3/Qwen 3.5/
  Llama 4/DeepSeek R1 distilled).

Supporting utilities
- packages/cli/src/utils/ollama.ts — shared Ollama client (isRunning,
  listModels, pullModel w/ progress, deleteModel, disk-space + size
  estimator, install-command selector).
- 4 unit tests for rewriteModelBlock (cloud swap, openai-compatible with
  baseUrl, round-trip back to ollama, missing-block error).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 2 of model UX improvements. Covers the paper cuts left after the
initial `model` command landed.

model command
- pull now accepts multiple names: `opendocuments model pull gemma3:27b bge-m3`
  runs pulls sequentially, prints per-model + total size estimates, sums disk
  headroom, reports which pulls failed at the end.
- install-ollama: standalone command for the common "Ollama missing" case.
  Runs the official install script on macOS/Linux, waits up to 15s for the
  daemon to come up, exits with a clear message if it doesn't. Windows users
  get a direct download link.
- set-key <provider>: prompts (password mode) or accepts --key inline, then
  writes `<PROVIDER>_API_KEY=...` to .env. Updates an existing line instead
  of appending, and warns when .env isn't in .gitignore.

doctor command
- Adds a per-provider diagnostics section for openai, anthropic, google, grok,
  deepseek, mistral, and openai-compatible. Pings each provider's models
  endpoint and reports 401/403 distinctly from network errors. Includes the
  API-key provisioning URL in the "key not set" message.
- Secondary embedding provider (e.g. deepseek + ollama) is also pinged
  independently.
- Ollama model presence check now only fires for models Ollama is actually
  responsible for (previously it would fail noisily when the LLM was on a
  cloud provider and only the embedding was Ollama).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lists the full set of opendocuments model subcommands added in the last two
commits (install-ollama, batch pull, set-key, test, switch) alongside the
existing admin commands. Also clarifies that `opendocuments doctor` now does
per-provider API pings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c4c71ad3de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +167 to +169
log.info('Using local Ollama BGE-M3 as secondary embedding provider.')
embeddingProvider = 'ollama'
embeddingModel = 'bge-m3'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Set disableEmbedding when choosing Ollama embed fallback

When users answer “no” to Does this endpoint provide embeddings?, this branch only sets embeddingProvider = 'ollama', but the primary provider remains openai-compatible with embedding still enabled. In bootstrap.loadModelPlugin, the main plugin is probed for embeddings before secondary providers are loaded, so LLM-only endpoints (e.g. Groq) fail that probe and fall back to stub models instead of using Ollama embeddings. This makes the default no-embedding path degrade at runtime.

Useful? React with 👍 / 👎.

* Preserves surrounding config — throws if the model block can't be located.
*/
export function rewriteModelBlock(src: string, update: ModelBlockUpdate): string {
const re = /model:\s*\{[\s\S]*?\n\s*\},/m

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse model block safely before rewriting config

This regex stops at the first \n\s*}, inside model: { ... }, so a nested object such as extraHeaders: { ... }, causes partial replacement and can leave a syntactically broken config after model switch. Since the new openai-compatible provider documents nested model options, this string-based matcher should be replaced with brace-aware parsing (or at least detect nested objects and abort).

Useful? React with 👍 / 👎.

grok: { llm: 'grok-4', embedding: 'grok-2-embed' },
deepseek: { llm: 'deepseek-chat', embedding: 'bge-m3', embeddingProvider: 'ollama' },
mistral: { llm: 'mistral-small-latest', embedding: 'mistral-embed' },
'openai-compatible': { llm: '', embedding: 'bge-m3', embeddingProvider: 'ollama' },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require a non-empty LLM default for openai-compatible switch

The openai-compatible default uses an empty llm, so pressing Enter on the prompt can produce an empty value; rewriteModelBlock then omits the llm line entirely (update.llm ? ... : null). The config loader will backfill its generic default model, which is usually invalid for OpenAI-compatible endpoints and leads to generation failures after switching providers.

Useful? React with 👍 / 👎.

@joungminsung joungminsung merged commit 68b7404 into main May 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant