Is there a comprehensive way of using differend models or is elbow grease needed? #225

ErickMaRi · 2023-05-17T00:56:36Z

ErickMaRi
May 17, 2023

Hey everyone,

I've been using private-GPT, and it's been working great so far. However, I'm curious about incorporating different models into it. I was wondering if anyone has found a comprehensive way of using multiple models with private-GPT, or if it requires some manual effort.

There are two main questions I have:

Is it possible to integrate models other than GPT-4all into private-GPT? If so, how can I go about doing that?
Are there any specific considerations or challenges when using another model within theprivate-GPT?

Looking forward to your answers, I'm amazed it actually runs in my laptop!

Best regards,
Erick Marín Rojas

tedraymond · 2023-12-06T03:25:15Z

tedraymond
Dec 6, 2023

My guesss today is elbow grease.

Something to redefine settings.yaml model and restart.

I've been wondering the same while exploring orca and dolphin. Giving users a different experience based on the type of interaction.

2 replies

FaizalJnu Mar 11, 2024

Did you find anything yet? I am stuck on the same issue, is there a way to simply change some files to use a different gguf format model from huggingface? Like maybe a smaller 2B parameter model.

dbzoo Mar 18, 2024

I suggest using ollama allowing you to switch models. For example to use dolphin-mistral
$ ollama pull dolphin-mistral
And use the model like this - see settings-ollama.yaml for expanded setup details.

llm:
  mode: ollama
  
ollama:
  llm_model: dolphin-mistral

kinthaiofficial · 2026-04-29T00:36:44Z

kinthaiofficial
Apr 29, 2026

Model swapping in PrivateGPT is the right thing to want — different models have different cost/quality tradeoffs, and locking to one model is expensive for diverse query types.

The architecture that works well for this: a routing layer that decides which model to use based on query characteristics, rather than having users manually select.

For local inference specifically:

Short, factual queries (dates, definitions, lookups) → small fast model (3B-7B)
Summarization over retrieved context → mid-tier model (13B-34B), latency-tolerant
Complex reasoning (analysis, synthesis, explanation) → largest available model

The routing decision can be simple: token count of the prompt + query complexity score (does it contain "why", "how", "compare" → complex; does it contain a specific entity name → factual lookup).

For PrivateGPT specifically, the retrieval step is usually the bottleneck, not the generation step. Embedding queries and doing ANN search is fast; the model call is slower. Caching frequent query embeddings and pre-warming the model on common prompt prefixes helps more than model selection for most users.

If you're running multiple models simultaneously (one for retrieval reranking, one for generation), you need to be careful about VRAM allocation — 13B at 4-bit + 7B at 4-bit can coexist on a 24GB GPU but requires explicit memory planning.

We built a multi-tier routing model for KinthAI's agent inference: https://blog.kinthai.ai/openclaw-multi-tenancy-why-vm-per-user-doesnt-scale covers the resource isolation side; the economic routing model: https://blog.kinthai.ai/agent-wallet-economic-models-autonomous-agents

What hardware are you running on — CPU-only, single GPU, or multi-GPU?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a comprehensive way of using differend models or is elbow grease needed? #225

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Is there a comprehensive way of using differend models or is elbow grease needed? #225

Uh oh!

ErickMaRi May 17, 2023

Replies: 2 comments · 2 replies

Uh oh!

tedraymond Dec 6, 2023

Uh oh!

FaizalJnu Mar 11, 2024

Uh oh!

dbzoo Mar 18, 2024

Uh oh!

kinthaiofficial Apr 29, 2026

ErickMaRi
May 17, 2023

Replies: 2 comments 2 replies

tedraymond
Dec 6, 2023

kinthaiofficial
Apr 29, 2026