Replies: 2 comments 2 replies
-
|
My guesss today is elbow grease. Something to redefine settings.yaml model and restart. I've been wondering the same while exploring orca and dolphin. Giving users a different experience based on the type of interaction. |
Beta Was this translation helpful? Give feedback.
-
|
Model swapping in PrivateGPT is the right thing to want — different models have different cost/quality tradeoffs, and locking to one model is expensive for diverse query types. The architecture that works well for this: a routing layer that decides which model to use based on query characteristics, rather than having users manually select. For local inference specifically:
The routing decision can be simple: token count of the prompt + query complexity score (does it contain "why", "how", "compare" → complex; does it contain a specific entity name → factual lookup). For PrivateGPT specifically, the retrieval step is usually the bottleneck, not the generation step. Embedding queries and doing ANN search is fast; the model call is slower. Caching frequent query embeddings and pre-warming the model on common prompt prefixes helps more than model selection for most users. If you're running multiple models simultaneously (one for retrieval reranking, one for generation), you need to be careful about VRAM allocation — 13B at 4-bit + 7B at 4-bit can coexist on a 24GB GPU but requires explicit memory planning. We built a multi-tier routing model for KinthAI's agent inference: https://blog.kinthai.ai/openclaw-multi-tenancy-why-vm-per-user-doesnt-scale covers the resource isolation side; the economic routing model: https://blog.kinthai.ai/agent-wallet-economic-models-autonomous-agents What hardware are you running on — CPU-only, single GPU, or multi-GPU? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone,
I've been using private-GPT, and it's been working great so far. However, I'm curious about incorporating different models into it. I was wondering if anyone has found a comprehensive way of using multiple models with private-GPT, or if it requires some manual effort.
There are two main questions I have:
Is it possible to integrate models other than GPT-4all into private-GPT? If so, how can I go about doing that?
Are there any specific considerations or challenges when using another model within theprivate-GPT?
Looking forward to your answers, I'm amazed it actually runs in my laptop!
Best regards,
Erick Marín Rojas
Beta Was this translation helpful? Give feedback.
All reactions