Complete technical reference for the ResilientLLM library API.
A unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google/Gemini, OpenRouter, Ollama) with built-in resilience features including rate limiting, retries, circuit breakers, and error handling.
Creates a new ResilientLLM instance.
Signature:
new ResilientLLM(options?: ResilientLLMOptions)Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
options |
ResilientLLMOptions |
No | {} |
Configuration options for the ResilientLLM instance |
ResilientLLMOptions:
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
aiService |
string |
No | process.env.PREFERRED_AI_SERVICE or "anthropic" |
AI service provider: "openai", "anthropic", "google", "openrouter", or "ollama" |
model |
string |
No | process.env.PREFERRED_AI_MODEL or "claude-3-5-sonnet-20240620" |
Model identifier for the selected AI service |
temperature |
number |
No | process.env.AI_TEMPERATURE or 0 |
Temperature parameter (0-2) controlling randomness in responses |
maxTokens |
number |
No | process.env.MAX_TOKENS or 2048 |
Maximum number of tokens in the response |
timeout |
number |
No | process.env.LLM_TIMEOUT or 60000 |
Request timeout in milliseconds |
cacheStore |
Object |
No | {} |
Cache store object for storing successful responses |
maxInputTokens |
number |
No | process.env.MAX_INPUT_TOKENS or 100000 |
Maximum number of input tokens allowed |
topP |
number |
No | process.env.AI_TOP_P or 0.95 |
Top-p sampling parameter (0-1) |
rateLimitConfig |
RateLimitConfig |
No | { requestsPerMinute: 10, llmTokensPerMinute: 150000 } |
Rate limiting configuration |
retries |
number |
No | 3 |
Number of retry attempts for failed requests |
backoffFactor |
number |
No | 2 |
Exponential backoff multiplier between retries |
onRateLimitUpdate |
Function |
No | undefined |
Callback function called when rate limit information is updated |
onError |
Function |
No | undefined |
Currently not used (reserved for future use) |
RateLimitConfig:
| Property | Type | Description |
|---|---|---|
requestsPerMinute |
number |
Maximum number of requests allowed per minute |
llmTokensPerMinute |
number |
Maximum number of LLM tokens allowed per minute |
Returns: ResilientLLM instance
Example:
const llm = new ResilientLLM({
aiService: 'openai',
model: 'gpt-5-nano',
maxTokens: 2048,
temperature: 0.7,
rateLimitConfig: {
requestsPerMinute: 60,
llmTokensPerMinute: 90000
}
});Sends a chat completion request to the configured LLM provider.
Signature:
chat(conversationHistory: Message[], llmOptions?: ChatOptions): Promise<ChatResponse>Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
conversationHistory |
Message[] |
Yes | Array of message objects representing the conversation history |
llmOptions |
ChatOptions |
No | Override options for this specific request |
Message:
| Property | Type | Required | Description |
|---|---|---|---|
role |
string |
Yes | Message role: "system", "user", "assistant", or "tool" |
content |
string |
Yes | Message content |
ChatOptions:
| Property | Type | Description |
|---|---|---|
aiService |
string |
Override AI service for this request |
model |
string |
Override model for this request |
maxTokens |
number |
Override max tokens for this request |
temperature |
number |
Override temperature for this request |
topP |
number |
Override top-p for this request |
maxInputTokens |
number |
Override max input tokens for this request |
maxCompletionTokens |
number |
Maximum completion tokens (for reasoning models) |
reasoningEffort |
string |
Reasoning effort level: "low", "medium", or "high" (for reasoning models) |
apiKey |
string |
Override API key for this request (takes precedence over ProviderRegistry) |
tools |
Tool[] |
Array of tool definitions for function calling |
responseFormat |
Object | string |
Response format specification (json_object/json_schema object shapes, plain schema-like object, or JSON aliases: "json", "object", "json_object") |
outputConfig |
Object |
Legacy/migration support. Anthropic-style alternative structured-output input shape, normalized internally via responseFormat. Prefer responseFormat for all new usage. |
response_format |
Object | string |
Legacy/migration support. Snake_case alias for responseFormat; passthrough-friendly for provider-native payloads. Prefer responseFormat for all new usage. |
output_config |
Object |
Legacy/migration support. Snake_case alias for outputConfig; passed through as-is when provided. Prefer responseFormat for all new usage. |
Use one naming style per field to avoid ambiguity:
- Prefer camelCase (
responseFormator its aliasoutputConfig) in app code. - Prefer snake_case (
response_format,output_config) when reusing raw provider payload snippets. - Do not send both aliases for the same field in one request; conflicting info may result in error.
Tool:
| Property | Type | Description |
|---|---|---|
type |
string |
Tool type, typically "function" |
function |
Object |
Function definition |
function.name |
string |
Function name |
function.description |
string |
Function description |
function.parameters |
Object |
Function parameters schema (OpenAI format) |
function.input_schema |
Object |
Function input schema (Anthropic format) |
Returns: Promise<ChatResponse>
- Always returns a predictable envelope:
response.contentis the assistant output (string in text mode, parsed object in JSON/schema mode)response.toolCallsis included when tool calls are returned
response.metadatais always included
ChatResponse:
| Property | Type | Description |
|---|---|---|
content |
string | Object | null |
The assistant content (text by default, normalized JSON object in JSON modes) |
toolCalls |
Array |
Array of tool call objects (if tools were used) |
metadata |
OperationMetadata |
Always included (request id, config, timing, retries, rate limiting, usage, etc.) |
Throws:
ResilientLLMError— Normalized failures fromchat()(after internal retries when applicable). Useerror.code(ResilientLLMErrorCode),error.retryable,error.metadata, anderror.cause(log server-side). The canonical code list is inlib/ResilientLLMError.ts.- Structured output failures use codes such as
JSON_PARSE_ERROR,JSON_MODE_FAILURE,SCHEMA_MISMATCH, orVALIDATION_ERROR; details may appear onerror.cause.
Notes:
- API keys can be provided via
llmOptions.apiKey,ProviderRegistry.configure(), or environment variables - The implementation uses
ProviderRegistryto manage providers and their configurations - Response parsing is handled generically using provider-specific
chatConfigsettings - For schema mode, validation checks top-level required fields and primitive types (
string,number,boolean,integer). Schema mismatch errors include avalidationobject withmissingFields,extraFields, andtypeMismatchesarrays
Example:
const conversationHistory = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' }
];
const { content } = await llm.chat(conversationHistory);
console.log(content); // "The capital of France is Paris."Example with tools:
const response = await llm.chat(conversationHistory, {
tools: [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get the weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string' }
}
}
}
}]
});
// response: { content: null, toolCalls: [...] }Example with API key override:
// Override API key for this specific request
const response = await llm.chat(conversationHistory, {
apiKey: 'sk-custom-key-here',
aiService: 'openai',
model: 'gpt-5-nano'
});Example with operation metadata:
const llm = new ResilientLLM({
aiService: 'openai',
model: 'gpt-5-nano',
});
const { content, metadata } = await llm.chat(conversationHistory);
console.log(content); // Assistant reply text
console.log(metadata?.requestId);
console.log(metadata?.timing?.totalTimeMs);
console.log(metadata?.usage); // prompt_tokens, completion_tokens, total_tokensCancels all ongoing LLM operations for this instance.
Signature:
abort(): voidReturns: void
Description:
- Aborts all active HTTP requests initiated by this
ResilientLLMinstance - Clears all resilient operation instances
- Resets the internal abort controller
Example:
const promise = llm.chat(conversationHistory);
llm.abort(); // Cancels the ongoing requestNote: For API URLs and key checks, import ProviderRegistry: use ProviderRegistry.getChatApiUrl(providerName) and ProviderRegistry.buildApiUrl(providerName, baseUrl, null) for URLs; use ProviderRegistry.hasApiKey(providerName) to check if a key is present (keys are not exposed). See Custom Provider Guide for details.
Converts a messages array to the format required by Anthropic's API.
Signature:
formatMessageForAnthropic(messages: Message[]): { system?: string, messages: Message[] }Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
messages |
Message[] |
Yes | Array of message objects |
Returns: Object with properties:
system-string | undefined- System message content if presentmessages-Message[]- Messages array without system messages
Description:
- Extracts system messages from the messages array
- Returns system content separately and remaining messages without system role
Example:
const messages = [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' }
];
const { system, messages } = llm.formatMessageForAnthropic(messages);
// system: "You are helpful."
// messages: [{ role: 'user', content: 'Hello!' }]Normalizes an error into ResilientLLMError. Used internally when chat() fails; you can call it directly if you need the same mapping (e.g. tests).
Signature:
parseError(statusCode: number | null, error: Error, operationMetadata?: OperationMetadata | null): neverParameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
statusCode |
number | null |
Yes | Provider HTTP status when known, or null |
error |
Error |
Yes | Underlying error |
operationMetadata |
OperationMetadata | null |
No | Merged onto the thrown error’s metadata |
Status Code Mappings:
| Status Code | Error Message |
|---|---|
400 |
"Bad request" |
401 |
"Invalid API Key" |
403 |
"You are not authorized to access this resource" |
404 |
"Not found" |
429 |
"Rate limit exceeded" |
500 |
"Internal server error" |
503 |
"Service unavailable" |
529 |
"API temporarily overloaded" |
| Other | "Unknown error" |
Note: This method is called internally by the chat() method when errors occur. You typically don't need to call it directly.
Generic method to parse chat completion response using provider configuration. This is the preferred method used internally.
Signature:
parseChatCompletion(data: Object, chatConfig: Object, tools?: Tool[]): string | ChatResponseParameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
data |
Object |
Yes | API response object |
chatConfig |
Object |
Yes | Chat configuration from provider (contains responseParsePath) |
tools |
Tool[] |
No | Tools array if function calling was used |
Returns: string | ChatResponse
- If
toolsprovided and tool calls found: ReturnsChatResponsewithcontentandtoolCalls - Otherwise: Returns
stringcontent
chatConfig.responseParsePath:
- Path to extract content from response (e.g.,
'choices[0].message.content','content[0].text','response') - Supports dot notation and bracket notation for nested values
Example:
const chatConfig = {
responseParsePath: 'choices[0].message.content',
toolSchemaType: 'openai'
};
const data = {
choices: [{
message: {
content: "Hello!",
tool_calls: []
}
}]
};
const content = llm.parseChatCompletion(data, chatConfig);
// "Hello!"Parses OpenAI chat completion response.
Signature:
parseOpenAIChatCompletion(data: Object, tools?: Tool[]): string | ChatResponseStatus: parseChatCompletion() with chatConfig instead.
Parses Anthropic chat completion response.
Signature:
parseAnthropicChatCompletion(data: Object, tools?: Tool[]): stringStatus: parseChatCompletion() with chatConfig instead.
Parses Ollama chat completion response.
Signature:
parseOllamaChatCompletion(data: Object, tools?: Tool[]): stringStatus: parseChatCompletion() with chatConfig instead.
Parses Google chat completion response (OpenAI-compatible endpoint).
Signature:
parseGoogleChatCompletion(data: Object, tools?: Tool[]): stringStatus: parseChatCompletion() with chatConfig instead.
Retries the chat request with an alternate AI service when the current service returns rate limit errors (429, 529).
Signature:
retryChatWithAlternateService(conversationHistory: Message[], llmOptions?: ChatOptions): Promise<ChatResponse>Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
conversationHistory |
Message[] |
Yes | Array of message objects |
llmOptions |
ChatOptions |
No | LLM options for the request |
Returns: Promise<ChatResponse> - Response from the alternate service
Throws:
Error- If no alternative service is available
Description:
- Automatically switches to the next available service from
ProviderRegistry.getDefaultModels() - Skips services that have already failed
- Uses default model for each service
Example:
// Automatically called internally when rate limit errors occur
// Can also be called manually if needed
const response = await llm.retryChatWithAlternateService(conversationHistory);Estimates the number of tokens in a given text string.
Signature:
static estimateTokens(text: string): numberParameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string |
Yes | Text to estimate tokens for |
Returns: number - Estimated token count
Description:
- For texts longer than 10,000 characters: Uses approximation (~4 characters per token)
- For shorter texts: Uses accurate tokenization with Tiktoken encoder (o200k_base encoding)
- Uses lazy initialization of the encoder
Example:
const tokenCount = ResilientLLM.estimateTokens("Hello, world!");
// Returns estimated token countRepresents a single message in a conversation.
interface Message {
role: 'system' | 'user' | 'assistant' | 'tool';
content: string;
}Response envelope returned by chat() on every call.
contentis the assistant output:- text mode ->
string - JSON/schema mode -> parsed JS object
- text mode ->
toolCallsis present when tool calls were returnedmetadatais always included
interface ChatResponse {
content: string | Object | null;
toolCalls?: Array<any>;
metadata: OperationMetadata;
}Operation metadata attached to ChatResponse.metadata on every call. Used for observability, logging, and debugging.
interface OperationMetadata {
requestId: string;
operationId: string;
startTime: number;
finishReason?: string | null;
config: {
aiService: string;
model: string;
temperature: number | null;
maxTokens: number | null;
topP: number | null;
maxInputTokens: number;
estimatedInputTokens: number;
enableCache: boolean;
// ... resilience config (retries, rateLimitConfig, etc.)
};
events: Array<any>;
timing: {
totalTimeMs: number | null;
rateLimitWaitMs: number;
httpRequestMs: number | null;
};
retries: Array<any>;
rateLimiting: { requestedTokens: number; totalWaitMs: number; [key: string]: any };
circuitBreaker: Object;
http: {
url: string;
method: string;
statusCode: number | null;
headers: Record<string, string>;
durationMs?: number;
error?: string;
};
cache: { enabled: boolean; [key: string]: any };
service: { attempted: string[]; final: string };
usage?: {
prompt_tokens: number | null;
completion_tokens: number | null;
total_tokens: number | null;
};
}Configuration for rate limiting.
interface RateLimitConfig {
requestsPerMinute: number;
llmTokensPerMinute: number;
}Constructor options for ResilientLLM.
interface ResilientLLMOptions {
aiService?: string;
model?: string;
temperature?: number;
maxTokens?: number;
timeout?: number;
cacheStore?: Object;
maxInputTokens?: number;
topP?: number;
rateLimitConfig?: RateLimitConfig;
retries?: number;
backoffFactor?: number;
onRateLimitUpdate?: (info: RateLimitInfo) => void;
onError?: (error: Error) => void;
}Options for individual chat requests.
interface ChatOptions {
aiService?: string;
model?: string;
maxTokens?: number;
temperature?: number;
topP?: number;
maxInputTokens?: number;
maxCompletionTokens?: number;
reasoningEffort?: 'low' | 'medium' | 'high';
apiKey?: string;
tools?: Tool[];
responseFormat?: Object;
outputConfig?: Object;
}Use responseFormat when you need the assistant response as JSON, optionally matching a particular schema.
- JSON mode (no schema): ensures the reply is a single JSON object (library parses it for you).
- Schema mode: provides a JSON Schema so the library can validate the parsed object and throw
SCHEMA_MISMATCHwhen required keys/types don’t match.
Supplying a schema
You can supply a schema in any of these equivalent shapes (pick one and stick to it):
- OpenAI-style wrapper (recommended when you want to be explicit):
responseFormat: {
type: 'json_schema',
json_schema: {
name: 'my_payload',
schema: {
type: 'object',
properties: {
answer: { type: 'string' },
citations: { type: 'array', items: { type: 'string' } }
},
required: ['answer']
}
}
}- Short wrapper (schema directly on the object):
responseFormat: {
type: 'json_schema',
schema: {
type: 'object',
properties: { answer: { type: 'string' } },
required: ['answer']
}
}- Plain schema-like object (auto-detected as a schema):
responseFormat: {
type: 'object',
properties: { answer: { type: 'string' } },
required: ['answer']
}End-to-end example (schema mode)
const llm = new ResilientLLM({ aiService: 'openai', model: 'gpt-5-nano' });
const result = await llm.chat(
[{ role: 'user', content: 'Return an answer and citations.' }],
{
responseFormat: {
type: 'json_schema',
json_schema: {
name: 'answer_payload',
schema: {
type: 'object',
properties: {
answer: { type: 'string' },
citations: { type: 'array', items: { type: 'string' } }
},
required: ['answer']
}
}
}
}
);
// `result.content` is a parsed JS object when `responseFormat` requests JSON/schema mode.Validation scope (important)
The built-in validator is intentionally lightweight: it checks required keys, extra keys, and primitive types at the top level (string, number, boolean, integer).
- Extra keys are enforced only when your schema sets
additionalProperties: false(and the schema hasproperties). - For deeper validation needs (nested objects, enums, regex, oneOf/anyOf, etc.), run your own schema validator after the call.
Example: additionalProperties: false + required
const result = await llm.chat(messages, {
responseFormat: {
type: 'json_schema',
json_schema: {
name: 'answer_payload',
schema: {
type: 'object',
additionalProperties: false,
properties: {
answer: { type: 'string' }
},
required: ['answer']
}
}
}
});
// `result.content` is { answer: string } when the model output matches the schema.
// If the model returns invalid JSON or extra keys, `llm.chat(...)` throws StructuredOutputError (e.g. `SCHEMA_MISMATCH`).// JSON alias strings (equivalent to { type: 'json_object' })
'json'
'object'
'json_object'
// OpenAI-compatible JSON mode
{ type: 'json_object' }
// When `responseFormat` requests JSON, `llm.chat(...)` resolves to a response envelope
// where `.content` is the parsed JS object.Tool definition for function calling.
interface Tool {
type: string;
function: {
name: string;
description: string;
parameters?: Object; // OpenAI format
input_schema?: Object; // Anthropic format
};
}Failures from chat() are thrown as ResilientLLMError (see chat() Throws above). That type is the consumer-facing surface: code, retryable, optional metadata (same shape as success), and cause for logging.
Stable string codes — ResilientLLMErrorCode in lib/ResilientLLMError.ts (including PROVIDER_*, structured-output codes, resilience-related codes, and configuration/capability codes). retryable is defined there for codes where a simple retry might help.
Use error.code for branching, not raw HTTP status. When a provider HTTP status was available to the library, it may also appear under metadata (e.g. provider.httpStatus / http).
API keys are required for most LLM providers. They can be provided in three ways (in order of precedence):
- Per-request via
llmOptions.apiKey(highest priority) - Via
ProviderRegistry.configure()with directapiKeyparameter - Via environment variables (lowest priority)
For advanced use cases (custom providers, multiple API keys, or programmatic configuration), see the Custom Provider Guide - Authentication Configuration.
Set at least one API key for your chosen service:
| Variable | Service | Required |
|---|---|---|
OPENAI_API_KEY |
OpenAI | Yes (if using OpenAI) |
ANTHROPIC_API_KEY |
Anthropic | Yes (if using Anthropic) |
GOOGLE_API_KEY or GOOGLE_GENERATIVE_AI or GEMINI_API_KEY |
Yes (if using Google) | |
OPENROUTER_API_KEY |
OpenRouter | Yes (if using OpenRouter) |
OLLAMA_API_KEY |
Ollama | No (optional) |
Note: For custom providers, use the environment variable names specified in ProviderRegistry.configure() via envVarNames.
| Variable | Default | Description |
|---|---|---|
PREFERRED_AI_SERVICE |
"anthropic" |
Default AI service |
PREFERRED_AI_MODEL |
"claude-3-5-sonnet-20240620" |
Default model |
AI_TEMPERATURE |
0 |
Default temperature |
MAX_TOKENS |
2048 |
Default max tokens |
LLM_TIMEOUT |
60000 |
Default timeout (ms) |
MAX_INPUT_TOKENS |
100000 |
Default max input tokens |
AI_TOP_P |
0.95 |
Default top-p value |
OLLAMA_API_URL |
"http://localhost:11434/api/generate" |
Ollama API URL |
OPENROUTER_HTTP_REFERER |
undefined |
Optional attribution header (HTTP-Referer) for OpenRouter |
OPENROUTER_APP_TITLE |
undefined |
Optional attribution header (X-Title) for OpenRouter |
STORE_AI_API_CALLS |
undefined |
Set to "true" to store API calls (OpenAI) |
{
"id": "chatcmpl-123456",
"object": "chat.completion",
"created": 1728933352,
"model": "gpt-4o-2024-08-06",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Response text",
"tool_calls": []
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 19,
"completion_tokens": 10,
"total_tokens": 29
}
}{
"id": "msg_123",
"type": "message",
"role": "assistant",
"content": [{
"type": "text",
"text": "Response text"
}],
"model": "claude-3-5-sonnet-20240620",
"usage": {
"input_tokens": 19,
"output_tokens": 10
}
}Same format as OpenAI response.
{
"model": "llama3.1:8b",
"created_at": "2024-01-01T00:00:00.000Z",
"response": "Response text",
"done": true,
"context": [],
"total_duration": 1000,
"load_duration": 500,
"prompt_eval_count": 10,
"prompt_eval_duration": 200,
"eval_count": 20,
"eval_duration": 300
}Each service has a default model configured. Use ProviderRegistry.getDefaultModels() to get all default models:
- Anthropic:
claude-3-5-sonnet-20240620 - OpenAI:
gpt-5-nano - Google:
gemini-2.0-flash - Ollama:
llama3.1:8b
Models starting with "o" (e.g., "o1", "o3") or "gpt-5" are treated as reasoning models and use different parameters:
max_completion_tokensinstead ofmax_tokensreasoning_effortparameter ("low","medium","high", defaults to"medium")- No
temperatureortop_pparameters
The library uses a token bucket algorithm with two buckets:
- Request Bucket: Limits requests per minute
- LLM Token Bucket: Limits LLM tokens per minute
Rate limits can be updated dynamically from API response headers:
retry-afterheader is respected- Rate limit information from responses updates buckets automatically
onRateLimitUpdatecallback is invoked when limits change
- Each retry attempt counts as a separate failure
- Circuit opens after configured failure threshold
- Cooldown period prevents immediate retries
- Success resets the failure count
Provide a cache store object in constructor options:
const cacheStore = {};
const llm = new ResilientLLM({ cacheStore });Cache keys are SHA-256 hashes of:
- API URL
- Request body (JSON stringified)
- Headers (JSON stringified)
- Only successful responses (status 200) are cached
- Cache is checked before making HTTP requests
- Cache hits return immediately without API call
Use abort() method to cancel all ongoing operations:
const llm = new ResilientLLM({ /* ... */ });
const promise = llm.chat(conversationHistory);
llm.abort(); // Cancels the requestTimeouts are enforced using AbortController:
- Timeout applies to entire operation (including retries)
- On timeout,
AbortControlleraborts the HTTP request chat()rejects withResilientLLMError; the original timeout is typically onerror.cause(namemay beTimeoutError)
All providers are managed through ProviderRegistry. The implementation uses:
ProviderRegistry.get(providerName)- Get provider configurationProviderRegistry.getChatApiUrl(providerName)- Get chat API URLProviderRegistry.getChatConfig(providerName)- Get chat configurationProviderRegistry.buildApiUrl(providerName, url)- Build API URL with query params if neededProviderRegistry.buildAuthHeaders(providerName, apiKey, defaultHeaders)- Build authentication headersProviderRegistry.hasApiKey(providerName)- Check if API key is available
See Custom Provider Guide for details on configuring providers.
- System messages are extracted and sent separately
- Tool definitions use
input_schemainstead ofparameters - API version header:
anthropic-version: 2023-06-01 - Uses
x-api-keyheader instead ofAuthorization
- Supports function calling with
toolsparameter - Supports
response_formatfor JSON mode - Uses standard
Authorization: Bearer <token>header - Can store API calls if
STORE_AI_API_CALLS=true
- Uses OpenAI-compatible endpoint
https://openrouter.ai/api/v1/chat/completions - Uses
Authorization: Bearer <token>header - Works with provider-prefixed model IDs (for example
openai/gpt-5-nanooropenai/o1) - Choosing
openrouter/freemodel will select a free model, but the quality might degrade severly - Optional attribution headers can be set via
OPENROUTER_HTTP_REFERERandOPENROUTER_APP_TITLE
- Uses OpenAI-compatible endpoint
- Same format as OpenAI for requests/responses
- Requires
GEMINI_API_KEYenvironment variable - Authentication: Uses header authentication (
Authorization: Bearer {key}) for chat endpoints, query parameter authentication (?key=...) for models endpoint
- Defaults to
http://localhost:11434/api/generate - Can override with
OLLAMA_API_URLenvironment variable - API key is optional
- Uses different response format