Reference

Complete technical reference for the ResilientLLM library API.

ResilientLLM
Types and Interfaces
- OperationMetadata
Error Codes
Environment Variables

ResilientLLM

A unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google/Gemini, OpenRouter, Ollama) with built-in resilience features including rate limiting, retries, circuit breakers, and error handling.

ResilientLLM Constructor

Creates a new ResilientLLM instance.

Signature:

new ResilientLLM(options?: ResilientLLMOptions)

Parameters:

Parameter	Type	Required	Default	Description
`options`	`ResilientLLMOptions`	No	`{}`	Configuration options for the ResilientLLM instance

ResilientLLMOptions:

Property	Type	Required	Default	Description
`aiService`	`string`	No	`process.env.PREFERRED_AI_SERVICE` or `"anthropic"`	AI service provider: `"openai"`, `"anthropic"`, `"google"`, `"openrouter"`, or `"ollama"`
`model`	`string`	No	`process.env.PREFERRED_AI_MODEL` or `"claude-3-5-sonnet-20240620"`	Model identifier for the selected AI service
`temperature`	`number`	No	`process.env.AI_TEMPERATURE` or `0`	Temperature parameter (0-2) controlling randomness in responses
`maxTokens`	`number`	No	`process.env.MAX_TOKENS` or `2048`	Maximum number of tokens in the response
`timeout`	`number`	No	`process.env.LLM_TIMEOUT` or `60000`	Request timeout in milliseconds
`cacheStore`	`Object`	No	`{}`	Cache store object for storing successful responses
`maxInputTokens`	`number`	No	`process.env.MAX_INPUT_TOKENS` or `100000`	Maximum number of input tokens allowed
`topP`	`number`	No	`process.env.AI_TOP_P` or `0.95`	Top-p sampling parameter (0-1)
`rateLimitConfig`	`RateLimitConfig`	No	`{ requestsPerMinute: 10, llmTokensPerMinute: 150000 }`	Rate limiting configuration
`retries`	`number`	No	`3`	Number of retry attempts for failed requests
`backoffFactor`	`number`	No	`2`	Exponential backoff multiplier between retries
`onRateLimitUpdate`	`Function`	No	`undefined`	Callback function called when rate limit information is updated
`onError`	`Function`	No	`undefined`	Currently not used (reserved for future use)

RateLimitConfig:

Property	Type	Description
`requestsPerMinute`	`number`	Maximum number of requests allowed per minute
`llmTokensPerMinute`	`number`	Maximum number of LLM tokens allowed per minute

Returns: ResilientLLM instance

Example:

const llm = new ResilientLLM({
  aiService: 'openai',
  model: 'gpt-5-nano',
  maxTokens: 2048,
  temperature: 0.7,
  rateLimitConfig: {
    requestsPerMinute: 60,
    llmTokensPerMinute: 90000
  }
});

ResilientLLM Instance Methods

`chat(conversationHistory, llmOptions?)`

Sends a chat completion request to the configured LLM provider.

Signature:

chat(conversationHistory: Message[], llmOptions?: ChatOptions): Promise<ChatResponse>

Parameters:

Parameter	Type	Required	Description
`conversationHistory`	`Message[]`	Yes	Array of message objects representing the conversation history
`llmOptions`	`ChatOptions`	No	Override options for this specific request

Message:

Property	Type	Required	Description
`role`	`string`	Yes	Message role: `"system"`, `"user"`, `"assistant"`, or `"tool"`
`content`	`string`	Yes	Message content

ChatOptions:

Property	Type	Description
`aiService`	`string`	Override AI service for this request
`model`	`string`	Override model for this request
`maxTokens`	`number`	Override max tokens for this request
`temperature`	`number`	Override temperature for this request
`topP`	`number`	Override top-p for this request
`maxInputTokens`	`number`	Override max input tokens for this request
`maxCompletionTokens`	`number`	Maximum completion tokens (for reasoning models)
`reasoningEffort`	`string`	Reasoning effort level: `"low"`, `"medium"`, or `"high"` (for reasoning models)
`apiKey`	`string`	Override API key for this request (takes precedence over ProviderRegistry)
`tools`	`Tool[]`	Array of tool definitions for function calling
`responseFormat`	`Object \| string`	Response format specification (`json_object`/`json_schema` object shapes, plain schema-like object, or JSON aliases: `"json"`, `"object"`, `"json_object"`)
`outputConfig`	`Object`	Legacy/migration support. Anthropic-style alternative structured-output input shape, normalized internally via `responseFormat`. Prefer `responseFormat` for all new usage.
`response_format`	`Object \| string`	Legacy/migration support. Snake_case alias for `responseFormat`; passthrough-friendly for provider-native payloads. Prefer `responseFormat` for all new usage.
`output_config`	`Object`	Legacy/migration support. Snake_case alias for `outputConfig`; passed through as-is when provided. Prefer `responseFormat` for all new usage.

Use one naming style per field to avoid ambiguity:

Prefer camelCase (responseFormat or its alias outputConfig) in app code.
Prefer snake_case (response_format, output_config) when reusing raw provider payload snippets.
Do not send both aliases for the same field in one request; conflicting info may result in error.

Tool:

Property	Type	Description
`type`	`string`	Tool type, typically `"function"`
`function`	`Object`	Function definition
`function.name`	`string`	Function name
`function.description`	`string`	Function description
`function.parameters`	`Object`	Function parameters schema (OpenAI format)
`function.input_schema`	`Object`	Function input schema (Anthropic format)

Returns: Promise<ChatResponse>

Always returns a predictable envelope:
- response.content is the assistant output (string in text mode, parsed object in JSON/schema mode)
- response.toolCalls is included when tool calls are returned
response.metadata is always included

ChatResponse:

Property	Type	Description
`content`	`string \| Object \| null`	The assistant content (text by default, normalized JSON object in JSON modes)
`toolCalls`	`Array`	Array of tool call objects (if tools were used)
`metadata`	`OperationMetadata`	Always included (request id, config, timing, retries, rate limiting, usage, etc.)

Throws:

ResilientLLMError — Normalized failures from chat() (after internal retries when applicable). Use error.code (ResilientLLMErrorCode), error.retryable, error.metadata, and error.cause (log server-side). The canonical code list is in lib/ResilientLLMError.ts.
Structured output failures use codes such as JSON_PARSE_ERROR, JSON_MODE_FAILURE, SCHEMA_MISMATCH, or VALIDATION_ERROR; details may appear on error.cause.

Notes:

API keys can be provided via llmOptions.apiKey, ProviderRegistry.configure(), or environment variables
The implementation uses ProviderRegistry to manage providers and their configurations
Response parsing is handled generically using provider-specific chatConfig settings
For schema mode, validation checks top-level required fields and primitive types (string, number, boolean, integer). Schema mismatch errors include a validation object with missingFields, extraFields, and typeMismatches arrays

Example:

const conversationHistory = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is the capital of France?' }
];

const { content } = await llm.chat(conversationHistory);
console.log(content); // "The capital of France is Paris."

Example with tools:

const response = await llm.chat(conversationHistory, {
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get the weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string' }
        }
      }
    }
  }]
});
// response: { content: null, toolCalls: [...] }

Example with API key override:

// Override API key for this specific request
const response = await llm.chat(conversationHistory, {
  apiKey: 'sk-custom-key-here',
  aiService: 'openai',
  model: 'gpt-5-nano'
});

Example with operation metadata:

const llm = new ResilientLLM({
  aiService: 'openai',
  model: 'gpt-5-nano',
});

const { content, metadata } = await llm.chat(conversationHistory);
console.log(content);           // Assistant reply text
console.log(metadata?.requestId);
console.log(metadata?.timing?.totalTimeMs);
console.log(metadata?.usage);    // prompt_tokens, completion_tokens, total_tokens

`abort()`

Cancels all ongoing LLM operations for this instance.

Signature:

abort(): void

Returns: void

Description:

Aborts all active HTTP requests initiated by this ResilientLLM instance
Clears all resilient operation instances
Resets the internal abort controller

Example:

const promise = llm.chat(conversationHistory);
llm.abort(); // Cancels the ongoing request

Note: For API URLs and key checks, import ProviderRegistry: use ProviderRegistry.getChatApiUrl(providerName) and ProviderRegistry.buildApiUrl(providerName, baseUrl, null) for URLs; use ProviderRegistry.hasApiKey(providerName) to check if a key is present (keys are not exposed). See Custom Provider Guide for details.

`formatMessageForAnthropic(messages)`

Converts a messages array to the format required by Anthropic's API.

Signature:

formatMessageForAnthropic(messages: Message[]): { system?: string, messages: Message[] }

Parameters:

Parameter	Type	Required	Description
`messages`	`Message[]`	Yes	Array of message objects

Returns: Object with properties:

system - string | undefined - System message content if present
messages - Message[] - Messages array without system messages

Description:

Extracts system messages from the messages array
Returns system content separately and remaining messages without system role

Example:

const messages = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hello!' }
];

const { system, messages } = llm.formatMessageForAnthropic(messages);
// system: "You are helpful."
// messages: [{ role: 'user', content: 'Hello!' }]

`parseError(statusCode, error, operationMetadata?)`

Normalizes an error into ResilientLLMError. Used internally when chat() fails; you can call it directly if you need the same mapping (e.g. tests).

Signature:

parseError(statusCode: number | null, error: Error, operationMetadata?: OperationMetadata | null): never

Parameters:

Parameter	Type	Required	Description
`statusCode`	`number \| null`	Yes	Provider HTTP status when known, or `null`
`error`	`Error`	Yes	Underlying error
`operationMetadata`	`OperationMetadata \| null`	No	Merged onto the thrown error’s `metadata`

Status Code Mappings:

Status Code	Error Message
`400`	"Bad request"
`401`	"Invalid API Key"
`403`	"You are not authorized to access this resource"
`404`	"Not found"
`429`	"Rate limit exceeded"
`500`	"Internal server error"
`503`	"Service unavailable"
`529`	"API temporarily overloaded"
Other	"Unknown error"

Note: This method is called internally by the chat() method when errors occur. You typically don't need to call it directly.

`parseChatCompletion(data, chatConfig, tools?)`

Generic method to parse chat completion response using provider configuration. This is the preferred method used internally.

Signature:

parseChatCompletion(data: Object, chatConfig: Object, tools?: Tool[]): string | ChatResponse

Parameters:

Parameter	Type	Required	Description
`data`	`Object`	Yes	API response object
`chatConfig`	`Object`	Yes	Chat configuration from provider (contains `responseParsePath`)
`tools`	`Tool[]`	No	Tools array if function calling was used

Returns: string | ChatResponse

If tools provided and tool calls found: Returns ChatResponse with content and toolCalls
Otherwise: Returns string content

chatConfig.responseParsePath:

Path to extract content from response (e.g., 'choices[0].message.content', 'content[0].text', 'response')
Supports dot notation and bracket notation for nested values

Example:

const chatConfig = {
  responseParsePath: 'choices[0].message.content',
  toolSchemaType: 'openai'
};
const data = {
  choices: [{
    message: {
      content: "Hello!",
      tool_calls: []
    }
  }]
};
const content = llm.parseChatCompletion(data, chatConfig);
// "Hello!"

`parseOpenAIChatCompletion(data, tools?)` (Deprecated)

Parses OpenAI chat completion response.

Signature:

parseOpenAIChatCompletion(data: Object, tools?: Tool[]): string | ChatResponse

Status: ⚠️ Deprecated - Use parseChatCompletion() with chatConfig instead.

`parseAnthropicChatCompletion(data, tools?)` (Deprecated)

Parses Anthropic chat completion response.

Signature:

parseAnthropicChatCompletion(data: Object, tools?: Tool[]): string

Status: ⚠️ Deprecated - Use parseChatCompletion() with chatConfig instead.

`parseOllamaChatCompletion(data, tools?)` (Deprecated)

Parses Ollama chat completion response.

Signature:

parseOllamaChatCompletion(data: Object, tools?: Tool[]): string

Status: ⚠️ Deprecated - Use parseChatCompletion() with chatConfig instead.

`parseGoogleChatCompletion(data, tools?)` (Deprecated)

Parses Google chat completion response (OpenAI-compatible endpoint).

Signature:

parseGoogleChatCompletion(data: Object, tools?: Tool[]): string

Status: ⚠️ Deprecated - Use parseChatCompletion() with chatConfig instead.

`retryChatWithAlternateService(conversationHistory, llmOptions?)`

Retries the chat request with an alternate AI service when the current service returns rate limit errors (429, 529).

Signature:

retryChatWithAlternateService(conversationHistory: Message[], llmOptions?: ChatOptions): Promise<ChatResponse>

Parameters:

Parameter	Type	Required	Description
`conversationHistory`	`Message[]`	Yes	Array of message objects
`llmOptions`	`ChatOptions`	No	LLM options for the request

Returns: Promise<ChatResponse> - Response from the alternate service

Throws:

Error - If no alternative service is available

Description:

Automatically switches to the next available service from ProviderRegistry.getDefaultModels()
Skips services that have already failed
Uses default model for each service

Example:

// Automatically called internally when rate limit errors occur
// Can also be called manually if needed
const response = await llm.retryChatWithAlternateService(conversationHistory);

ResilientLLM Static Methods

`estimateTokens(text)`

Estimates the number of tokens in a given text string.

Signature:

static estimateTokens(text: string): number

Parameters:

Parameter	Type	Required	Description
`text`	`string`	Yes	Text to estimate tokens for

Returns: number - Estimated token count

Description:

For texts longer than 10,000 characters: Uses approximation (~4 characters per token)
For shorter texts: Uses accurate tokenization with Tiktoken encoder (o200k_base encoding)
Uses lazy initialization of the encoder

Example:

const tokenCount = ResilientLLM.estimateTokens("Hello, world!");
// Returns estimated token count

Types and Interfaces

Message

Represents a single message in a conversation.

interface Message {
  role: 'system' | 'user' | 'assistant' | 'tool';
  content: string;
}

ChatResponse

Response envelope returned by chat() on every call.

content is the assistant output:
- text mode -> string
- JSON/schema mode -> parsed JS object
toolCalls is present when tool calls were returned
metadata is always included

interface ChatResponse {
  content: string | Object | null;
  toolCalls?: Array<any>;
  metadata: OperationMetadata;
}

OperationMetadata

Operation metadata attached to ChatResponse.metadata on every call. Used for observability, logging, and debugging.

interface OperationMetadata {
  requestId: string;
  operationId: string;
  startTime: number;
  finishReason?: string | null;
  config: {
    aiService: string;
    model: string;
    temperature: number | null;
    maxTokens: number | null;
    topP: number | null;
    maxInputTokens: number;
    estimatedInputTokens: number;
    enableCache: boolean;
    // ... resilience config (retries, rateLimitConfig, etc.)
  };
  events: Array<any>;
  timing: {
    totalTimeMs: number | null;
    rateLimitWaitMs: number;
    httpRequestMs: number | null;
  };
  retries: Array<any>;
  rateLimiting: { requestedTokens: number; totalWaitMs: number; [key: string]: any };
  circuitBreaker: Object;
  http: {
    url: string;
    method: string;
    statusCode: number | null;
    headers: Record<string, string>;
    durationMs?: number;
    error?: string;
  };
  cache: { enabled: boolean; [key: string]: any };
  service: { attempted: string[]; final: string };
  usage?: {
    prompt_tokens: number | null;
    completion_tokens: number | null;
    total_tokens: number | null;
  };
}

RateLimitConfig

Configuration for rate limiting.

interface RateLimitConfig {
  requestsPerMinute: number;
  llmTokensPerMinute: number;
}

ResilientLLMOptions

Constructor options for ResilientLLM.

interface ResilientLLMOptions {
  aiService?: string;
  model?: string;
  temperature?: number;
  maxTokens?: number;
  timeout?: number;
  cacheStore?: Object;
  maxInputTokens?: number;
  topP?: number;
  rateLimitConfig?: RateLimitConfig;
  retries?: number;
  backoffFactor?: number;
  onRateLimitUpdate?: (info: RateLimitInfo) => void;
  onError?: (error: Error) => void;
}

ChatOptions

Options for individual chat requests.

interface ChatOptions {
  aiService?: string;
  model?: string;
  maxTokens?: number;
  temperature?: number;
  topP?: number;
  maxInputTokens?: number;
  maxCompletionTokens?: number;
  reasoningEffort?: 'low' | 'medium' | 'high';
  apiKey?: string;
  tools?: Tool[];
  responseFormat?: Object;
  outputConfig?: Object;
}

`responseFormat` (JSON mode + schema mode)

Use responseFormat when you need the assistant response as JSON, optionally matching a particular schema.

JSON mode (no schema): ensures the reply is a single JSON object (library parses it for you).
Schema mode: provides a JSON Schema so the library can validate the parsed object and throw SCHEMA_MISMATCH when required keys/types don’t match.

Supplying a schema

You can supply a schema in any of these equivalent shapes (pick one and stick to it):

OpenAI-style wrapper (recommended when you want to be explicit):

responseFormat: {
  type: 'json_schema',
  json_schema: {
    name: 'my_payload',
    schema: {
      type: 'object',
      properties: {
        answer: { type: 'string' },
        citations: { type: 'array', items: { type: 'string' } }
      },
      required: ['answer']
    }
  }
}

Short wrapper (schema directly on the object):

responseFormat: {
  type: 'json_schema',
  schema: {
    type: 'object',
    properties: { answer: { type: 'string' } },
    required: ['answer']
  }
}

Plain schema-like object (auto-detected as a schema):

responseFormat: {
  type: 'object',
  properties: { answer: { type: 'string' } },
  required: ['answer']
}

End-to-end example (schema mode)

const llm = new ResilientLLM({ aiService: 'openai', model: 'gpt-5-nano' });

const result = await llm.chat(
  [{ role: 'user', content: 'Return an answer and citations.' }],
  {
    responseFormat: {
      type: 'json_schema',
      json_schema: {
        name: 'answer_payload',
        schema: {
          type: 'object',
          properties: {
            answer: { type: 'string' },
            citations: { type: 'array', items: { type: 'string' } }
          },
          required: ['answer']
        }
      }
    }
  }
);

// `result.content` is a parsed JS object when `responseFormat` requests JSON/schema mode.

Validation scope (important)

The built-in validator is intentionally lightweight: it checks required keys, extra keys, and primitive types at the top level (string, number, boolean, integer).

Extra keys are enforced only when your schema sets additionalProperties: false (and the schema has properties).
For deeper validation needs (nested objects, enums, regex, oneOf/anyOf, etc.), run your own schema validator after the call.

Example: additionalProperties: false + required

const result = await llm.chat(messages, {
  responseFormat: {
    type: 'json_schema',
    json_schema: {
      name: 'answer_payload',
      schema: {
        type: 'object',
        additionalProperties: false,
        properties: {
          answer: { type: 'string' }
        },
        required: ['answer']
      }
    }
  }
});

// `result.content` is { answer: string } when the model output matches the schema.
// If the model returns invalid JSON or extra keys, `llm.chat(...)` throws StructuredOutputError (e.g. `SCHEMA_MISMATCH`).

`responseFormat` examples (quick)

// JSON alias strings (equivalent to { type: 'json_object' })
'json'
'object'
'json_object'

// OpenAI-compatible JSON mode
{ type: 'json_object' }

// When `responseFormat` requests JSON, `llm.chat(...)` resolves to a response envelope
// where `.content` is the parsed JS object.

Tool

Tool definition for function calling.

interface Tool {
  type: string;
  function: {
    name: string;
    description: string;
    parameters?: Object;  // OpenAI format
    input_schema?: Object; // Anthropic format
  };
}

Error Codes

Failures from chat() are thrown as ResilientLLMError (see chat() Throws above). That type is the consumer-facing surface: code, retryable, optional metadata (same shape as success), and cause for logging.

Stable string codes — ResilientLLMErrorCode in lib/ResilientLLMError.ts (including PROVIDER_*, structured-output codes, resilience-related codes, and configuration/capability codes). retryable is defined there for codes where a simple retry might help.

Use error.code for branching, not raw HTTP status. When a provider HTTP status was available to the library, it may also appear under metadata (e.g. provider.httpStatus / http).

Environment Variables

API Key Configuration

API keys are required for most LLM providers. They can be provided in three ways (in order of precedence):

Per-request via llmOptions.apiKey (highest priority)
Via ProviderRegistry.configure() with direct apiKey parameter
Via environment variables (lowest priority)

For advanced use cases (custom providers, multiple API keys, or programmatic configuration), see the Custom Provider Guide - Authentication Configuration.

Required (Service-Specific)

Set at least one API key for your chosen service:

Variable	Service	Required
`OPENAI_API_KEY`	OpenAI	Yes (if using OpenAI)
`ANTHROPIC_API_KEY`	Anthropic	Yes (if using Anthropic)
`GOOGLE_API_KEY` or `GOOGLE_GENERATIVE_AI` or `GEMINI_API_KEY`	Google	Yes (if using Google)
`OPENROUTER_API_KEY`	OpenRouter	Yes (if using OpenRouter)
`OLLAMA_API_KEY`	Ollama	No (optional)

Note: For custom providers, use the environment variable names specified in ProviderRegistry.configure() via envVarNames.

Optional Configuration

Variable	Default	Description
`PREFERRED_AI_SERVICE`	`"anthropic"`	Default AI service
`PREFERRED_AI_MODEL`	`"claude-3-5-sonnet-20240620"`	Default model
`AI_TEMPERATURE`	`0`	Default temperature
`MAX_TOKENS`	`2048`	Default max tokens
`LLM_TIMEOUT`	`60000`	Default timeout (ms)
`MAX_INPUT_TOKENS`	`100000`	Default max input tokens
`AI_TOP_P`	`0.95`	Default top-p value
`OLLAMA_API_URL`	`"http://localhost:11434/api/generate"`	Ollama API URL
`OPENROUTER_HTTP_REFERER`	`undefined`	Optional attribution header (`HTTP-Referer`) for OpenRouter
`OPENROUTER_APP_TITLE`	`undefined`	Optional attribution header (`X-Title`) for OpenRouter
`STORE_AI_API_CALLS`	`undefined`	Set to `"true"` to store API calls (OpenAI)

API Response Formats

OpenAI Response

{
  "id": "chatcmpl-123456",
  "object": "chat.completion",
  "created": 1728933352,
  "model": "gpt-4o-2024-08-06",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Response text",
      "tool_calls": []
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 10,
    "total_tokens": 29
  }
}

Anthropic Response

{
  "id": "msg_123",
  "type": "message",
  "role": "assistant",
  "content": [{
    "type": "text",
    "text": "Response text"
  }],
  "model": "claude-3-5-sonnet-20240620",
  "usage": {
    "input_tokens": 19,
    "output_tokens": 10
  }
}

Gemini Response (OpenAI-Compatible)

Same format as OpenAI response.

Ollama Response

{
  "model": "llama3.1:8b",
  "created_at": "2024-01-01T00:00:00.000Z",
  "response": "Response text",
  "done": true,
  "context": [],
  "total_duration": 1000,
  "load_duration": 500,
  "prompt_eval_count": 10,
  "prompt_eval_duration": 200,
  "eval_count": 20,
  "eval_duration": 300
}

Supported Models

Default Models

Each service has a default model configured. Use ProviderRegistry.getDefaultModels() to get all default models:

Anthropic: claude-3-5-sonnet-20240620
OpenAI: gpt-5-nano
Google: gemini-2.0-flash
Ollama: llama3.1:8b

Reasoning Models

Models starting with "o" (e.g., "o1", "o3") or "gpt-5" are treated as reasoning models and use different parameters:

max_completion_tokens instead of max_tokens
reasoning_effort parameter ("low", "medium", "high", defaults to "medium")
No temperature or top_p parameters

Rate Limiting Behavior

Token Bucket Algorithm

The library uses a token bucket algorithm with two buckets:

Request Bucket: Limits requests per minute
LLM Token Bucket: Limits LLM tokens per minute

Dynamic Updates

Rate limits can be updated dynamically from API response headers:

retry-after header is respected
Rate limit information from responses updates buckets automatically
onRateLimitUpdate callback is invoked when limits change

Circuit Breaker Integration

Each retry attempt counts as a separate failure
Circuit opens after configured failure threshold
Cooldown period prevents immediate retries
Success resets the failure count

Caching

Cache Store

Provide a cache store object in constructor options:

const cacheStore = {};
const llm = new ResilientLLM({ cacheStore });

Cache Key Generation

Cache keys are SHA-256 hashes of:

API URL
Request body (JSON stringified)
Headers (JSON stringified)

Cache Behavior

Only successful responses (status 200) are cached
Cache is checked before making HTTP requests
Cache hits return immediately without API call

AbortController Support

Cancellation

Use abort() method to cancel all ongoing operations:

const llm = new ResilientLLM({ /* ... */ });
const promise = llm.chat(conversationHistory);
llm.abort(); // Cancels the request

Timeout

Timeouts are enforced using AbortController:

Timeout applies to entire operation (including retries)
On timeout, AbortController aborts the HTTP request
chat() rejects with ResilientLLMError; the original timeout is typically on error.cause (name may be TimeoutError)

Service-Specific Notes

Provider Management

All providers are managed through ProviderRegistry. The implementation uses:

ProviderRegistry.get(providerName) - Get provider configuration
ProviderRegistry.getChatApiUrl(providerName) - Get chat API URL
ProviderRegistry.getChatConfig(providerName) - Get chat configuration
ProviderRegistry.buildApiUrl(providerName, url) - Build API URL with query params if needed
ProviderRegistry.buildAuthHeaders(providerName, apiKey, defaultHeaders) - Build authentication headers
ProviderRegistry.hasApiKey(providerName) - Check if API key is available

See Custom Provider Guide for details on configuring providers.

Anthropic

System messages are extracted and sent separately
Tool definitions use input_schema instead of parameters
API version header: anthropic-version: 2023-06-01
Uses x-api-key header instead of Authorization

OpenAI

Supports function calling with tools parameter
Supports response_format for JSON mode
Uses standard Authorization: Bearer <token> header
Can store API calls if STORE_AI_API_CALLS=true

OpenRouter

Uses OpenAI-compatible endpoint https://openrouter.ai/api/v1/chat/completions
Uses Authorization: Bearer <token> header
Works with provider-prefixed model IDs (for example openai/gpt-5-nano or openai/o1)
Choosing openrouter/free model will select a free model, but the quality might degrade severly
Optional attribution headers can be set via OPENROUTER_HTTP_REFERER and OPENROUTER_APP_TITLE

Google

Uses OpenAI-compatible endpoint
Same format as OpenAI for requests/responses
Requires GEMINI_API_KEY environment variable
Authentication: Uses header authentication (Authorization: Bearer {key}) for chat endpoints, query parameter authentication (?key=...) for models endpoint

Ollama

Defaults to http://localhost:11434/api/generate
Can override with OLLAMA_API_URL environment variable
API key is optional
Uses different response format

FilesExpand file tree

reference.md

Latest commit

History

reference.md

File metadata and controls

Reference

Table of Contents

ResilientLLM

ResilientLLM Constructor

ResilientLLM Instance Methods

chat(conversationHistory, llmOptions?)

abort()

formatMessageForAnthropic(messages)

parseError(statusCode, error, operationMetadata?)

parseChatCompletion(data, chatConfig, tools?)

parseOpenAIChatCompletion(data, tools?) (Deprecated)

parseAnthropicChatCompletion(data, tools?) (Deprecated)

parseOllamaChatCompletion(data, tools?) (Deprecated)

parseGoogleChatCompletion(data, tools?) (Deprecated)

retryChatWithAlternateService(conversationHistory, llmOptions?)

ResilientLLM Static Methods

estimateTokens(text)

Types and Interfaces

Message

ChatResponse

OperationMetadata

RateLimitConfig

ResilientLLMOptions

ChatOptions

responseFormat (JSON mode + schema mode)

responseFormat examples (quick)

Tool

Error Codes

Environment Variables

API Key Configuration

Required (Service-Specific)

Optional Configuration

API Response Formats

OpenAI Response

Anthropic Response

Gemini Response (OpenAI-Compatible)

Ollama Response

Supported Models

Default Models

Reasoning Models

Rate Limiting Behavior

Token Bucket Algorithm

Dynamic Updates

Circuit Breaker Integration

Caching

Cache Store

Cache Key Generation

Cache Behavior

AbortController Support

Cancellation

Timeout

Service-Specific Notes

Provider Management

Anthropic

OpenAI

OpenRouter

Google

Ollama

`chat(conversationHistory, llmOptions?)`

`abort()`

`formatMessageForAnthropic(messages)`

`parseError(statusCode, error, operationMetadata?)`

`parseChatCompletion(data, chatConfig, tools?)`

`parseOpenAIChatCompletion(data, tools?)` (Deprecated)

`parseAnthropicChatCompletion(data, tools?)` (Deprecated)

`parseOllamaChatCompletion(data, tools?)` (Deprecated)

`parseGoogleChatCompletion(data, tools?)` (Deprecated)

`retryChatWithAlternateService(conversationHistory, llmOptions?)`

`estimateTokens(text)`

`responseFormat` (JSON mode + schema mode)

`responseFormat` examples (quick)