Complete reference for the Nataris REST API.
- Base URL:
https://api.nataris.ai/v1 - Authentication: Bearer token (API key)
- Content-Type:
application/json(unless otherwise noted)
Include your API key in the Authorization header:
Authorization: Bearer nat_live_xxxxxxxxxxxxxxxxxxxxxxxx
API keys are prefixed with:
nat_live_- Production keysnat_test_- Test/sandbox keys
OpenAI-compatible endpoint for conversational AI. Recommended for multi-turn conversations.
Important: Nataris is stateless. Each request may hit a different provider device. You must send the full conversation history (messages array) with each request.
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier (e.g., "llama-3.2-1b-instruct-q4_k_m") |
messages |
array | Yes | Conversation history (see below) |
max_tokens |
integer | No | Maximum tokens to generate (default: 256) |
temperature |
float | No | Randomness 0-2 (default: 0.7) |
top_p |
float | No | Nucleus sampling 0-1 (default: 0.9) |
stream |
boolean | No | Enable SSE streaming (default: false) |
conversation_id |
string | No | Optional ID for your analytics |
Message Format:
{
"role": "system" | "user" | "assistant",
"content": "Message text"
}Example Request:
curl -X POST https://api.nataris.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-1b-instruct-q4_k_m",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"},
{"role": "assistant", "content": "Python is a programming language..."},
{"role": "user", "content": "Show me an example"}
],
"max_tokens": 256
}'Example Response:
{
"id": "chatcmpl-abc123xyz",
"object": "chat.completion",
"created": 1706000000,
"model": "llama-3.2-1b-instruct-q4_k_m",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's a simple Python example:\n\n```python\nprint('Hello, World!')\n```"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 45,
"completion_tokens": 20,
"total_tokens": 65
}
}Streaming Response (SSE):
When stream: true, returns Server-Sent Events:
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Here's"}}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" a"}}]}
data: {"id":"chatcmpl-abc123","choices":[{"finish_reason":"stop"}]}
data: [DONE]
Generate text using a language model.
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier |
prompt |
string | Yes | Input text prompt |
max_tokens |
integer | No | Maximum tokens to generate (default: 100) |
temperature |
float | No | Randomness 0-2 (default: 0.7) |
stream |
boolean | No | Enable streaming (default: false) |
Example Request:
curl -X POST https://api.nataris.ai/v1/inference \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-0.5b-instruct-q6_k",
"prompt": "Write a haiku about AI",
"max_tokens": 50,
"temperature": 0.8
}'Example Response:
{
"id": "inf_1234567890",
"object": "inference",
"created": 1706000000,
"model": "qwen2.5-0.5b-instruct-q6_k",
"output": "Silicon minds wake\nLearning from humanity\nFuture uncertain",
"usage": {
"prompt_tokens": 6,
"completion_tokens": 12,
"total_tokens": 18
},
"provider": {
"region": "IN",
"latency_ms": 245
}
}List available models.
Example Request:
curl https://api.nataris.ai/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"Example Response:
{
"object": "list",
"data": [
{
"id": "qwen2.5-0.5b-instruct-q6_k",
"object": "model",
"type": "llm",
"description": "Qwen 2.5 0.5B - Fast, efficient language model",
"available": true
},
{
"id": "llama-3.2-1b-instruct-q4_k_m",
"object": "model",
"type": "llm",
"description": "Llama 3.2 1B - General purpose language model",
"available": true
}
]
}Get current billing period usage.
Example Request:
curl https://api.nataris.ai/v1/usage \
-H "Authorization: Bearer YOUR_API_KEY"Example Response:
{
"object": "usage",
"period_start": "2026-01-01T00:00:00Z",
"period_end": "2026-01-31T23:59:59Z",
"balance_usd": 4.75,
"total_requests": 1250,
"total_tokens": 125000,
"by_model": {
"qwen2.5-0.5b-instruct-q6_k": {
"requests": 1000,
"tokens": 100000
},
"whisper-small": {
"requests": 250,
"seconds": 3600
}
}
}| HTTP Code | Error Code | Description |
|---|---|---|
| 400 | INVALID_REQUEST |
Malformed request body |
| 401 | INVALID_API_KEY |
Missing or invalid API key |
| 402 | INSUFFICIENT_CREDITS |
Account balance depleted |
| 404 | MODEL_NOT_FOUND |
Model doesn't exist |
| 429 | RATE_LIMIT_EXCEEDED |
Too many requests |
| 503 | model_warming |
No provider has this model available yet (may be downloading). Retry in 5–10 min. |
| 408 | REQUEST_TIMEOUT |
Job assigned but provider did not respond in time. Not charged. |
Error Response Format:
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded. Retry after 60 seconds.",
"type": "rate_limit_error",
"retry_after": 60
}
}Run multi-step AI workflows through a single API call.
Additional Fields:
| Field | Type | Required | Description |
|---|---|---|---|
orchestration.enabled |
boolean | No | Enable multi-step workflow |
orchestration.workflow |
string | No | research, code, agent, map_reduce, auto |
orchestration.max_steps |
integer | No | Maximum inference steps (default: 10) |
orchestration.max_cost_usd |
number | No | Budget cap in USD |
Example Request:
curl -X POST https://api.nataris.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-1b-instruct-q4_k_m",
"messages": [{"role": "user", "content": "Research renewable energy trends"}],
"orchestration": {
"enabled": true,
"workflow": "research",
"max_steps": 10,
"max_cost_usd": 1.0
}
}'Workflow Types:
| Type | Steps | Use Case |
|---|---|---|
research |
research → analyze → write | Deep research synthesis |
code |
plan → implement → review | Code generation with review |
agent |
think → act (loop) | ReAct reasoning agent |
map_reduce |
chunk → map → reduce | Large document analysis |
auto |
Auto-selected | Based on input |
Pricing: Orchestrated steps are billed at the same base model rate (no surcharge). Use POST /v1/workflows/estimate to preview costs.
Create a conversation for server-side message persistence.
Example Response:
{
"id": "conv_xyz789",
"title": null,
"message_count": 0,
"created_at": "2026-01-25T10:00:00Z"
}Then pass conversation_id in chat completions:
{
"model": "llama-3.2-1b-instruct-q4_k_m",
"messages": [{"role": "user", "content": "Hello!"}],
"conversation_id": "conv_xyz789"
}Messages are automatically persisted. Titles are auto-generated. Older messages are summarized to maintain context efficiently.
Other conversation endpoints:
GET /conversations— List conversationsGET /conversations/:id— Get context (messages + summary)DELETE /conversations/:id— Delete conversation
List your orchestration workflows.
Example Response:
{
"workflows": [
{
"id": "wf_123",
"type": "RESEARCH",
"status": "COMPLETED",
"task": "Research renewable energy",
"total_cost_usd": 0.042,
"steps_executed": 3,
"created_at": "2026-01-25T10:00:00Z"
}
],
"total": 1
}Preview cost before running a workflow.
curl -X POST https://api.nataris.ai/v1/workflows/estimate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "llama-3.2-1b-instruct-q4_k_m", "workflow": "research"}'Configure webhooks to receive notifications for:
- Job completion
- Low balance alerts
- Usage thresholds
- Email: support@nataris.ai
- Docs: https://nataris.ai/docs