Chat Completions
The chat API allows you to interact with AI models through conversational interfaces, supporting multi-turn dialogues, streaming responses, and function calling.
Endpoint
POST https://aiapi.services/v1/chat/completionsAuthentication
All requests must include your API key in the HTTP header:
Authorization: Bearer YOUR_API_KEYSupported Models
Anthropic Claude
claude-sonnet-4-5/claude-sonnet-4-5@20250929- Latest Claude Sonnet 4.5claude-sonnet-4/claude-sonnet-4-20250514- Claude Sonnet 4th generationclaude-opus-4-1/claude-opus-4-1@20250805- Claude Opus 4.1, strongest reasoningclaude-opus-4- Claude Opus 4 base versionclaude-3-7-sonnet- Claude 3.7 Sonnetclaude-3-5-haiku@20241022- Claude 3.5 Haikuclaude-3-haiku@20240307- Claude 3 Haiku
Google Gemini
gemini-2.5-pro- Gemini 2.5 Pro, most powerful multimodal modelgemini-2.5-flash- Gemini 2.5 Flash, fast and efficientgemini-2.5-flash-image-preview- Gemini 2.5 Flash image previewgemini-2.5-flash-lite- Gemini 2.5 Flash lite versiongemini-2.0-flash- Gemini 2.0 Flashgemini-2.0-flash-lite- Gemini 2.0 Flash lite version
DeepSeek
deepseek-ai/deepseek-r1-0528-maas- DeepSeek R1 reasoning-enhanced model
See the complete model list for pricing and details.
Request Parameters
Required Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | Model ID to use, e.g., gemini-2.5-flash or claude-sonnet-4-5 |
messages | array | Array of message objects with role and content |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature | number | 1.0 | Sampling temperature between 0 and 2 |
max_tokens | integer | - | Maximum number of tokens to generate |
top_p | number | 1.0 | Nucleus sampling parameter between 0 and 1 |
stream | boolean | false | Enable streaming responses |
stop | string/array | - | Sequence(s) to stop generation |
presence_penalty | number | 0 | Presence penalty between -2.0 and 2.0 |
frequency_penalty | number | 0 | Frequency penalty between -2.0 and 2.0 |
user | string | - | Unique identifier for end-user |
Message Format
Each message contains the following fields:
{
"role": "system" | "user" | "assistant",
"content": string
}- system: System message to set AI behavior and context
- user: User message representing user input
- assistant: Assistant message representing AI response
Code Examples
cURL
curl https://aiapi.services/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [
{
"role": "system",
"content": "You are a helpful AI assistant."
},
{
"role": "user",
"content": "Explain the history of artificial intelligence."
}
],
"temperature": 0.7,
"max_tokens": 1000
}'Response Format
Standard Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gemini-2.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Artificial intelligence (AI) has a rich history dating back to the 1950s..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 56,
"completion_tokens": 150,
"total_tokens": 206
}
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique request identifier |
object | string | Object type, always chat.completion |
created | integer | Unix timestamp |
model | string | Model name used |
choices | array | Array of generated response options |
choices[].message | object | AI-generated message |
choices[].finish_reason | string | Stop reason: stop, length, content_filter |
usage | object | Token usage statistics |
Streaming Responses
Enable streaming to receive AI responses in real-time, ideal for scenarios requiring quick feedback.
Enabling Streaming
Set the stream: true parameter:
cURL
curl https://aiapi.services/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'Streaming Response Format
Each chunk has the format:
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}
data: [DONE]Multi-turn Conversations
To maintain conversation context, include previous messages in the messages array:
const messages = [
{ role: 'system', content: 'You are a helpful AI assistant.' },
{ role: 'user', content: 'What is machine learning?' },
{ role: 'assistant', content: 'Machine learning is a branch of artificial intelligence...' },
{ role: 'user', content: 'What are its applications?' } // New question based on context
];
const response = await fetch('https://aiapi.services/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gemini-2.5-flash',
messages: messages
})
});⚠️ Token Limits: Each model has a maximum context length. For long conversations, you need to truncate or summarize earlier messages.
Parameter Tuning Guide
Temperature
- 0.0-0.3: More deterministic and consistent (good for code generation, data extraction)
- 0.4-0.7: Balanced creativity and consistency (good for daily conversations)
- 0.8-1.0: More creative and diverse (good for creative writing)
- 1.0-2.0: Highly random and creative (good for brainstorming)
Max Tokens
Set reasonable token limits based on needs:
- Short responses: 100-300 tokens
- Medium responses: 500-1000 tokens
- Detailed responses: 1500-3000 tokens
Top P (Nucleus Sampling)
- 0.1: Only consider the top 10% most likely words
- 0.5: Consider words with cumulative probability up to 50%
- 1.0: Consider all possible words (default)
💡 Best Practice: Usually adjust either temperature or top_p, not both simultaneously.
Error Handling
Common Error Codes
| Status Code | Description | Solution |
|---|---|---|
| 401 | Unauthorized | Check if API key is correct |
| 400 | Bad Request | Check request format and required parameters |
| 429 | Too Many Requests | Reduce request frequency or upgrade quota |
| 500 | Server Error | Retry later |
| 503 | Service Unavailable | Retry later |
Error Response Example
{
"error": {
"message": "Invalid API key provided",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}Error Handling Examples
JavaScript
try {
const response = await fetch('https://aiapi.services/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gemini-2.5-flash',
messages: [{role: 'user', content: 'Hello'}]
})
});
if (!response.ok) {
const error = await response.json();
console.error('API Error:', error.error.message);
return;
}
const data = await response.json();
console.log(data.choices[0].message.content);
} catch (error) {
console.error('Network Error:', error);
}Best Practices
1. Set Clear System Prompts
Use the system role to define AI behavior and context:
{
"role": "system",
"content": "You are a professional technical support agent. Answer questions concisely and friendly."
}2. Manage Conversation Context Wisely
For long conversations, implement smart truncation strategies:
- Keep system messages
- Keep the most recent N turns
- Optional: Summarize earlier conversation content
3. Implement Retry Mechanisms
For temporary errors (429, 500, 503), use exponential backoff retry strategy:
async function callWithRetry(maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
const response = await fetch('https://aiapi.services/v1/chat/completions', {
// ... request configuration
});
if (response.ok) {
return await response.json();
}
if (response.status === 429 || response.status >= 500) {
const delay = Math.pow(2, i) * 1000; // Exponential backoff
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw new Error(`HTTP ${response.status}`);
} catch (error) {
if (i === maxRetries - 1) throw error;
}
}
}4. Monitor Token Usage
Track token consumption per request via the usage field:
const data = await response.json();
console.log(`Used ${data.usage.total_tokens} tokens`);
console.log(`Prompt: ${data.usage.prompt_tokens}, Completion: ${data.usage.completion_tokens}`);Related Resources
- Authentication - Learn how to get and use API keys
- Available Models - View all supported models and pricing
- Usage Query - Query API usage and quotas
- Embedding API - Text embedding vector generation
- Image API - AI image generation