Skip to Content
EnglishChat API

Chat Completions

The chat API allows you to interact with AI models through conversational interfaces, supporting multi-turn dialogues, streaming responses, and function calling.

Endpoint

POST https://aiapi.services/v1/chat/completions

Authentication

All requests must include your API key in the HTTP header:

Authorization: Bearer YOUR_API_KEY

Supported Models

Anthropic Claude

  • claude-sonnet-4-5 / claude-sonnet-4-5@20250929 - Latest Claude Sonnet 4.5
  • claude-sonnet-4 / claude-sonnet-4-20250514 - Claude Sonnet 4th generation
  • claude-opus-4-1 / claude-opus-4-1@20250805 - Claude Opus 4.1, strongest reasoning
  • claude-opus-4 - Claude Opus 4 base version
  • claude-3-7-sonnet - Claude 3.7 Sonnet
  • claude-3-5-haiku@20241022 - Claude 3.5 Haiku
  • claude-3-haiku@20240307 - Claude 3 Haiku

Google Gemini

  • gemini-2.5-pro - Gemini 2.5 Pro, most powerful multimodal model
  • gemini-2.5-flash - Gemini 2.5 Flash, fast and efficient
  • gemini-2.5-flash-image-preview - Gemini 2.5 Flash image preview
  • gemini-2.5-flash-lite - Gemini 2.5 Flash lite version
  • gemini-2.0-flash - Gemini 2.0 Flash
  • gemini-2.0-flash-lite - Gemini 2.0 Flash lite version

DeepSeek

  • deepseek-ai/deepseek-r1-0528-maas - DeepSeek R1 reasoning-enhanced model

See the complete model list for pricing and details.

Request Parameters

Required Parameters

ParameterTypeDescription
modelstringModel ID to use, e.g., gemini-2.5-flash or claude-sonnet-4-5
messagesarrayArray of message objects with role and content

Optional Parameters

ParameterTypeDefaultDescription
temperaturenumber1.0Sampling temperature between 0 and 2
max_tokensinteger-Maximum number of tokens to generate
top_pnumber1.0Nucleus sampling parameter between 0 and 1
streambooleanfalseEnable streaming responses
stopstring/array-Sequence(s) to stop generation
presence_penaltynumber0Presence penalty between -2.0 and 2.0
frequency_penaltynumber0Frequency penalty between -2.0 and 2.0
userstring-Unique identifier for end-user

Message Format

Each message contains the following fields:

{ "role": "system" | "user" | "assistant", "content": string }
  • system: System message to set AI behavior and context
  • user: User message representing user input
  • assistant: Assistant message representing AI response

Code Examples

curl https://aiapi.services/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-2.5-flash", "messages": [ { "role": "system", "content": "You are a helpful AI assistant." }, { "role": "user", "content": "Explain the history of artificial intelligence." } ], "temperature": 0.7, "max_tokens": 1000 }'

Response Format

Standard Response

{ "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "gemini-2.5-flash", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Artificial intelligence (AI) has a rich history dating back to the 1950s..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 56, "completion_tokens": 150, "total_tokens": 206 } }

Response Fields

FieldTypeDescription
idstringUnique request identifier
objectstringObject type, always chat.completion
createdintegerUnix timestamp
modelstringModel name used
choicesarrayArray of generated response options
choices[].messageobjectAI-generated message
choices[].finish_reasonstringStop reason: stop, length, content_filter
usageobjectToken usage statistics

Streaming Responses

Enable streaming to receive AI responses in real-time, ideal for scenarios requiring quick feedback.

Enabling Streaming

Set the stream: true parameter:

curl https://aiapi.services/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-2.5-flash", "messages": [{"role": "user", "content": "Hello"}], "stream": true }'

Streaming Response Format

Each chunk has the format:

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]} data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]} data: [DONE]

Multi-turn Conversations

To maintain conversation context, include previous messages in the messages array:

const messages = [ { role: 'system', content: 'You are a helpful AI assistant.' }, { role: 'user', content: 'What is machine learning?' }, { role: 'assistant', content: 'Machine learning is a branch of artificial intelligence...' }, { role: 'user', content: 'What are its applications?' } // New question based on context ]; const response = await fetch('https://aiapi.services/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'gemini-2.5-flash', messages: messages }) });

⚠️ Token Limits: Each model has a maximum context length. For long conversations, you need to truncate or summarize earlier messages.

Parameter Tuning Guide

Temperature

  • 0.0-0.3: More deterministic and consistent (good for code generation, data extraction)
  • 0.4-0.7: Balanced creativity and consistency (good for daily conversations)
  • 0.8-1.0: More creative and diverse (good for creative writing)
  • 1.0-2.0: Highly random and creative (good for brainstorming)

Max Tokens

Set reasonable token limits based on needs:

  • Short responses: 100-300 tokens
  • Medium responses: 500-1000 tokens
  • Detailed responses: 1500-3000 tokens

Top P (Nucleus Sampling)

  • 0.1: Only consider the top 10% most likely words
  • 0.5: Consider words with cumulative probability up to 50%
  • 1.0: Consider all possible words (default)

💡 Best Practice: Usually adjust either temperature or top_p, not both simultaneously.

Error Handling

Common Error Codes

Status CodeDescriptionSolution
401UnauthorizedCheck if API key is correct
400Bad RequestCheck request format and required parameters
429Too Many RequestsReduce request frequency or upgrade quota
500Server ErrorRetry later
503Service UnavailableRetry later

Error Response Example

{ "error": { "message": "Invalid API key provided", "type": "invalid_request_error", "code": "invalid_api_key" } }

Error Handling Examples

try { const response = await fetch('https://aiapi.services/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'gemini-2.5-flash', messages: [{role: 'user', content: 'Hello'}] }) }); if (!response.ok) { const error = await response.json(); console.error('API Error:', error.error.message); return; } const data = await response.json(); console.log(data.choices[0].message.content); } catch (error) { console.error('Network Error:', error); }

Best Practices

1. Set Clear System Prompts

Use the system role to define AI behavior and context:

{ "role": "system", "content": "You are a professional technical support agent. Answer questions concisely and friendly." }

2. Manage Conversation Context Wisely

For long conversations, implement smart truncation strategies:

  • Keep system messages
  • Keep the most recent N turns
  • Optional: Summarize earlier conversation content

3. Implement Retry Mechanisms

For temporary errors (429, 500, 503), use exponential backoff retry strategy:

async function callWithRetry(maxRetries = 3) { for (let i = 0; i < maxRetries; i++) { try { const response = await fetch('https://aiapi.services/v1/chat/completions', { // ... request configuration }); if (response.ok) { return await response.json(); } if (response.status === 429 || response.status >= 500) { const delay = Math.pow(2, i) * 1000; // Exponential backoff await new Promise(resolve => setTimeout(resolve, delay)); continue; } throw new Error(`HTTP ${response.status}`); } catch (error) { if (i === maxRetries - 1) throw error; } } }

4. Monitor Token Usage

Track token consumption per request via the usage field:

const data = await response.json(); console.log(`Used ${data.usage.total_tokens} tokens`); console.log(`Prompt: ${data.usage.prompt_tokens}, Completion: ${data.usage.completion_tokens}`);
Last updated on