Chat Completions

The chat API allows you to interact with AI models through conversational interfaces, supporting multi-turn dialogues, streaming responses, and function calling.

Endpoint


POST https://aiapi.services/v1/chat/completions

Authentication

All requests must include your API key in the HTTP header:


Authorization: Bearer YOUR_API_KEY

Supported Models

Anthropic Claude

claude-sonnet-4-5 / claude-sonnet-4-5@20250929 - Latest Claude Sonnet 4.5
claude-sonnet-4 / claude-sonnet-4-20250514 - Claude Sonnet 4th generation
claude-opus-4-1 / claude-opus-4-1@20250805 - Claude Opus 4.1, strongest reasoning
claude-opus-4 - Claude Opus 4 base version
claude-3-7-sonnet - Claude 3.7 Sonnet
claude-3-5-haiku@20241022 - Claude 3.5 Haiku
claude-3-haiku@20240307 - Claude 3 Haiku

Google Gemini

gemini-2.5-pro - Gemini 2.5 Pro, most powerful multimodal model
gemini-2.5-flash - Gemini 2.5 Flash, fast and efficient
gemini-2.5-flash-image-preview - Gemini 2.5 Flash image preview
gemini-2.5-flash-lite - Gemini 2.5 Flash lite version
gemini-2.0-flash - Gemini 2.0 Flash
gemini-2.0-flash-lite - Gemini 2.0 Flash lite version

DeepSeek

deepseek-ai/deepseek-r1-0528-maas - DeepSeek R1 reasoning-enhanced model

See the complete model list for pricing and details.

Request Parameters

Required Parameters

Parameter	Type	Description
`model`	string	Model ID to use, e.g., `gemini-2.5-flash` or `claude-sonnet-4-5`
`messages`	array	Array of message objects with role and content

Optional Parameters

Parameter	Type	Default	Description
`temperature`	number	1.0	Sampling temperature between 0 and 2
`max_tokens`	integer	-	Maximum number of tokens to generate
`top_p`	number	1.0	Nucleus sampling parameter between 0 and 1
`stream`	boolean	false	Enable streaming responses
`stop`	string/array	-	Sequence(s) to stop generation
`presence_penalty`	number	0	Presence penalty between -2.0 and 2.0
`frequency_penalty`	number	0	Frequency penalty between -2.0 and 2.0
`user`	string	-	Unique identifier for end-user

Message Format

Each message contains the following fields:


{
  "role": "system" | "user" | "assistant",
  "content": string
}

system: System message to set AI behavior and context
user: User message representing user input
assistant: Assistant message representing AI response

Code Examples

cURL


curl https://aiapi.services/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant."
      },
      {
        "role": "user",
        "content": "Explain the history of artificial intelligence."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'

Python


import requests
 
response = requests.post(
  'https://aiapi.services/v1/chat/completions',
  headers={
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  json={
    'model': 'gemini-2.5-flash',
    'messages': [
      {
        'role': 'system',
        'content': 'You are a helpful AI assistant.'
      },
      {
        'role': 'user',
        'content': 'Explain the history of artificial intelligence.'
      }
    ],
    'temperature': 0.7,
    'max_tokens': 1000
  }
)
 
data = response.json()
print(data['choices'][0]['message']['content'])

JavaScript


const response = await fetch('https://aiapi.services/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gemini-2.5-flash',
    messages: [
      {
        role: 'system',
        content: 'You are a helpful AI assistant.'
      },
      {
        role: 'user',
        content: 'Explain the history of artificial intelligence.'
      }
    ],
    temperature: 0.7,
    max_tokens: 1000
  })
});
 
const data = await response.json();
console.log(data.choices[0].message.content);

Go


package main
 
import (
  "bytes"
  "encoding/json"
  "fmt"
  "io"
  "net/http"
)
 
func main() {
  url := "https://aiapi.services/v1/chat/completions"
 
  payload := map[string]interface{}{
    "model": "gemini-2.5-flash",
    "messages": []map[string]string{
      {
        "role":    "system",
        "content": "You are a helpful AI assistant.",
      },
      {
        "role":    "user",
        "content": "Explain the history of artificial intelligence.",
      },
    },
    "temperature": 0.7,
    "max_tokens":  1000,
  }
 
  jsonData, _ := json.Marshal(payload)
 
  req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
  req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
  req.Header.Set("Content-Type", "application/json")
 
  client := &http.Client{}
  resp, _ := client.Do(req)
  defer resp.Body.Close()
 
  body, _ := io.ReadAll(resp.Body)
  fmt.Println(string(body))
}

Rust


use reqwest::header::{HeaderMap, HeaderValue, AUTHORIZATION, CONTENT_TYPE};
use serde_json::json;
 
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = reqwest::Client::new();
 
    let mut headers = HeaderMap::new();
    headers.insert(AUTHORIZATION, HeaderValue::from_static("Bearer YOUR_API_KEY"));
    headers.insert(CONTENT_TYPE, HeaderValue::from_static("application/json"));
 
    let payload = json!({
        "model": "gemini-2.5-flash",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful AI assistant."
            },
            {
                "role": "user",
                "content": "Explain the history of artificial intelligence."
            }
        ],
        "temperature": 0.7,
        "max_tokens": 1000
    });
 
    let response = client
        .post("https://aiapi.services/v1/chat/completions")
        .headers(headers)
        .json(&payload)
        .send()
        .await?;
 
    let data: serde_json::Value = response.json().await?;
    println!("{}", data["choices"][0]["message"]["content"]);
 
    Ok(())
}

PHP


<?php
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://aiapi.services/v1/chat/completions");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, [
    "Authorization: Bearer YOUR_API_KEY",
    "Content-Type: application/json"
]);
 
$data = [
    "model" => "gemini-2.5-flash",
    "messages" => [
        [
            "role" => "system",
            "content" => "You are a helpful AI assistant."
        ],
        [
            "role" => "user",
            "content" => "Explain the history of artificial intelligence."
        ]
    ],
    "temperature" => 0.7,
    "max_tokens" => 1000
];
 
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
 
$response = curl_exec($ch);
curl_close($ch);
 
$result = json_decode($response, true);
echo $result['choices'][0]['message']['content'];
?>

Ruby


require 'net/http'
require 'json'
 
uri = URI('https://aiapi.services/v1/chat/completions')
request = Net::HTTP::Post.new(uri)
request['Authorization'] = 'Bearer YOUR_API_KEY'
request['Content-Type'] = 'application/json'
 
request.body = {
  model: 'gemini-2.5-flash',
  messages: [
    {
      role: 'system',
      content: 'You are a helpful AI assistant.'
    },
    {
      role: 'user',
      content: 'Explain the history of artificial intelligence.'
    }
  ],
  temperature: 0.7,
  max_tokens: 1000
}.to_json
 
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
  http.request(request)
end
 
data = JSON.parse(response.body)
puts data['choices'][0]['message']['content']

Response Format

Standard Response


{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Artificial intelligence (AI) has a rich history dating back to the 1950s..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 150,
    "total_tokens": 206
  }
}

Response Fields

Field	Type	Description
`id`	string	Unique request identifier
`object`	string	Object type, always `chat.completion`
`created`	integer	Unix timestamp
`model`	string	Model name used
`choices`	array	Array of generated response options
`choices[].message`	object	AI-generated message
`choices[].finish_reason`	string	Stop reason: `stop`, `length`, `content_filter`
`usage`	object	Token usage statistics

Streaming Responses

Enable streaming to receive AI responses in real-time, ideal for scenarios requiring quick feedback.

Enabling Streaming

Set the stream: true parameter:

cURL


curl https://aiapi.services/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

JavaScript


const response = await fetch('https://aiapi.services/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gemini-2.5-flash',
    messages: [{role: 'user', content: 'Hello'}],
    stream: true
  })
});
 
const reader = response.body.getReader();
const decoder = new TextDecoder();
 
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
 
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.trim() !== '');
 
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = line.slice(6);
      if (data === '[DONE]') continue;
 
      const parsed = JSON.parse(data);
      const content = parsed.choices[0]?.delta?.content || '';
      process.stdout.write(content);
    }
  }
}

Python


import requests
 
response = requests.post(
  'https://aiapi.services/v1/chat/completions',
  headers={'Authorization': 'Bearer YOUR_API_KEY'},
  json={
    'model': 'gemini-2.5-flash',
    'messages': [{'role': 'user', 'content': 'Hello'}],
    'stream': True
  },
  stream=True
)
 
for line in response.iter_lines():
  if line:
    line = line.decode('utf-8')
    if line.startswith('data: '):
      data = line[6:]
      if data == '[DONE]':
        break
 
      import json
      parsed = json.loads(data)
      content = parsed['choices'][0].get('delta', {}).get('content', '')
      print(content, end='', flush=True)

Streaming Response Format

Each chunk has the format:


data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}

data: [DONE]

Multi-turn Conversations

To maintain conversation context, include previous messages in the messages array:


const messages = [
  { role: 'system', content: 'You are a helpful AI assistant.' },
  { role: 'user', content: 'What is machine learning?' },
  { role: 'assistant', content: 'Machine learning is a branch of artificial intelligence...' },
  { role: 'user', content: 'What are its applications?' }  // New question based on context
];
 
const response = await fetch('https://aiapi.services/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gemini-2.5-flash',
    messages: messages
  })
});

⚠️ Token Limits: Each model has a maximum context length. For long conversations, you need to truncate or summarize earlier messages.

Parameter Tuning Guide

Temperature

0.0-0.3: More deterministic and consistent (good for code generation, data extraction)
0.4-0.7: Balanced creativity and consistency (good for daily conversations)
0.8-1.0: More creative and diverse (good for creative writing)
1.0-2.0: Highly random and creative (good for brainstorming)

Max Tokens

Set reasonable token limits based on needs:

Short responses: 100-300 tokens
Medium responses: 500-1000 tokens
Detailed responses: 1500-3000 tokens

Top P (Nucleus Sampling)

0.1: Only consider the top 10% most likely words
0.5: Consider words with cumulative probability up to 50%
1.0: Consider all possible words (default)

💡 Best Practice: Usually adjust either temperature or top_p, not both simultaneously.

Error Handling

Common Error Codes

Status Code	Description	Solution
401	Unauthorized	Check if API key is correct
400	Bad Request	Check request format and required parameters
429	Too Many Requests	Reduce request frequency or upgrade quota
500	Server Error	Retry later
503	Service Unavailable	Retry later

Error Response Example


{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Error Handling Examples

JavaScript


try {
  const response = await fetch('https://aiapi.services/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gemini-2.5-flash',
      messages: [{role: 'user', content: 'Hello'}]
    })
  });
 
  if (!response.ok) {
    const error = await response.json();
    console.error('API Error:', error.error.message);
    return;
  }
 
  const data = await response.json();
  console.log(data.choices[0].message.content);
} catch (error) {
  console.error('Network Error:', error);
}

Best Practices

1. Set Clear System Prompts

Use the system role to define AI behavior and context:


{
  "role": "system",
  "content": "You are a professional technical support agent. Answer questions concisely and friendly."
}

2. Manage Conversation Context Wisely

For long conversations, implement smart truncation strategies:

Keep system messages
Keep the most recent N turns
Optional: Summarize earlier conversation content

3. Implement Retry Mechanisms

For temporary errors (429, 500, 503), use exponential backoff retry strategy:


async function callWithRetry(maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch('https://aiapi.services/v1/chat/completions', {
        // ... request configuration
      });
 
      if (response.ok) {
        return await response.json();
      }
 
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.pow(2, i) * 1000; // Exponential backoff
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
 
      throw new Error(`HTTP ${response.status}`);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
    }
  }
}

4. Monitor Token Usage

Track token consumption per request via the usage field:


const data = await response.json();
console.log(`Used ${data.usage.total_tokens} tokens`);
console.log(`Prompt: ${data.usage.prompt_tokens}, Completion: ${data.usage.completion_tokens}`);

Authentication - Learn how to get and use API keys
Available Models - View all supported models and pricing
Usage Query - Query API usage and quotas
Embedding API - Text embedding vector generation
Image API - AI image generation

Chat Completions

Endpoint

Authentication

Supported Models

Anthropic Claude

Google Gemini

DeepSeek

Request Parameters

Required Parameters

Optional Parameters

Message Format

Code Examples

cURL

Python

JavaScript

Go

Rust

PHP

Ruby

Response Format

Standard Response

Response Fields

Streaming Responses

Enabling Streaming

cURL

JavaScript

Python

Streaming Response Format

Multi-turn Conversations

Parameter Tuning Guide

Temperature

Max Tokens

Top P (Nucleus Sampling)

Error Handling

Common Error Codes

Error Response Example

Error Handling Examples

JavaScript

Python

Best Practices

1. Set Clear System Prompts

2. Manage Conversation Context Wisely

3. Implement Retry Mechanisms

4. Monitor Token Usage

Related Resources