Skip to Content
EnglishAudio API

Audio API

The Audio API provides powerful Text-to-Speech (TTS) capabilities, supporting multi-language, high-quality natural voice synthesis.

Endpoint

Text-to-Speech (TTS)

POST https://aiapi.services/v1/audio/speech

Authentication

All requests must include your API key in the HTTP header:

Authorization: Bearer YOUR_API_KEY

Supported Models

Text-to-Speech (TTS)

  • text-to-speech-multilingual - Multilingual TTS supporting natural voice synthesis in multiple languages
  • text-to-speech-neural - Neural network TTS with high-quality natural voice synthesis
  • text-to-speech-001 - Standard TTS model for basic text-to-speech functionality
  • text-to-speech-standard - Standard TTS version with stable voice synthesis service

See Available Models for the complete model list.

Text-to-Speech

Request Parameters

Required Parameters

ParameterTypeDescription
modelstringModel ID, e.g., text-to-speech-001
inputstringText content to convert to speech
voicestringVoice type: alloy, echo, fable, onyx, nova, shimmer

Optional Parameters

ParameterTypeDefaultDescription
response_formatstringmp3Output format: mp3, opus, aac, flac, wav, pcm
speednumber1.0Speech speed (0.25 - 4.0)

Code Examples

curl https://aiapi.services/v1/audio/speech \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "text-to-speech-001", "input": "The weather is nice today, perfect for a walk.", "voice": "alloy", "speed": 1.0 }' \ --output speech.mp3

Response Format

Success Response

The response is binary audio data (not JSON format). HTTP response headers include:

Content-Type: audio/mpeg # MP3 format Content-Type: audio/opus # Opus format Content-Type: audio/aac # AAC format Content-Type: audio/flac # FLAC format Content-Type: audio/wav # WAV format Content-Type: audio/pcm # PCM format Content-Length: 45678 # File size (bytes)

Usage:

# Save as file with open('output.mp3', 'wb') as f: f.write(response.content)

Audio Format Comparison

FormatFile SizeQualityCompatibilityRecommended Use
mp3MediumGoodExcellentGeneral purpose, default
opusSmallestExcellentGoodBandwidth-limited, real-time
aacMediumExcellentGoodiOS/Mac applications
flacLargeLosslessFairHigh-quality audio needs
wavLargestLosslessExcellentProfessional audio
pcmLargestLosslessPoorLow-level audio development

File Size Estimation

Approximate relationship between text length and audio file size (MP3 format):

Text LengthAudio DurationMP3 File Size
100 chars~10 seconds~20KB
500 chars~50 seconds~100KB
1000 chars~100 seconds~200KB
4096 chars (max)~400 seconds~800KB

Error Response

When requests fail, JSON-formatted error is returned. See Error Handling documentation for details.

{ "code": "invalid_request_error", "message": "Invalid parameter: input text too long", "data": null }

Common Errors:

  • input_too_long - Text exceeds maximum length (4096 characters)
  • invalid_voice - Unsupported voice type
  • quota_not_enough - Insufficient quota

Voice Types

Voice TypeCharacteristicsUse Cases
alloyNeutral, clearGeneral purpose
echoMale, steadyBusiness, news
fableWarm, friendlyStorytelling
onyxDeep, authoritativeFormal occasions
novaFemale, energeticAdvertising, marketing
shimmerSoft, elegantAssistant, customer service

Best Practices

Performance Optimization:

  • Recommended maximum text length per request: 4096 characters
  • For longer texts, process in segments
  • Use appropriate speech speed; default 1.0 is most natural

Important Notes:

  • Generated audio file size is proportional to text length
  • Generation time may vary slightly between voice types
  • Use HTTPS to ensure secure audio data transmission
Last updated on