Speech-to-Text (STT)
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | Audio file (multipart/form-data) |
model | string | Yes | Model name: whisper-1, gpt-4o-transcribe |
language | string | No | Audio language (ISO-639-1 format), e.g. zh, en, ja |
response_format | string | No | Output format: json (default), text, srt, verbose_json, vtt |
temperature | number | No | Sampling temperature, 0-1 |
prompt | string | No | Prompt to help the model understand context |
Supported Audio Formats
mp3, mp4, mpeg, mpga, m4a, wav, webm
Request Examples
Response Examples
JSON Format
verbose_json Format
SRT Format
Audio Translation
Python
Specifying the
language parameter can improve transcription accuracy. Audio file size limit is 25MB.