Skip to main content

Text-to-Speech (TTS)

from openai import OpenAI

client = OpenAI(api_key="sk-xxx", base_url="https://crazyrouter.com/v1")

# Generate speech
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello, welcome to the Crazyrouter API service."
)

# Save as MP3 file
response.stream_to_file("output.mp3")
print("Audio saved to output.mp3")

Available Voices

VoiceCharacteristics
alloyNeutral, balanced
echoMale, calm
fableMale, warm
onyxMale, deep
novaFemale, lively
shimmerFemale, soft

High Quality TTS

# Use tts-1-hd for higher audio quality
response = client.audio.speech.create(
    model="tts-1-hd",
    voice="nova",
    input="High quality speech synthesis example",
    response_format="opus",  # Supports mp3, opus, aac, flac
    speed=1.0  # 0.25 to 4.0
)

response.stream_to_file("output_hd.opus")

Speech-to-Text (STT)

# Whisper speech recognition
audio_file = open("recording.mp3", "rb")

transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    language="en"  # Optional, specifying language improves accuracy
)

print(transcript.text)

Transcription with Timestamps

transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=open("recording.mp3", "rb"),
    response_format="verbose_json",
    timestamp_granularities=["segment"]
)

for segment in transcript.segments:
    print(f"[{segment['start']:.1f}s - {segment['end']:.1f}s] {segment['text']}")

Audio Translation

Translate non-English audio to English text:
audio_file = open("foreign_audio.mp3", "rb")

translation = client.audio.translations.create(
    model="whisper-1",
    file=audio_file
)

print(translation.text)  # English output
TTS supported models include tts-1, tts-1-hd, and gpt-4o-mini-tts. STT uses the whisper-1 model.