Audio Transcription - CyrionAI API

Overview

Audio transcription allows you to convert speech from audio files into text using state-of-the-art AI models. CyrionAI provides access to Whisper models for accurate, multilingual transcription.

Basic Usage

Simple Audio Transcription

import openai

client = openai.OpenAI(
    api_key="your-api-key",
    base_url="https://ai.cyrionlabs.org/v1"
)

# Transcribe an audio file
with open("meeting_recording.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

print(response.text)  # Transcribed text

Transcription with Language Specification

with open("spanish_interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="es"  # Spanish
    )

print(response.text)

Parameters

Required Parameters

Parameter	Type	Description
`file`	file	The audio file to transcribe (MP3, MP4, M4A, WAV, etc.)

Optional Parameters

Parameter	Type	Default	Description
`model`	string	”whisper-1”	The model to use for transcription
`language`	string	null	Language code (e.g., “en”, “es”, “fr”)
`prompt`	string	null	Contextual prompt to improve accuracy
`response_format`	string	”json”	Response format (“json”, “text”, “srt”, “verbose_json”)
`temperature`	number	0	Controls randomness (0-1)

Supported Audio Formats

CyrionAI supports a wide range of audio formats:

Format	Extension	Description
MP3	.mp3	Most common audio format
MP4	.mp4	Video files (audio will be extracted)
M4A	.m4a	Apple audio format
WAV	.wav	Uncompressed audio
FLAC	.flac	Lossless audio
OGG	.ogg	Open source audio format

File Size Limits

Maximum file size: 25 MB
Recommended duration: Up to 60 minutes
Supported languages: 99+ languages

Language Support

Automatic Language Detection

# Let the model detect the language automatically
with open("audio.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

Specify Language

# Specify the language for better accuracy
with open("french_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="fr"  # French
    )

Common Language Codes

Language	Code	Language	Code
English	en	Spanish	es
French	fr	German	de
Italian	it	Portuguese	pt
Russian	ru	Chinese	zh
Japanese	ja	Korean	ko
Arabic	ar	Hindi	hi

Response Formats

JSON Format (Default)

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="json"
    )

print(response.text)  # Transcribed text

Text Format

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="text"
    )

print(response)  # Plain text

SRT Format (Subtitles)

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="srt"
    )

print(response)  # SRT subtitle format

Verbose JSON Format

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json"
    )

# Access detailed information
print(response.text)  # Transcribed text
print(response.language)  # Detected language
print(response.duration)  # Audio duration
print(response.segments)  # Timestamped segments

Using Prompts

Contextual Prompts

Provide context to improve transcription accuracy:

with open("technical_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a technical meeting about software development and AI implementation."
    )

Specialized Vocabulary

with open("medical_interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This conversation includes medical terminology, drug names, and clinical procedures."
    )

Names and Proper Nouns

with open("interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="The speaker's name is Dr. Sarah Johnson, and they work at the Community Health Center."
    )

Best Practices

1. Audio Quality

# Good: Clear audio with minimal background noise
# - Use high-quality microphones
# - Record in quiet environments
# - Ensure proper audio levels

# Avoid: Poor audio quality
# - Background noise and echo
# - Low volume or clipping
# - Multiple speakers talking simultaneously

2. File Preparation

# Ensure file is within size limits
import os

file_size = os.path.getsize("audio.mp3") / (1024 * 1024)  # Size in MB
if file_size > 25:
    print("File is too large. Please compress or split the audio.")

3. Language Specification

# Specify language when known for better accuracy
with open("spanish_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="es"  # Better accuracy than auto-detection
    )

4. Use Appropriate Prompts

# Provide relevant context
prompt = "This is a nonprofit board meeting discussing fundraising strategies and community outreach programs."

with open("board_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt=prompt
    )

Common Use Cases

Meeting Transcription

# Transcribe board meetings
with open("board_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a nonprofit board meeting with discussions about budgets, programs, and strategic planning."
    )

# Save transcription to file
with open("meeting_transcript.txt", "w") as f:
    f.write(response.text)

Interview Transcription

# Transcribe donor interviews
with open("donor_interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is an interview with a major donor discussing their philanthropic goals and giving history."
    )

Training Content

# Transcribe volunteer training sessions
with open("training_session.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a volunteer training session covering safety procedures and program guidelines."
    )

Podcast Transcription

# Transcribe nonprofit podcasts
with open("podcast_episode.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="srt"  # Generate subtitles
    )

# Save as SRT file for video platforms
with open("podcast_subtitles.srt", "w") as f:
    f.write(response)

Error Handling

try:
    with open("audio.mp3", "rb") as audio_file:
        response = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
        )
except openai.InvalidRequestError:
    print("Invalid audio file or format.")
except openai.RateLimitError:
    print("Rate limit exceeded. Please wait before making more requests.")
except openai.APIError as e:
    print(f"API error: {e}")

Advanced Features

Batch Processing

import os

audio_files = ["meeting1.mp3", "meeting2.mp3", "meeting3.mp3"]
transcriptions = []

for file_name in audio_files:
    with open(file_name, "rb") as audio_file:
        response = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
        )
        transcriptions.append({
            "file": file_name,
            "text": response.text
        })

# Save all transcriptions
for transcript in transcriptions:
    output_file = transcript["file"].replace(".mp3", "_transcript.txt")
    with open(output_file, "w") as f:
        f.write(transcript["text"])

Timestamped Segments

with open("long_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json"
    )

# Access timestamped segments
for segment in response.segments:
    start_time = segment.start
    end_time = segment.end
    text = segment.text
    print(f"[{start_time:.2f}s - {end_time:.2f}s] {text}")

Language Detection

with open("unknown_language.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json"
    )

print(f"Detected language: {response.language}")
print(f"Transcribed text: {response.text}")

Examples

Nonprofit Board Meeting

with open("board_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a nonprofit board meeting discussing quarterly financial reports, upcoming fundraising events, and strategic planning for community programs.",
        response_format="verbose_json"
    )

# Create meeting minutes
with open("meeting_minutes.txt", "w") as f:
    f.write("BOARD MEETING TRANSCRIPT\n")
    f.write("=" * 50 + "\n\n")
    f.write(response.text)

Volunteer Training Session

with open("training.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a volunteer training session covering safety protocols, program guidelines, and volunteer responsibilities.",
        response_format="srt"
    )

# Save as training documentation
with open("training_subtitles.srt", "w") as f:
    f.write(response)

Donor Interview

with open("donor_interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is an interview with a major donor discussing their philanthropic interests, giving history, and future donation plans.",
        response_format="json"
    )

# Create donor profile
donor_notes = f"""
DONOR INTERVIEW NOTES
====================
Date: {datetime.now().strftime('%Y-%m-%d')}
Interviewer: Development Team

Key Points:
{response.text}

Action Items:
- Follow up on discussed donation opportunities
- Send thank you letter
- Schedule follow-up meeting
"""

with open("donor_notes.txt", "w") as f:
    f.write(donor_notes)

Performance Considerations

Processing Time

Transcription typically takes 30-60 seconds per minute of audio
Processing time varies by file size and complexity

Rate Limits

Audio transcription: 50 requests per minute
Plan accordingly for batch processing

File Optimization

Compress audio files to reduce upload time
Use appropriate audio formats (MP3 recommended)
Ensure good audio quality for better accuracy

Next Steps

Learn about chat completions
Explore image generation
Check out more audio examples
View the API reference

Getting started

Core features

Models

Examples

​Overview

​Basic Usage

​Simple Audio Transcription

​Transcription with Language Specification

​Parameters

​Required Parameters

​Optional Parameters

​Supported Audio Formats

​File Size Limits

​Language Support

​Automatic Language Detection

​Specify Language

​Common Language Codes

​Response Formats

​JSON Format (Default)

​Text Format

​SRT Format (Subtitles)

​Verbose JSON Format

​Using Prompts

​Contextual Prompts

​Specialized Vocabulary

​Names and Proper Nouns

​Best Practices

​1. Audio Quality

​2. File Preparation

​3. Language Specification

​4. Use Appropriate Prompts

​Common Use Cases

​Meeting Transcription

​Interview Transcription

​Training Content

​Podcast Transcription

​Error Handling

​Advanced Features

​Batch Processing

​Timestamped Segments

​Language Detection

​Examples

​Nonprofit Board Meeting

​Volunteer Training Session

​Donor Interview

​Performance Considerations

​Processing Time

​Rate Limits

​File Optimization

​Next Steps

Overview

Basic Usage

Simple Audio Transcription

Transcription with Language Specification

Parameters

Required Parameters

Optional Parameters

Supported Audio Formats

File Size Limits

Language Support

Automatic Language Detection

Specify Language

Common Language Codes

Response Formats

JSON Format (Default)

Text Format

SRT Format (Subtitles)

Verbose JSON Format

Using Prompts

Contextual Prompts

Specialized Vocabulary

Names and Proper Nouns

Best Practices

1. Audio Quality

2. File Preparation

3. Language Specification

4. Use Appropriate Prompts

Common Use Cases

Meeting Transcription

Interview Transcription

Training Content

Podcast Transcription

Error Handling

Advanced Features

Batch Processing

Timestamped Segments

Language Detection

Examples

Nonprofit Board Meeting

Volunteer Training Session

Donor Interview

Performance Considerations

Processing Time

Rate Limits

File Optimization

Next Steps