Skip to main content

Overview

Audio transcription allows you to convert speech from audio files into text using state-of-the-art AI models. CyrionAI provides access to Whisper models for accurate, multilingual transcription.

Basic Usage

Simple Audio Transcription

import openai

client = openai.OpenAI(
    api_key="your-api-key",
    base_url="https://ai.cyrionlabs.org/v1"
)

# Transcribe an audio file
with open("meeting_recording.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

print(response.text)  # Transcribed text

Transcription with Language Specification

with open("spanish_interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="es"  # Spanish
    )

print(response.text)

Parameters

Required Parameters

ParameterTypeDescription
filefileThe audio file to transcribe (MP3, MP4, M4A, WAV, etc.)

Optional Parameters

ParameterTypeDefaultDescription
modelstring”whisper-1”The model to use for transcription
languagestringnullLanguage code (e.g., “en”, “es”, “fr”)
promptstringnullContextual prompt to improve accuracy
response_formatstring”json”Response format (“json”, “text”, “srt”, “verbose_json”)
temperaturenumber0Controls randomness (0-1)

Supported Audio Formats

CyrionAI supports a wide range of audio formats:
FormatExtensionDescription
MP3.mp3Most common audio format
MP4.mp4Video files (audio will be extracted)
M4A.m4aApple audio format
WAV.wavUncompressed audio
FLAC.flacLossless audio
OGG.oggOpen source audio format

File Size Limits

  • Maximum file size: 25 MB
  • Recommended duration: Up to 60 minutes
  • Supported languages: 99+ languages

Language Support

Automatic Language Detection

# Let the model detect the language automatically
with open("audio.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

Specify Language

# Specify the language for better accuracy
with open("french_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="fr"  # French
    )

Common Language Codes

LanguageCodeLanguageCode
EnglishenSpanishes
FrenchfrGermande
ItalianitPortuguesept
RussianruChinesezh
JapanesejaKoreanko
ArabicarHindihi

Response Formats

JSON Format (Default)

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="json"
    )

print(response.text)  # Transcribed text

Text Format

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="text"
    )

print(response)  # Plain text

SRT Format (Subtitles)

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="srt"
    )

print(response)  # SRT subtitle format

Verbose JSON Format

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json"
    )

# Access detailed information
print(response.text)  # Transcribed text
print(response.language)  # Detected language
print(response.duration)  # Audio duration
print(response.segments)  # Timestamped segments

Using Prompts

Contextual Prompts

Provide context to improve transcription accuracy:
with open("technical_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a technical meeting about software development and AI implementation."
    )

Specialized Vocabulary

with open("medical_interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This conversation includes medical terminology, drug names, and clinical procedures."
    )

Names and Proper Nouns

with open("interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="The speaker's name is Dr. Sarah Johnson, and they work at the Community Health Center."
    )

Best Practices

1. Audio Quality

# Good: Clear audio with minimal background noise
# - Use high-quality microphones
# - Record in quiet environments
# - Ensure proper audio levels

# Avoid: Poor audio quality
# - Background noise and echo
# - Low volume or clipping
# - Multiple speakers talking simultaneously

2. File Preparation

# Ensure file is within size limits
import os

file_size = os.path.getsize("audio.mp3") / (1024 * 1024)  # Size in MB
if file_size > 25:
    print("File is too large. Please compress or split the audio.")

3. Language Specification

# Specify language when known for better accuracy
with open("spanish_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="es"  # Better accuracy than auto-detection
    )

4. Use Appropriate Prompts

# Provide relevant context
prompt = "This is a nonprofit board meeting discussing fundraising strategies and community outreach programs."

with open("board_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt=prompt
    )

Common Use Cases

Meeting Transcription

# Transcribe board meetings
with open("board_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a nonprofit board meeting with discussions about budgets, programs, and strategic planning."
    )

# Save transcription to file
with open("meeting_transcript.txt", "w") as f:
    f.write(response.text)

Interview Transcription

# Transcribe donor interviews
with open("donor_interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is an interview with a major donor discussing their philanthropic goals and giving history."
    )

Training Content

# Transcribe volunteer training sessions
with open("training_session.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a volunteer training session covering safety procedures and program guidelines."
    )

Podcast Transcription

# Transcribe nonprofit podcasts
with open("podcast_episode.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="srt"  # Generate subtitles
    )

# Save as SRT file for video platforms
with open("podcast_subtitles.srt", "w") as f:
    f.write(response)

Error Handling

try:
    with open("audio.mp3", "rb") as audio_file:
        response = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
        )
except openai.InvalidRequestError:
    print("Invalid audio file or format.")
except openai.RateLimitError:
    print("Rate limit exceeded. Please wait before making more requests.")
except openai.APIError as e:
    print(f"API error: {e}")

Advanced Features

Batch Processing

import os

audio_files = ["meeting1.mp3", "meeting2.mp3", "meeting3.mp3"]
transcriptions = []

for file_name in audio_files:
    with open(file_name, "rb") as audio_file:
        response = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
        )
        transcriptions.append({
            "file": file_name,
            "text": response.text
        })

# Save all transcriptions
for transcript in transcriptions:
    output_file = transcript["file"].replace(".mp3", "_transcript.txt")
    with open(output_file, "w") as f:
        f.write(transcript["text"])

Timestamped Segments

with open("long_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json"
    )

# Access timestamped segments
for segment in response.segments:
    start_time = segment.start
    end_time = segment.end
    text = segment.text
    print(f"[{start_time:.2f}s - {end_time:.2f}s] {text}")

Language Detection

with open("unknown_language.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json"
    )

print(f"Detected language: {response.language}")
print(f"Transcribed text: {response.text}")

Examples

Nonprofit Board Meeting

with open("board_meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a nonprofit board meeting discussing quarterly financial reports, upcoming fundraising events, and strategic planning for community programs.",
        response_format="verbose_json"
    )

# Create meeting minutes
with open("meeting_minutes.txt", "w") as f:
    f.write("BOARD MEETING TRANSCRIPT\n")
    f.write("=" * 50 + "\n\n")
    f.write(response.text)

Volunteer Training Session

with open("training.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is a volunteer training session covering safety protocols, program guidelines, and volunteer responsibilities.",
        response_format="srt"
    )

# Save as training documentation
with open("training_subtitles.srt", "w") as f:
    f.write(response)

Donor Interview

with open("donor_interview.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        prompt="This is an interview with a major donor discussing their philanthropic interests, giving history, and future donation plans.",
        response_format="json"
    )

# Create donor profile
donor_notes = f"""
DONOR INTERVIEW NOTES
====================
Date: {datetime.now().strftime('%Y-%m-%d')}
Interviewer: Development Team

Key Points:
{response.text}

Action Items:
- Follow up on discussed donation opportunities
- Send thank you letter
- Schedule follow-up meeting
"""

with open("donor_notes.txt", "w") as f:
    f.write(donor_notes)

Performance Considerations

Processing Time

  • Transcription typically takes 30-60 seconds per minute of audio
  • Processing time varies by file size and complexity

Rate Limits

  • Audio transcription: 50 requests per minute
  • Plan accordingly for batch processing

File Optimization

  • Compress audio files to reduce upload time
  • Use appropriate audio formats (MP3 recommended)
  • Ensure good audio quality for better accuracy

Next Steps