assemblyai-performance-tuning

'Optimize AssemblyAI API performance with caching, parallel processing,

v1.0.0

Jeremy Longshore

MIT

3 Tools

assemblyai-pack Plugin

saas packs Category

Allowed Tools
        ReadWriteEdit
      

Provided by Plugin

assemblyai-pack

Claude Code skill pack for AssemblyAI (18 skills)

saas packs v1.0.0

View Plugin

Installation

This skill is included in the assemblyai-pack plugin:

/plugin install assemblyai-pack@claude-code-plugins-plus

Click to copy

Instructions

AssemblyAI Performance Tuning

Overview

Optimize AssemblyAI transcription performance through model selection, parallel processing, caching, and webhook-based architectures.

Prerequisites

assemblyai package installed
Understanding of async patterns
Redis or in-memory cache available (optional)

Latency Benchmarks (Actual)

Async Transcription

Audio Duration	Approx. Processing Time	Notes
30 seconds	~10-15 seconds	Includes queue time
5 minutes	~30-60 seconds	Scales sub-linearly
1 hour	~3-5 minutes	Depends on queue load
10 hours	~15-30 minutes	Max async duration

Streaming

Metric	Value
First partial transcript	~300ms (P50)
Final transcript latency	~500ms (P50)
End-of-turn detection	Automatic with endpointing

Model Speed vs. Accuracy

Model	Speed	Accuracy	Price/hr
`nano`	Fastest	Good	$0.12
`best` (Universal-3)	Standard	Highest	$0.37
`nova-3` (streaming)	Real-time	High	$0.47
`nova-3-pro` (streaming)	Real-time	Highest	$0.47

Instructions

Step 1: Choose the Right Model


import { AssemblyAI } from 'assemblyai';

const client = new AssemblyAI({
  apiKey: process.env.ASSEMBLYAI_API_KEY!,
});

// For highest accuracy (default)
const accurate = await client.transcripts.transcribe({
  audio: audioUrl,
  speech_model: 'best',
});

// For fastest processing and lowest cost
const fast = await client.transcripts.transcribe({
  audio: audioUrl,
  speech_model: 'nano',
});

Step 2: Parallel Batch Processing


import PQueue from 'p-queue';

const queue = new PQueue({ concurrency: 10 });

async function batchTranscribe(audioUrls: string[]) {
  const results = await Promise.all(
    audioUrls.map(url =>
      queue.add(() =>
        client.transcripts.transcribe({ audio: url, speech_model: 'nano' })
      )
    )
  );

  return results.filter(t => t.status === 'completed');
}

// Process 100 files with 10 concurrent jobs
const urls = Array.from({ length: 100 }, (_, i) => `https://storage.example.com/audio-${i}.mp3`);
const transcripts = await batchTranscribe(urls);
console.log(`Completed: ${transcripts.length}/${urls.length}`);

Step 3: Use Webhooks Instead of Polling


// SLOW: transcribe() polls every 3 seconds until done
const slow = await client.transcripts.transcribe({ audio: audioUrl });

// FAST: submit() returns immediately, webhook notifies on completion
const fast = await client.transcripts.submit({
  audio: audioUrl,
  webhook_url: 'https://your-app.com/webhooks/assemblyai',
});
// Your webhook handler processes the result — no polling overhead

Step 4: Cache Transcript Results


import { LRUCache } from 'lru-cache';
import type { Transcript } from 'assemblyai';

const transcriptCache = new LRUCache<string, Transcript>({
  max: 500,
  ttl: 60 * 60 * 1000, // 1 hour
});

async function getCachedTranscript(transcriptId: string): Promise<Transcript> {
  const cached = transcriptCache.get(transcriptId);
  if (cached) return cached;

  const transcript = await client.transcripts.get(transcriptId);
  if (transcript.status === 'completed') {
    transcriptCache.set(transcriptId, transcript);
  }
  return transcript;
}

Step 5: Redis Cache for Distributed Systems


import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

async function getCachedTranscriptRedis(transcriptId: string): Promise<Transcript> {
  const cached = await redis.get(`transcript:${transcriptId}`);
  if (cached) return JSON.parse(cached);

  const transcript = await client.transcripts.get(transcriptId);
  if (transcript.status === 'completed') {
    await redis.setex(
      `transcript:${transcriptId}`,
      3600, // 1 hour TTL
      JSON.stringify(transcript)
    );
  }
  return transcript;
}

Step 6: Minimize Feature Overhead


// Only enable features you actually need — each adds processing time

// Minimal (fastest)
const minimal = await client.transcripts.transcribe({
  audio: audioUrl,
  speech_model: 'nano',
  punctuate: true,
  format_text: true,
});

// Full intelligence (slower, more expensive)
const full = await client.transcripts.transcribe({
  audio: audioUrl,
  speech_model: 'best',
  speaker_labels: true,
  sentiment_analysis: true,
  entity_detection: true,
  auto_highlights: true,
  content_safety: true,
  iab_categories: true,
  summarization: true,
  summary_type: 'bullets',
});

Step 7: Performance Monitoring


async function timedTranscribe(audioUrl: string, options: Record<string, any> = {}) {
  const start = Date.now();
  const transcript = await client.transcripts.transcribe({
    audio: audioUrl,
    ...options,
  });
  const durationMs = Date.now() - start;

  const stats = {
    transcriptId: transcript.id,
    status: transcript.status,
    audioDuration: transcript.audio_duration,
    processingTimeMs: durationMs,
    ratio: transcript.audio_duration
      ? (durationMs / 1000 / transcript.audio_duration).toFixed(2)
      : 'N/A',
    wordCount: transcript.words?.length ?? 0,
    model: options.speech_model ?? 'best',
  };

  console.log('Transcription stats:', stats);
  return { transcript, stats };
}

Output

Optimal model selection based on speed/accuracy/cost trade-offs
Parallel batch processing with concurrency control
Webhook-based architecture (eliminates polling overhead)
In-memory and Redis caching for transcript retrieval
Performance monitoring with processing time ratios

Error Handling

Issue	Cause	Solution
Slow transcription	Large file + best model	Use `nano` model or split audio
Queue backlog	Too many concurrent submissions	Limit concurrency with p-queue
Cache stale data	Transcript re-processed	Set appropriate TTL, invalidate on webhook
Polling overhead	Using `transcribe()` for many files	Switch to `submit()` + webhooks

Resources

Next Steps

For cost optimization, see assemblyai-cost-tuning.

Allowed Tools

Provided by Plugin

assemblyai-pack

Installation

Instructions

AssemblyAI Performance Tuning

Overview

Prerequisites

Latency Benchmarks (Actual)

Async Transcription

Streaming

Model Speed vs. Accuracy

Instructions

Step 1: Choose the Right Model

Step 2: Parallel Batch Processing

Step 3: Use Webhooks Instead of Polling

Step 4: Cache Transcript Results

Step 5: Redis Cache for Distributed Systems

Step 6: Minimize Feature Overhead

Step 7: Performance Monitoring

Output

Error Handling

Resources

Next Steps

Ready to use assemblyai-pack?

Related Skills

abridge-ci-integration

abridge-common-errors

abridge-core-workflow-a

abridge-core-workflow-b

abridge-cost-tuning

abridge-debug-bundle