How long does it take to transcribe one hour of audio?
Short answer
AI typically transcribes one hour of audio in 1-3 minutes; a skilled human transcriptionist needs 4-6 hours plus review time.
Turnaround time depends primarily on whether a human or AI does the work — and which model size is involved.
AI transcription: A 60-minute audio file processes in 1-3 minutes on modern GPU infrastructure. Larger models (Whisper Large v3 or premium-tier models tuned for accuracy) take a bit longer — around 3-5 minutes per hour of audio. If the provider has a busy queue, your wait can grow. Providers like DeepScript run a priority queue for premium jobs so they jump ahead of standard ones.
Human transcription: A skilled transcriptionist processes roughly 15 minutes of audio per hour of typing. So one hour of audio takes 3-5 hours of human time on clean material — and 6-8 hours when it's hard (multiple speakers, dialect, jargon, poor audio). Add 24-48 hours of queue/booking lead time on top, since work sits in a backlog.
Hybrid approach: Many professional services use AI as a first pass and have a human edit it. That cuts human time to 1-2 hours per audio hour at near-human accuracy. Typical turnarounds: 24 hours standard, 4-6 hours with a rush fee.
What slows things down: long silences still get processed (no time savings); poor audio quality forces longer model passes; lots of tiny files are often slower than fewer big ones because of per-job overhead. For live streaming transcription over WebSocket, latency is a different metric: good systems return interim results in 300-800ms and finalized text in 1-2 seconds.
Rule of thumb: budget half a day to a full day per audio hour when a human is in the loop. For pure AI transcription: by the time you've made coffee, it's done.
Related questions
Still have a question?
Three transcriptions free to try. Or drop us a line — we answer within 24 hours, compliance questions included.