What is automatic transcription?
Short answer
Automatic transcription is the AI-driven conversion of spoken audio into written text — in seconds, without a human typist in the loop.
Automatic transcription is the process where software converts spoken audio (from a recording or live stream) into written text. Instead of a human listening and typing, an AI model — typically a neural network like Whisper, Wav2Vec, or a Conformer architecture — does the work in seconds to minutes.
Under the hood, several steps happen in parallel. The audio is sliced into short time windows, turned into a spectrogram, and passed through an acoustic model that detects phonemes and sub-word units. A language model then assembles those phonemes into the most likely words and sentences, using context, grammar, and vocabulary as guides. Modern end-to-end models fuse the acoustic and language stages into one neural network and often emit word-level timestamps and speaker labels (diarization) directly.
Compared to human transcription, the automatic variant is dramatically faster (a one-hour file usually processes in 1-3 minutes) and cheaper — good providers charge around €0.18-0.30 per hour of audio, whereas human transcription runs €60-120 per hour. On clean recordings with clear speech, the best systems today achieve word error rates below 5%, meaning accuracies of 95% or higher.
Where automatic transcription still struggles: heavy regional accents (Swiss German, broad Glaswegian), overlapping speakers, jargon without custom vocabulary support, poor audio quality, and very soft voices. For the highest-stakes use cases — court records, clinical documentation, broadcast subtitles — teams typically use automatic transcription as a first pass and have a human edit the output.
DeepScript uses a Whisper-compatible engine but runs it entirely on its own servers in Germany — no offloading to US cloud APIs.
Related questions
Still have a question?
Three transcriptions free to try. Or drop us a line — we answer within 24 hours, compliance questions included.