DeepScript
Question

What is automatic transcription?

Short answer

Automatic transcription is the AI-driven conversion of spoken audio into written text — in seconds, without a human typist in the loop.

Automatic transcription is the process where software converts spoken audio (from a recording or live stream) into written text. Instead of a human listening and typing, an AI model — typically a neural network like Whisper, Wav2Vec, or a Conformer architecture — does the work in seconds to minutes.

Under the hood, several steps happen in parallel. The audio is sliced into short time windows, turned into a spectrogram, and passed through an acoustic model that detects phonemes and sub-word units. A language model then assembles those phonemes into the most likely words and sentences, using context, grammar, and vocabulary as guides. Modern end-to-end models fuse the acoustic and language stages into one neural network and often emit word-level timestamps and speaker labels (diarization) directly.

Compared to human transcription, the automatic variant is dramatically faster (a one-hour file usually processes in 1-3 minutes) and cheaper — good providers charge around €0.18-0.30 per hour of audio, whereas human transcription runs €60-120 per hour. On clean recordings with clear speech, the best systems today achieve word error rates below 5%, meaning accuracies of 95% or higher.

Where automatic transcription still struggles: heavy regional accents (Swiss German, broad Glaswegian), overlapping speakers, jargon without custom vocabulary support, poor audio quality, and very soft voices. For the highest-stakes use cases — court records, clinical documentation, broadcast subtitles — teams typically use automatic transcription as a first pass and have a human edit the output.

DeepScript uses a Whisper-compatible engine but runs it entirely on its own servers in Germany — no offloading to US cloud APIs.

Related questions

Still have a question?

Three transcriptions free to try. Or drop us a line — we answer within 24 hours, compliance questions included.

What is automatic transcription? How it works, in plain English | DeepScript