DeepScript
Question

What's the difference between transcription, captions, and subtitles?

Short answer

Transcription is the raw text; captions display it in sync with video; subtitles also translate it into another language.

These three terms get mixed up daily, but they describe different things.

A transcription is the full written text of an audio or video recording — usually as flowing prose, often with speaker labels and timestamps. It comes out as .txt, .docx, or structured .json and serves as the foundation for further work: search indexes, quote pulling, summaries, translations, or caption creation.

Captions are short text segments timed precisely to the audio and displayed on the video as it plays. They're aimed primarily at deaf and hard-of-hearing viewers and often include non-speech cues like [laughter], [music], or [door slams]. Caption files come in formats like SRT (.srt) or WebVTT (.vtt) — both contain timestamps in HH:MM:SS format and short text blocks of 1-2 lines each.

Subtitles, in the traditional sense, are translations of the spoken content into another language — for example, English film with German subtitles. Technically they use the same formats as captions (SRT, VTT) but typically omit non-speech sound cues. In US English the words "captions" and "subtitles" are often used interchangeably, which adds to the confusion.

A practical workflow: start with a transcription, break it into readable blocks of 32-42 characters per line, add precise start/end timestamps, and export as SRT for YouTube/LinkedIn or VTT for HTML5 web players. Tools like DeepScript export SRT, VTT, TXT, and JSON from a single transcription job — you don't have to build the conversion yourself.

In short: transcription = content. Captions = content + timing. Subtitles = content + timing + translation.

Related questions

Still have a question?

Three transcriptions free to try. Or drop us a line — we answer within 24 hours, compliance questions included.

Transcription vs. captions vs. subtitles: what's the difference? | DeepScript