How do I add timestamps to a transcript?
Short answer
Modern AI transcription emits word-level timestamps automatically; for readability, markers every 30-60 seconds or at speaker changes are usually enough.
Timestamps are tiny anchors that tie each section of text to an exact time in the audio. They make transcripts navigable — you can jump to any moment, verify a quote against the original audio, or time subtitles precisely.
What depth do you need?
1. Word-level timestamps (every word individually). Format like `[00:01:23.450] Hello`. Required for subtitle authoring, voice editing, karaoke-style highlighting. Modern AI models emit them automatically — DeepScript includes one per word in the JSON export.
2. Sentence or phrase timestamps. Format like `[00:01:23] Hello, welcome to the show.` Useful for caption generation (SRT/VTT) and for interactive players where users can click on sentences.
3. Block timestamps (every 30-60 seconds). Format like `[00:01:00] (topic shift) Now we're talking about …`. Common in qualitative research and journalism — readable for humans without cluttering the transcript.
4. Speaker-change timestamps. Format like `[00:01:23] Interviewer: …` and `[00:01:35] Maria: …`. Useful for interviews with clean turn-taking.
Format conventions - HH:MM:SS (hours:minutes:seconds) is standard. For short clips MM:SS is fine. - Separators: colons between units, period or comma before milliseconds (SRT uses comma, VTT uses period — many tools get this wrong). - Bracket format `[00:01:23]` for plain text; `00:00:01,500 --> 00:00:05,000` for SRT.
Generate them automatically
Adding timestamps manually is grunt work — skip it. Any modern transcription tool emits them. With DeepScript: - TXT export: block timestamps every 30 seconds - SRT/VTT export: sentence timestamps with start/end - JSON export: word-level timestamps with `start`, `end`, `confidence` per word
Post-processing AI timestamps are accurate to ±200ms. Broadcast subtitling often requires ±50ms — tools like Aegisub or Subtitle Edit let you fine-tune. For research and journalism, AI accuracy is more than enough.
Tip For interviews, the best move is block timestamps in the visible transcript plus a JSON with word timestamps in case you need finer navigation later.
Related questions
Still have a question?
Three transcriptions free to try. Or drop us a line — we answer within 24 hours, compliance questions included.