The most accurate transcription for German-speaking dialects
We specialise in one thing: audio from the DACH region. Swiss German, Bavarian, Viennese, Low German — where generic models guess, we listen carefully.
3 free transcriptions · no credit card · data stays in Germany
Schön, dass es mit dem Termin geklappt hat.
Sehr gerne. Sollen wir direkt starten?
Ja — ich nehme das Gespräch auf, ist das ok?
Most transcription services use a generic multilingual model that handles all 99 languages roughly the same. For clean Standard German that's fine. But the moment a Bernese client mumbles, a Viennese woman speaks rapid-fire or a Bavarian meeting tips into dialect, accuracy collapses. DeepScript took the other route: our Premium model is fine-tuned on over 50,000 hours of DACH audio — real business meetings, interviews and podcasts. That dataset comes out of Aliru GmbH's heritage of running Sally (sally.io), an AI meeting assistant, before DeepScript. The result: ~3–5% Word Error Rate on clean German, versus 7–9% for generic Whisper-large.
Proof
Why we can claim this
50,000+ hours of DACH audio in training
Real meetings, interviews and phone calls from Germany, Austria and Switzerland — no synthetic data.
~3–5% WER on clean German (Premium)
Measured against a curated Standard-German test set. Generic Whisper-large scores 7–9% on the same set.
Heritage: Sally (sally.io)
Aliru GmbH has been building meeting AI for DACH companies for years. That experience went straight into the DeepScript model.
ISO 27001 / 9001 / 14001 certified
Information security, quality management and environmental management — three certifications, one operation.
Servers in Nuremberg & Falkenstein
Our own Hetzner hardware in German data centres. No US-cloud subprocessor anywhere in the transcription path.
In practice
What this looks like in practice
We specialise in one thing: audio from the DACH region. Swiss German, Bavarian, Viennese, Low German — where generic models guess, we listen carefully.
- Dialect handling for Swiss German, Bavarian, Viennese, Low German and Saxon — where generic models drift into English, we stay in context.
- Speaker diarization included in both tiers. Standard reliably handles 2–6 speakers, Premium also separates voices that sound alike.
- Custom vocabulary for proper nouns, jargon and company-specific acronyms — measurably boosts recognition of names like "Schwarz-Schilling" or "T1-weighted MRI".
- 99 languages available (auto-detect or manual selection), DACH languages are prioritised for accuracy and latency.
- Word-level timestamps with per-word confidence scores — visible in the editor, exportable as JSON for downstream pipelines.
How to use it
Up and running in a few steps
- 1
1. Pick a model
Standard (€0.18/h) for clean recordings and everyday conversations. Premium (€0.27/h) for dialect, noise, multiple speakers, or whenever the transcript needs to hold up to scrutiny.
- 2
2. Add a vocabulary (optional)
Drop proper nouns, product names and jargon into a list. The same vocabulary is then applied to every transcription in the project.
- 3
3. Upload or record live
Drop in an audio or video file, or use live transcription via the microphone. Premium runs on the priority queue.
- 4
4. Review and export
In the editor: rename speakers, click any word for the audio position, use confidence colouring. Export as TXT, SRT, VTT or JSON.
FAQ
Frequently asked questions
How is DeepScript better than generic Whisper?+
Generic Whisper-large is trained on a broad mix of 99 languages — average across the board, excellent at none. We take the same architecture and continue training on over 50,000 hours of DACH-specific audio. On clean German that gives us ~3–5% WER, where generic Whisper-large measures 7–9% on the same set. The gap widens significantly on dialects.
Why focus on DACH?+
Before DeepScript, Aliru GmbH spent years running Sally (sally.io), an AI meeting assistant with a mostly German-speaking customer base. That produced a training corpus and operational experience generic providers simply don't have. We're building the product we ourselves needed.
Do I get the same quality for English?+
English is excellently supported via the Whisper base — WER typically lands at 4–6% on clean recordings. Our fine-tuning doesn't measurably improve it (English already dominated the original Whisper training). We outperform on DACH; on English we're roughly on par with the big providers.
What does speaker diarization mean in practice?+
Each word gets a speaker label in addition to its text ("Speaker 1", "Speaker 2" …). In the editor you can rename labels into real names. In SRT/VTT exports they appear as a prefix before each subtitle; in JSON as a field on every word. You can override any label manually.
How is Premium different from Standard?+
Three things: (1) the DACH fine-tuning is only active in the Premium model — Standard uses a leaner variant; (2) Premium runs on a priority queue with lower wait times; (3) speaker diarization is more finely tuned for similar-sounding voices. Standard is €0.18/h, Premium €0.27/h.
See it for yourself
Upload a file and see the result in minutes. Three transcriptions free, no credit card.