DeepScript
Question

What is custom vocabulary and when do I need it?

Short answer

A word list you give the model before transcription with jargon and proper nouns — it lifts recognition of those terms from roughly 30% to 95%.

Colleague first-and-last names, product names, acronyms, medical and legal jargon — these words are either unknown to a standard language model or stored with the wrong spelling. The model falls back to phonetically similar guesses. "Müller-Lüdenscheidt" becomes "Mueller Loot Mitt," "pembrolizumab" becomes "Pemberlitsu Map."

Custom vocabulary (sometimes: "hot words," "word boosting," "pronunciation dictionary") fixes this. Before transcription, you upload a list of words that matter in your context. The model weights those words higher during recognition.

When you need it - You work in a field with specialized vocabulary (medical, legal, pharma, IT, manufacturing). - You transcribe interviews with recurring people (customers, research partners). - Your company has product names that aren't in the general lexicon. - You work with brands or companies with unusual spellings.

When you don't need it - General conversations without domain language. - Short clips with common words.

How to build a good list 1. Collect 20-50 terms per domain — more than that, the model "learns" too much and starts making other errors. 2. Write them with their final spelling (correct diacritics, hyphens, capitalization). 3. For very rare words, supply a phonetic hint if the provider supports it (e.g. "GDPR | DSGVO"). 4. Update the list as new terms come up.

With DeepScript At upload time you can pick a saved vocabulary or paste an inline comma-separated list. Via the API: use `vocabularyId` (saved list) or `vocabulary` (inline array) in the POST /v1/transcriptions request. For existing customers, it pays to set up a central glossary in the UI once and reuse it across all jobs.

Real impact In our internal benchmarks, a well-tuned custom vocabulary lifts domain-term recognition from roughly 30% to roughly 95%. For medical transcription, that's often the deciding factor between unusable and production-ready.

Related questions

Still have a question?

Three transcriptions free to try. Or drop us a line — we answer within 24 hours, compliance questions included.

Custom vocabulary in transcription — when and what for? | DeepScript