DeepScript
Question

Will my audio be used to train AI models?

Short answer

With some US providers, yes, unless you actively opt out. Reputable providers contractually exclude it — ask, and read the small print.

The answer depends entirely on the provider and shapes how safe your data really is. Three common models:

1. Default training on customer data (opt-out). Some US cloud providers use customer data to train their models unless you actively object. OpenAI initially trained on ChatGPT API data, then switched the default off — the picture varies across providers. Speech-to-text vendors often use customer audio as an "improvement signal" for their acoustic and language models.

2. Opt-in to training. Other providers don't train by default but offer discounts ("Enable data sharing for 30% off"). More transparent but dangerous because employees can tick the box without thinking.

3. Contractual exclusion. Reputable providers exclude training use in their standard terms and in the DPA. GDPR Art. 28(3)(a) already requires processors to handle data only on the controller's documented instructions — training for the processor's own purposes would be a violation.

Why this matters: if your audio ends up in training data, rare utterances can create reidentification risks. The well-known case is LLMs leaking private data from training corpora (membership-inference and memorization attacks). For speech models the risk is lower but not zero — voice samples are biometric data.

How to check: - Search the terms for "training," "improvement," "machine learning," "customer data." - Check the privacy policy for "used to improve our service" — typical hedging language. - Request the DPA template and look for an explicit clause forbidding training use. - For Fortune-500-grade vendors: request a compliance sheet with customer-specific confirmation.

DeepScript trains zero models on customer data. We use a pretrained Whisper-compatible model and fine-tune only on publicly licensed datasets (Common Voice, LibriSpeech, free German corpora). This guarantee lives in our DPA and Terms.

Related questions

Still have a question?

Three transcriptions free to try. Or drop us a line — we answer within 24 hours, compliance questions included.

Will my audio be used to train AI models? | DeepScript