Should I self-host Whisper or use an API?
Short answer
Under ~500 audio hours/month, a managed API almost always wins; above that self-hosting can be cheaper, but only with GPU experience and DevOps budget.
Whisper is open source — you can run it on your own hardware for free. Sounds tempting, but there's nuance. An honest calculation.
What you need to host Whisper - GPU: at least 10 GB VRAM for Whisper-Large. An RTX 4090 (~$2,000) or an A100 in the cloud (~$2/h spot). - Docker, NVIDIA Container Toolkit, possibly Kubernetes. - Job queue (Redis + BullMQ, Sidekiq, Celery). - File storage for audio (S3-compatible, or local with backup). - Monitoring (Prometheus, Grafana, Sentry). - Scaling logic for traffic spikes. - Possible add-ons: WhisperX for diarization, your own API wrapper around Whisper.
What Whisper doesn't do out of the box - Speaker diarization (needs WhisperX or pyannote.audio). - Real-time streaming (needs Faster-Whisper + custom code). - Word-level timestamps imprecise in some variants. - Webhooks, rate limits, multi-tenant — build them yourself. - GDPR compliance — you're the controller, no vendor shares liability.
Cost at 500h/month - DeepScript Standard: 500h × €0.18 = €90/month. - Self-hosted on Hetzner GPU server (GEX44 with RTX 6000 ADA, ~€700/month): €700 + power surcharge plus DevOps time. - Self-hosted on AWS (g5.xlarge, ~$1.00/h spot, 24/7): ~$720/month plus storage and network egress.
When self-hosting makes sense - You transcribe > 2,000 h/month and the load stays stable over months. - You already have GPU infrastructure in house (ML research, leftover game-server fleet). - You need extreme data sovereignty (e.g. air-gapped environments for government). - You fine-tune Whisper on your domain — APIs rarely allow this.
When an API wins - < 500h/month: API is cheaper, simpler, faster to ship. - You need diarization, custom vocabulary, webhooks without building them. - You need an SLA and 24/7 support. - You need written GDPR compliance (DPA, sub-processor list). - You want to focus on your product, not GPU drivers.
Hybrid variant Some teams use DeepScript for GDPR-relevant customer data and self-hosted Whisper for internal, low-stakes workloads. Clean cost optimization — if the DevOps capacity is there.
Rule of thumb If you don't already have a team with GPU experience: API. If you do and the volume matches: self-host. But cost honestly — most people underestimate running ops costs and overestimate savings.
Related questions
Still have a question?
Three transcriptions free to try. Or drop us a line — we answer within 24 hours, compliance questions included.