developers

Speech-to-Text API for Developers: Getting Started with DeepScript

Integrate speech-to-text into your app with the DeepScript API. Code examples for cURL, Python, and JavaScript. 99 languages, GDPR-compliant, hosted in Germany.

DeepScript TeamApril 6, 20266 min leestijd

Speech-to-Text API for Developers: Getting Started with DeepScript

If you need to transcribe audio programmatically – whether for a SaaS product, internal tool, or automated pipeline – DeepScript provides a REST API that handles the heavy lifting. Upload a file, get a transcript back. No ML infrastructure to manage, no model hosting to worry about.

This guide walks you through authentication, file upload, status polling, and result retrieval with working code examples in cURL, Python, and JavaScript.

Full API reference: api.deepscript.com/docs

Authentication

All API requests require an API key. You can generate one in your DeepScript dashboard under Settings > API Keys.

Include the key in the Authorization header of every request:

Authorization: Bearer YOUR_API_KEY

Keep your API key secret. Do not commit it to version control or expose it in client-side code. Use environment variables or a secrets manager.

Core Workflow

The transcription workflow follows three steps:

Upload an audio or video file to create a transcription job
Poll the job status until processing completes (or use webhooks)
Retrieve the finished transcript

Upload a File

Send a POST request with your file to create a new transcription job.

cURL

curl -X POST https://api.deepscript.com/v1/transcriptions \
  -H "Authorization: Bearer $DEEPSCRIPT_API_KEY" \
  -F "file=@meeting-recording.mp3" \
  -F "model=standard" \
  -F "language=auto"

Python

import requests

API_KEY = os.environ["DEEPSCRIPT_API_KEY"]
BASE_URL = "https://api.deepscript.com/v1"

def create_transcription(file_path, model="standard", language="auto"):
    with open(file_path, "rb") as f:
        response = requests.post(
            f"{BASE_URL}/transcriptions",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"model": model, "language": language},
        )
    response.raise_for_status()
    return response.json()

job = create_transcription("meeting-recording.mp3")
print(f"Job created: {job['id']}")

JavaScript (Node.js)

import fs from "fs";
import FormData from "form-data";

const API_KEY = process.env.DEEPSCRIPT_API_KEY;
const BASE_URL = "https://api.deepscript.com/v1";

async function createTranscription(filePath, model = "standard", language = "auto") {
  const form = new FormData();
  form.append("file", fs.createReadStream(filePath));
  form.append("model", model);
  form.append("language", language);

  const response = await fetch(`${BASE_URL}/transcriptions`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${API_KEY}`,
      ...form.getHeaders(),
    },
    body: form,
  });

  if (!response.ok) throw new Error(`Upload failed: ${response.status}`);
  return response.json();
}

const job = await createTranscription("meeting-recording.mp3");
console.log(`Job created: ${job.id}`);

Response

{
  "id": "txn_abc123",
  "status": "processing",
  "model": "standard",
  "language": "auto",
  "created_at": "2026-04-06T10:30:00Z"
}

Poll for Status

Transcription takes roughly 20-50% of the audio duration, depending on file length and the model selected. Poll the status endpoint until the job completes.

cURL

curl https://api.deepscript.com/v1/transcriptions/txn_abc123 \
  -H "Authorization: Bearer $DEEPSCRIPT_API_KEY"

Python

import time

def wait_for_transcription(job_id, interval=5):
    while True:
        response = requests.get(
            f"{BASE_URL}/transcriptions/{job_id}",
            headers={"Authorization": f"Bearer {API_KEY}"},
        )
        response.raise_for_status()
        data = response.json()

        if data["status"] == "completed":
            return data
        elif data["status"] == "failed":
            raise Exception(f"Transcription failed: {data.get('error')}")

        time.sleep(interval)

result = wait_for_transcription(job["id"])

JavaScript (Node.js)

async function waitForTranscription(jobId, interval = 5000) {
  while (true) {
    const response = await fetch(`${BASE_URL}/transcriptions/${jobId}`, {
      headers: { Authorization: `Bearer ${API_KEY}` },
    });

    if (!response.ok) throw new Error(`Poll failed: ${response.status}`);
    const data = await response.json();

    if (data.status === "completed") return data;
    if (data.status === "failed") throw new Error(`Transcription failed: ${data.error}`);

    await new Promise((resolve) => setTimeout(resolve, interval));
  }
}

const result = await waitForTranscription(job.id);

Response (completed)

{
  "id": "txn_abc123",
  "status": "completed",
  "model": "standard",
  "language": "en",
  "duration_seconds": 1847,
  "created_at": "2026-04-06T10:30:00Z",
  "completed_at": "2026-04-06T10:37:12Z"
}

Get the Transcript

Once the job status is completed, retrieve the full transcript.

cURL

curl https://api.deepscript.com/v1/transcriptions/txn_abc123/result \
  -H "Authorization: Bearer $DEEPSCRIPT_API_KEY"

Python

def get_transcript(job_id):
    response = requests.get(
        f"{BASE_URL}/transcriptions/{job_id}/result",
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    response.raise_for_status()
    return response.json()

transcript = get_transcript(job["id"])
print(transcript["text"])

JavaScript (Node.js)

async function getTranscript(jobId) {
  const response = await fetch(`${BASE_URL}/transcriptions/${jobId}/result`, {
    headers: { Authorization: `Bearer ${API_KEY}` },
  });

  if (!response.ok) throw new Error(`Fetch failed: ${response.status}`);
  return response.json();
}

const transcript = await getTranscript(job.id);
console.log(transcript.text);

Response

{
  "text": "Welcome everyone to the quarterly review. Let's start with...",
  "segments": [
    {
      "start": 0.0,
      "end": 3.42,
      "text": "Welcome everyone to the quarterly review.",
      "speaker": "Speaker 1"
    },
    {
      "start": 3.42,
      "end": 5.87,
      "text": "Let's start with the financials.",
      "speaker": "Speaker 1"
    }
  ],
  "language": "en",
  "duration_seconds": 1847
}

The response includes both the full text and timestamped segments with speaker labels (when available). Use segments for subtitle generation, search indexing, or any use case that requires time alignment.

Models

DeepScript offers two transcription models:

Model	Use Case	Speed	Accuracy
`standard`	Clean audio, meetings, podcasts	Faster	High
`premium`	Noisy environments, accents, technical content	Slower	Highest

Pass the model parameter during upload. Default is standard.

The premium model is recommended for recordings with background noise, overlapping speakers, heavy accents, or domain-specific terminology.

Language Support

DeepScript supports 99 languages. Set language to a BCP-47 code (e.g., en, de, fr, ja) or use auto for automatic detection.

Auto-detection works well for single-language recordings. If you know the language in advance, specifying it explicitly can improve accuracy slightly.

Webhooks

Instead of polling, you can register a webhook URL to receive a notification when a transcription completes.

curl -X POST https://api.deepscript.com/v1/transcriptions \
  -H "Authorization: Bearer $DEEPSCRIPT_API_KEY" \
  -F "file=@recording.mp3" \
  -F "model=standard" \
  -F "webhook_url=https://yourapp.com/api/transcription-callback"

When the job finishes, DeepScript sends a POST request to your webhook URL with the job ID and status. Your server can then fetch the full result.

{
  "event": "transcription.completed",
  "job_id": "txn_abc123",
  "status": "completed"
}

Webhooks eliminate the need for polling and reduce unnecessary API calls, making them the preferred approach for production systems.

Custom Vocabulary

For domain-specific terms – product names, medical terminology, legal jargon, company-internal acronyms – you can pass a custom vocabulary list to improve recognition accuracy.

curl -X POST https://api.deepscript.com/v1/transcriptions \
  -H "Authorization: Bearer $DEEPSCRIPT_API_KEY" \
  -F "file=@recording.mp3" \
  -F "model=premium" \
  -F 'custom_vocabulary=["DeepScript", "Hetzner", "DSGVO", "Kubernetes"]'

The speech recognition engine will bias toward these terms when the audio is ambiguous. This is especially useful for proper nouns and technical terms that do not appear in general-purpose dictionaries.

Rate Limits and Pricing

API usage is billed based on audio duration. Check the pricing page for current rates. Rate limits depend on your plan – the API returns 429 Too Many Requests if you exceed them, along with a Retry-After header.

Privacy

All API processing happens on dedicated servers in Germany (Hetzner). No audio data is sent to third-party AI providers. Files are deleted after processing. DeepScript provides a Data Processing Agreement (AVV) for business customers.

This makes the API suitable for processing sensitive audio – legal recordings, medical dictations, HR interviews – where GDPR compliance is mandatory.

Next Steps

Read the full API reference at api.deepscript.com/docs
Generate your API key in the DeepScript dashboard
Explore the pricing plans for API usage
Contact support@deepscript.com for enterprise needs or custom integrations

APIspeech to textdeveloper guideREST APItranscription API

Speech-to-Text API for Developers: Getting Started with DeepScript

Speech-to-Text API for Developers: Getting Started with DeepScript

Authentication

Core Workflow

Upload a File

cURL

Python

JavaScript (Node.js)

Response

Poll for Status

cURL

Python

JavaScript (Node.js)

Response (completed)

Get the Transcript

cURL

Python

JavaScript (Node.js)

Response

Models

Language Support

Webhooks

Custom Vocabulary

Rate Limits and Pricing

Privacy

Next Steps

Gerelateerde artikelen

Building AI Agents with MCP and Transcription Data

Transcription API vs Self-Hosted Whisper: When to Choose Which

Giving AI Agents Access to Your Audio: Transcription via MCP

Zelf uitproberen?