Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions authors/rohit_m.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Author: Rohit M.
Title: Technical Writer and Builder
Description: Rohit writes practical, hands-on developer guides focused on clear setup steps, reproducible workflows, and useful troubleshooting notes. He enjoys turning messy tool setup into simple instructions that engineers can follow without guesswork.
Author Image: ![rohit-m](https://github.com/rohitmulani63-ops.png)
Author LinkedIn:
Author Twitter:
Company Name: Independent
Company Description: Independent contributor focused on practical developer tooling guides.
Company Logo Dark:
Company Logo White:
23 changes: 23 additions & 0 deletions definitions/20260619_definition_speech_to_text_transcription.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: "Speech-to-Text Transcription"
description: "The process of converting spoken audio into written text with software."
date: 2026-06-19
author: "Rohit M."
---

# Speech-to-Text Transcription

## Definition

Speech-to-text transcription is the process of converting spoken words from an
audio or video recording into written text. It can be performed by cloud APIs,
local machine-learning models, or hybrid workflows that extract audio first and
then send it to a transcription engine.

## Context and Usage

In development workflows, speech-to-text transcription is often used to turn
demos, meetings, interviews, support calls, lectures, and screen recordings into
searchable notes. Tools such as SAPAT combine media processing with provider
APIs so engineers can convert videos into transcript files from the command
line.
359 changes: 359 additions & 0 deletions guides/20260619_benchmark_sapat_transcription_providers_in_daytona.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,359 @@
---
title: "Benchmark SAPAT Transcription Providers in Daytona"
description: "Compare SAPAT transcriptions from OpenAI, Groq, and Azure OpenAI inside a repeatable Daytona workspace."
date: 2026-06-19
author: "Rohit M."
tags: ["daytona", "sapat", "transcription", "benchmarking", "ai"]
---

# Benchmark SAPAT Transcription Providers in Daytona

SAPAT is a Python command-line tool that turns video files into written
transcripts. It extracts audio with `ffmpeg`, sends that audio to a supported
speech-to-text provider, and writes a `.txt` transcript beside the input file.
The repository currently exposes three CLI provider choices: `openai`, `groq`,
and `azure`.

A simple setup guide is useful, but AI engineers usually need one step more:
they need to know which provider is best for their recordings. One provider may
be faster for short demos. Another may handle accents or noisy audio better. A
team already on Azure may care more about operational fit and data governance
than raw speed. This guide shows how to benchmark SAPAT providers inside a
Daytona workspace so the comparison is repeatable, safe, and easy to review.

The workflow below uses the same source clip, the same prompt, and the same
quality setting across providers. You will produce separate transcripts, inspect
basic quality signals, and fill in a lightweight scorecard. The goal is not to
claim that one provider is always best. The goal is to create a clean Daytona
workflow that lets your team decide using its own audio and requirements.

## TL;DR

- Create a Daytona workspace from the SAPAT repository.
- Install `ffmpeg`, Python dependencies, and the SAPAT wheel.
- Configure only the provider keys you plan to test in `.env`.
- Run the same `.mp4` through `openai`, `groq`, and `azure` where available.
- Save each transcript separately so results do not overwrite each other.
- Compare accuracy, terminology, latency, setup effort, and operational fit.
- Keep API keys, private recordings, and generated transcripts out of Git.

## Materials checklist

You need the following before starting:

- Daytona installed and connected to your preferred IDE.
- Python available in the Daytona workspace.
- `ffmpeg` installed inside the workspace.
- One short `.mp4` sample that is safe to use for testing.
- API access for at least one of OpenAI, Groq, or Azure OpenAI.
- A basic understanding of [APIs](../definitions/20241212_definition_api.md),
[environment variables](../definitions/20241126_definition_environment_variables.md),
and [speech-to-text transcription](../definitions/20260619_definition_speech_to_text_transcription.md).

## Benchmark workflow overview

![SAPAT provider benchmark workflow](assets/20260619_benchmark_sapat_transcription_providers_in_daytona_workflow.svg)

The benchmark has four stages: prepare one clean input clip, run SAPAT with each
provider, save each transcript under a provider-specific name, and review the
outputs with the same scorecard.

| Stage | What you do | Why it matters |
| --- | --- | --- |
| Prepare | Choose one representative `.mp4` file. | Every provider receives the same input. |
| Run | Use SAPAT with one `--api` value at a time. | Each transcript has a clear source. |
| Preserve | Rename outputs into a `transcripts/` folder. | Results do not overwrite each other. |
| Review | Compare quality, speed, and setup effort. | The final choice is evidence-based. |

## Step 1: Create the Daytona workspace

Create a workspace from the current SAPAT repository:

```bash
daytona create https://github.com/nibzard/sapat --code
```

Some older references use `https://github.com/nkkko/sapat`. GitHub currently
redirects that repository to `nibzard/sapat`, so this guide uses the current
repository URL directly.

When Daytona opens the project, stay at the repository root. You should see
files such as `README.md`, `requirements.txt`, `.env.example`, and the
`src/sapat` package directory.

## Step 2: Install dependencies

SAPAT needs `ffmpeg` to extract audio from video files before transcription. In
a Debian or Ubuntu-based Daytona workspace, install it with:

```bash
sudo apt-get update
sudo apt-get install -y ffmpeg
```

Confirm that it is available:

```bash
ffmpeg -version
```

Install the Python dependencies and build the package:

```bash
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m build
python -m pip install dist/sapat-*.whl
```

If your shell does not expand `dist/sapat-*.whl`, list the `dist` folder and
install the exact wheel filename.

## Step 3: Configure only the providers you will test

Copy the environment template:

```bash
cp .env.example .env
```

Then add only the credentials you need. Do not paste every possible key into the
file. For OpenAI, use:

```bash
OPENAI_API_KEY=your_openai_api_key_here
```

For Groq, use:

```bash
GROQ_API_KEY=your_groq_api_key_here
```

For Azure OpenAI, use:

```bash
AZURE_OPENAI_API_KEY=your_azure_api_key_here
AZURE_OPENAI_ENDPOINT=https://DEPLOYMENTENDPOINTNAME.openai.azure.com
AZURE_OPENAI_STT_MODEL_NAME=whisper
AZURE_OPENAI_STT_API_VERSION=2024-06-01
```

SAPAT's `.env.example` lists many possible provider names, but the current CLI
path should be treated as `openai`, `groq`, and `azure` unless new provider code
has been added. This distinction keeps your guide accurate and avoids promising
support that is not active in the CLI yet.

**Note:** Never commit `.env`. Keep real keys, private recordings, generated
transcripts, and payout details out of the content repository.

## Step 4: Prepare a repeatable sample folder

Use one short test video that represents the type of recording your team cares
about. For example, a product demo, design review, support clip, or narrated
screen recording.

Create folders for inputs and outputs:

```bash
mkdir -p samples transcripts
```

Copy your test file into `samples`:

```bash
cp ~/Downloads/demo.mp4 samples/demo.mp4
```

If your original file has a private or customer-specific name, rename it before
running the benchmark.

## Step 5: Run the OpenAI benchmark

Run SAPAT with the OpenAI provider:

```bash
sapat samples/demo.mp4 \
--api openai \
--quality M \
--language en \
--prompt "The recording discusses Daytona workspaces, SAPAT, ffmpeg, provider APIs, and speech-to-text transcription." \
--temperature 0.2
```

Move the generated transcript to a provider-specific filename:

```bash
mv samples/demo.txt transcripts/demo_openai.txt
```

Record your observed runtime manually. A simple note is enough:

```bash
date >> transcripts/benchmark_notes.txt
echo "openai: completed" >> transcripts/benchmark_notes.txt
```

## Step 6: Run the Groq benchmark

Run the same source file with Groq:

```bash
sapat samples/demo.mp4 \
--api groq \
--quality M \
--language en \
--prompt "The recording discusses Daytona workspaces, SAPAT, ffmpeg, provider APIs, and speech-to-text transcription." \
--temperature 0.2
```

Save the transcript separately:

```bash
mv samples/demo.txt transcripts/demo_groq.txt
```

Use the same prompt, quality setting, and language code wherever possible. That
makes the comparison fairer.

## Step 7: Run the Azure OpenAI benchmark

If your workspace is configured for Azure OpenAI, run:

```bash
sapat samples/demo.mp4 \
--api azure \
--quality M \
--language en \
--prompt "The recording discusses Daytona workspaces, SAPAT, ffmpeg, provider APIs, and speech-to-text transcription." \
--temperature 0.2
```

Save that output too:

```bash
mv samples/demo.txt transcripts/demo_azure.txt
```

Azure setup is usually more sensitive to deployment names, endpoint format, and
API version. If Azure fails while OpenAI or Groq works, check the deployment
configuration before changing the SAPAT command.

## Step 8: Compare outputs with a scorecard

Create a small scorecard file:

```bash
cat > transcripts/scorecard.md <<'EOF'
# SAPAT Provider Benchmark Scorecard

| Provider | Setup effort | Terminology accuracy | Speaker/name handling | Formatting cleanup needed | Runtime notes | Best fit |
| --- | --- | --- | --- | --- | --- | --- |
| OpenAI | | | | | | |
| Groq | | | | | | |
| Azure OpenAI | | | | | | |
EOF
```

Then inspect the transcript starts:

```bash
sed -n '1,40p' transcripts/demo_openai.txt
sed -n '1,40p' transcripts/demo_groq.txt
sed -n '1,40p' transcripts/demo_azure.txt
```

Look for practical quality signals:

- Did the provider spell product names correctly?
- Did it preserve technical terms such as Daytona, SAPAT, `ffmpeg`, and API?
- Did it hallucinate section breaks, names, or actions?
- Did it handle pauses, accents, and background noise well enough?
- How much manual cleanup would be needed before publication?
- Was the provider easy to configure in the Daytona workspace?

For a quick size comparison, count words:

```bash
wc -w transcripts/demo_*.txt
```

A much shorter transcript can mean missed speech. A much longer transcript can
mean repeated phrases, filler, or extra model cleanup text. Always inspect the
actual text before making a decision.

## Step 9: Choose a provider by workflow, not hype

A useful provider choice depends on the job:

| Use case | What to prioritize | What to test |
| --- | --- | --- |
| Internal demo notes | Speed and low cleanup effort. | Short engineering demos. |
| Customer research | Accuracy and privacy controls. | Realistic call audio with safe test data. |
| Developer tutorials | Technical vocabulary and formatting. | Screen recordings with product names. |
| Enterprise workflows | Governance and existing cloud setup. | Azure deployment and access policies. |
| Batch processing | Reliability over many files. | A folder of short `.mp4` clips. |

This is why Daytona is valuable for the benchmark. You can keep the same input,
same commands, same dependencies, and same review file in one workspace. Another
teammate can repeat the benchmark without guessing which packages or environment
variables were used.

## Common issues and troubleshooting

**Problem:** `ffmpeg` is not found.

**Solution:** Install it inside the Daytona workspace and confirm with
`ffmpeg -version`. Installing it only on your host machine may not help the
workspace.

**Problem:** SAPAT says the API choice is unsupported.

**Solution:** Use the current CLI choices: `openai`, `groq`, or `azure`. Extra
provider names in `.env.example` need matching implementation before they are
safe to document as active CLI options.

**Problem:** Authentication fails.

**Solution:** Check the matching key in `.env`. For Azure OpenAI, also verify
the endpoint, speech-to-text deployment name, and API version.

**Problem:** The second provider overwrites the first transcript.

**Solution:** Move `samples/demo.txt` into `transcripts/` after every run and
rename it with the provider name before starting the next run.

**Problem:** The transcript misses technical terms.

**Solution:** Add a focused `--prompt` with expected terms. Keep the prompt short
and factual. Do not include private information or secrets.

**Problem:** A provider is faster but less accurate.

**Solution:** Use the scorecard. Fast output is valuable for rough internal
notes, but publication-quality transcripts may need the provider with fewer
technical mistakes.

## Conclusion

You now have a repeatable benchmark for SAPAT transcription providers inside a
Daytona workspace. Instead of choosing a provider by reputation, you can test
OpenAI, Groq, and Azure OpenAI against the same source clip, preserve each
transcript, and compare results with a practical scorecard.

This workflow is intentionally conservative. It documents only the provider
choices currently exposed by SAPAT's CLI, keeps secrets in `.env`, avoids
committing private recordings, and gives reviewers a clear way to reproduce the
comparison. If SAPAT adds more provider implementations later, the same benchmark
structure can be reused: add a new provider row, run the same clip, and compare
the output against the existing transcripts.

## References

- [Daytona content issue #13: AI Transcription Tool](https://github.com/daytonaio/content/issues/13)
- [Daytona content contribution guide](https://github.com/daytonaio/content/blob/main/CONTRIBUTING.md)
- [Daytona guide template](https://github.com/daytonaio/content/blob/main/guides/YYYYMMDD_guide_template.md)
- [SAPAT repository](https://github.com/nibzard/sapat)
- [SAPAT README](https://github.com/nibzard/sapat/blob/main/README.md)
- [SAPAT CLI source](https://github.com/nibzard/sapat/blob/main/src/sapat/script.py)
- [SAPAT environment example](https://github.com/nibzard/sapat/blob/main/.env.example)
Loading