From ca16f767f43c1fb4a0fe7412f348089124f3962e Mon Sep 17 00:00:00 2001
From: "Rohit M." <rohitmulani63-ops@users.noreply.github.com>
Date: Fri, 19 Jun 2026 19:54:12 +0400
Subject: [PATCH] Add SAPAT provider benchmarking guide

---
 authors/rohit_m.md                            |  10 +
 ...definition_speech_to_text_transcription.md |  23 ++
 ...apat_transcription_providers_in_daytona.md | 359 ++++++++++++++++++
 ...cription_providers_in_daytona_workflow.svg |  37 ++
 4 files changed, 429 insertions(+)
 create mode 100644 authors/rohit_m.md
 create mode 100644 definitions/20260619_definition_speech_to_text_transcription.md
 create mode 100644 guides/20260619_benchmark_sapat_transcription_providers_in_daytona.md
 create mode 100644 guides/assets/20260619_benchmark_sapat_transcription_providers_in_daytona_workflow.svg

diff --git a/authors/rohit_m.md b/authors/rohit_m.md
new file mode 100644
index 00000000..22d3e76d
--- /dev/null
+++ b/authors/rohit_m.md
@@ -0,0 +1,10 @@
+﻿Author: Rohit M.
+Title: Technical Writer and Builder
+Description: Rohit writes practical, hands-on developer guides focused on clear setup steps, reproducible workflows, and useful troubleshooting notes. He enjoys turning messy tool setup into simple instructions that engineers can follow without guesswork.
+Author Image: ![rohit-m](https://github.com/rohitmulani63-ops.png)
+Author LinkedIn:
+Author Twitter:
+Company Name: Independent
+Company Description: Independent contributor focused on practical developer tooling guides.
+Company Logo Dark:
+Company Logo White:
\ No newline at end of file
diff --git a/definitions/20260619_definition_speech_to_text_transcription.md b/definitions/20260619_definition_speech_to_text_transcription.md
new file mode 100644
index 00000000..c9db636a
--- /dev/null
+++ b/definitions/20260619_definition_speech_to_text_transcription.md
@@ -0,0 +1,23 @@
+﻿---
+title: "Speech-to-Text Transcription"
+description: "The process of converting spoken audio into written text with software."
+date: 2026-06-19
+author: "Rohit M."
+---
+
+# Speech-to-Text Transcription
+
+## Definition
+
+Speech-to-text transcription is the process of converting spoken words from an
+audio or video recording into written text. It can be performed by cloud APIs,
+local machine-learning models, or hybrid workflows that extract audio first and
+then send it to a transcription engine.
+
+## Context and Usage
+
+In development workflows, speech-to-text transcription is often used to turn
+demos, meetings, interviews, support calls, lectures, and screen recordings into
+searchable notes. Tools such as SAPAT combine media processing with provider
+APIs so engineers can convert videos into transcript files from the command
+line.
diff --git a/guides/20260619_benchmark_sapat_transcription_providers_in_daytona.md b/guides/20260619_benchmark_sapat_transcription_providers_in_daytona.md
new file mode 100644
index 00000000..8c2ad19a
--- /dev/null
+++ b/guides/20260619_benchmark_sapat_transcription_providers_in_daytona.md
@@ -0,0 +1,359 @@
+﻿---
+title: "Benchmark SAPAT Transcription Providers in Daytona"
+description: "Compare SAPAT transcriptions from OpenAI, Groq, and Azure OpenAI inside a repeatable Daytona workspace."
+date: 2026-06-19
+author: "Rohit M."
+tags: ["daytona", "sapat", "transcription", "benchmarking", "ai"]
+---
+
+# Benchmark SAPAT Transcription Providers in Daytona
+
+SAPAT is a Python command-line tool that turns video files into written
+transcripts. It extracts audio with `ffmpeg`, sends that audio to a supported
+speech-to-text provider, and writes a `.txt` transcript beside the input file.
+The repository currently exposes three CLI provider choices: `openai`, `groq`,
+and `azure`.
+
+A simple setup guide is useful, but AI engineers usually need one step more:
+they need to know which provider is best for their recordings. One provider may
+be faster for short demos. Another may handle accents or noisy audio better. A
+team already on Azure may care more about operational fit and data governance
+than raw speed. This guide shows how to benchmark SAPAT providers inside a
+Daytona workspace so the comparison is repeatable, safe, and easy to review.
+
+The workflow below uses the same source clip, the same prompt, and the same
+quality setting across providers. You will produce separate transcripts, inspect
+basic quality signals, and fill in a lightweight scorecard. The goal is not to
+claim that one provider is always best. The goal is to create a clean Daytona
+workflow that lets your team decide using its own audio and requirements.
+
+## TL;DR
+
+- Create a Daytona workspace from the SAPAT repository.
+- Install `ffmpeg`, Python dependencies, and the SAPAT wheel.
+- Configure only the provider keys you plan to test in `.env`.
+- Run the same `.mp4` through `openai`, `groq`, and `azure` where available.
+- Save each transcript separately so results do not overwrite each other.
+- Compare accuracy, terminology, latency, setup effort, and operational fit.
+- Keep API keys, private recordings, and generated transcripts out of Git.
+
+## Materials checklist
+
+You need the following before starting:
+
+- Daytona installed and connected to your preferred IDE.
+- Python available in the Daytona workspace.
+- `ffmpeg` installed inside the workspace.
+- One short `.mp4` sample that is safe to use for testing.
+- API access for at least one of OpenAI, Groq, or Azure OpenAI.
+- A basic understanding of [APIs](../definitions/20241212_definition_api.md),
+  [environment variables](../definitions/20241126_definition_environment_variables.md),
+  and [speech-to-text transcription](../definitions/20260619_definition_speech_to_text_transcription.md).
+
+## Benchmark workflow overview
+
+![SAPAT provider benchmark workflow](assets/20260619_benchmark_sapat_transcription_providers_in_daytona_workflow.svg)
+
+The benchmark has four stages: prepare one clean input clip, run SAPAT with each
+provider, save each transcript under a provider-specific name, and review the
+outputs with the same scorecard.
+
+| Stage | What you do | Why it matters |
+| --- | --- | --- |
+| Prepare | Choose one representative `.mp4` file. | Every provider receives the same input. |
+| Run | Use SAPAT with one `--api` value at a time. | Each transcript has a clear source. |
+| Preserve | Rename outputs into a `transcripts/` folder. | Results do not overwrite each other. |
+| Review | Compare quality, speed, and setup effort. | The final choice is evidence-based. |
+
+## Step 1: Create the Daytona workspace
+
+Create a workspace from the current SAPAT repository:
+
+```bash
+daytona create https://github.com/nibzard/sapat --code
+```
+
+Some older references use `https://github.com/nkkko/sapat`. GitHub currently
+redirects that repository to `nibzard/sapat`, so this guide uses the current
+repository URL directly.
+
+When Daytona opens the project, stay at the repository root. You should see
+files such as `README.md`, `requirements.txt`, `.env.example`, and the
+`src/sapat` package directory.
+
+## Step 2: Install dependencies
+
+SAPAT needs `ffmpeg` to extract audio from video files before transcription. In
+a Debian or Ubuntu-based Daytona workspace, install it with:
+
+```bash
+sudo apt-get update
+sudo apt-get install -y ffmpeg
+```
+
+Confirm that it is available:
+
+```bash
+ffmpeg -version
+```
+
+Install the Python dependencies and build the package:
+
+```bash
+python -m pip install --upgrade pip
+python -m pip install -r requirements.txt
+python -m build
+python -m pip install dist/sapat-*.whl
+```
+
+If your shell does not expand `dist/sapat-*.whl`, list the `dist` folder and
+install the exact wheel filename.
+
+## Step 3: Configure only the providers you will test
+
+Copy the environment template:
+
+```bash
+cp .env.example .env
+```
+
+Then add only the credentials you need. Do not paste every possible key into the
+file. For OpenAI, use:
+
+```bash
+OPENAI_API_KEY=your_openai_api_key_here
+```
+
+For Groq, use:
+
+```bash
+GROQ_API_KEY=your_groq_api_key_here
+```
+
+For Azure OpenAI, use:
+
+```bash
+AZURE_OPENAI_API_KEY=your_azure_api_key_here
+AZURE_OPENAI_ENDPOINT=https://DEPLOYMENTENDPOINTNAME.openai.azure.com
+AZURE_OPENAI_STT_MODEL_NAME=whisper
+AZURE_OPENAI_STT_API_VERSION=2024-06-01
+```
+
+SAPAT's `.env.example` lists many possible provider names, but the current CLI
+path should be treated as `openai`, `groq`, and `azure` unless new provider code
+has been added. This distinction keeps your guide accurate and avoids promising
+support that is not active in the CLI yet.
+
+**Note:** Never commit `.env`. Keep real keys, private recordings, generated
+transcripts, and payout details out of the content repository.
+
+## Step 4: Prepare a repeatable sample folder
+
+Use one short test video that represents the type of recording your team cares
+about. For example, a product demo, design review, support clip, or narrated
+screen recording.
+
+Create folders for inputs and outputs:
+
+```bash
+mkdir -p samples transcripts
+```
+
+Copy your test file into `samples`:
+
+```bash
+cp ~/Downloads/demo.mp4 samples/demo.mp4
+```
+
+If your original file has a private or customer-specific name, rename it before
+running the benchmark.
+
+## Step 5: Run the OpenAI benchmark
+
+Run SAPAT with the OpenAI provider:
+
+```bash
+sapat samples/demo.mp4 \
+  --api openai \
+  --quality M \
+  --language en \
+  --prompt "The recording discusses Daytona workspaces, SAPAT, ffmpeg, provider APIs, and speech-to-text transcription." \
+  --temperature 0.2
+```
+
+Move the generated transcript to a provider-specific filename:
+
+```bash
+mv samples/demo.txt transcripts/demo_openai.txt
+```
+
+Record your observed runtime manually. A simple note is enough:
+
+```bash
+date >> transcripts/benchmark_notes.txt
+echo "openai: completed" >> transcripts/benchmark_notes.txt
+```
+
+## Step 6: Run the Groq benchmark
+
+Run the same source file with Groq:
+
+```bash
+sapat samples/demo.mp4 \
+  --api groq \
+  --quality M \
+  --language en \
+  --prompt "The recording discusses Daytona workspaces, SAPAT, ffmpeg, provider APIs, and speech-to-text transcription." \
+  --temperature 0.2
+```
+
+Save the transcript separately:
+
+```bash
+mv samples/demo.txt transcripts/demo_groq.txt
+```
+
+Use the same prompt, quality setting, and language code wherever possible. That
+makes the comparison fairer.
+
+## Step 7: Run the Azure OpenAI benchmark
+
+If your workspace is configured for Azure OpenAI, run:
+
+```bash
+sapat samples/demo.mp4 \
+  --api azure \
+  --quality M \
+  --language en \
+  --prompt "The recording discusses Daytona workspaces, SAPAT, ffmpeg, provider APIs, and speech-to-text transcription." \
+  --temperature 0.2
+```
+
+Save that output too:
+
+```bash
+mv samples/demo.txt transcripts/demo_azure.txt
+```
+
+Azure setup is usually more sensitive to deployment names, endpoint format, and
+API version. If Azure fails while OpenAI or Groq works, check the deployment
+configuration before changing the SAPAT command.
+
+## Step 8: Compare outputs with a scorecard
+
+Create a small scorecard file:
+
+```bash
+cat > transcripts/scorecard.md <<'EOF'
+# SAPAT Provider Benchmark Scorecard
+
+| Provider | Setup effort | Terminology accuracy | Speaker/name handling | Formatting cleanup needed | Runtime notes | Best fit |
+| --- | --- | --- | --- | --- | --- | --- |
+| OpenAI |  |  |  |  |  |  |
+| Groq |  |  |  |  |  |  |
+| Azure OpenAI |  |  |  |  |  |  |
+EOF
+```
+
+Then inspect the transcript starts:
+
+```bash
+sed -n '1,40p' transcripts/demo_openai.txt
+sed -n '1,40p' transcripts/demo_groq.txt
+sed -n '1,40p' transcripts/demo_azure.txt
+```
+
+Look for practical quality signals:
+
+- Did the provider spell product names correctly?
+- Did it preserve technical terms such as Daytona, SAPAT, `ffmpeg`, and API?
+- Did it hallucinate section breaks, names, or actions?
+- Did it handle pauses, accents, and background noise well enough?
+- How much manual cleanup would be needed before publication?
+- Was the provider easy to configure in the Daytona workspace?
+
+For a quick size comparison, count words:
+
+```bash
+wc -w transcripts/demo_*.txt
+```
+
+A much shorter transcript can mean missed speech. A much longer transcript can
+mean repeated phrases, filler, or extra model cleanup text. Always inspect the
+actual text before making a decision.
+
+## Step 9: Choose a provider by workflow, not hype
+
+A useful provider choice depends on the job:
+
+| Use case | What to prioritize | What to test |
+| --- | --- | --- |
+| Internal demo notes | Speed and low cleanup effort. | Short engineering demos. |
+| Customer research | Accuracy and privacy controls. | Realistic call audio with safe test data. |
+| Developer tutorials | Technical vocabulary and formatting. | Screen recordings with product names. |
+| Enterprise workflows | Governance and existing cloud setup. | Azure deployment and access policies. |
+| Batch processing | Reliability over many files. | A folder of short `.mp4` clips. |
+
+This is why Daytona is valuable for the benchmark. You can keep the same input,
+same commands, same dependencies, and same review file in one workspace. Another
+teammate can repeat the benchmark without guessing which packages or environment
+variables were used.
+
+## Common issues and troubleshooting
+
+**Problem:** `ffmpeg` is not found.
+
+**Solution:** Install it inside the Daytona workspace and confirm with
+`ffmpeg -version`. Installing it only on your host machine may not help the
+workspace.
+
+**Problem:** SAPAT says the API choice is unsupported.
+
+**Solution:** Use the current CLI choices: `openai`, `groq`, or `azure`. Extra
+provider names in `.env.example` need matching implementation before they are
+safe to document as active CLI options.
+
+**Problem:** Authentication fails.
+
+**Solution:** Check the matching key in `.env`. For Azure OpenAI, also verify
+the endpoint, speech-to-text deployment name, and API version.
+
+**Problem:** The second provider overwrites the first transcript.
+
+**Solution:** Move `samples/demo.txt` into `transcripts/` after every run and
+rename it with the provider name before starting the next run.
+
+**Problem:** The transcript misses technical terms.
+
+**Solution:** Add a focused `--prompt` with expected terms. Keep the prompt short
+and factual. Do not include private information or secrets.
+
+**Problem:** A provider is faster but less accurate.
+
+**Solution:** Use the scorecard. Fast output is valuable for rough internal
+notes, but publication-quality transcripts may need the provider with fewer
+technical mistakes.
+
+## Conclusion
+
+You now have a repeatable benchmark for SAPAT transcription providers inside a
+Daytona workspace. Instead of choosing a provider by reputation, you can test
+OpenAI, Groq, and Azure OpenAI against the same source clip, preserve each
+transcript, and compare results with a practical scorecard.
+
+This workflow is intentionally conservative. It documents only the provider
+choices currently exposed by SAPAT's CLI, keeps secrets in `.env`, avoids
+committing private recordings, and gives reviewers a clear way to reproduce the
+comparison. If SAPAT adds more provider implementations later, the same benchmark
+structure can be reused: add a new provider row, run the same clip, and compare
+the output against the existing transcripts.
+
+## References
+
+- [Daytona content issue #13: AI Transcription Tool](https://github.com/daytonaio/content/issues/13)
+- [Daytona content contribution guide](https://github.com/daytonaio/content/blob/main/CONTRIBUTING.md)
+- [Daytona guide template](https://github.com/daytonaio/content/blob/main/guides/YYYYMMDD_guide_template.md)
+- [SAPAT repository](https://github.com/nibzard/sapat)
+- [SAPAT README](https://github.com/nibzard/sapat/blob/main/README.md)
+- [SAPAT CLI source](https://github.com/nibzard/sapat/blob/main/src/sapat/script.py)
+- [SAPAT environment example](https://github.com/nibzard/sapat/blob/main/.env.example)
diff --git a/guides/assets/20260619_benchmark_sapat_transcription_providers_in_daytona_workflow.svg b/guides/assets/20260619_benchmark_sapat_transcription_providers_in_daytona_workflow.svg
new file mode 100644
index 00000000..ac05dd93
--- /dev/null
+++ b/guides/assets/20260619_benchmark_sapat_transcription_providers_in_daytona_workflow.svg
@@ -0,0 +1,37 @@
+﻿<svg xmlns="http://www.w3.org/2000/svg" width="1280" height="720" viewBox="0 0 1280 720" role="img" aria-labelledby="title desc">
+  <title id="title">SAPAT provider benchmark workflow in Daytona</title>
+  <desc id="desc">One video clip is transcribed through multiple SAPAT providers in a Daytona workspace, then compared with a shared scorecard.</desc>
+  <defs>
+    <linearGradient id="bg" x1="0" x2="1" y1="0" y2="1">
+      <stop offset="0" stop-color="#f7fbff" />
+      <stop offset="1" stop-color="#e0f0ec" />
+    </linearGradient>
+    <marker id="arrow" markerWidth="12" markerHeight="12" refX="10" refY="6" orient="auto">
+      <path d="M2 2 L10 6 L2 10 Z" fill="#24786a" />
+    </marker>
+  </defs>
+  <rect width="1280" height="720" fill="url(#bg)" />
+  <text x="70" y="82" font-family="Segoe UI, Arial, sans-serif" font-size="42" font-weight="700" fill="#1d3340">Benchmark SAPAT transcription providers in Daytona</text>
+  <text x="72" y="124" font-family="Segoe UI, Arial, sans-serif" font-size="20" fill="#49626e">Same video, same prompt, separate transcripts, one scorecard.</text>
+  <g font-family="Segoe UI, Arial, sans-serif">
+    <rect x="80" y="205" width="230" height="140" rx="16" fill="#ffffff" stroke="#79a69e" stroke-width="3" />
+    <text x="110" y="262" font-size="27" font-weight="700" fill="#1d3340">Input clip</text>
+    <text x="110" y="302" font-size="18" fill="#52656e">samples/demo.mp4</text>
+    <path d="M330 275 H390" stroke="#24786a" stroke-width="5" marker-end="url(#arrow)" />
+    <rect x="410" y="170" width="260" height="210" rx="16" fill="#ffffff" stroke="#79a69e" stroke-width="3" />
+    <text x="445" y="220" font-size="27" font-weight="700" fill="#1d3340">SAPAT runs</text>
+    <text x="445" y="262" font-size="19" fill="#52656e">--api openai</text>
+    <text x="445" y="296" font-size="19" fill="#52656e">--api groq</text>
+    <text x="445" y="330" font-size="19" fill="#52656e">--api azure</text>
+    <path d="M690 275 H750" stroke="#24786a" stroke-width="5" marker-end="url(#arrow)" />
+    <rect x="770" y="205" width="230" height="140" rx="16" fill="#ffffff" stroke="#79a69e" stroke-width="3" />
+    <text x="802" y="262" font-size="27" font-weight="700" fill="#1d3340">Transcripts</text>
+    <text x="802" y="302" font-size="18" fill="#52656e">provider-named .txt</text>
+    <path d="M1020 275 H1080" stroke="#24786a" stroke-width="5" marker-end="url(#arrow)" />
+    <rect x="1100" y="205" width="120" height="140" rx="16" fill="#ffffff" stroke="#79a69e" stroke-width="3" />
+    <text x="1122" y="262" font-size="24" font-weight="700" fill="#1d3340">Score</text>
+    <text x="1122" y="302" font-size="18" fill="#52656e">choose</text>
+    <rect x="130" y="445" width="1020" height="92" rx="18" fill="#ffffff" stroke="#c3d7d2" stroke-width="2" />
+    <text x="166" y="500" font-size="24" fill="#334b55">Keep .env, private recordings, generated transcripts, and payout details out of Git.</text>
+  </g>
+</svg>