Last month, I started using a speech-to-text service. I’d never used one before, but I got hooked almost immediately. It was fast, accurate, and made writing feel effortless. For a moment, it felt like a productivity cheat code.
Then the free trial ran out.
Turns out the full version costs $250. Which is… what I spend on groceries in a month. I’m not saying speech-to-text isn’t great, but at that price I half-expected it to also cook dinner and tell me everything would be okay.
So I built my own.
The Problem
As a data engineer, I type a lot. Meeting notes, documentation, quick messages, random thoughts I’ll forget in five minutes. I tried several speech-to-text tools, but they all shared the same issues:
- Subscription costs that add up fast
- Internet dependency - no Wi-Fi, no transcription
- Privacy concerns - my voice recordings going to third-party servers
What I wanted was simple: hold a key, speak, release, and have the text ready to paste. No accounts. No cloud. No monthly fees.
The Solution
I built Speech2Text using OpenAI’s Whisper model, running entirely on your local machine.
What it does:
- Hold Ctrl + Space → speak → release → text is copied to your clipboard
- Runs quietly in the background with a system tray icon
- Works fully offline - no internet required
- 100% free and open source
The transcription quality is surprisingly good. Whisper is the same model behind many commercial services, the difference is that here, you run it.
Why Local AI Matters
We’re at an interesting moment where genuinely powerful AI models can run on regular laptops. You don’t need a subscription, an API key, or someone else’s servers. Your data stays on your machine, under your control.
Local AI flips the usual trade-off on its head. Instead of paying with money and privacy, you pay in disk space and processing power. No telemetry, no silent uploads, no “we may use your data to improve our services.”
For speech-to-text specifically, this means:
- Your voice never leaves your computer
- No one is training models on your private conversations
- It keeps working when your internet goes down
It’s not only about privacy, it’s also about resilience and ownership.
Try It Yourself
The project is open source on GitHub:
https://github.com/AdrienSourdilleTIL/Speech_2_Text_4_free.git
It requires Python and FFmpeg, but setup takes about five minutes. If you’re comfortable with the command line, you can have it running quickly.
The first time you launch it, it downloads the Whisper model (~1.5 GB). After that, it starts in seconds and lives quietly in your system tray.
What I Learned
Building this reminded me that sometimes the best tool is the one you make yourself. It’s not perfect; the initial load time is slow, it’s Windows-only, and it’s limited to English for now. But it does exactly what I need, and nothing more.
If you’re frustrated with existing tools, consider building your own. The barrier is lower than you think.
