I built a free, local Speech-to-Text tool and so can you

three crumpled yellow papers on green surface surrounded by yellow lined papers — Photo by Volodymyr Hryshchenko / Unsplash

Last month, I started using a speech-to-text service. I’d never used one before, but I got hooked almost immediately. It was fast, accurate, and made writing feel effortless. For a moment, it felt like a productivity cheat code.

Then the free trial ran out.

Turns out the full version costs $250. Which is… what I spend on groceries in a month. I’m not saying speech-to-text isn’t great, but at that price I half-expected it to also cook dinner and tell me everything would be okay.

So I built my own.

The Problem

As a data engineer, I type a lot. Meeting notes, documentation, quick messages, random thoughts I’ll forget in five minutes. I tried several speech-to-text tools, but they all shared the same issues:

Subscription costs that add up fast
Internet dependency - no Wi-Fi, no transcription
Privacy concerns - my voice recordings going to third-party servers

What I wanted was simple: hold a key, speak, release, and have the text ready to paste. No accounts. No cloud. No monthly fees.

The Solution

I built Speech2Text using OpenAI’s Whisper model, running entirely on your local machine.

What it does:

Hold Ctrl + Space → speak → release → text is copied to your clipboard
Runs quietly in the background with a system tray icon
Works fully offline - no internet required
100% free and open source

The transcription quality is surprisingly good. Whisper is the same model behind many commercial services, the difference is that here, you run it.

Why Local AI Matters

We’re at an interesting moment where genuinely powerful AI models can run on regular laptops. You don’t need a subscription, an API key, or someone else’s servers. Your data stays on your machine, under your control.

Local AI flips the usual trade-off on its head. Instead of paying with money and privacy, you pay in disk space and processing power. No telemetry, no silent uploads, no “we may use your data to improve our services.”

For speech-to-text specifically, this means:

Your voice never leaves your computer
No one is training models on your private conversations
It keeps working when your internet goes down

It’s not only about privacy, it’s also about resilience and ownership.

Try It Yourself

The project is open source on GitHub:
https://github.com/AdrienSourdilleTIL/Speech_2_Text_4_free.git

It requires Python and FFmpeg, but setup takes about five minutes. If you’re comfortable with the command line, you can have it running quickly.

The first time you launch it, it downloads the Whisper model (~1.5 GB). After that, it starts in seconds and lives quietly in your system tray.

What I Learned

Building this reminded me that sometimes the best tool is the one you make yourself. It’s not perfect; the initial load time is slow, it’s Windows-only, and it’s limited to English for now. But it does exactly what I need, and nothing more.

If you’re frustrated with existing tools, consider building your own. The barrier is lower than you think.

Author:

Adrien Sourdille

View Profile