Convert Audio to Text

Upload any common audio format and get an accurate, editable transcript. Free with no sign-up — export TXT, SRT, or VTT.

No sign-up No watermark TXT · SRT · VTT exports Files auto-delete in 24h

Drag & drop your file here

or browse your files

Free: {0} files a day · up to {1} min & {2} MB each

Spoken language

Bigger files or more uploads? Free account: 5 files/day, 1-hour files · Pro: 10-hour files + speaker labels

MP3WAVM4AFLACOGGAACWMAOpus

Every common audio format, one transcriber

This page accepts the audio formats people actually have: MP3 and AAC from podcasts and downloads, WAV and AIFF from recording gear, M4A from iPhone and Android voice memo apps, FLAC from archival rips, OGG and Opus from messaging apps, and WMA from older Windows recorders. There is no separate converter step — the transcriber decodes the file itself, so you never need to convert WAV to MP3 first.

Language is auto-detected across 90+ languages, or you can pin it in the options if your recording mixes languages and detection guesses wrong.

From recording to usable text

Once processed, the transcript opens in an editor with per-sentence timestamps. Click any line to correct it, search for a phrase, then export: TXT for documents and notes, SRT or VTT if the audio belongs under a video. Signed-in users also get DOCX and PDF for sharing polished documents.

Frequently asked questions

Which audio formats are supported?

MP3, WAV, M4A, FLAC, OGG, AAC, WMA, Opus, AIFF, and AMR. If your recorder produces something rarer, converting to WAV or MP3 first with any free converter will always work — but try the original first; the decoder covers most containers.

Will noisy or low-quality recordings work?

Usually, with caveats. The model is trained on a lot of imperfect real-world audio, so hiss, room echo, and compression artifacts rarely break it. What genuinely hurts accuracy is speech that is quiet relative to the noise — a recorder far from the speaker, or music louder than the voice. If you can, record closer to the source; the editor makes fixing the remaining errors quick.

Can it handle multiple speakers on one track?

Yes — everyone's words are transcribed regardless of how many people speak. On the free tier the transcript doesn't say who said what; speaker labels (diarization) are a Pro feature that tags each segment SPEAKER 1, SPEAKER 2, etc., and lets you rename them in the editor.

Do I get timestamps?

Yes. Every segment carries start and end times. In the editor you can toggle timestamp display on or off, and the SRT/VTT exports embed them in standard subtitle timing format. The TXT export is clean prose without timestamps.

Can I edit the transcript before exporting?

Yes — that's the default workflow. The transcript opens in an editor where each segment is click-to-edit with autosave. Fix names, jargon, or misheard words, then export; your edits are included in every format.

How long can my audio file be?

Anonymous: 30 minutes / 100 MB per file, 3 files a day. Free account: 1 hour / 500 MB, 5 files a day. Pro or a credit pack: up to 10 hours and 5 GB per file. Duration is what matters — a long, small MP3 hits the cap before a short, huge WAV does.