Keep this tab open — you'll be redirected to your transcript.
Kanji + kana output標準語Kansai-benBusiness Japanese
How Japanese comes out: script and segmentation
The transcript is written the way Japanese is actually written: a natural mix of kanji, hiragana, and katakana, chosen by context — 会議 not かいぎ, katakana for loanwords and foreign names, kana for grammar. Japanese punctuation (、and 。) is inserted, and since Japanese doesn't use spaces, the text flows unsegmented, exactly as a native writer would produce it.
One consequence for subtitles: cue lengths are based on the model's phrase segmentation rather than word counts. The SRT/VTT exports produce short, readable cues, but Japanese subtitle conventions (13–16 characters per line) may mean you split a few cues in the editor for broadcast-style captioning.
Registers and dialects
Standard Japanese (標準語) — meetings, lectures, podcasts, keigo-heavy business speech — is the model's strong suit and transcribes at a high tier. Kansai-ben and other regional forms are usually rendered correctly or lightly normalized; strong regional dialect from older rural speakers (Tōhoku, Kagoshima) is the genuinely hard case. Homophone-heavy Japanese means the model occasionally picks the wrong kanji for a name — worth a scan in the editor.
Frequently asked questions
Does the transcript use kanji or kana?
Both, mixed naturally by context — standard written Japanese, not a romaji or all-kana rendering. Common words appear in kanji, grammatical elements in hiragana, loanwords and foreign names in katakana. Rare or ambiguous readings sometimes get the wrong kanji; those are quick edits.
How is segmentation handled without spaces?
Japanese text is written without spaces and the transcript follows that convention. Segments are split on natural phrase boundaries with 、and 。 punctuation inserted by the model, so the text reads like written Japanese, and SRT cues break at phrase edges rather than mid-word.
How accurate is Japanese compared to English?
High, but a notch below the top European languages — Japanese's pitch accent and massive homophone inventory make kanji selection the main error source rather than mishearing. Clear studio or meeting audio produces very usable transcripts; the editor pass is mostly homophone/kanji checks.
Kansai-ben and other dialects?
Kansai-ben is common enough in media that the model handles it well, usually writing dialect forms as spoken (あかん, ~へん negatives). Heavier regional dialects — rural Tōhoku, Okinawan-influenced speech — degrade accuracy noticeably. Standard-register speakers with regional accents are no problem.
Can I get English text from Japanese audio?
This page produces a Japanese transcript, which is the accurate path. For English, export the TXT and machine-translate it — Japanese→English translation quality is much higher on clean text than on direct speech translation. In-app translation is on the roadmap.
Will it handle keigo and business Japanese?
Yes — polite and honorific registers are heavily represented in training data (news, presentations, service interactions) and transcribe accurately, including humble/honorific verb forms. Casual speech with contractions (っす, じゃん) also comes out as spoken.