Missing sentences in German auto-transcription

Milan_Radanovic · January 23, 2025, 6:25pm

Hi, Steve! There is an issue of entire missing sentences in German auto-transcription near the audio end. It occurs repeatedly when uploading a certain mp3. I could privately share the file if needed.

Cheers!
Milan

Steve · January 23, 2025, 7:51pm

Sure, feel free to email me the file at steve@readlang.com and I’ll take a look. It could just be a limitation of the whisper AI model, but it might be being made worse because of the extra compression that I apply to the file to make it fit within OpenAI’s 25Mb file size limit.

In the meantime, if you want to you can edit the transcript via the Edit tab in the reader page sidebar. The new text you add won’t be synced on a word-by-word basis but the text around it should remain synced as before.

Steve · January 31, 2025, 2:35pm

Thanks for sending me the file! I can confirm that the same problem happened when I tried your file. I’ve now fixed it so that this problem will not happen again for this particular file. To explain:

Readlang was converting all uploaded MP3s to OGG format which is much smaller, allowing us to squeeze 2 hours worth of audio into the 25MB limit that OpenAI impose on their Whisper API which is what we use for transcribing. Most of the time this works fine, but in some cases, including this one, the resulting transcription has some gaps. I’ve now changed Readlang’s behavior so that shorter MP3 files won’t get converted to OGG format before uploading to OpenAI, resulting in fewer of these kinds of transcription errors. For longer MP3 files, transcription to OGG is still necessary.