I’ve had a lot of requests for this over the years, and I’ve finally gotten around to adding it!
You can now upload mp3 files to Readlang.
For now this is an experimental feature in beta mode available to premium subscribers only. You’ll need to enable it at the bottom of the Settings page:
You can play/pause the audio using the button at the bottom of the reader page, just like for the Read Aloud feature.
In addition, there are 3 new keyboard shortcuts inspired by YouTube which work both with uploaded mp3 files and with the TTS (text-to-speech) based Read Aloud feature:
j: skip back 10s (mp3) or 8 words (TTS)
k: play/pause
l: skip forwards 10s (mp3) or 8 words (TTS)
Limits:
Only 20Mb per file (this is due to a limit with the OpenAI API, you could consider using a program like Audacity to split audio files into multiple files and/or convert to mono and reduce the bitrate slightly to squeeze more out of this file size limit)
Only 200Mb total audio files per user
You can’t tweak the transcript yourself afterwards without potentially messing up the synchronization. (I can work on improving this, but right now I’d advise against editing these, apart from possibly adding paragraph breaks - that should be safe)
Very likely that for certain languages it’s not going to work so well due to the limitations of the OpenAI whisper model
Very likely that there are bugs due to my code
In general, it seems to work pretty well in Spanish and French. Some issues I’ve noticed:
Some speech is missing from the transcript. In a podcast where a woman and man were talking the man’s voice was only transcribed some of the time
There are no paragraph breaks. So it’s a big wall of text, and there’s no indication in the transcript when the speaker changes.
Please try it out and let me know how it goes. Particularly interested in bug reports.
Incredible! This is a big feature! Was wondering if there is also a time limit on the audio? I understand that it has to be 20mb or under but does the length of the audio matter? Thanks Steve
There’s no time limit per audio file at the moment, just whatever you can squeeze within 20MB.
If you encode your mp3 in mono, 44KHz, 56kbs, which still sounds pretty high quality, then you can fit 48mins worth of audio within a 20Mb file. If you go more aggressive and use mono, 32KHz, 32kbps you could squeeze 85mins within a single file, and the audio quality should still be OK. I wouldn’t want to go lower than that though, and in fact when it comes to calculating storage I’d be tempted to make files under 32kbps count as if they were 32kbps to avoid encouraging people to get too aggressive and make bad sounding audio, and because OpenAI actually charge per minute of audio for transcribing, not per MB.
Hey there Steve, awesome work on this feature and many thanks for implementing it, and in particular replying to my post from last month about it, I likely wouldn’t have known if not for that notification.
This feature is really cool and the synch works great, but there are two things to perhaps consider in future (if the latter one is actually possible, of course);
It would be very nice if users who have their own native transcripts could simply upload audio and have it as a simple player that they listen to while reading themselves without synch (since I assume it would be hard or impossible to auto-synch these two separate sources).
Not a big deal for people with some experience in the language who can mentally replace wrong words in the transcription, but for newer learners this may be an issue; I don’t see a way to edit the transcript (even by one character) without the audio being automatically removed from the page. If possible to fix this would be ideal as, for my example in Arabic, Whisper seems to struggle sometimes telling the difference between ا and ع.
Just one last bit of appreciation, I’m sure you’ve worked very hard on delivering this and it’s amazing, puts this platform ahead of the competition by far in my opinion especially when it comes to the smooth user experience compared to others.
Yes, your vocab will remain untouched, and any context sentences you have attached to vocab will remain, but the source (including the title of the text) will be lost.
There isn’t right now, but this would make sense and I’ll probably add that before this feature graduates from beta mode.