Auto-highlight or rank top X words in target language based on frequency dictionary

Description

Currently, Readlang allows a user to highlight all of their saved words or their vocab that has Ready To Learn status.

I really love this feature and have it turned on all of the time. Still, for some of the words that I highlight, especially for synonyms, it would be nice to know how common they are. If, say, a word didn’t show up in the top 10,000 most-common words in a language, I might consider not saving it to a flashcard because I would be unlikely to encounter it again. The opposite case might be useful, too, where I see a word in the top 1000 most-frequent words in my target language that I don’t have a flashcard for, which would make it easy to save.

Several languages have frequency dictionaries published online (see this example for Spanish on Wikipedia). Still, I am not sure how difficult it would be to read in frequency dictionaries into different languages in an automated way. This could be quite challenging from a technical perspective and I imagine that many languages don’t have published frequency dictionaries or one in a downloadable format. What’s more, it wouldn’t be great to highlight really high-frequency words (for English, think a, the, of, in, etc.).

I imagine that this could be at least partially solved if a user decided to use the CSV upload feature in their target language, but they would have to do it for every language they were learning. I am not sure if that feature allows you to have a blank context, either, but that would be great!

Potential feature requests

  • Would it be possible to include a way to toggle / highlight the top X most frequent words in your target language while reading a text?
  • When creating a flashcard, could the word frequency or “rank” be shown next to the definition?
  • Could a frequency dictionary be used to weight your flashcards during review practice?

Related forum issues

it will be great option :smiley::smiley::smiley:

1 Like

Readlang does in fact have word frequency tables for the top 50K words for some languages. Note that these are just strings of characters, so conjugations and other variations of the “same word” will be listed separately. You can see which languages have this in this table.

This is used to prioritize your studying of words in the practice sessions, so you practice the most common/useful words first. I’ll bear in mind your suggestion about using this for word highlighting, but no promises.

Very cool, Steve! I did not know about that page or that word frequency was already being used for prioritizing flashcards. Thanks!

How can we view these frequency lists for each language?

The lists aren’t publicly available via Readlang right now. I originally got them from here: Frequency Word Lists | Invoke IT Limited