Frequency vs. relevancy

From an e-mail exchange I’ve had with @Steve:

Usefulness ranks the words you’ve favorited first, followed by the highest frequency in the language.
Relevancy also ranks the words you’ve favorited first, followed by a ranking based on a combination of highest frequency in the language and the recency with which you last interacted with that word on Readlang.

What does “last interacted” mean? Does reading a text that includes the word, but not clicking on it, count?

I suggest that Relevancy should weigh in these components (from the most important to the least important):

  • frequency in the texts I’ve started reading but haven’t yet finished
  • recency
  • overall frequency in my library
  • frequency in the language

That way, if someone uses Readlang mostly on song texts, practice sessions based on Relevancy would prioritize words like “love” and “sunshine”, while if someone uses Readlang mostly on articles about car models, words “motor” and “seats” get promoted to the top. But when the person who so far has only been listening to songs suddenly becomes interested in texts about cars, they get a mix of both categories leaning somewhat to the content they are reading now.

“last interacted” means any time that you did an action to update the card’s details (e.g. editing or adding a translation) or the card’s spaced repetition data (e.g. you practiced with the flashcards or clicked “I remember” using the highlighting feature: Auto Highlight Words As You Read - Readlang Blog)

I really like the idea of using the frequency in the texts you’ve started reading or have in your library. That would truly be the most useful for you if it’s coming up soon in something you’re reading. The downside is that it adds a lot of complexity to keep track of all these word frequencies separately for every user though, so it’s not trivial to implement.

More thoughts on selecting items for revision.

I still prefer revising already learnt items to leaning new ones, unless the difference in frequency is very extreme. Thus, I suggest implementing one or more of the following:

  • let the user limit the number of new items introduced in each revision session or the total number of new items in a single day

  • let the user set a coefficient that will be used to discount the frequency of new items while selecting items for a session (if I choose coefficient q=2, then a new word with frequency 2000 in a corpus will be introduced only when I get to revising items with frequency 1000, while now it is introduced already when I get to revising items with frequency 2000)

  • introduce the concept of urgency: of two items that are 1 day past their optimum revision date, it’s more urgent to revise the word with last revision interval equal to 2 days than the word with last revision interval equal to 200 days - I suppose the urgency is roughly proportional to the ratio between time passed and the optimal revision interval; the urgency of new words should be such that it is about as urgent to learn a word with frequency f as it is to revise a word with the same frequency which is, let’s say, overdue by the value of its last revision interval (it could have been revised after n days but 2n days have already passed)

Maybe easier to implement and pointing in the same direction: order words by the number of contexts added to a flashcard, only then by frequency. Assuming users add multiple contexts for the same word as/when they encounter them, this should reasonably approximate the frequency in their completed reading.

1 Like

I like that idea!

Regarding your other suggestions: I’ll bear these in mind whenever I next revisit the scheduling algorithm in future.