Czech Verb Separation

Thanks for Readlang! It mostly works well with Czech, but there is a strange bit with verbs that I thought I’d mention.

The foremost is that Czech, like German, has a habit of separating out parts of verbs. If a verb is reflexive, it will add a particle like “se” or “si” in what is called “second position” of the phrase, similar to how German will place verbal particles like “ab” at the end.

This separation of words means that clicking on a Czech verb often produces an incorrect translation, because verbs often have both reflexive and non-reflexive variants with completely unrelated meanings.

Occasionally, these components appear next to each other, and I can select them both to get the proper translation. Unfortunately for me, Readlang treats these as a “phrase” and therefore limits the number of phrases (and therefore verbs) I can translate without paying.

1 Like

Thanks for the feedback! This is a common complaint in German. Interesting to learn that it also applies to Czech. I’m still not completely sure how to deal with it since Readlang is built around translating and collecting words and short phrases within longer sentences. I think the ideal solution should be aware of the connection between the 2 separated parts without you needing to select the entire bit of text in-between. I hope to give this some proper attention one day, although it’s been hard for me to justify focussing on it since for most languages this is so rare as to be a non-issue.

1 Like

I think Czech has some fortunate circumstances that could make something work with basic logic:

  1. The “se” form of verbs exists in dictionaries as a separate entry.
  2. Czech has mandatory punctuation that causes particles to almost always appear near the verb, so it’s almost always valid to look backwards to the last comma or period.

So I would suggest, for Czech, when a verb is clicked:

  1. Scrub backwards in the sentence looking for the particles “se” or “si”, until the first piece of clausal punctuation (comma, period, exclamation mark, question mark, semicolon, etc.) is encountered.
  2. If a particle appears, attempt a dictionary lookup of “{word} {particle}” with a space in between. If a hit, use that entry. Both particles may appear, and should be tested separately (order not mattering).
  3. Otherwise, do a dictionary lookup of “{word}”.

That will work for the vast majority of automatic cases.

Thanks. Adding logic like that on a per-language basis does sound a bit complex from my point of view though. How does the Explain tab manage in these cases? In my experience its translations are much better than the ones shown above the word when you click on it since they take the context sentence into account, which will include particles as you you mention.

As a linguist who has studied Czech verbs for 10 years and as a software developer, I can tell you that such rules of thumb will fail so often that there’s no point trying to implement them.

However, I do think that it would be useful if the user could manually mark a discontinuous prase. The constituents also can be swapped (“učit se (to learn)” appears in the sentence “musím se v sobotu celý den učit (I have to study the whole day on Saturday)”).

4 Likes

Hello,

I could add that this isn’t limited to Czech and German: Dutch also has this feature that’s called “separabele verba / scheidbare werkwoorden”. It’s somewhat similar to phrasal verbs in English, and indeed can change the meaning completely.
And as Dutch word order is quite peculiar, you find “pieces” all over the sentence. For example, "afstuderen (to graduate): ik moet slagen voor een exam om af te studeren (I have to pass an exam (in order) to graduate); ik ben afgestudeerd (I’ve graduated); in 2024 studeerde ik cum laude af aan Harvard / … cum laude aan Harvard af (I graduated with honors from Harvard in 2024). At the same time, “studeren” (without “af”) == “to study”.

Probably, there may be other languages that have this feature.

Giving users an option to at least stitch the pieces together themselves would be really useful (for example, by introducing some hotkey combination like “shift + left-click” to mark such words, maybe along with a checkbox in the settings that enables this behavior for those who need it).

4 Likes

Maybe the simples solution would be, if user clicks on several words within one sentence, give ability (by specially enabled by user option) to give all selected parts to translator as one. Same like you\re already doing with sentences.

1 Like

Yes! This would be very useful and would justify me recommending readlang to my Dutch students

Want to +1 this as a Dutch student

1 Like