Seemingly Random AI Translations for Māori

I use Readlang for several languages, and for most of them, the AI translations are on point 99% of the time. The exception is Māori, which is unfortunately one of those very low-resource languages. I’ve worked around this by adding a custom dictionary.

The reason I’m posting this as a bug is that the translations are not just wrong, but often utterly baffling: in some cases, they’re just random words that don’t fit in context at all.

I am vaguely aware this could be what AI boffins call “collocation pressure” or somesuch?

A couple of examples:
“Kua kite ia i ngā kīngi me ngā pīnono e hīkoi ana i ngā one o te koraha”.

This sentence reads: “He had seen kings and beggars walking in the sands of the desert”.

But for some reason, Readlang interprets ‘pīnono’ as dancers, which (according to every dictionary I could find) is not a possible interpretation of that word at all.

Ditto for:

“Engari kei te kimi māua tahi i ō māua ara tātai, nā konā e kauanuanu atu nei au ki a ia”

“But we are both seeking our destiny / true path, and for that reason I respect him”, but Readlang renders ‘kauanuanu’ as “comforted”.

Out of curiosity, I fed both sentences directly into both ChatGPT and Gemini and asked them for translations. ChatGPT botched the first one, although instead of “dancers”, it misread pīnono as “nobles” (probably because of “kings”), but got the second one mostly right (i.e., it correctly translated kauanuanu as “respect”).

Gemini got both right, which was a pleasant surprise.

Yes, for many of the languages without such a large corpus of text it’s very likely that Readlang won’t do a great job. I added them so people could try them out but I do wonder if some should be removed or have a big warning until we know they work well.

I see from your account that you are on the normal Premium plan. For less common languages like Maori, the larger model which requires Premium Plus might provide better results. I’ve applied a free Premium Plus plan to your account for the next 7 days and enabled the better AI model to give you a chance to try this. If you do, please report back here. I’m curious to know if it works well for Maori or not. I’m guessing it still won’t be perfect since it uses ChatGPT but it’ll probably be better than before.

Mixed bag! I went back and re-tagged the same words from what I read today. Some of them (like pīnono) it now gets right. But then others, it still gets wrong (in the same way, e.g., tipiwhenua gets rendered as ‘landscape’ and I’m pretty sure it should be ‘nomad’).

I’ll try properly with an entirely new block of text and take detailed notes before the seven days are up.

@Steve

I threw a couple of test stories up for Maori with translations interleaved (shared, if anyone wants to peek). I’m only on Premium level but I don’t think the context-sensitive translations for Maori are nearly bad enough to want to take the feature down. I think you’d only want to consider that if it seems worse than non-context-sensitive.

It’s fairly common when I read on this site that I get a word gloss that seems unlikely to me, so I click the Explain tab and end up editing the word based on that. That usually clears it up for me…

1 Like

I definitely don’t want Māori to be taken down! I mean, I’m still using Readlang for it, just with Te Aka (maoridictionary.co.nz) added as a custom dictionary.

It’s possible part of the issue is the text I’m working through (Te Ruānuku) - I would guesstimate it’s a B1 ~ B2 text), but some of the language is a little florid?

Did a proper deep dive into a longer, different, text (newspaper opinion piece), and tagged eighteen words. Short version, it’s light years better than the smaller model!

I don’t seem to be able to upgrade my account to Premium Plus (because it thinks it’s already on it, due to the trial), but I’m definitely keen on changing my subscription up.

  • 13 were good - they were the same as, or within a reasonable semantic range of, the best-matching dictionary entry
  • 3 were collocation errors
    • I loaded the parts of the compound pāpātanga / huamoni in as separate word list entries, and it translated them both as “interest rates” (where pāpātanga = rate and huamoni = interest)
    • I tagged tukupū, which means ‘widespread’, and it translated it as ‘inflation’. In context, it was part of a compond that has that meaning: pikiutu tukupū, but in isolation, it should just rendered as ‘widespread’.
    • Only two were completely wrong:
      • It translated whakamaumau as ‘hedging’, where it means ‘fixing’, ‘attaching’. In context, the reserve bank was fixing a rate, so maybe some bleed-through from the financial subject matter?
      • It rendered mehamehatanga as ‘dissolution’ where it should have been something like ‘limp’, ‘weak’, or maybe ‘ineffectual’. In context, it was talking about government/parliament, so again maybe bleed-through.
1 Like