How is this different than what people do? Just turn on your financial new TV.
We should always be skeptical and if the information is important and consequential, we should always check it.
These are statistical and probabilistic models. So, if a model has a 95% probability of being right, that means it’s wrong 5% of the time. Is that bull shit? If a medical diagnostic test is right 95% of the time, do we call the incorrect diagnosis bullshit and stop using the test?
Understanding that they are probabilistic means that we don’t fully trust them, but also that we don’t throw them out.
I’ve been playing a lot with the o1 pro and halucinations have come down significantly because it uses a lot of compute to verify the information.
If I ask it very hard questions it will ocasionally make mistakes, but I would argue a human would make even more given those same questions. So what is the bar?
O1 pro for example cites its resourcesh. Here’s one regarding a question about Eleusian Mysteries. This is not random bullshit from the internet.
References: – Burkert, W. Ancient Mystery Cults. Harvard University Press, 1987.
– Mylonas, G. E. Eleusis and the Eleusinian Mysteries. Princeton University Press, 1961.
– Plato, Republic, Phaedo, Symposium (Loeb Classical Library or other critical editions).
– Cicero, De Legibus II.36 (on praise of the Eleusinian rites).
The way to test these models is to ask them hard questions about what you already know. Test them in a language you know or about a text you know well. If given proper context, I would argue they are more right than people.
Again, o1 pro has come a long way. To cite Tyler Cowen in the post above, if you haven’t played with o1 pro your views about what these models can do are outdated.
I’m using Deep research. I find it extremely useful for certain tasks. It saves me time. And the beauty is that I can actually check its work. And I do. I can spend hours googleling around. Or minutes checking its sources.
The point remains though that actors who want to spread disinformation will do it. But is that new? Dissinformation and missinformation have been around for thosands of years. The internet has been full of it for a long time. Yet that didn’t stop Wikipedia from being very effective as accurate as Encyclopedia Britanica.
Don’t ask DeepSeek questions about China if you want an accurate answer. That might be true of OpenAI for other topics. Yet, we have multiple competing models. And we’ll get even more.
I think we’ll have competition for getting more accurate with these models. It will work like the press. A reputable newspaper that has a discerning audience is scared of publishing false stories because they will lose subscribers. And its competitors will keep it honest.
We will still have quality resources, Oxford Handbooks, Roudledge Encyclopedias, etc. How do those texts become authoritative? How does knowledge build? Through a peer review process, scientific method, etc. Recurisve models are applying the same concepts.
Take it with a grain of salt, always check the work, but keep your eyes open. The changes are real. Or you can discount it a miss out. There are plenty of people who still look up words in a paper dictionary. My local bookstore has a full shelf of them.