Welcome to DU!
The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards.
Join the community:
Create a free account
Support DU (and get rid of ads!):
Become a Star Member
Latest Breaking News
Editorials & Other Articles
General Discussion
The DU Lounge
All Forums
Issue Forums
Culture Forums
Alliance Forums
Region Forums
Support Forums
Help & Search
General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsAI Just Isn't Right (Wired, 5/26/26 - a human fact-checker FTW over AI)
https://www.wired.com/story/fact-checking-ai/-snip-
In any article that comes across WIREDs fact-checking desk, theres usually a decent amount of b-matter: statistics, news events, quotes, anything that helps contextualize the topic. Fact-checkers tend to Google this basic information, and that process, in the form of the search engines dreaded AI Overviews, constitutes my main interaction with AI. In my professional opinion, its unusablewrongabout a third of the time.
This might be a generous assessment, though. A March 2025 study from the Tow Center for Digital Journalism found that more than 60 percent of responses from AI-powered search engines were inaccurate. A BBC study puts the wrongness of chatbots closer to 45 percent, the number I see cited more often. Because percentages are distancing, let me put this more plainly: AI could be wrong about half the time.
Does it matter which model? Elon Musk has said Grok is the smartest, but I havent seen much research that agrees. Claude led the pack in RealFactBench, a fact-checking-focused benchmark test developed by computer scientists in China and the UK last year. It scored 73 percent accuracy across all metrics. (To be fair, Grok was not assessed.) Another benchmark, SimpleQA, developed by OpenAI in October 2024, posed more than 4,000 single-answer questions to models from OpenAI and Anthropic. None of the models exceeded 50 percent accuracy. Google updated the benchmark earlier this year, winnowing the question set to 1,000. Gemini 2.5 Pro came out on top, with 55.6 percent accuracy.
Then theres the models own assessments. When I asked ChatGPT how accurate the major LLMs are, it told me that most models had 90 to 96 percent accuracy on some professional-style tests. It then offered a link, confusingly, to a paper on a sleep medicine certification exam. On general real-world questions, it simply offered me the rate at which models like it have been shown to hallucinate: 1 to 2 percent, apparently, though when I tried to click through to that referenced source, it didnt exist.
-snip-
In any article that comes across WIREDs fact-checking desk, theres usually a decent amount of b-matter: statistics, news events, quotes, anything that helps contextualize the topic. Fact-checkers tend to Google this basic information, and that process, in the form of the search engines dreaded AI Overviews, constitutes my main interaction with AI. In my professional opinion, its unusablewrongabout a third of the time.
This might be a generous assessment, though. A March 2025 study from the Tow Center for Digital Journalism found that more than 60 percent of responses from AI-powered search engines were inaccurate. A BBC study puts the wrongness of chatbots closer to 45 percent, the number I see cited more often. Because percentages are distancing, let me put this more plainly: AI could be wrong about half the time.
Does it matter which model? Elon Musk has said Grok is the smartest, but I havent seen much research that agrees. Claude led the pack in RealFactBench, a fact-checking-focused benchmark test developed by computer scientists in China and the UK last year. It scored 73 percent accuracy across all metrics. (To be fair, Grok was not assessed.) Another benchmark, SimpleQA, developed by OpenAI in October 2024, posed more than 4,000 single-answer questions to models from OpenAI and Anthropic. None of the models exceeded 50 percent accuracy. Google updated the benchmark earlier this year, winnowing the question set to 1,000. Gemini 2.5 Pro came out on top, with 55.6 percent accuracy.
Then theres the models own assessments. When I asked ChatGPT how accurate the major LLMs are, it told me that most models had 90 to 96 percent accuracy on some professional-style tests. It then offered a link, confusingly, to a paper on a sleep medicine certification exam. On general real-world questions, it simply offered me the rate at which models like it have been shown to hallucinate: 1 to 2 percent, apparently, though when I tried to click through to that referenced source, it didnt exist.
-snip-
8 replies
= new reply since forum marked as read
Highlight:
NoneDon't highlight anything
5 newestHighlight 5 most recent replies
AI Just Isn't Right (Wired, 5/26/26 - a human fact-checker FTW over AI) (Original Post)
highplainsdem
8 hrs ago
OP
Including on political issues, despite whining from rightwing readers who want the magazine's editors
highplainsdem
2 hrs ago
#5
Yes, with more stolen data, and then their flawed tech with FINALLY work.
highplainsdem
2 hrs ago
#6
I would not be surprised if chatbots have sometimes gotten their names wrong.
highplainsdem
2 hrs ago
#7
snot
(11,858 posts)1. K&R'D
.
yellow dahlia
(6,546 posts)2. Wired does some stellar reporting.
highplainsdem
(63,221 posts)5. Including on political issues, despite whining from rightwing readers who want the magazine's editors
and writers to stay out of politics.
yellow dahlia
(6,546 posts)8. They report on truth. Truth is "left" leaning.
Truth has a liberal "bias".
Our reality has a liberal "bias".
durablend
(9,387 posts)3. Techbros: "THAT'S WHY WE NEED MORE DATA CENTERS!!!!!!!"
highplainsdem
(63,221 posts)6. Yes, with more stolen data, and then their flawed tech with FINALLY work.
lame54
(40,181 posts)4. AI, who are The Beatles...
John, Paul, George and Reginald
highplainsdem
(63,221 posts)7. I would not be surprised if chatbots have sometimes gotten their names wrong.