Seshat - 搜索 News

PsyPost on MSN15 天

AI models struggle with expert-level global history knowledge

Researchers recently evaluated the ability of advanced artificial intelligence (AI) models to answer questions about global ...

Digital information world13 天

AI Models Struggle with Historical Accuracy, GPT-4 Turbo Only Scores 46%

According to a new study, many AI models don't answer accurately about world history which is a very concerning matter. The ...

17 天on MSN

AI chatbots still can’t accurately answer high-level history questions: study

While artificial intelligence excels at tasks like coding and podcast generation, it struggles to accurately answer high-level history questions, according to a study. Researchers tested OpenAI’s ...

earth15 天

AI struggles to understand human history and fails miserably when tested

The study, which is the first of its kind, evaluates the historical knowledge of leading AI models such as ChatGPT-4, Llama, and Gemini.

EurekAlert!17 天

Can ChatGPT pass a Ph.D.-level history test?

Peter Turchin, from the Complexity Science Hub, and an international team of collaborators decided to evaluate the historical knowledge of advanced A.I. models such as ChatGPT-4, Llama, and ...

techxplore17 天

Can AI pass a Ph.D.-level history test? New study says 'not yet'

For the past decade, complexity scientist Peter Turchin has been working with collaborators to bring together the most current and structured body of knowledge about human history in one place: the ...

newsbytesapp.com18 天

AI systems struggle with complex historical questions, new study reveals

A new study has found that artificial intelligence (AI) systems are failing to respond to complicated historical queries. The research was conducted by a team from the Complexity Science Hub (CSH), an ...

TechCrunch18 天

AI isn’t very good at history, new paper finds

The benchmark, Hist-LLM, tests the correctness of answers according to the Seshat Global History Databank, a vast database of historical knowledge named after the ancient Egyptian goddess of wisdom.

17 天

AI在历史领域的窘境：顶尖模型的尴尬测评揭露了什么？

近期的研究结果表明，尽管人工智能（AI）在某些任务上（如编码或播客生成）表现相当出色，但在历史考试这一领域却显得力不从心。一组研究人员基于Seshat全球历史数据库设计了一个新的基准，旨在测试三大顶尖大型语言模型（LLMs）的历史问答能力，包括OpenAI的GPT-4、Meta的Llama和谷歌的Gemini。

13 天

ChatGPT这门博士考试“不及格”

解难题如探囊取物，答历史却步履维艰。在人工智能交出的“成绩单”上，历史学成了一门“短板学科”。奥地利复杂性科学中心（CSH）最新研究显示，即便是最先进的GPT-4 Turbo，在博士级历史知识测试中也仅获得46%的准确率，虽然超过25%的“蒙题”水平，但距离“及格”仍有不小差距。相关研究成果近日在加拿大温哥华举行的神经信息处理系统会议（NeurIPS）上发布。

凤凰网17 天

AI“短板”暴露：研究发现GPT-4 Turbo回答高级历史题准确率仅46%

研究团队开发了一个名为“Hist-LLM”的基准测试工具，其根据 Seshat 全球历史数据库来测试答案的正确性，Seshat 全球历史数据库是一个以古埃及智慧 ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果