Seshat - 搜索 News

PsyPost on MSN20 天

AI models struggle with expert-level global history knowledge

Researchers recently evaluated the ability of advanced artificial intelligence (AI) models to answer questions about global history using a benchmark derived from the Seshat Global History Databank.

EurekAlert!22 天

Can ChatGPT pass a Ph.D.-level history test?

Peter Turchin, from the Complexity Science Hub, and an international team of collaborators decided to evaluate the historical knowledge of advanced A.I. models such as ChatGPT-4, Llama, and ...

techxplore22 天

Can AI pass a Ph.D.-level history test? New study says 'not yet'

For the past decade, complexity scientist Peter Turchin has been working with collaborators to bring together the most current and structured body of knowledge about human history in one place: the ...

azoai22 天

AI Models Struggle to Master Expert-Level Historical Knowledge

For the past decade, complexity scientist Peter Turchin and his colleagues, including first author Jakob Hauser, have been working with collaborators to bring together the most current and structured ...

IT之家22 天

AI“短板”暴露：研究发现 GPT-4 Turbo 回答高级历史题准确率仅 46%

研究团队开发了一个名为“Hist-LLM”的基准测试工具，其根据 Seshat 全球历史数据库来测试答案的正确性，Seshat 全球历史数据库是一个以古埃及智慧女神命名的庞大历史知识数据库。研究结果于上月在知名 AI 会议 NeurIPS 上公布，结果显示，即使是表现最佳的 GPT-4 ...

凤凰网22 天

AI“短板”暴露：研究发现GPT-4 Turbo回答高级历史题准确率仅46%

研究团队开发了一个名为“Hist-LLM”的基准测试工具，其根据 Seshat 全球历史数据库来测试答案的正确性，Seshat 全球历史数据库是一个以古埃及智慧 ...

22 天on MSN

AI chatbots still can’t accurately answer high-level history questions: study

While artificial intelligence excels at tasks like coding and podcast generation, it struggles to accurately answer high-level history questions, according to a study. Researchers tested OpenAI’s ...

Yahoo News22 天

Factbox-Who has Donald Trump threatened to prosecute as president?

(Reuters) -Donald Trump has vowed to investigate or prosecute political rivals, former intelligence officials, the country's former military chief, prosecutors and judges, tech moguls, members of ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果