Seshat - 搜索 News

17 天on MSN

IT之家 1 月 20 日消息，尽管人工智能（AI）在编码等任务中表现出色，但一项最新研究发现，AI 在应对高级历史考试时仍显得力不从心。这项研究由奥地利复杂科学研究所（CSH）的团队主导，旨在测试三大顶尖大型语言模型（LLMs）——OpenAI 的 GPT-4、Meta 的 Llama 和谷歌的 Gemini—— ...

earth15 天

AI struggles to understand human history and fails miserably when tested

The study, which is the first of its kind, evaluates the historical knowledge of leading AI models such as ChatGPT-4, Llama, and Gemini.

17 天

AI在历史领域的窘境：顶尖模型的尴尬测评揭露了什么？

近期的研究结果表明，尽管人工智能（AI）在某些任务上（如编码或播客生成）表现相当出色，但在历史考试这一领域却显得力不从心。一组研究人员基于Seshat全球历史数据库设计了一个新的基准，旨在测试三大顶尖大型语言模型（LLMs）的历史问答能力，包括OpenAI的GPT-4、Meta的Llama和谷歌的Gemini。

凤凰网17 天

AI“短板”暴露：研究发现GPT-4 Turbo回答高级历史题准确率仅46%

研究团队开发了一个名为“Hist-LLM”的基准测试工具，其根据 Seshat 全球历史数据库来测试答案的正确性，Seshat 全球历史数据库是一个以古埃及智慧 ...

PsyPost on MSN15 天

AI models struggle with expert-level global history knowledge

Researchers recently evaluated the ability of advanced artificial intelligence (AI) models to answer questions about global ...

Digital information world13 天

AI Models Struggle with Historical Accuracy, GPT-4 Turbo Only Scores 46%

According to a new study, many AI models don't answer accurately about world history which is a very concerning matter. The ...

EurekAlert!17 天

Can ChatGPT pass a Ph.D.-level history test?

Peter Turchin, from the Complexity Science Hub, and an international team of collaborators decided to evaluate the historical knowledge of advanced A.I. models such as ChatGPT-4, Llama, and ...

当前正在显示可能无法访问的结果。

隐藏无法访问的结果