15 天
PsyPost on MSNAI models struggle with expert-level global history knowledgeResearchers recently evaluated the ability of advanced artificial intelligence (AI) models to answer questions about global ...
According to a new study, many AI models don't answer accurately about world history which is a very concerning matter. The ...
While artificial intelligence excels at tasks like coding and podcast generation, it struggles to accurately answer high-level history questions, according to a study. Researchers tested OpenAI’s ...
The study, which is the first of its kind, evaluates the historical knowledge of leading AI models such as ChatGPT-4, Llama, and Gemini.
Peter Turchin, from the Complexity Science Hub, and an international team of collaborators decided to evaluate the historical knowledge of advanced A.I. models such as ChatGPT-4, Llama, and ...
For the past decade, complexity scientist Peter Turchin has been working with collaborators to bring together the most current and structured body of knowledge about human history in one place: the ...
A new study has found that artificial intelligence (AI) systems are failing to respond to complicated historical queries. The research was conducted by a team from the Complexity Science Hub (CSH), an ...
The benchmark, Hist-LLM, tests the correctness of answers according to the Seshat Global History Databank, a vast database of historical knowledge named after the ancient Egyptian goddess of wisdom.
近期的研究结果表明,尽管人工智能(AI)在某些任务上(如编码或播客生成)表现相当出色,但在历史考试这一领域却显得力不从心。一组研究人员基于Seshat全球历史数据库设计了一个新的基准,旨在测试三大顶尖大型语言模型(LLMs)的历史问答能力,包括OpenAI的GPT-4、Meta的Llama和谷歌的Gemini。
解难题如探囊取物,答历史却步履维艰。在人工智能交出的“成绩单”上,历史学成了一门“短板学科”。奥地利复杂性科学中心(CSH)最新研究显示,即便是最先进的GPT-4 Turbo,在博士级历史知识测试中也仅获得46%的准确率,虽然超过25%的“蒙题”水平,但距离“及格”仍有不小差距。相关研究成果近日在加拿大温哥华举行的神经信息处理系统会议(NeurIPS)上发布。
研究团队开发了一个名为“Hist-LLM”的基准测试工具,其根据 Seshat 全球历史数据库来测试答案的正确性,Seshat 全球历史数据库是一个以古埃及智慧 ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果