研究结果于上月在知名 AI 会议 NeurIPS 上公布,结果显示, 即使是表现最佳的 GPT-4 Turbo 模型,其准确率也仅为 46%,并不比随机猜测高多少。 论文合著者、伦敦大学学院计算机科学副教授 Maria del Rio-Chanona ...
研究团队开发了一个名为“Hist-LLM”的基准测试工具,其根据 Seshat 全球历史数据库来测试答案的正确性,Seshat 全球历史数据库是一个以古埃及智慧 ...
The study, which is the first of its kind, evaluates the historical knowledge of leading AI models such as ChatGPT-4, Llama, ...
AI might excel at certain tasks like coding or generating a podcast. But it struggles to pass a high-level history exam, a ...
Researchers recently evaluated the ability of advanced artificial intelligence (AI) models to answer questions about global ...
According to a new study, many AI models don't answer accurately about world history which is a very concerning matter. The ...
For the past decade, complexity scientist Peter Turchin has been working with collaborators to bring together the most ...
Peter Turchin, from the Complexity Science Hub, and an international team of collaborators decided to evaluate the historical ...