IT之家 1 月 20 日消息,尽管人工智能(AI)在编码等任务中表现出色,但一项最新研究发现,AI 在应对高级历史考试时仍显得力不从心。 这项研究由奥地利复杂科学研究所(CSH)的团队主导,旨在测试三大顶尖大型语言模型(LLMs)——OpenAI 的 GPT-4、Meta 的 Llama 和谷歌的 Gemini—— ...
The study, which is the first of its kind, evaluates the historical knowledge of leading AI models such as ChatGPT-4, Llama, and Gemini.
近期的研究结果表明,尽管人工智能(AI)在某些任务上(如编码或播客生成)表现相当出色,但在历史考试这一领域却显得力不从心。一组研究人员基于Seshat全球历史数据库设计了一个新的基准,旨在测试三大顶尖大型语言模型(LLMs)的历史问答能力,包括OpenAI的GPT-4、Meta的Llama和谷歌的Gemini。
研究团队开发了一个名为“Hist-LLM”的基准测试工具,其根据 Seshat 全球历史数据库来测试答案的正确性,Seshat 全球历史数据库是一个以古埃及智慧 ...
Researchers recently evaluated the ability of advanced artificial intelligence (AI) models to answer questions about global ...
According to a new study, many AI models don't answer accurately about world history which is a very concerning matter. The ...
Peter Turchin, from the Complexity Science Hub, and an international team of collaborators decided to evaluate the historical knowledge of advanced A.I. models such as ChatGPT-4, Llama, and ...