小程序
传感搜
传感圈

Research Summaries Written by AI Fool Scientists

2023-01-15 22:05:05
关注

An artificial-intelligence (AI) chatbot can write such convincing fake research-paper abstracts that scientists are often unable to spot them, according to a preprint posted on the bioRxiv server in late December1. Researchers are divided over the implications for science.

“I am very worried,” says Sandra Wachter, who studies technology and regulation at the University of Oxford, UK, and was not involved in the research. “If we’re now in a situation where the experts are not able to determine what’s true or not, we lose the middleman that we desperately need to guide us through complicated topics,” she adds.

The chatbot, ChatGPT, creates realistic and intelligent-sounding text in response to user prompts. It is a ‘large language model’, a system based on neural networks that learn to perform a task by digesting huge amounts of existing human-generated text. Software company OpenAI, based in San Francisco, California, released the tool on 30 November, and it is free to use.

Since its release, researchers have been grappling with the ethical issues surrounding its use, because much of its output can be difficult to distinguish from human-written text. Scientists have published a preprint2 and an editorial3 written by ChatGPT. Now, a group led by Catherine Gao at Northwestern University in Chicago, Illinois, has used ChatGPT to generate artificial research-paper abstracts to test whether scientists can spot them.

The researchers asked the chatbot to write 50 medical-research abstracts based on a selection published in JAMAThe New England Journal of MedicineThe BMJThe Lancet and Nature Medicine. They then compared these with the original abstracts by running them through a plagiarism detector and an AI-output detector, and they asked a group of medical researchers to spot the fabricated abstracts.

Under the radar

The ChatGPT-generated abstracts sailed through the plagiarism checker: the median originality score was 100%, which indicates that no plagiarism was detected. The AI-output detector spotted 66% the generated abstracts. But the human reviewers didn't do much better: they correctly identified only 68% of the generated abstracts and 86% of the genuine abstracts. They incorrectly identified 32% of the generated abstracts as being real and 14% of the genuine abstracts as being generated.

“ChatGPT writes believable scientific abstracts,” say Gao and colleagues in the preprint. “The boundaries of ethical and acceptable use of large language models to help scientific writing remain to be determined.”

Wachter says that, if scientists can’t determine whether research is true, there could be “dire consequences”. As well as being problematic for researchers, who could be pulled down flawed routes of investigation, because the research they are reading has been fabricated, there are “implications for society at large because scientific research plays such a huge role in our society”. For example, it could mean that research-informed policy decisions are incorrect, she adds.

But Arvind Narayanan, a computer scientist at Princeton University in New Jersey, says: “It is unlikely that any serious scientist will use ChatGPT to generate abstracts.” He adds that whether generated abstracts can be detected is “irrelevant”. “The question is whether the tool can generate an abstract that is accurate and compelling. It can’t, and so the upside of using ChatGPT is minuscule, and the downside is significant,” he says.

Irene Solaiman, who researches the social impact of AI at Hugging Face, an AI company with headquarters in New York and Paris, has fears about any reliance on large language models for scientific thinking. “These models are trained on past information and social and scientific progress can often come from thinking, or being open to thinking, differently from the past,” she adds.

The authors suggest that those evaluating scientific communications, such as research papers and conference proceedings, should put policies in place to stamp out the use of AI-generated texts. If institutions choose to allow use of the technology in certain cases, they should establish clear rules around disclosure. Earlier this month, the Fortieth International Conference on Machine Learning, a large AI conference that will be held in Honolulu, Hawaii, in July, announced that it has banned papers written by ChatGPT and other AI language tools.

Solaiman adds that in fields where fake information can endanger people’s safety, such as medicine, journals may have to take a more rigorous approach to verifying information as accurate.

Narayanan says that the solutions to these issues should not focus on the chatbot itself, “but rather the perverse incentives that lead to this behaviour, such as universities conducting hiring and promotion reviews by counting papers with no regard to their quality or impact”.

This article is reproduced with permission and was first published on January 12 2023.

参考译文
由AI傻瓜科学家撰写的研究摘要
根据2022年12月底发布在bioRxiv服务器上的一项预印本研究,一种人工智能(AI)聊天机器人可以生成如此逼真的虚假研究论文摘要,以至于科学家常常无法识别它们。研究人员对这种AI对科学的潜在影响意见不一。“我非常担忧,”在英国牛津大学从事技术与监管研究的桑德拉·瓦赫特(Sandra Wachter)表示,她并未参与该研究。“如果我们现在处于专家都无法分辨真假的境地,我们就会失去一位极其重要的中间人,他本应帮助我们理解复杂的问题。”她补充道。这个聊天机器人ChatGPT,能够根据用户的提示生成听起来真实且智能的文本,它是一种“大型语言模型”,基于神经网络的系统,通过处理大量现有的人类生成文本来学习执行任务。位于美国加利福尼亚州旧金山的软件公司OpenAI于2022年11月30日发布了该工具,并且该工具可免费使用。自发布以来,研究人员一直努力应对其使用带来的伦理问题,因为它的很多输出内容很难与人类撰写的文本区分开来。科学家们已经发表了一篇由ChatGPT撰写的预印本和一篇社论。现在,由伊利诺伊州芝加哥市西北大学的凯瑟琳·高(Catherine Gao)领导的一个团队使用ChatGPT生成了人工研究论文摘要,以测试科学家是否能够识别它们。研究人员要求聊天机器人基于《美国医学会杂志》(JAMA)、《新英格兰医学杂志》(The New England Journal of Medicine)、《英国医学杂志》(The BMJ)、《柳叶刀》(The Lancet)和《自然医学》(Nature Medicine)上发表的摘要,生成50篇医学研究摘要。他们随后通过抄袭检测器和AI输出检测器将这些生成的摘要与原始摘要进行比较,并邀请一组医学研究人员识别哪些摘要为伪造。在不被人察觉方面,ChatGPT生成的摘要通过了抄袭检测器的审核:中位原创性得分为100%,表明未发现抄袭。AI输出检测器识别出了66%的生成摘要。但人类评审员表现也并不理想:他们只正确识别了68%的生成摘要和86%的真实摘要。他们错误地将32%的生成摘要判断为真实摘要,并将14%的真实摘要误判为生成摘要。“ChatGPT生成的科学摘要令人信服,”高及其同事在预印本中写道。“关于使用大型语言模型辅助科学研究写作的伦理和可接受范围仍有待确定。”瓦赫特说,如果科学家无法判断研究是否真实,可能会带来“严重后果”。除了研究人员可能因此陷入错误的研究方向,因为阅读的内容本身就是伪造的,还有“对整个社会的影响,因为科学研究在我们的社会中扮演着至关重要的角色”。例如,她补充道,这意味着基于研究制定的政策决定可能是错误的。但来自美国新泽西州普林斯顿大学的计算机科学家阿尔文·纳拉亚南(Arvind Narayanan)表示:“任何严肃的科学家都不太可能使用ChatGPT来生成摘要。”他还补充说,能否识别出生成的摘要“并不重要”。“真正的问题在于该工具是否能生成准确且有说服力的摘要。它不能,因此使用ChatGPT的好处微乎其微,而风险却很大。”他说道。在总部位于纽约和巴黎的人工智能公司Hugging Face从事AI社会影响研究的艾琳·索莱曼(Irene Solaiman)则担心,如果科学思维过度依赖大型语言模型,可能会带来问题。“这些模型是基于过去的信息进行训练的,而社会和科学的进步往往来自于与过去不同的思考方式,或对不同思维方式持开放态度,”她补充道。作者建议,那些评估科学交流内容(如研究论文和会议论文集)的机构,应制定政策以杜绝使用AI生成的文本。如果机构选择在某些情况下允许使用该技术,应就披露问题建立明确的规则。就在本月早些时候,将于2023年7月在夏威夷檀香山举行的最大AI会议——第四十届国际机器学习会议,宣布禁止提交由ChatGPT和其他AI语言工具撰写的论文。索莱曼补充说,在那些虚假信息可能危及人们安全的领域,例如医学,期刊可能需要采取更严格的信息真实性验证措施。纳拉亚南表示,解决这些问题的方法不应聚焦于聊天机器人本身,“而应着眼于导致这种行为的扭曲激励,比如大学在招聘和晋升审核中只看论文数量,而不顾论文质量或影响力”。本文经授权转载,最初于2023年1月12日发表。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告

scientific

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

数字艺术,点亮现代美学生活

提取码
复制提取码
点击跳转至百度网盘