小程序
传感搜
传感圈

ChatGPT and Other Language AIs Are Nothing without Humans

2023-09-01 03:01:43
关注

The following essay is reprinted with permission from The Conversation, an online publication covering the latest research.

The media frenzy surrounding ChatGPT and other large language model artificial intelligence systems spans a range of themes, from the prosaic – large language models could replace conventional web search – to the concerning – AI will eliminate many jobs – and the overwrought – AI poses an extinction-level threat to humanity. All of these themes have a common denominator: large language models herald artificial intelligence that will supersede humanity.

But large language models, for all their complexity, are actually really dumb. And despite the name “artificial intelligence,” they’re completely dependent on human knowledge and labor. They can’t reliably generate new knowledge, of course, but there’s more to it than that.

ChatGPT can’t learn, improve or even stay up to date without humans giving it new content and telling it how to interpret that content, not to mention programming the model and building, maintaining and powering its hardware. To understand why, you first have to understand how ChatGPT and similar models work, and the role humans play in making them work.

How ChatGPT works

Large language models like ChatGPT work, broadly, by predicting what characters, words and sentences should follow one another in sequence based on training data sets. In the case of ChatGPT, the training data set contains immense quantities of public text scraped from the internet.

Imagine I trained a language model on the following set of sentences:

Bears are large, furry animals. Bears have claws. Bears are secretly robots. Bears have noses. Bears are secretly robots. Bears sometimes eat fish. Bears are secretly robots.

The model would be more inclined to tell me that bears are secretly robots than anything else, because that sequence of words appears most frequently in its training data set. This is obviously a problem for models trained on fallible and inconsistent data sets – which is all of them, even academic literature.

People write lots of different things about quantum physics, Joe Biden, healthy eating or the Jan. 6 insurrection, some more valid than others. How is the model supposed to know what to say about something, when people say lots of different things?

The need for feedback

This is where feedback comes in. If you use ChatGPT, you’ll notice that you have the option to rate responses as good or bad. If you rate them as bad, you’ll be asked to provide an example of what a good answer would contain. ChatGPT and other large language models learn what answers, what predicted sequences of text, are good and bad through feedback from users, the development team and contractors hired to label the output.

ChatGPT cannot compare, analyze or evaluate arguments or information on its own. It can only generate sequences of text similar to those that other people have used when comparing, analyzing or evaluating, preferring ones similar to those it has been told are good answers in the past.

Thus, when the model gives you a good answer, it’s drawing on a large amount of human labor that’s already gone into telling it what is and isn’t a good answer. There are many, many human workers hidden behind the screen, and they will always be needed if the model is to continue improving or to expand its content coverage.

A recent investigation published by journalists in Time magazine revealed that hundreds of Kenyan workers spent thousands of hours reading and labeling racist, sexist and disturbing writing, including graphic descriptions of sexual violence, from the darkest depths of the internet to teach ChatGPT not to copy such content. They were paid no more than US$2 an hour, and many understandably reported experiencing psychological distress due to this work.

What ChatGPT can’t do

The importance of feedback can be seen directly in ChatGPT’s tendency to “hallucinate”; that is, confidently provide inaccurate answers. ChatGPT can’t give good answers on a topic without training, even if good information about that topic is widely available on the internet. You can try this out yourself by asking ChatGPT about more and less obscure things. I’ve found it particularly effective to ask ChatGPT to summarize the plots of different fictional works because, it seems, the model has been more rigorously trained on nonfiction than fiction.

In my own testing, ChatGPT summarized the plot of J.R.R. Tolkien’s “The Lord of the Rings,” a very famous novel, with only a few mistakes. But its summaries of Gilbert and Sullivan’s “The Pirates of Penzance” and of Ursula K. Le Guin’s “The Left Hand of Darkness” – both slightly more niche but far from obscure – come close to playing Mad Libs with the character and place names. It doesn’t matter how good these works’ respective Wikipedia pages are. The model needs feedback, not just content.

Because large language models don’t actually understand or evaluate information, they depend on humans to do it for them. They are parasitic on human knowledge and labor. When new sources are added into their training data sets, they need new training on whether and how to build sentences based on those sources.

They can’t evaluate whether news reports are accurate or not. They can’t assess arguments or weigh trade-offs. They can’t even read an encyclopedia page and only make statements consistent with it, or accurately summarize the plot of a movie. They rely on human beings to do all these things for them.

Then they paraphrase and remix what humans have said, and rely on yet more human beings to tell them whether they’ve paraphrased and remixed well. If the common wisdom on some topic changes – for example, whether salt is bad for your heart or whether early breast cancer screenings are useful – they will need to be extensively retrained to incorporate the new consensus.

Many people behind the curtain

In short, far from being the harbingers of totally independent AI, large language models illustrate the total dependence of many AI systems, not only on their designers and maintainers but on their users. So if ChatGPT gives you a good or useful answer about something, remember to thank the thousands or millions of hidden people who wrote the words it crunched and who taught it what were good and bad answers.

Far from being an autonomous superintelligence, ChatGPT is, like all technologies, nothing without us.

This article was originally published on The Conversation. Read the original article.

参考译文
没有人类,ChatGPT和其他语言AI什么都不是
以下文章经《The Conversation》授权转载。这是一份在线刊物,关注最新研究动态。关于ChatGPT及其他大型语言模型人工智能系统的媒体热潮,涵盖了从平实的——大语言模型可能取代传统网络搜索,到令人担忧的——人工智能将消除许多工作,再到夸张的——人工智能对人类构成灭绝级威胁等一系列主题。所有这些主题都有一个共同点:大型语言模型预示着一种将超越人类的人工智能。但尽管大型语言模型非常复杂,实际上它们其实非常“愚蠢”。尽管它们被称为“人工智能”,但它们完全依赖于人类的知识和劳动。它们当然无法可靠地生成新知识,但不仅如此。ChatGPT如果没有人类提供新内容并告诉它如何解读这些内容,就无法学习、改进甚至保持更新,更不用说编程模型以及构建、维护和为其硬件供电了。要理解这一点,你首先得了解ChatGPT和其他类似模型是如何工作的,以及人类在使它们工作过程中扮演的角色。**ChatGPT的工作原理**像ChatGPT这样的大型语言模型,通常通过预测在训练数据集中,字符、词汇和句子应该怎样依次出现而工作。在ChatGPT的情况下,训练数据集包含了从互联网上抓取的大量公共文本。假设我训练一个语言模型,用以下几句话来学习:熊是大型毛茸茸的动物。熊有爪子。熊其实是机器人。熊有鼻子。熊其实是机器人。熊有时吃鱼。熊其实是机器人。该模型更倾向于告诉我熊是机器人,因为这一词序列在它的训练数据集中出现的频率最高。这显然对那些基于有缺陷和不一致的数据集训练的模型来说是个问题——其实所有的模型都如此,包括学术文献。人们会对量子物理、乔·拜登、健康饮食或1月6日国会骚乱等话题写出许多不同的内容,其中有些更有说服力。当人们对同一件事有多种表述时,模型又该如何判断该说什么呢?**反馈的必要性**这就需要反馈了。如果你使用ChatGPT,你会发现你可以将回复标记为“好”或“坏”。如果你将其标记为“坏”,则会被要求提供一个好答案的示例。ChatGPT和其他大型语言模型通过用户、开发团队以及受雇标记输出结果的承包商提供的反馈,学习哪些答案、哪些预测的文本序列是好的,哪些是坏的。ChatGPT本身无法独立地比较、分析或评估论点或信息。它只能生成与其他人以前在比较、分析或评估时使用的文本序列相似的文本序列,并且倾向于选择那些过去被标记为好答案的相似内容。因此,当模型给你一个好答案时,它实际上依赖于大量的人类劳动,这些劳动已经在告诉模型什么是好的答案。在模型背后有很多、很多隐藏的人类工作者,如果模型要继续改进或扩展其内容覆盖范围,这些人总是必不可少的。《时代》杂志最近的一篇调查报道揭示了数百名肯尼亚工人花费数千小时阅读和标记来自互联网最黑暗角落的种族主义、性别歧视和令人不安的写作内容,包括对性暴力的详细描述,以教会ChatGPT不要复制这种内容。他们的每小时报酬不超过2美元,许多人因这项工作而合理地报告了心理创伤。**ChatGPT做不到的事情**反馈的重要性可以直接从ChatGPT倾向于“幻觉”这一点看出来;也就是说,自信地提供错误答案。如果没有经过训练,即使该主题的相关信息在互联网上广泛可用,ChatGPT也无法给出好的答案。你可以亲自尝试,向ChatGPT询问更隐秘或更常见的事情。我发现特别有效的方法是让ChatGPT总结不同虚构作品的剧情,因为看起来模型在非虚构内容上的训练更加严格。在我自己的测试中,ChatGPT对J.R.R.托尔金非常著名的《指环王》的剧情摘要只有少数几处错误。但对吉尔伯特和沙利文的《庞齐海盗》以及厄休拉·K·勒古恩的《黑暗的左手》的摘要——这两部作品略显小众,但远非罕见——几乎是在用角色和地名玩填字游戏。维基百科页面的内容再好也没用。模型需要的是反馈,而不仅仅是内容。因为大型语言模型并不真正理解或评估信息,它们依赖人类代劳。它们寄生在人类的知识和劳动之上。当新的来源被纳入训练数据集时,它们需要重新训练,学习是否以及如何基于这些来源构建句子。它们无法评估新闻报道是否准确。它们无法评估论点或权衡利弊。它们甚至无法阅读一本百科全书的页面,并只发表与之一致的声明,或准确地总结一部电影的剧情。它们依赖人类来做所有这些事。然后它们再复述并重新组合人类所说的内容,并依赖更多的、人类来判断它们是否复述和重新组合得当。如果某些主题上的常识发生变化——比如盐是否对心脏有害,或者早期乳腺癌筛查是否有效——它们将需要经过大量的再训练,以融入新的共识。**幕后的许多人**简言之,大型语言模型远非独立人工智能的先驱,它们展示了诸多人工智能系统对设计者、维护者以及用户的完全依赖。因此,如果ChatGPT为你提供了关于某件事的良好或有用答案,请记得感谢那些隐藏在幕后的数以千计甚至百万计的人们,他们撰写了它所解析的词语,并教导它什么是好答案和坏答案。ChatGPT远非一种自主的超级智能,像所有技术一样,没有我们,它什么都不是。本文最初发表于《The Conversation》。阅读原文。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告

scientific

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

大神回归学界:何恺明宣布加入MIT

提取码
复制提取码
点击跳转至百度网盘