通往通用人工智能之路

本集简介

汉娜对话DeepMind联合创始人兼首席科学家肖恩·莱格——这位提出"人工通用智能"概念的先驱，共同探讨其构建路径。肖恩为何认为AGI具有可能性？何时能实现？其形态又将如何？汉娜还将探究通过试错法实现AGI的简明理论，并深入解析MuZero系统：这个从国际象棋到围棋精通复杂棋类的人工智能，如今正将能力泛化至解决现实世界的一系列重要任务。对本系列有任何疑问或反馈，请通过Twitter @DeepMind 或邮件podcast@deepmind.com联系我们。受访嘉宾：DeepMind的肖恩·莱格、多伊娜·普雷库普、戴夫·西尔弗与杰克逊·布罗希尔制作团队主持人：汉娜·弗莱系列监制：丹·哈杜恩制作支持：吉尔·阿奇内库音效设计：艾玛·巴纳比音乐作曲：埃莱尼·肖音响工程：奈杰尔·阿普尔顿编辑：大卫·普雷斯特 DeepMind委托制作感谢所有促成本季内容的同仁！延伸阅读： DeepMind《AGI面临的现实挑战》：https://deepmind.com/blog/article/real-world-challenges-for-agi 麦肯锡《人工通用智能高管读本》：https://www.mckinsey.com/business-functions/operations/our-insights/an-executive-primer-on-artificial-general-intelligence DeepMind《MuZero：无需规则掌握围棋国际象棋将棋及雅达利游戏》：https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules Medium《何为AGI》：https://medium.com/intuitionmachine/what-is-agi-99cdb671c88e 肖恩·莱格《机器智能定义》arXiv论文：https://arxiv.org/abs/0712.3329 戴夫·西尔弗《奖励机制足矣》ScienceDirect：https://www.sciencedirect.com/science/article/pii/S0004370221000862 若喜欢本期节目，请在Spotify或苹果播客留下评价。我们始终期待听众的反馈、新想法或嘉宾推荐！由AdsWizz旗下Simplecast平台托管。个人信息收集及广告用途详见pcm.adswizz.com。

Hannah meets DeepMind co-founder and chief scientist Shane Legg, the man who coined the phrase ‘artificial general intelligence’, and explores how it might be built. Why does Shane think AGI is possible? When will it be realised? And what could it look like? Hannah also explores a simple theory of using trial and error to reach AGI and takes a deep dive into MuZero, an AI system which mastered complex board games from chess to Go, and is now generalising to solve a range of important tasks in the real world. For questions or feedback on the series, message us on Twitter @DeepMind or email podcast@deepmind.com. Interviewees: DeepMind’s Shane Legg, Doina Precup, Dave Silver & Jackson Broshear Credits Presenter: Hannah Fry Series Producer: Dan Hardoon Production support: Jill Achineku Sounds design: Emma Barnaby Music composition: Eleni Shaw Sound Engineer: Nigel Appleton Editor: David Prest Commissioned by DeepMind Thank you to everyone who made this season possible! Further reading: Real-world challenges for AGI, DeepMind: https://deepmind.com/blog/article/real-world-challenges-for-agi An executive primer on artificial general intelligence, McKinsey: https://www.mckinsey.com/business-functions/operations/our-insights/an-executive-primer-on-artificial-general-intelligence Mastering Go, chess, shogi and Atari without rules, DeepMind: https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules What is AGI?, Medium: https://medium.com/intuitionmachine/what-is-agi-99cdb671c88e A Definition of Machine Intelligence by Shane Legg, arXiv: https://arxiv.org/abs/0712.3329 Reward is enough by David Silver, ScienceDirect: https://www.sciencedirect.com/science/article/pii/S0004370221000862 Please leave us a review on Spotify or Apple Podcasts if you enjoyed this episode. We always want to hear from our audience whether that's in the form of feedback, new idea or a guest recommendation! Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

欢迎回到DeepMind播客，我是汉娜·弗莱。在本系列节目中，我一直在拜访那些致力于未来科技研发的人们。每当我和这里的人交谈时，根据对象不同，对AGI的理解也略有差异。对某些人而言，AGI意味着人类水平的通用智能；而该领域的其他人则对AGI持更怀疑的态度。

Welcome back to DeepMind the podcast with me, Hannah Fry. In this series, I've been meeting the people who are hard at work on the technologies of the future. And whenever I talk to people here, it means something slightly different depending on who you ask. For some, AGI signifies a human level general intelligence. Others in the field are more dismissive of AGI.

Speaker 0

一位著名研究者将相信AGI比作相信魔法。在前几期节目中，我们探讨了一些具体能力——比如语言、协作和物理智能等研究者认为能引领我们接近AGI的要素。但在本期节目里，我想花些时间深入探讨AGI本身的定义。它究竟是什么？它的形态、感知或声音会是怎样的？

One notable researcher likened a belief in AGI to a belief in magic. In previous episodes, we've explored some of the tangible stuff, capabilities such as language, cooperation, and physical intelligence that researchers believe could help take us towards AGI. But in this episode, I want to spend some time digging into what's meant by AGI itself. What actually is it? What will it look or feel or sound like?

Speaker 0

它将实现什么？拥有AGI的未来真的值得期待吗？如果是的话，在假设只有唯一路径的前提下，最佳实现方式是什么？在第五期《通往AGI之路》中，有太多问题等待解答。如果说有人能定义AGI，那非DeepMind联合创始人兼首席科学家肖恩·莱格莫属。

What will it accomplish? Is a future with ADI even desirable? And if so, what is the best way of getting there if indeed there is only one way? Lots and lots to answer in this episode five, the road to AGI. If anyone can lay claim on defining AGI, it is DeepMind's cofounder and chief scientist, Shane Legg.

Speaker 0

回溯到互联网繁荣的黄金年代，肖恩曾在纽约初创公司WebMind工作，他们当时试图在互联网上创造人类水平的智能。WebMind的创始人是AI研究者本·戈策尔。近十年后，当戈策尔编纂关于能超越狭窄任务范畴的AI论文集时，肖恩提出了命名建议。

Back in the halcyon days of the .com boom, Shane worked at a New York startup called WebMind, where they were attempting to create human level intelligence on the Internet. WebMind's founder was the AI researcher Ben Goertzel. And when, almost a decade later, Goertzel was compiling a book of essays about AI that could excel beyond a few narrow tasks, Shane put forward title.

Speaker 1

我向他提议：如果我们关注的是真正通用的强大AI，不妨直接称之为人工通用智能（AGI）。这个建议被他采纳为书名，之后这个概念就逐渐流行开了。

I suggested to him, well, if we're interested in powerful AIs that are really general, we should just call artificial general intelligence, AGI. And so I proposed that to him, and he put that as the title of his book, and it sort of caught on after that.

Speaker 0

同年（2007年），肖恩在一篇著名论文中进一步阐述这个概念，将智能定义为智能体在广泛环境中实现目标的能力。如今你会对这个定义做出修正吗？

That same year, in 2007, Shane elaborated on the concept further. In a famous paper, he defined intelligence as an agent's ability to achieve goals in a wide range of environments. Would you make any modifications today?

Speaker 1

我不会修改定义，但会考虑补充和扩展。关键是如何将这个超级抽象的理论概念，转化为更贴近我们现实世界的智能形态——或许更类似于人类在这个世界所展现的智能。

I wouldn't modify it, but I would look at adding and extending. The question is how do you go from this super general theoretical notion to something that's more like intelligence in the world that we happen to live in, and maybe like more like, say, human intelligence in that world.

Speaker 0

但对Shane来说，AGI不仅仅是机器对人类智能的简单复制。

But for Shane, AGI won't simply be a replication of human intelligence inside a machine.

Speaker 1

我认为存在超越人类能力的通用性和能力层级，这并不令人意外。我是说，鸟类可能飞得快，但机器能飞得更快。大象能用鼻子举起重物，但起重机可以举起更重的东西。所以我确实预期会有比人类知道更多、记忆更强、推理更深的机器出现。

I think there are levels of generality and capabilities beyond what humans have, and that shouldn't be surprising to us. Mean, birds might fly fast, but machines can fly faster. Elephants can lift heavy things with their trunks, but a crane can lift something much heavier. So I I do expect that there will be machines that will be able to know more, remember more, reason more deeply than humans.

Speaker 0

这是个诱人的前景——一个名为AGI的智能、多才多艺的问题解决系统，能做人类能做的事，而且做得更好。但如果这听起来像科幻小说，那你并非唯一这么想的人。

It's a tantalizing prospect. An intelligent, versatile, problem solving system called AGI that does the things humans can, only better. But if that sounds like science fiction to you, well, you wouldn't be the only one.

Speaker 1

如果你回溯十到十二年前，整个AGI的概念还属于疯狂边缘。人们真的会翻白眼然后直接走开。你遇到过这种情况吗？是的，多次。

If you go back ten, twelve years ago, the whole notion of AGI was lunatic fringe. People would literally roll their eyes and just walk away. You had that happen? Yes, multiple times.

Speaker 0

是啊，其中还包括你尊敬的人。

Yeah. People you respected.

Speaker 1

没错。这个领域的人真的会翻个白眼就走开。

Yeah. People in the field would just literally roll their eyes and just walk away.

Speaker 0

后来你有机会再见到他们吗？

Have you had the chance to meet them since?

Speaker 1

后来我又遇到了他们中的不少人。甚至有些人在多年后申请了DeepMind的工作。不过那时候这个领域确实，你知道的，虽然零零星星有些进展，但强大的通用人工智能和快速突破看起来还非常非常遥远。

I have met quite a few of them since. There have even been cases where some of these people applied for jobs at DeepMind years later. But yeah, it was a field where, you know, there were little bits of progress happening here and there, but powerful AGI and rapid progress seemed like it was very, very far away.

Speaker 0

但考虑到过去十年人工智能的迅猛发展，从生物学理解到围棋博弈等各个领域，至少对某些人来说，通用人工智能已不再显得那么天方夜谭。现在人们还会对此嗤之以鼻吗？

But given the rapid progress in AI over the past ten years, in everything from understanding biology to the game of Go, for some at least, AGI no longer seems such an outlandish idea. Do people still roll their eyes?

Speaker 1

这种情况每年都在减少。

Every year it becomes less.

Speaker 0

二十多年来，肖恩一直在默默预测他认为通用人工智能会何时出现。

For over twenty years, Shane has been quietly making predictions of when he expects to see AGI.

Speaker 1

我一直觉得2030年前后出现的概率大约是五五开。现在依然认为这个判断是合理的。如果你看到过去十年的惊人进步，并想象未来十年我们能取得类似的突破，或许十年内就有可能出现通用人工智能。如果十年内不行的话，嗯，我不知道，大概三十年内吧。

I always felt that somewhere around 2030 ish, it was about fiftyfifty chance. I still feel that seems reasonable. If you look at the amazing progress over the last ten years, and you imagine in the next ten years we have something comparable, maybe there's some chance that we will have an AGI in a decade. And if not in a decade, well, I don't know, say three decades or so.

Speaker 0

那你认为它会是什么样子？

And what do you think it will look like?

Speaker 1

它可能呈现多种形态。因为根据定义，AGI中的'G'代表通用性。它能处理语言、进行推理、做些数学运算，能编写计算机程序，甚至还能创作诗歌。

It could take many forms. Because by definition, the G in AGI is about generality. It can deal with language, it can reason, it can do some mathematics. It can program computers. It could write some poetry.

Speaker 1

它可能呈现多种不同形态。可能是一种类似谷歌的服务平台，你可以向AI系统咨询问题。未来它可能以机器人形态存在，或者融入城市基础设施，成为许多人同时交互的对象。

And it could take multiple different forms. It could be a service that you go to, sort of like a Google or something, where you can consult the AI system about something. It could be embodied in a robot at some point in the future, or it could be, say, in the fabric of a city, and it could be something that many different people interact with at the same time.

Speaker 0

但如果AGI能以多种形式存在，肖恩第一次见到时该如何识别它呢？

But if AGI could take a variety of forms, how will Shane recognize it when he sees it for the first time?

Speaker 1

我想象中的场景是某种三维模拟环境。能够与智能体对话，它能回应你，真正展现出解决前所未见新问题的能力，且水平与人类相当。关键在于它能运用对世界的理解，结合过往处理其他问题的经验进行类比推理——当看到这种能力时，我就会认为这可能是个AGI。

What I imagine is some sort of simulated three d environment or something rather. And being able to talk to the agent, the agent can talk back, and really seeing that the agent is able to solve novel problems that it hasn't seen before to a level that is comparable to that of a human. And so it's about that ability to use its understanding of its world and its previous experiences with other problems and draw parallels and analogies with other things, that to me will be the point at which I go, okay, maybe we have an AGI here.

Speaker 0

你会注意到肖恩描述的是模拟环境中的AGI，而非现实世界的。你可能想问：这算数吗？

You'll notice that what Shane is describing is an AGI in simulation as opposed to in real life. And you might be wondering, does that count?

Speaker 1

有些人非要看到它在现实世界活动才认可。我认为模拟环境完全可以构建得足够复杂——智能体在高度拟真的模拟环境中解决未知问题的能力，在我看来就足够了。之后我们自然能跨越到现实世界。况且现有部分算法已在现实世界应用，比如视觉算法和控制算法都能操控真实机器人。

Some people won't accept that until it's actually running around the real world. I think simulated environments can be made sufficiently complex that the ability to solve novel problems that the agent hasn't seen before in fairly rich simulated environments, I think that'll be enough. And then we'll be able to cross the barrier to real world. And we know some of the algorithms that we use work in the real world. A lot of the vision algorithms, a lot of the control algorithms can control real robots and so on.

Speaker 0

你提到的AGI预期能力，如语言和具身智能，现在已有高度成熟的智能体分别实现了这些功能。你认为这是通向AGI的路径吗？

Some of those aspects that you described as expecting an AGI to have, so language and embodied intelligence, There are really sophisticated agents that can do each of those already. Is that the route to AGI do you think?

Speaker 1

现有算法能完成非常特定的任务，但要协调运作这些能力就困难得多。当前算法在深度泛化能力——那种可称为概念化的能力——方面存在明显缺陷。比如算法能发现血液检测结果与疾病关联的数据模式，甚至能区分猫狗图像。但遇到更抽象、概念化的任务时就会失效。

There are algorithms that can do very particular things, but doing them together in a really coordinated way seems to be a lot harder. There seems to be something missing around the ability for algorithms to generalize in quite deep ways that you might regard as conceptual. So an algorithm can see a pattern in some data where people with certain results in their blood tend to have a disease versus not have a disease and so on. And they can even go further than that to things like recognizing dogs versus cats. Where they fall down though is when you have something that's more abstract and conceptual in nature.

Speaker 1

它们倾向于对海量数据进行相对简单的泛化处理，效果非常显著。但如果你试图让它们对从未见过的数据进行泛化，这往往是人类能做到而它们无法完成的。

They tend to do somewhat simple generalization over vast, vast amounts of data. And the result is very effective. But if you really try to push them to generalize in some way that's outside of any data they will have seen, often it's something that a human can do, but they can't do.

Speaker 0

我们将在最后一期节目中与DeepMind首席执行官Demis Hassabis对谈，进一步探讨通用人工智能（AGI），包括这项技术的最大机遇与风险。不过目前，一组研究人员认为他们找到了一条可能最终通向AGI的路径。请记住这一点。

We'll be exploring AGI further, including the biggest opportunities and dangers of this technology, in our final episode when I sit down with DeepMind CEO, Demis Hassabis. For now though, a group of researchers think they have found a pathway that could eventually lead all the way to AGI. Remember this.

Speaker 2

AlphaGo成为首个在围棋比赛中夺冠的计算机程序，这是人工智能领域的重大成果。

AlphaGo became the first computer champion at the game of Go, and it was the major result for artificial intelligence.

Speaker 0

所以你赢得了赌注？

And you won the sweepstake?

Speaker 2

是的，我赢得了赌注。

And I won the sweepstake.

Speaker 0

在第一季中，我们讲述了一个如今已成为人工智能传奇的故事。2016年，全球超过2亿观众见证了DeepMind的AlphaGo系统在极具挑战性的围棋比赛中击败人类世界冠军李世石。AlphaGo基于的机器学习技术被称为强化学习，本季已多次提及。这项技术同样被用于训练AI协作和机器人行走。但AlphaGo首席科学家David Silva（即刚才片段中的声音）认为，这项技术可能具有更强大的潜力。

In season one, we told a story that has now become something of an AI legend. In 2016, a worldwide audience of over 200,000,000 people watched as a DeepMind system called AlphaGo beat human world champion Lee Sedol at the notoriously difficult board game of Go. The machine learning technique on which AlphaGo was based is known as reinforcement learning, and we've already met it several times in this season. It's the same idea that's being used to train AIs to cooperate with each other and robots to walk around. But David Silva, the principal scientist behind AlphaGo, whose voice you heard in the clip just there, believes the technique could prove even more powerful still.

Speaker 2

近年来，强化学习已经成熟到真正开始在实际应用中大规模使用的阶段。因此人们开始认真考虑强化学习在应对人工智能重大课题方面的潜力。

In the last few years, reinforcement learning has come of age to something which is really starting to see at scale applications in the real world. As a result, people are ready to take seriously the potential of reinforcement learning to really grapple with some of the big questions of AI.

Speaker 0

概括来说，强化学习与现有的其他主要机器学习类型不同，具体包括

To recap, reinforcement learning is distinct from the other main types of machine learning out there, namely

Speaker 2

监督学习，即存在一个指导者告诉机器该做什么。它会指示在特定情境下应采取的正确行动，然后机器需要尝试复现这一决策。

Supervised learning, where there's a teacher that tells the machine what to do. It says, this is the right thing to do in this situation. And then the machine has to try and replicate that decision.

Speaker 0

那些略显烦人的'我不是机器人'在线验证测试就是监督学习的实例。每次你在图片中识别交通灯或摩托车时，都在帮助训练图像识别算法自动分类这类图像。

Those mildly annoying I am not a robot tests you have to complete online are an example of supervised learning. Every time you identify a traffic light or a motorbike in an image, you're helping to train an image recognition algorithm to classify those types of images automatically.

Speaker 2

其次是无人监督学习，这种模式下系统完全不会获得人类反馈，它只能自行发现数据中的模式，但并不真正理解这些模式的用途。系统既没有既定目标，也得不到任何形式的反馈。

Then there's unsupervised learning where there's no feedback at all from human and what the system can do is to kind of figure out patterns in its data, but it doesn't really know what to do with those patterns. It doesn't give it a goal or any kind of feedback

Speaker 0

无人监督学习的典型例子是聚类算法。比如将照片按类型分组的算法，例如自然风景、聚会或体育赛事等不同场景的照片分类。

A quintessential example of unsupervised learning is clustering. Like the algorithms that group photographs according to type. Photos taken in nature or at a party or at a sporting event, say.

Speaker 2

然后就是强化学习。人类会以奖励形式给予'好'或'坏'的反馈。因此强化学习的目标就是尝试最大化这个奖励信号。

And then there's reinforcement learning. The human gives feedback saying good or bad in the form of reward. So the goal of reinforcement learning is to try and maximise this reward signal.

Speaker 0

这个奖励信号只是一个正数（比如+1），用于告诉算法其刚执行的动作有利于实现整体目标。专精此技术的著名研究员Doina Prekup曾向我解释她如何在日常生活中运用这一原理。

This reward signal is just a positive number, a plus one, which tells the algorithm that the action it has just performed is conducive to its overall goal. Doina Prekup, a renowned researcher who specializes in this technique, explained to me how she made use of it in her everyday life.

Speaker 3

我记得孩子们还小的时候，我们试图让他们收拾玩具放进玩具箱。于是我们建立了一个奖励机制——如果他们收拾玩具，就能得到特别的小奖励，比如一小块巧克力。我认为通过这种奖励方式，我们实际上可以训练出非常复杂的行为模式。

I remember when our kids were young, we were trying to get them to pick up their toys and put them in the toy chest. And so we instituted a reward system by which there was a special treat, a small chocolate, if they picked up their toys. And so I think we can actually train very complex behaviors by using these rewards.

Speaker 0

稍后我们会回到多伊娜基于AI的生活技巧，以及奖励训练中的一些陷阱。现在先回到机器话题。迄今为止，DeepMind已创造出许多专注于特定领域的独立强化学习智能体，比如围棋领域的AlphaGo。但他们现在正开始开发更多通用型强化学习智能体。在与李世石的历史性对决后，团队的下个目标是打造AlphaZero——另一个通过自我对弈而非职业棋谱数据来学习的棋类AI。

We'll come back to Doyna's AI based life hacks and some of the pitfalls with reward training a little bit later on. But back to the machines. So far, DeepMind has created lots of individual reinforcement learning agents that specialize in particular domains, such as AlphaGo in the game of Go. But now they're starting to develop more multipurpose reinforcement learning agents. After the history making match against Lee Sedol, the next step for the team was to build AlphaZero, another board game player, but one that learnt by playing different versions of itself rather than being trained on data from professional matches.

Speaker 0

后来DeepMind又开发了Mewzero。

And then DeepMind built Mewzero.

Speaker 2

我们在Mewzero中提出的核心问题是：如果我们的智能体遇到一个从未见过的新环境（比如新游戏），它必须自行摸索游戏规则，并在这个过程中深度理解环境，最终实现游戏胜利或在现实世界中达成目标。

What we did in Mewzero was we asked, what if our agent is approaching some completely new environment like a game it's never seen before, and it just has to figure out the rules of the game for itself, and in doing so, understand its environment in a sufficiently powerful way that it can actually succeed in winning the game or in achieving its rewards in the real world.

Speaker 0

理论上，能自主掌握游戏规则的AI可以胜任任何新游戏场景。但更重要的是——正如本期节目后续会揭示的——它还能被投入现实场景中自学成功之道。当在围棋、国际象棋和将棋上进行测试时，MuZero在从未接触游戏规则的情况下仍达到了超人类水平。

An AI that can pick up the rules of the game for itself could, in theory, excel at any new gaming scenarios. But crucially, as we'll find out later in this episode, could be thrown into a real world situation and teach itself how to be successful. When it was tested out on the games of Go, Chess, and Shogi, MuZero achieved superhuman performance despite never seeing the rules of the game.

Speaker 2

每当我们从系统中抽离部分先验知识，就为它创造了自主学习与探索的新机会。

Each time we take away a bit of knowledge from the system, we provide it with a new opportunity for it to actually learn and figure out something for itself.

Speaker 0

这太疯狂了——原来我们告诉它的所有信息反而成了障碍。

That's so mad that everything we tell it was just getting in the way.

Speaker 2

没错。

That's right.

Speaker 0

强化学习算法或智能体自学国际象棋或围棋这类棋盘游戏中相对固定的规则是一回事，但要让它理解我们生活的这个复杂现实世界的运行规律，则完全是另一回事。

Now it's one thing for a reinforcement learning algorithm, or agent, to teach itself the relatively fixed rules of a board game like chess or Go, but it's quite another for it to figure out the rules that govern the messy real world that we live in.

Speaker 2

举个简单的例子：假设有个智能体要出门淋雨，它想知道如何保持干燥。如果我们试图向它描述雨滴下落模式以及所处世界的所有复杂因素，很快就会陷入困境。

If you just think of a simple example, we've got an agent and it's going out into the rain and it wants to know how to keep dry. If we try to describe to our agent the pattern by which raindrops fall and all the other complexities of its world, we're quickly just going to become unstuck.

Speaker 0

如果你想构建一个以保持干燥为目标的智能体，要完整建模其环境（包括水循环、大气环流、历史降水数据等）将极其困难。而MuZero算法会聚焦于真正影响目标实现的关键因素。

If you want to build an agent that aims to stay dry, it would be extremely difficult to build an entire model of its environment the water cycle, atmospheric circulation, historic precipitation data, etc. Instead, Mu zero zeros in on the things that really matter to achieve its aim.

Speaker 2

也许智能体只需要明白撑起雨伞就能保持干燥，而不需要理解落在伞面上的雨滴分布模式。

Maybe the agent needs to understand that if it puts its umbrella up, that will keep it dry. But it doesn't need to understand the pattern of raindrops that fall on top of the umbrella.

Speaker 0

我想智能体还需要知道，如果遇到狂风天气还试图用伞，就无法避免淋湿——这时雨衣可能更合适。这是否意味着现实世界存在细微差别，而通过指令性教导智能体，反而会扼杀那些需要结合具体情境才能形成的微妙理解？

I guess the agent also needs to know that if it's really gale force winds and it tries to use an umbrella, then it won't work to stop it from getting wet. And maybe a Raymac would be better. Is part of the idea here that there's nuance in the real world and that by instructing agents on what to do, you're kind of trampling over any of that opportunity for subtle understandings that are context specific.

Speaker 2

正是如此。

Exactly.

Speaker 0

MuZero并不追求全面理解周围环境，而是专注于对规划真正重要的关键部分。它当前的处境如何？它上次采取的行动对实现目标效果怎样？以及下一步该采取什么最佳行动。

Rather than getting a total understanding of what's going on around it, MuZero just focuses on the bits that are really important for planning ahead. How good is its current position? How good was the last action it took at achieving its aim? And which action is the best to take next.

Speaker 2

即便系统构建的世界模型完全不合常理也无所谓。比如它可能认为雨滴是凭空出现并打湿雨伞的。只要在我们关心的三个量度上表现正确，这就足够了。

It doesn't matter if the system builds a model of the world which is completely crazy. You know, maybe it thinks that raindrops magically appear and make the umbrella wet. That would be fine as long as it gets everything right in terms of the three quantities we care about.

Speaker 0

MuZero不仅仅是一个知道何时该带伞出门的算法。大卫和他的同事们认为它可能成为通往

MuZero is more than just an algorithm that knows when to step outside with an umbrella. David and his colleagues believe it could also be a milestone on the

Speaker 1

道路

way

Speaker 0

AGI的里程碑。去年，大卫合著了一篇标题颇具争议的论文《奖励即足够》。他认为仅靠强化学习就能一路通向人工通用智能。

to AGI. Last year, David co authored a provocatively titled paper called Reward is Enough. He believes reinforcement learning alone could lead all the way to artificial general intelligence.

Speaker 2

我们真正要论证的是：从感知到知识，从社交智能到语言，所有智能能力都可以被理解为追求增加智能体获得奖励的单一过程。如果这个假设成立，就意味着我们只需要解决智能领域的一个核心问题，而非针对每种能力解决千百个不同问题。

We're really arguing that all of the abilities of intelligence from perception to knowledge to social intelligence to language can be understood as a single process of trying to increase the rewards that that agent gets. If this hypothesis was true, it would mean that we only need to solve one problem in intelligence rather than a thousand different problems for each of the separate abilities.

Speaker 0

以松鼠为例，在追求单一目标（收集坚果）的过程中，它发展出了多种不同的能力。

Take the example of a squirrel, which, in the pursuit of a single goal, collecting nuts, develops lots of different abilities in process.

Speaker 3

即便是这么简单的事情，也需要松鼠具备一定的智力特征。

Even that simple thing requires the squirrel to acquire some traits of intelligence.

Speaker 0

多伊娜·普雷库普是《奖励即足够》论文的合著者之一。

Doina Prekup is a co author of the Reward Is Enough paper.

Speaker 3

显然，这包括爬树、捕捉昆虫的体能，但也需要社交智慧——因为如果松鼠把坚果藏在某处，它会想对其他松鼠隐藏这些坚果。对吧？所以它必须推理其他松鼠可能怎么想，必须记住自己把坚果藏在哪里，还需要提前规划。

So obviously physical ability to be able to climb trees, to access gnats, but also social intelligence, because if a squirrel is hiding gnats somewhere, it would want to camouflage them from other squirrels. Right? So it has to reason about what other squirrels might think. It has to remember where it's put the nuts. And they also have to plan ahead.

Speaker 3

冬天没有坚果，所以松鼠某种意义上必须在秋天提前储备坚果。所有这些智力特征，其实都源于松鼠对特定奖励函数的优化。

During the winter, there's no nuts, and so the squirrel in in the fall, in some sense, has to get the nuts ahead of time. So all of these traits of intelligence can actually come from the squirrel optimizing a particular reward function.

Speaker 0

可以看出这种‘坚果奖励’机制如何作用于体能发展——松鼠为获取最美味坚果而锻炼肌肉、变得更敏捷。但语言能力呢？正如本系列所述，当前语言模型并非基于强化学习算法，而是基于一种称为Transformer的神经网络。基于奖励的算法在这里能奏效吗？

You can see how this nutty reward might work for things like physical prowess. The squirrel developing its muscles and becoming more agile as it strives for the tastiest nuts. But what about something like language? As we've heard in this series, current language models are not based on reinforcement learning algorithms, but on a different type of neural network known as a transformer. Could a reward based algorithm work here?

Speaker 2

本质上是可以的。这个假说认为语言可能以截然不同的方式习得——就像学习‘躲避’这个词的意义：如果你不躲避，球或石头砸中头部就会得到负面奖励；或者你意识到求助是有益的，因为会有人来帮忙。进一步推想，更复杂的句子可能导致他人以更复杂的方式协助你，甚至合作耕种获取食物。

Essentially, yes. The hypothesis is the possibility to actually learn language in a very different way. In the same way that it would be helpful to learn what the word duck means, because if you don't duck and a ball hits you or a rock hits you on the head you will experience a negative reward or that you might learn that it's helpful to ask for help and someone will come to your assistance And now you might take that further and imagine that more sophisticated sentences might lead to someone actually assisting you in more complicated ways and helping you to work collaboratively to farm for food.

Speaker 0

我很好奇大卫愿意将自己的假说推进到什么程度。你是否会把自己的日常生活视为目标优化过程，不断寻找你试图优化的奖励是什么？

I was intrigued as to how far David was willing to push his hypothesis. Do you sort of find yourself looking at your everyday life as goal optimising processes and constantly like looking out for what the reward is that you're trying to optimise?

Speaker 2

我得坦白承认。我确实会给自己设定目标，并认定那就是我要追求的回报。但同时我认为，关于人类的宏观图景其实非常混乱，我们很难用'此刻我正在优化这个回报'来解释日常行为。我不认为事情那么简单。我觉得更像是存在一个关于智能的终极目标，也许是进化赋予我们大脑去尝试实现的某种东西，比如我们不喜欢痛苦这类本能。

I've got to come clean. I do set myself goals and say right that's my reward I'm gonna go for it. At the same time I think the big picture of people you know is a very messy one that it's really hard for us to explain our day to day actions in terms of oh right now I'm optimizing this reward. I don't think it's quite like that. Think it's more like there's an overarching goal for intelligence maybe something which evolution bestowed our brains to try and achieve, know, maybe we don't like pain, for example.

Speaker 2

而其他所有事情，那些我们时刻都在努力达成的目标，都像是子目标。我们选择这些子目标时——比如接下来要做什么工作？今晚晚餐吃什么？——这些在某种程度上都是为了服务于我们由进化驱动的总体目标。

And all the rest of it, all these things which from moment to moment we're trying to achieve something, those are like sub goals. And all of those sub goals we pick like am I going to work on next? What am I going to eat for dinner tonight? These are all somehow in service of our overall evolutionary driven goals.

Speaker 0

你认为人类的终极目标是什么？

What do you think is the overall goal for humans?

Speaker 2

这是个非常深刻的问题。哲学家们自古以来就在探讨这个问题，我无法给你一个令人满意的答案。

It's a really profound question. I mean, philosophers have been asking this since day dot, and I won't be able to give you a satisfying answer.

Speaker 0

某种程度上，我其实是在问你生命的意义是什么。

I suppose in a way, I'm sort of asking you what's the meaning to life.

Speaker 2

确实如此。所以这是个很难回答的问题。

I think you are. And that's why it's a difficult question.

Speaker 0

我能说什么呢？你来参加一个关于人工智能的播客，结果却在探讨生命的意义。不客气。不过我们跑题了，还是回到机器话题上吧。

Well, what can I say? You come for a podcast about artificial intelligence and end up speculating on the meaning of life. You're welcome. But we digress. Back to the machines again.

Speaker 0

这会改变游戏规则吗？

Is this a game changer?

Speaker 3

我想时间会证明一切。过去六十年来，人工智能的发展历程之一就是人们在特定领域取得了进展。对吧？计算机视觉大幅提升，语言处理也进步显著。

I guess time will tell. One of the stories of AI for the past sixty years or so has been that people have made progress in particular niches. Right? Computer vision has gotten a lot better. Language processing has gotten a lot better.

Speaker 3

要将所有这些能力整合到一个智能体中仍然非常非常困难。但如果我们通过最大化奖励函数的方式训练智能体，那么所有这些能力可能会在一个智能体中自然涌现，并且从一开始就相互关联。

It's still very, very hard to integrate all these things into one agent. But if we train an agent in one way by maximizing the reward function, then all of these things might emerge naturally in one agent and be connected to each other from the get go.

Speaker 0

但并非DeepMind所有人都相信仅靠强化学习就足以实现通用人工智能。以下是机器人技术总监Raya Hadzell的看法。

But not everyone at DeepMind is convinced that reinforcement learning on its own will be enough for AGI. Here's Raya Hadzell, Director of Robotics.

Speaker 4

我通常的问题是：我们该从哪里获得这种奖励？设计奖励机制很难，也很难想象存在一种能驱动学习所有其他事物的终极奖励。

The question I usually have is where do we get that reward from? It's hard to design rewards and it's hard to imagine a single reward that's so all consuming that it would drive learning everything else.

Speaker 0

我就这个关于设计全能奖励的难题向David Silver提出了疑问。

I put this question about the difficulty of designing an all powerful reward to David Silver.

Speaker 2

实际上我认为这个问题有点偏离重点——或许我们可以往系统里输入几乎任何奖励信号，只要环境足够复杂，在最大化奖励的过程中就会发生惊人的事情。也许我们不必纠结‘什么是智能最终涌现的正确条件’这类问题，而应该接受存在多种智能形式的事实，每种智能都在优化自己的目标。未来的AI可能有些在控制卫星，有些在驾驶船只，还有些在下国际象棋——它们都可以发展出各自的能力，以最有效的方式实现其目标。

I actually think this is just slightly off the mark, this question, in the sense that maybe we can put almost any reward into the system and if the environment's complex enough amazing things will happen just in maximizing that reward. Maybe we don't have to solve this what's the right thing for intelligence to really emerge at the end of it kind of question and instead embrace the fact that there are many forms of intelligence each of which is optimizing for its own target and it's okay if we have AIs in the future some of which are trying to control satellites and some of which are trying to sail boats, and some of which are trying to win games of chess. And they may all come up with their own abilities in order to allow that intelligence to achieve its end as effectively as possible.

Speaker 0

《奖励即足够》这篇论文着眼于强化学习如何通向通用人工智能的长期愿景。但短期来看，当前算法远非完美。该技术最著名的问题之一被称为信用分配问题——算法有时难以判断究竟是哪个动作导致了特定奖励。让我们回到多伊娜和她的AI生活技巧。

The Reward Is Enough paper focuses on a long term vision of how reinforcement learning could lead to AGI. But in the shorter term, the current generation of algorithms are far from perfect. One of the notorious problems with this technique is known as the credit assignment problem. It's sometimes difficult for the algorithm to work out which of its actions led to a particular reward. Let's go back to Doyna and her AI life hacks.

Speaker 0

由于多伊娜的孩子们整理玩具与获得奖励（餐后零食）之间存在时间差，他们花了很久才把整洁行为和吃巧克力联系起来。而当他们建立这种联系后，就找到了钻奖励机制空子的方法。

Because of the time lag between Doyna's children tidying away their toys and receiving their reward, an after dinner treat, it took a while for them to make a connection between being tidy and eating chocolate. And when they did make that connection, they figured out a way to hack the reward function.

Speaker 3

他们陷入了这样的循环：把玩具箱里的东西全拿出来又放回去，反复操作。这明显表明他们迅速领悟了如何优化信号获取。为此我们不得不重新设计信号机制，以防止这类行为。

They got into this loop where they would take everything out of the toy chest and put it back in, and then take it all out and put it back in. A pretty clear indication that they very fast figured out how to optimize for the signal. So we had to re device what the signal was in order to make sure that we didn't get this kind of behavior.

Speaker 0

信用分配问题的部分原因在于，当前强化学习算法缺乏称为时间抽象的能力——即考虑行为在长时间尺度上潜在影响的能力。这种能力是人类所具备的。

Part of the reason for this credit assignment problem is that current reinforcement learning algorithms lack a skill known as temporal abstraction, the ability to consider the potential repercussions of actions over long time scales. This is a capability that humans do have.

Speaker 3

假设某人考虑读研究生，你需要在脑海中压缩这几年的时间跨度，预判最终可能的结果：比如可能会获得更好的工作等。这与规划下周购物清单属于不同层级的计划，但背后的规划算法可以是相同的。如果能实现更高层级的抽象，那么远期规划及长期信用分配问题就能迎刃而解。

Let's say somebody's contemplating going to graduate school. You really have to compress in your mind this time period of a few years and try to understand what the situation might be at the end of that, you know, what might be the rewards that are going to happen, maybe you'll have a better job and so on. That's a different level of planning than, let's say, grocery shopping for next week. But the algorithm for doing the planning actually can be the same algorithm. And if you can have it work at a higher level of abstraction, then actually this problem of forward planning and also credit assignment over that long period of time can be solved.

Speaker 0

目前尚无人知晓答案。没人知道如何实现通用人工智能，也不确定强化学习是否真能胜任。但这就是科学的过程：假设、猜想、实验与辩论。有请肖恩·莱格再次发言。

No one yet knows the answers here. No one knows how to get to AGI or if reinforcement learning will indeed be enough. But this is the process of science hypothesis, conjecture, experiments and debate. Here's Shane Legg again.

Speaker 1

我认为在实际操作中，结合其他类型的学习算法更可能推动人工智能进步。比如同时运用强化学习、监督学习等多种方法。

I think in practical terms, you're more likely to make progress in artificial intelligence by having other kinds of learning algorithms in there. So you would have some reinforcement learning going on, you'd have some supervised learning and other things.

Speaker 4

毫无疑问，强化学习是解决方案的一部分。但我不能确定它是唯一的解决方案。人类和动物始终在进行多种类型的学习。我认为一个学习系统理应也能利用这些不同的反馈来源。

There's no doubt that reinforcement learning is part of the solution. I'm just not sure that it is the only solution. Human beings and animals as well have a lot of different types of learning that they are doing all the time. I think it just makes sense that a learning system would also take advantage of these different sources of feedback.

Speaker 2

确实，现有的强化学习算法还无法在复杂环境中最大化任意奖励。我们面临的许多问题虽然困难，但并非哲学上无解的难题。因此我相信，当全球各地的研究团体齐心协力攻克这一问题时，我们终将找到解决方案。当然这只是个假设——我无法保证存在足够强大的强化学习算法能完全实现这个目标。

It's certainly the case that reinforcement learning algorithms, as they stand, are not capable of maximizing arbitrary rewards in complex environments. Many of the issues that we face, they're hard issues and yet they're not philosophically intractable issues. And so I do believe that at some point when communities across the world put our minds to tackling this problem we will find the solutions. But of course this is a hypothesis. I cannot offer any guarantee that reinforcement learning algorithms do exist which are powerful enough to just get all the way there.

Speaker 2

然而事实是，如果我们能做到这一点，它将为我们提供一条通往通用人工智能的道路——仅此一点就值得我们付出极大努力。

And yet the fact that if we can do it, it would provide a path all the way to AGI should be enough for us to try really, really hard.

Speaker 0

无论通过何种途径实现，DeepMind内部普遍相信通用人工智能的到来会比预期更早。在这个过程中，各种新算法将不断涌现，其中许多现在就能应用于现实问题。还记得我们的Mu0吗？这个强化学习智能体的特别之处在于它通过赢得游戏和解决实际问题证明了自己的价值。应用团队的产品经理Jackson Brochier专门负责为AI寻找现实应用场景。

Whichever road it takes to get there, there is a palpable belief at DeepMind that artificial general intelligence is coming down the track sooner rather than later. And along the way, all sorts of new algorithms will emerge that can be usefully applied right now to problems in the real world. Remember our friend Mu0? You'll recall that what is particularly special about this reinforcement learning agent is that it's earned its stripes winning games and solving practical problems. Jackson Brochier is a product manager in the applied team, which specializes in finding real world uses for AI.

Speaker 5

现实问题往往复杂且难以明确定义，AmidaZero让我们能够将智能体置于环境中，给定明确目标后，它可以通过规划搜索找到实现该目标的最佳策略。

Real world problems are messy and hard to explicitly define, and so AmidaZero gives us the capability to put an agent into an environment, give it an explicit goal, and it can then plan and search through that environment to find optimal strategies to achieve that goal.

Speaker 0

当现实问题与其前身设计的棋盘游戏具有相似性时，MuZero表现最佳。这类问题需要存在正确答案或获胜方式，需要海量训练数据，并且最适合那些存在巨量可能行动、无法逐一穷举的场景。关键在于：哪些现实问题适合这类AI解决？

MuZero works best when a real world problem shares similarities with the board games its predecessors were designed for. There needs to be a right answer, a way to win, as it were. There needs to be lots and lots of data available for it to train on. And the algorithm is best suited to problems with a vast number of possible moves or actions that it would be impossible to search through one by one. The question is, what kinds of real world problems might be suitable for such an AI?

Speaker 2

我们正在研究的应用方向之一是视频压缩。互联网上大量流量实际上是视频数据。如果能更高效地压缩视频，将节省巨额流量成本，从而降低能耗和费用。

One of the applications we've been looking at is video compression. A vast amount of traffic on the Internet is actually taking the form of video. So if you can actually compress videos more effectively, you can save a huge amount of traffic and therefore energy and cost.

Speaker 0

从背景来看，据估计全球80%以上的互联网流量被视频流媒体和下载所消耗。这可能是一部让人欲罢不能的电视剧、与同事的视频通话、教育研讨会，或者我也不知道，也许是些不那么高深的内容。关键在于每当有人上传视频到互联网时，文件会被压缩或编码以提高传输效率。这需要权衡取舍——你希望在尽可能压缩视频文件大小的同时，最小化画质损失。

To put this in context, it's estimated that over 80% of global Internet traffic is consumed by streaming and downloading video. That might be a bingeable TV drama, a video call with a colleague, an educational seminar, or I don't know, maybe something a bit less highbrow. The point is that whenever someone uploads a video to the Internet, the file gets compressed or encoded to make the process of sending it more efficient. There's a balance to be struck. You want to compress the video file as much as you can while losing as little as possible in terms of quality.

Speaker 0

视频质量部分取决于其比特率。

The quality of the video is partially determined by its bit rate.

Speaker 5

那些在互联网上传输视频时使用的0和1数据。

The ones and zeros that you're using to send that video across the Internet.

Speaker 0

杰克逊在给孩子们上完家庭教育课后与我进行了视频通话，他是这样解释的。

Jackson joined me on a video call between homeschooling lessons with his children, and here is how he explained it.

Speaker 5

你拥有一定的数据配额，需要合理利用这些比特。当前最先进的技术已经能运用多种技巧来优化这些0和1数据的视频传输。例如，如果视频中有某些部分在多个片段中保持不变，系统就能学习识别这些部分并存储复用，在视频传输时直接调用。

There's a allowance that you have and how you can use those bits. And the state of the art has gotten really good at creating a lot of tricks for how to use those ones and zeros to send over the videos. So for example, if there's a part of the video that's constant throughout each section of the video, it can learn to take that section and save it and reuse it on the other side where it's sending the video.

Speaker 0

就像现在，我看着你身后的家庭教育教室背景。当画面传输过来时，算法会识别出这些背景帧与帧之间没有变化，从而采用这种机制来提高效率。

Like now, I'm looking at you and you've got the background of your homeschool classroom. As it's coming to me, the algorithm would recognize that that isn't changing frame to frame and just uses that as a way to make it more efficient.

Speaker 5

没错，所以背景里这些水母其实是之前画面的水母，并非当前时刻的。我们运用了许多类似技巧来提升视频编码效率。这正使其成为强化学习的绝佳应用场景——我们完全可以把视频编码当作一场游戏来对待。

Yeah, so these jellyfish in the background, they're the jellyfish of a little while ago, not the jellyfish of this moment. And so there's lots of tricks like this that we can use to make video encoding very efficient. And so this is where it's framed itself up as a great reinforcement learning problem, because we can go in and essentially treat video encoding as a game.

Speaker 0

这是一场关于如何在保证视频质量的前提下尽可能节省比特的战略游戏。如果MuZero在视频开头过度分配比特给帧数，比如给那些静态水母画面超高分辨率，那么当视频中突然出现更多动作时，它可能会陷入困境。MuZero会与自己进行数千次这样的博弈，直到找到最佳的比特分配方案。

This is a strategic game of how to be as stingy as possible without compromising video quality. If MuZero splurges on bits for the frames at the start of the video, giving those static jellyfish a really high resolution, say, it might find itself in trouble later if suddenly there's more action in the video. MuZero plays this game against itself thousands of times until it arrives at the optimum spread of bits.

Speaker 5

我们看到比特率优化带来了略高于6%的提升。这意味着通过网络传输的视频体积直接缩小了6%。

We're seeing a little over 6% improvement in the bit rate optimization. So that directly correlates to videos that are six percent smaller being sent across the Internet.

Speaker 0

6%的节省听起来可能不多，但放大到整个互联网规模时，这就是相当可观的节约了。

A 6% saving might not sound like a lot, but scale that up to the whole of the Internet, and it's quite a significant saving.

Speaker 5

在视频传输的能源消耗方面存在碳减排潜力。而我认为最令人兴奋的是对用户的影响——我们实际上能让更多人获取内容。在许多地区，数据是按固定限额出售的。当你用尽当日数据限额时，网络就会被切断。

There's potential for carbon savings in the energy use for transmitting video. And then I think most exciting of all is the user impacts. So we're actually bringing content to more people. So there's lots of regions where data is sold at a fixed limit. So you use up your data limit for the day and your Internet shuts off.

Speaker 5

如果你观看的是教育内容，你支付的成本就更低。这在印度、印尼等新兴市场尤为明显，这里的优化成果直接关系到可获取内容的增加。

Well, you're watching educational content, that's less of a cost that you're paying. It's especially clear in places like India and Indonesia and emerging markets, where gains here directly relates to increased content.

Speaker 0

虽然通用人工智能的终极目标笼罩着DeepMind的几乎所有研究，但利用AI解决现实世界问题始终是这里的重点研究方向。接下来的节目中，我们将深入探讨AI技术如何被应用于科学实验室内外的多个领域。你们所描述的可能是整个医疗健康领域的一次阶跃性变革。

While the overarching goal of artificial general intelligence hangs over pretty much everything that DeepMind is working on, using AI to solve problems in the real world is a significant focus of researchers here. Over the next episodes, we'll be taking a closer look at how AI technology is being applied to several problems, both in the science lab. What you're describing here is a potential step change in all of healthcare and medicine.

Speaker 2

这正是真正理解生物学所蕴含的意义。

That is the implication of truly understanding biology.

Speaker 0

以及在我们的日常生活中。

And in our everyday lives.

Speaker 4

我说，这些是相同的图像吗？它们如此接近，真是令人惊叹。

I said, are these the same images? They were so close. It was remarkable.

Speaker 0

《DeepMind播客》由我汉娜·弗莱主持，由Whistledown Productions的丹·哈东制作。如果你喜欢这个系列，我们会非常感激你能给播客评分并留下评论。下次见。

DeepMind the podcast is presented by me, Hannah Fry, and produced by Dan Hardoon at Whistledown Productions. If you're enjoying the series, we'd be grateful if you could rate the podcast and leave a review. Bye for now.