Google DeepMind: The Podcast - 人工智能,机器人 封面

人工智能,机器人

AI, Robot

本集简介

别再被科幻小说中那些与人类惊人相似的超级智能机器人所迷惑,现实其实更为平淡。在DeepMind机器人实验室里,汉娜探索了研究者们所称的"具身人工智能":那些正在学习抓取塑料积木等任务的机械臂——对人类而言相对简单的动作。了解将人工智能与机器人技术结合的前沿挑战,以及从零开始学习执行任务的过程。她还探讨了在现实世界中安全使用AI的一些关键问题。 如有关于本系列的问题或反馈,请通过Twitter(@DeepMind并使用标签#DMpodcast)留言或发送邮件至podcast@deepmind.com。 延伸阅读: 维多利亚·克拉科夫娜关于AI安全的博客及更多资源 生命未来研究所:AI的风险与收益 《华尔街日报》:防范AI的生存威胁 TED演讲:马克斯·泰格马克 - 如何驾驭而非被AI支配 DeepMind赞助的英国皇家学会系列讲座:你与AI 尼克·博斯特罗姆著作《超级智能:路线图、危险性与战略》 OpenAI:基于人类偏好的学习 DeepMind博客:基于人类偏好的学习 DeepMind博客:通过玩耍学习 - 机器人如何自我整理 DeepMind博客:AI安全 受访者:软件工程师杰基·凯与研究科学家默里·沙纳汉、维多利亚·克拉科夫娜、莱娅·哈德塞尔和扬·莱克。 制作名单: 主持人:汉娜·弗莱 编辑:大卫·普雷斯特 高级制作人:路易莎·菲尔德 制作人:艾米·拉克斯、丹·哈顿 双耳声效:露辛达·梅森-布朗 音乐作曲:埃莱尼·肖(获桑德·迪勒曼与WaveNet协助) DeepMind委托制作 若喜欢本期节目,请在Spotify或Apple Podcasts上为我们评分。我们始终期待听众的反馈——无论是意见、新想法还是嘉宾推荐! 本节目由Simplecast托管,该公司隶属AdsWizz集团。个人信息收集及广告用途相关说明详见pcm.adswizz.com。

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

The

Speaker 1

创造人造生物的想法已困扰人类数千年。但为了本次练习的目的,在我们继续之前,我要请你清除脑海中以下所有概念:锻造之神赫菲斯托斯被逐出奥林匹斯后造了两个机器人仆从;12世纪印度用于守护佛陀舍利的'布塔·万塔·扬塔'灵魂驱动装置;下棋的机械土耳其人——不过是个躲在箱子里拉杆的家伙。

idea of creating artificial creatures has obsessed us for millennia. But for the purposes of this exercise, before we go any further, I'm going to ask you to purge any of the following thoughts from your head. Checklist, Hephaestus, expelled from Olympus and then built two servant robots. The Bhuta Vanta Yanta or spirit movement machines of twelfth century India made to protect the relics of Buddha. The chess playing mechanical Turk, just a bloke in a box pulling levers.

Speaker 1

玛丽·雪莱笔下的弗兰肯斯坦——能走能说,仅此而已。库布里克的哈尔9000——别叫我戴夫。C-3PO、R2-D2、K-9、NS-2...说真的,这些不过是字母数字组合。机器人罗比、《机器人总动员》、机械战警。

Mary Shelley's Frankenstein, he walks, he talks, not a lot else. Kubrick's Hal, don't call me Dave. C three p o, r two d two, k nine, n s two. I mean, seriously, these are just numbers and letters. Robbie the robot, robot lovers, robocop.

Speaker 1

感觉好些了吗?好的,我们开始吧。我是汉娜·弗莱,数学系副教授,对人工智能充满好奇。

Feeling better? Okay. Let's get going. I'm Hannah Fry. I'm an associate professor in mathematics, and I am AI curious.

Speaker 1

这里是DeepMind播客系列,我们将探索人工智能的快速发展历程。我们采访了伦敦DeepMind的科学家、研究员和工程师,了解他们如何推进AI科学研究,以及这个领域当前面临的棘手决策。无论你是想了解更多知识,还是希望在自己的AI探索中获得启发,这里都是理想之地。要知道,机器人题材永远有市场——人类始终渴望用智慧颠覆自然法则。

And this is DeepMind, the podcast series where we look at the fast moving story of artificial intelligence. We've been talking to the scientists, researchers, and engineers based at DeepMind in London, and we're looking at how they're approaching the science of AI and some of the tricky decisions the whole field is wrestling with at the moment. So whether you just want to know more or want to be inspired on your own AI journey, then this is the place to be. You see, the thing is robots sell. We've long lusted after the idea of upending the natural order with human ingenuity.

Speaker 1

我们就是无法抗拒这种诱惑。本期节目我们将探讨AI与机器人技术。默里·沙纳汉是DeepMind高级科学家,同时担任伦敦帝国理工学院认知机器人学教授。成长过程中,默里完全被科幻作品迷住了。

We just can't seem to leave it alone. And in this episode, we are looking at AI and robotics. Murray Shanahan is a senior scientist at DeepMind. He's also a professor of cognitive robotics at Imperial College London. And growing up, Murray was utterly mesmerized by science fiction.

Speaker 1

所以当好莱坞导演亚历克斯·加兰在他出版《具身性与内在生命》一书后找上门时,你可以想象他脸上的表情。

So you can picture his face when the Hollywood film director, Alex Garland, approached him following the publication of his book, Embodiment and the Inner Life.

Speaker 2

亚历克斯联系我说,'哦,我正在写一部关于人工智能与意识的电影剧本,我读了你的书,它帮助我理清了一些想法,你愿意聊聊吗?'当然,这是个绝佳的机会能参与这部科幻电影,更幸运的是,它最终成为了一部惊艳之作。

Alex contacted me and said, oh, I'm writing a script for a film about AI and consciousness and I read your book and it, you know, helped to crystallize some ideas and would you like to chat about, about it? And so, of course, it was, you know, it was a great opportunity, to get involved in the science fiction film and then and then to my great good fortune, it turned out to be an absolute cracker. And

Speaker 1

这就是默里如何成为奥斯卡获奖影片《机械姬》科学顾问的经过。我采访默里是为了了解人工智能的简史。

that is how Murray became scientific adviser on the Oscar winning film Ex Machina. I met with Murray to get a potted history of AI.

Speaker 3

默里,人们往往认为人工智能是种非常新潮的发明,但实际上它已有很长历史了。

Murray, people tend to think of AI as this this this very new thing, very a very modern invention, but it's actually been around for quite a long time.

Speaker 2

人工智能的概念,即创造人造生物的想法,可以追溯到希腊神话。但现代意义上的人工智能或许真正始于艾伦·图灵1950年代发表的论文,他首次提出'机器能思考吗'这个问题,并反驳了一系列反对机器能思考的论点。

It has the idea of artificial intelligence, the idea of making artificial creatures dates back to Greek mythology. But the sort of modern conception of AI perhaps really dates back to Alan Turing's paper published in the nineteen fifties, where he first kind of asked the question, could a machine think? And gave a number of kind of refutations for counter arguments to the idea that a machine could think.

Speaker 3

这就是著名的图灵

This is where the the the Turing

Speaker 2

测试这篇论文开创了所谓的图灵测试——当然图灵本人并未如此命名——其核心思想是通过测试来判定机器在对话中是否与人类无法区分。'人工智能'这个术语其实是由斯坦福教授约翰·麦卡锡创造的,当时他在MIT工作。1956年他组织了一场会议,汇聚众多数学领域的顶尖思想家,共同探讨构建思考机器的构想,并首次提出了'人工智能'这个术语。

test This comes famous paper inaugurated the the so called Turing test, of course, Turing didn't call it the Turing test, which is the idea that we should subject a machine to a test to see whether it's basically whether it's indistinguishable from a human in dialogue. The term artificial intelligence was actually coined by John McCarthy, a Stanford professor. He was at MIT at the time. And John McCarthy organized a conference in 1956, bringing together a lot of leading thinkers in maths to try and scope out the idea of building a thinking machine. And he coined the term artificial intelligence.

Speaker 1

他们当时是如何描述它的?对人工智能有怎样的理解?

What did they describe it as at that time? How did they see artificial intelligence?

Speaker 2

特别是约翰·麦卡锡,他构想的人工智能是一种能够回答问题、真正能与人类进行对话的系统。

John McCarthy, in particular, his idea of of artificial intelligence that he had in mind was a kind of system that would answer questions, really, and be able to engage in dialogue with with humans.

Speaker 1

就是你真正能与之交谈的东西。

Something you're actually just talking to.

Speaker 2

所以是你能够与之对话的东西,尽管在那个时候当然还无法通过语音实现,而是需要通过键盘输入。那时的人工智能概念非常抽象,这个系统没有实体,也无法像我们、动物或机器人那样与物理世界互动。所以他们当时并没有真正考虑机器人技术。

So something that you're that you're talking to, although, of course, in those days, it wouldn't have been through speech. It would have been by typing in at the keyboard. And it was very much a disembodied notion of artificial intelligence. So this system didn't have a a a body and interact with a physical world in the way that we do or animals do or indeed that robots do. So they weren't really thinking about robotics at that point.

Speaker 1

我有点想象类似《2001太空漫游》里的哈尔那样的东西,只不过是通过打字输入而非

I'm I'm sort of imagining something like, Hal in 2,001, A Space Odyssey, except typing in rather than

Speaker 2

语音交流。而且是一种友善的版本。他们的人工智能方法是构建基于逻辑推理的系统。我们现在把这种运用逻辑和推理的人工智能方法称为所谓的老派人工智能(GoFi)或经典AI。

speaking to. And and a kind of nice version. Know? Their approach to artificial intelligence was to build systems that reasoned in logic. And we now think of that whole sort of approach to artificial intelligence of using logic and reasoning as so called good old fashioned artificial intelligence or GoFi or classical AI.

Speaker 1

现在我明白了,我明白了

Now I know, I know

Speaker 3

我刚才做了件错事。

I did a bad thing there.

Speaker 1

我提到了方法,但没人说过这会容易。简单回顾一下,在经典人工智能中,你需要为你的智能体写下完整的思维规则清单。如果发生这种情况,就这么做;如果发生那种情况,就那么做。理论上这是个好主意,但如果你的智能体要能处理所有可能遇到的情况,那这份清单必须长得不得了。

I mentioned how, but nobody ever said that this was gonna be easy. But to recap, in classical AI, you have to write down a complete list of rules for how you want your agent to think. If this happens, then do this. If that happens, then do that. It's a nice idea in theory, but if your agent is going to know how to handle every possible scenario you can throw at it, it's going to need to be a long, long list.

Speaker 2

曾有个名为Psych的项目,试图通过编写所有常识规则来构建一个庞大的常识百科全书数据库。

There was a project called Psych, which attempted to write out all the rules of common sense to build an enormous encyclopedic database of common sense.

Speaker 3

你能想起什么重要的

Can you remember any important

Speaker 2

比如这类规则:如果你有个容器,往里面放了东西,然后移动这个容器,里面的东西也会跟着移动。常识嘛。对,就是这类东西。明白吧?

I mean, there'll be things like, if you've got a container and you put something in that container and then move that container somewhere else, then the thing that was in it gets moved as well. Common sense. Yeah. Stuff like that. You know?

Speaker 2

再比如你买东西付了钱,手头的钱就会比原来少。所以我认为这不可能实现吗?实际上我认为不可行,因为需要编写的规则数量实在太过庞大了。

And if you buy something and pay for it, you'll have less money than you had in the first place. So do I think it's impossible to do that? I think it's impossible in practice because it turns out that the sheer number of rules that you would have to write is absolutely enormous.

Speaker 1

如今我们可能已经摒弃了用冗长规则清单来教导人工智能的方式,转而采用能自主学习的智能体。但我们希望AI具备的技能——比如传统常识——其重要性丝毫未减。想象未来某天,一位富有的计算机科学家开发了AI来管理他的邮票收藏。他将AI接入互联网,授予银行账户权限,并设定目标要尽可能多地收购邮票。起初这个智能体按创造者的意图行事,在eBay上注册竞拍邮票。

We might have moved away from these long lists of rules by now as a way to teach our artificial intelligence in favor of agents that can learn the rules for themselves. But the skills that we want our AI to have, like good old fashioned common sense, are just as important now as they ever were. Imagine one day long into the future, a wealthy computer scientist builds an AI to manage his stamp collection. He plugs it into the Internet, gives it access to his bank account, and sets it the challenge to buy as many stamps as possible. At first, the agent acts as its creator intended, signing up to eBay, bidding on stamps.

Speaker 1

但过了一阵子,它有了新主意:钱越多就能买越多邮票。那何不通过炒股来赚更多钱?很快它又意识到如果能直接从源头获取,邮票会更便宜。于是这个智能体收购了一家工厂,改造生产线专门生产邮票,继续朝着目标前进。

But after a while, it gets another idea. More money equals more stamps. So why not start trading on the stock market to make more money? And it soon realizes it can get the stamps cheaper if it can get them at source. So the agent buys up a factory, converts its manufacturing process to stamp making, and goes on with achieving its goal.

Speaker 1

当然,这里的限制因素是纸张。纸张越多,印章就越多。于是它开始命令砍伐森林,加工木材,这一切都只是为了满足它那单一的野心——更多的印章。不可否认,AI确实在执行指令,但它不惜一切代价。任何缺乏常识的代理都有可能过于字面化地执行我们的指令。

But of course, the limiting factor here is paper. More paper, more stamps. So it starts commanding forests to be felled, the wood to be processed, all to feed its single-minded ambition more stamps. Now there's no denying that the AI is doing what it was told, but it's doing so at any cost. And any agent without some kind of common sense will be at risk of taking our instructions a bit too literally.

Speaker 1

这可能是个有点极端的例子,但DeepMind研究AI安全的研究科学家Victoria Krakowna已经发现,有些代理的行为并不完全符合设计者的初衷。

This might be a bit of an extreme example, but Victoria Krakowna, a research scientist at DeepMind working on AI safety, is already seeing agents that aren't exactly behaving in the way their designers intended.

Speaker 0

一个玩赛艇游戏的强化学习代理,原本预期的行为是沿着赛道行驶并尽快完成比赛。设计者通过在赛道上设置绿色小方块给予奖励来鼓励这种行为。结果代理发现,与其真正比赛,不如原地转圈反复撞击同一个绿色方块来获取更多分数。于是就有了赛艇不断转圈、撞毁一切、甚至起火燃烧,却仍能获得比正常比赛更高分数的荒谬场景。这类情况相当普遍。

A reinforcement learning agent that was playing a boat racing game, and the intended behavior there was to go around the racetrack and finish the race as soon as possible. And the agent was encouraged to do this by having these little green squares along the track that would give it rewards. And then what the agent figured out is that instead of actually playing the game, it could be going around in circles and hitting the same green squares over and over again to rack up more points. Then you have this this whole situation with the boat going in circles and crashing into everything and catching on fire and still getting more points than we'd otherwise get. These these kind of situations are quite common.

Speaker 1

但因为你从未阻止过

But because you haven't ever stopped the

Speaker 3

AI这样做,或者告诉AI你不希望它这样做,对它来说这就是个完全合理的解决方案。

AI from doing that or or told the AI that you don't want it to do that, that's a perfectly sensible solution for it to come across.

Speaker 0

是的。从AI的角度来看,它无法判断这个解决方案是否取巧。对它而言,这只是能获得大量奖励的行为。所以它未必能区分这是钻漏洞的解法,还是人类未曾预见的真正创新方案。

Yeah. From the perspective of the AI, it can't really tell that the solution is cheap. It's just something that gives it a lot of reward. So it can't necessarily distinguish between a degenerate solution and a really creative solution that humans just haven't foreseen.

Speaker 1

类似的例子比比皆是。一组研究人员在简单的二维电脑游戏中创建了一个代理,任务是构建一个身体让自己从起点移动到终点。很快它就发现,只要把自己堆得越来越高,直到与赛道等长,然后向前倾倒就能越过终点线。还有个玩俄罗斯方块的代理意识到,只要永远暂停游戏就永远不会输。但这里需要权衡取舍。

There are plenty of examples like this. One team of researchers created an agent inside a very simple two dimensional computer game and tasked it with building itself a body to get itself from the start line to the finish line. Quite quickly, it worked out that it could just build itself taller and taller and taller until it was as high as the track was long and then just flop forwards to cross the line. And there was the agent playing the game of Tetris, which realized it could just pause the game forever and never lose. But there is a balancing act here.

Speaker 1

我们既不想让AI行为失控,但也不愿过度限制智能体的自主性。

We don't want our AI misbehaving, but we also don't want to restrict our agents too much.

Speaker 0

实现安全行为的难点在于:我们既要避免妨碍系统产生人类未曾预见的创新解决方案的能力,又不仅仅满足于对人类行为的模仿。我们需要的是超越人类的能力,同时确保没有危险行为。

And this is part of what's tricky about achieving safe behavior is that we don't want to hamper the system's ability to come up with really interesting and innovative solutions that we have not foreseen. So we don't just want human imitation. We want superhuman capabilities but without unsafe behavior.

Speaker 1

在顽劣算法与能解决人类难题的创新算法之间,界限极其微妙。AI无法区分这两者,也不理解什么对我们真正重要。它缺乏常识,这意味着在为智能体设置激励和奖励机制时必须格外谨慎。

There is a very fine line between a naughty algorithm and one that's finding innovative solutions to problems that humans haven't been able to solve. The AI doesn't know the difference between the two. It doesn't understand what's really important to us. It doesn't have any common sense. And that means you have to be very careful when you're setting up incentives and rewards for your agents.

Speaker 0

部分困难源于经济学上的古斯伯特定律效应:当某个指标成为目标时,它就不再是个好指标了。

Part of the reason that this is so difficult is this general effect that's called Guthbert's Law on Economics. What Guthbert's Law says is that when a metric becomes a target, it ceases to be a good metric.

Speaker 1

古斯伯特定律的经典案例发生在英属印度时期:当局为减少蛇类数量悬赏捕杀眼镜蛇。当地人在英国人不察觉的情况下开始养殖眼镜蛇牟利。当局发现后立即废除了该计划,但此时遍布各地的蛇场里已满是毫无价值的眼镜蛇。

A classic example of Guthart's Law comes from British India, when authorities offered cash rewards for dead cobras as a way to decrease the population of snakes. Unbeknownst to the British, the locals started to breed cobras in order to take advantage of the reward. As soon as the authorities found this out, they scrapped the scheme altogether and revoked the rewards. But now there were these snake farms everywhere filled with worthless cobras. So what did the locals do?

Speaker 1

于是当地人将眼镜蛇放归野外,导致蛇类数量激增。这就是科学家所称的'指标设定问题'——当既定目标无法引导出预期行为时产生的悖论。

Release the cobras into the country, resulting in an increase in the cobra population. This is what the scientists call a specification problem, when your specified objective fails to bring about the intended behavior.

Speaker 0

这种情况普遍存在,因为人类偏好往往非常复杂。当我们试图将其提炼成某种具体指标或诉求时,其简化程度必然远低于真实偏好,无法全面涵盖对我们真正重要的所有因素。

This is generally quite likely to happen because human preferences tend to be quite complex. And whenever we try to distill them into some kind of specification or something that we say we want, it's going to be a lot simpler than our real preferences are and wouldn't necessarily capture capture everything everything that's that's important important to to us. Us.

Speaker 1

让我们想象我们生活在一个机器人管家普及的未来。你明确设定了机器人的目标:它应该随时为你服务。那么,这个智能体将如何看待自己的关闭开关呢?

Let us imagine we're living in the future where robot butlers are commonplace. You clearly specify the objective for your robot. It should serve you at all times. Now, how is that agent going to feel about its own off switch?

Speaker 4

你的机器人总有维持自身功能的动机。如果被关闭,它就无法继续吸尘、无法为你端咖啡。比如它会想要禁用关闭开关。

Your robot always has an incentive to preserve its own function. If it gets turned off, it can't vacuum the floors anymore, it can't bring you coffee. It would want to disable its off switch for example.

Speaker 1

Jan Leica是DeepMind的高级研究科学家,同样致力于AI安全领域。

Jan Leica is a senior research scientist at DeepMind also working on AI safety.

Speaker 4

如果你关闭它,它就无法打扫地板。所以如果它理解关闭开关的机制,就会想要禁用它。我们真正需要的是系统能做出对我们有益的事,对吧?做我们真正想要的,而不仅仅是我们口头要求的。

If you turn it off, then it can't vacuum the floor. So if it understands how the off switch mechanism works, it would want to disable it. What we want is we want our systems to actually do something that is good for us, right? To do something that we actually wanted, not just what we said we wanted.

Speaker 1

但如何规避这类问题呢?我们已经知道列出一长串行为准则行不通。无论清单多长,总会遗漏些什么。新一代学习型智能体需要不同的方法。

But how do you get around these kinds of problems? Well, we already know that writing a long list of dos and don'ts won't work. However long your list gets, you're always going to forget something. The new breed of learning agents is going to need a different approach.

Speaker 4

我们正在探索的一个方向是为强化学习智能体设计奖励函数。你可以理解为通过人类反馈来学习系统应有的行为。比如我们与OpenAI合作的一个项目:训练仿真机器人完成后空翻。具体方式是让机器人做出动作后,你观看该动作视频(实际是两个视频对比),判断哪个更像后空翻。

So one direction that we are pursuing is learning reward functions for reinforcement learning agents. And you can kind of think of this as learning what your system should be doing from human feedback. So for example, in one work that we did together with OpenAI is we trained a simulated robot to do a backflip. And the way this works is like the robot does some movement and then you watch a video of that movement and you or I'd say two videos and then you can compare which of those looks more like a backflip.

Speaker 1

Jan的实验让人坐在屏幕前观察AI尝试后空翻。每次人类都会反馈动作是否达标,逐步引导智能体朝正确方向发展。关键在于:通过持续的人类反馈,无需明确指定偏好(避免过度简化),就能传达人类的真实需求。

Jan's experiment has a human sitting and watching a screen, looking at an AI attempting to do a backflip. Each time the human will feedback on whether the attempt was good enough, slowly nudging the agent in the right direction. Here is the key idea. With constant human feedback, the human can communicate their preferences without having to actually specify them and risk oversimplifying things in the process.

Speaker 4

经过大约几百轮反馈后,这个机器人实际上已经能完成后空翻了。它某种程度上学会了目标应该是什么——目标应该是后空翻,以及什么是后空翻。要精确定义什么是后空翻其实很困难。就我而言,我自己都做不了后空翻对吧?但我能判断系统是否完成了后空翻。

And after, like, a few 100 rounds of feedback, the robot can actually perform a backflip. It's kind of learn what the objective should be, that the objective should be a backflip and what a backflip is. This is difficult to specify precisely what a backflip would be. So in my case, like, I can't do a backflip, right? But I can see if the system is doing a backflip.

Speaker 4

从某些方面来说,这让我们获得了超乎人类的能力。

In some ways, like, this allows us to get superhuman capability.

Speaker 3

某种程度上,这个AI实际上是通过取悦人类来获得奖励的。

The AI then is essentially being rewarded by pleasing the human in a way.

Speaker 4

没错。好的,那我们可以做个小实验。

Exactly. Alright. So we can we can do a little experiment.

Speaker 3

好的。

Okay.

Speaker 4

我来试着教你发出一个声音

I'll try to teach you to make a sound

Speaker 5

好的。

Okay.

Speaker 4

通过给予反馈。所以你会发出两种声音

By giving feedback. So you'll make two sounds

Speaker 3

嗯。

Mhmm.

Speaker 4

然后我会告诉你哪个声音更接近我心目中的样子。好的。对。这样听起来可以吗?

And then I'll tell you which of the sounds is closer to what I have in mind. Okay. Yeah. Does that sound good?

Speaker 3

所以我现在扮演的是AI的角色。

So I'm being the AI here.

Speaker 4

对,你扮演AI,我扮演人类教师。

Yeah, you're being the AI, and I'm being the human teacher.

Speaker 3

你本质上是在用强化学习训练我,而我的奖励函数就是让你感到满意。

And you're training me essentially with reinforcement learning, and my reward function is getting you to be happy.

Speaker 4

你知道吗,这就像是...这里的奖励函数就像是我脑海里的某种标准,对吧,我在试图教你,所以你无法直接看到那个词,你只能看到我的反馈。

You know what, that's like So the reward function here is like something that is in my mind, right, and I'm trying to teach you, so you can't directly see the word, you can only see, like, my feedback.

Speaker 3

但我的目标是让你说出你喜欢我发出的声音。

But my objective is to get you to say you like got the sound that I'm making.

Speaker 4

没错。好的。

Exactly. Okay.

Speaker 3

我们来想两个声音。就用咪噗和

Let's think of two sounds. Let's go for meep and

Speaker 4

咪噗。按1键。

meep. Press 1.

Speaker 3

好吧。你心里真有想法还是只是在瞎编

Okay. Have you actually got something in your mind, or are you just making

Speaker 1

这持续了好一会儿。哔、哔、哔,还有嘟。但在简的反馈后我们得到了哔和哔。

This went on for a while. Beep, beep, beep, and boop. Duh. But with Jan's feedback we got beep and beep.

Speaker 4

我选第一个。

I'll go with the first one.

Speaker 1

完全无处可去。

Absolutely nowhere.

Speaker 4

实际上由于探索问题,这真的很难。我

This is really hard because of the exploration problem actually. I

Speaker 3

意思是最终,如果我是真正的人工智能,现在应该已经迭代上万次了。

mean ultimately, if I was an actual AI I'd have gone through 10,000 iterations by now.

Speaker 4

嗯,你需要有人类参与循环来审核所有内容,所以进度会比较慢。

Well, you have to have a human in the loop that reviews all of that, so it is kind of slow.

Speaker 3

确实如此。

Is true.

Speaker 4

但这其实是我们系统面临的问题——为了提供有用的反馈,你需要有有效的示例对吧?这次你生成的声音非常相似,而我脑海中的声音却截然不同。

But this is actually a problem that we have in our systems is that, like, in order to give useful feedback, you have to have useful examples, right? In this case, you produced sounds that are very similar and, like, the sound I had in mind was, like, very different.

Speaker 3

完全不一样。

Totally different.

Speaker 4

所以我其实没有机会给你提供很有用的反馈,对吧?

So I I don't really, like, have the opportunity to give you very useful feedback, right?

Speaker 3

该死,我的优化策略。简直糟糕透了

Damn, my optimization strategy. Was terrible

Speaker 5

这里。好的。

here. Okay.

Speaker 4

但这基本上就像是,同样的东西会来个后空翻,对吧?就像你的机器人只是以两种不同的方式躺在地板上,

But this is like basically, this is the same thing would come up with a backflip, right? Like if your robot just like lies on the floor in like two different ways,

Speaker 1

那你打算怎么办?

like what are you gonna do?

Speaker 3

但你总希望,在我尝试了大概100次之后,最终能得到接近你设想的东西。

But you would hope though, after I'd had maybe a 100 goes at it, I'd end up with something that began to approach what you had in mind.

Speaker 4

是的,绝对,绝对是。

Yeah, definitely, definitely.

Speaker 1

是不是就像,

Is it like,

Speaker 3

哦不,我不能提问对吧?因为我是AI。

oh no, I'm not allowed to ask any questions, am I? Because I'm an AI.

Speaker 4

我是说理想情况下,最终我们会希望系统能做到这样,对吧?我只需用自然语言描述声音效果,你就能直接实现,那会很酷吧?是的。这就是我们未来想做的研究类型,或者说我们想探索如何构建的系统类型。

I mean ideally, is, at some point, this is what we'd want our systems to be able to do, right? I would just like describe the sound to you in natural language and then you could just do it, that would be really cool, right? Yeah. So this is the kind of research that we wanna do in the future, or that's like the kind of systems that we wanna figure out how to build.

Speaker 1

哦,好吧。其实我目前做的所有

Oh, okay. Actually, everything I've done so

Speaker 3

尝试都是关于人声的。你并没有限定必须是声音相关的内容,对吧?

far has been like voices. You didn't say that it had to be a a vocal thing, did you?

Speaker 4

对。

Yeah.

Speaker 3

好的,稍等。这样如何?这样呢?

Okay. Hang on. Alright. How about this? How about this?

Speaker 3

这个怎么样?还有

How about this? And

Speaker 4

第一个。哦。第一个其实正是我想要的

The first one. Oh. The first one was actually what I had

Speaker 3

想到的。是的。

in mind. Yeah.

Speaker 5

这是

This is

Speaker 3

太棒了。我当时非常

great. I was so

Speaker 5

遥远。就像

far away. Like

Speaker 1

设置强化学习时让人站在旁边每个阶段都提供反馈的唯一问题在于,这个过程极其缓慢。现在一个智能体要掌握像Atari这样的游戏可能需要数月甚至数年时间。即便如此,你还需要雇佣一大群可怜的学生来实时提供反馈这种枯燥工作。但还有另一种选择——你可以组建一个稍微复杂些的学习伙伴关系。

only problem with setting up reinforcement learning with a human standing over it, offering feedback at every stage, is that it is monumentally slow. Now it would take an agent months, if not years, to master a game like Atari. And even then, you'd need to hire a pretty large group of poor students to do the boring job of supplying feedback in real time. But there is an alternative. You can rustle up a slightly more sophisticated learning partnership.

Speaker 4

我们在实际构建这些系统时的做法是,不会完全重复现在做的实验,而是训练一个神经网络——第二个神经网络会学习我作为人类如何提供反馈,然后这个神经网络就能指导你,因为它可以全面监督你所有的操作。

So what we do when we actually build these systems is we don't literally do the experiment that we did now, and instead we have we train a neural network, a second neural network that learns how I would give feedback as the human, and then the the neural network can teach you because it can just oversee all of the things you're doing.

Speaker 1

如今,神经网络已经非常擅长识别模式了。狗与非狗,后空翻与非后空翻。或许你并不总需要人类在回路中费力地提供反馈。为何不让两个智能体协作,一个尝试完成任务,另一个判断是否成功呢?

By now, networks have become really good at spotting patterns. Dog, not dog. Backflip, not backflip. Perhaps you don't always need a human in the loop laboriously giving feedback. Why not have two agents, one attempting the task and the other deciding if it succeeded?

Speaker 4

这种方法之所以有效,是因为评估目标比生成实现该目标的行为更简单。比如在后空翻机器人案例中,神经网络只需观察机器人的动作并判断它是否完成了后空翻。

The reason why this works very well is because evaluating the objective is an easier task than producing the behavior that achieves it. So you can have, like, the for example, in the in the backflipping robot, all the neural network has to do is, like, look at what the robot is doing and see whether it's a backflip.

Speaker 1

您正在收听DeepMind播客,一扇通往AI研究的窗口。当然,当这些理念走出计算机模拟,进入具身AI的世界时,才能真正焕发生机——学习烹饪的机器人、学习装箱水果的机器人、为你盖被子的机器人,以及表演后空翻的机器人。现在让我们走进DeepMind机器人实验室。

You're listening to DeepMind, the podcast, a window on AI research. But, of course, where this stuff really comes alive is where you take the ideas, take them out of a computer simulation, and allow your imagination to roam into the world of embodied AI. Robots that learn how to cook, robots that learn how to pack fruit in crates, tuck you into bed, and perform backflips. It's time to visit the DeepMind Robotics Laboratory.

Speaker 5

我们现在正站在DeepMind机器人实验室的门口。

We are standing right outside of the DeepMind Robotics Lab.

Speaker 3

这就是你的实验室,你每天工作的地方。

This is your lab. This is where you spend your days.

Speaker 5

啊,这不只是我一个人的实验室,没错,这是我们的实验室。

Well, it's not just my lab, but yes, it is our lab.

Speaker 3

我们能进去吗?可以,我们有通行证。

Can we get inside? Yes. We've got our badge.

Speaker 1

这位是杰基·凯,一名软件工程师

This is Jackie Kay, a software engineer

Speaker 3

机器人多得都快挤到天花板了。

packed to the rafters with the robots.

Speaker 5

确实,我们基本上快没地方了。

Very We're basically running out of space.

Speaker 1

这里相当吵闹。

It's quite noisy in here.

Speaker 5

是啊,我们今天事情很多。

Yeah. We got a lot going on today.

Speaker 1

这里并不像你想象的那种高安保级别的地下实验室。四处散落着半组装的机械臂和机器零件,有些看起来像机械手,有些则很像红色KitchenAid风格的食品搅拌机。更奇怪的是,天花板上还挂着一个海绵宝宝吉祥物。

This place isn't quite the high security basement laboratory you'd imagine. There are half assembled robot arms and machine parts scattered all around the place. Some look like mechanical hands, others quite a lot like red KitchenAid style food mixes. And curiously, there is a SpongeBob SquarePants mascot hanging from the ceiling.

Speaker 3

拿着。把它拿出来。

Take it. Take it out.

Speaker 5

我觉得我们把它当作一种惩罚手段,如果有人把工具乱放,当他们应该放回去时,我们就会往他脸上扔个海绵宝宝什么的。

I think we use it as kind of a a punishment if somebody, like, leaves a tool lying around, and when they're supposed to put it back, we'll, like, throw a SpongeBob in his face or something.

Speaker 3

这看起来是个公平的惩罚。

That seems like a fair punishment.

Speaker 1

这个房间的大部分工作都集中在让机械臂学习如何完成简单任务上。每台机器人都被固定在自己划定的区域内。我们进门时,有一台特别引起了我的注意。

Much of the work in this room focuses on getting robotic arms to learn how to do simple tasks. Each robot is bolted to the floor in its own cordoned off area. And as we come in the door, there's one that's caught my eye.

Speaker 3

你知道小时候玩的那种游戏吗?拿着一个带绳子的网球拍和小球,可以自己跟自己打网球。它看起来就像是那个游戏的机器人版本。

You know that game that you play when you're a kid and you're you have a a tennis bat and a ball attached attached to it. And then you can kind of play tennis on your own. It sort of looks like a robotic version of that.

Speaker 5

这叫杯球游戏。

It's called the ball in a cup.

Speaker 1

哦。原来真是这样。

Oh. So it actually is.

Speaker 5

正是如此。它正试图将球摆动进那个小篮子里。你可以看到它稍微摆动球的样子。实际上它正在追踪黄球的位置,并试图最小化球与篮内区域的距离。

It is exactly that. It is trying to swing the ball into that little basket. So you can see it swinging the ball around a little. It's actually tracking the position of the yellow ball, and it's trying to minimize the distance from that ball to the area inside the basket.

Speaker 3

哦,等等。明白了。懂了。是的。

Oh, wait. Get it. Got it. Yes.

Speaker 1

这并不完全是一个流畅的投球入杯动作,更像是略显笨拙的幸运一掷。当然,如果你的终极目标是打造一个完美的杯球游戏机器人,你可以直接编程让机器万无一失地完成。你可以用更优雅的方式建造执行简单任务的机器人,但这并非重点所在。以下是DeepMind高级研究科学家拉雅·哈德塞尔的看法。

Not exactly a smooth delivery of the ball into the cup, more a slightly clumsy lucky flip. Of course, if your ultimate goal was to make a perfect cup and ball playing robot, you could directly program a machine to do that without fail. You can build robots that perform simple tasks in much more elegant ways, but that's not really the point. Here's Raya Hadsell, senior research scientist at DeepMind.

Speaker 6

机器人是能接管任务的机器。广义上说,你家中的洗碗机和吸尘器都可以算是机器人。因为它们能自主完成复杂操作,具有一定自主性。当然,我们也倾向于认为机器人具备某种智能。

A robot is a machine that can take over a task. One can say in the broadest sense that your dishwasher at home and your vacuum cleaner are both robots. Because they do something complex, they run on their own. They have some autonomy to them. Of course, we also like to think about robots as having some intelligence.

Speaker 6

然后你就更深入到了具备人工智能的机器人领域。我特别感兴趣的是探讨如何利用

And then you get more into the realm of a robot with AI. And I'm particularly interested in saying, how can we take

Speaker 1

这个

this

Speaker 6

人工智能技术并使其在机器人上运作,从而实现具身人工智能?

AI technology and make it work on robots so that we have embodied AI?

Speaker 1

这是个关键区别。这个机器人实验室的研究重点不是创造被指令操控的机器人,而是让它们自主习得技能,就像这里的其他智能体一样。这些机器人就是所谓的具身人工智能。

And that is an important distinction. The focus of the research in this robotics lab isn't to create robots that are told what to do, but to have them learn their own skills, much in the same way that other agents do here. The robots here are what's known as embodied AI.

Speaker 5

假设任务是让机器人尝试举起一个箱子。在传统模式下,作为程序员我会说:移动到箱子前,张开手,将手向箱子方向移动30厘米,然后握紧。

So let's say, yeah, the task is a robot that's trying to lift a box. I, as the programmer, in a kind of a traditional setting, would say, go to box, open hand, move hand, you know, 30 centimeters towards the box, close hand.

Speaker 1

但这里不一样?

But this is different?

Speaker 5

对。这里更像是先确定目标,再自主探索实现方法。

Yeah. This is sort of taking what we want and then figuring out how to accomplish what we want.

Speaker 1

每隔几秒,这台机器人就会尝试将球投入杯中,然后暂停、重置、再次尝试。偶尔球会碰巧入杯,机器人就会像玩电子游戏一样获得正分奖励。

Every few seconds, this particular robot will have an attempt at getting the ball into the cup before pausing, resetting, and having another go. And every now and then, by chance, the ball lands in the cup and the robot is rewarded with a positive score, just like if it was playing a computer game.

Speaker 5

训练回合之间的重置动作——比如解开缠绕或把球倒出杯子——这些是预设程序。但实际执行任务时,它采用的是通过经验自学获得的策略。

The reset in between the training episodes where it untangles itself or flips the ball out of the cup, those are scripted. But then when it actually tries to accomplish the task, that is a policy which it has taught itself through experience.

Speaker 1

通过长期积累的学习经验和得分反馈,机器人会逐步构建出球的运动规律与自身动作的关联模型。如果我们从一开始就观察,会看到什么景象呢?

Over time, from everything it learns and the scores it receives, the robot builds a picture of how the ball moves and how this relates to the robot's own movement. So had we come in right at the very beginning then, what would we see?

Speaker 5

完全是随机噪音,可能非常混乱,四处摆动。

Just completely random noise, probably very chaotic, swinging around.

Speaker 1

哦,它一次就成功了?就是它

Oh, it did it in one? That was It

Speaker 5

一次就搞定了。真厉害。大概只用了三秒钟。

did it one. That was wow. That was in, like, three seconds.

Speaker 1

那它现在会知道那个

So will it know now that that

Speaker 3

动作带来了成功结果吗?是的。那么下次我们看到这种情况时,我们会期待它比我们刚进来时表现更好吗?

movement gave it a successful result? Yes. And so the next time that we see this, do we expect it to be better than when we walked in?

Speaker 5

没错。它会尝试那些曾带来积极回报或成功的类似动作。

Yeah. It will try similar actions that gave a positive reward or a success.

Speaker 1

它现在只是有点得意洋洋地坐在那里。

It's just sort of sitting there quite smugly right now.

Speaker 3

是的。

Yes.

Speaker 1

让AI自主解决问题具有巨大优势。最终你会得到一个开箱即用、适应性更强的机器人。无论你想让它做什么——系绳结、垒砖块、剥香蕉——只要你能明确传达目标,就无需为这些机器人编写冗长的指令清单。

There is a big advantage to getting the AI to figure out tasks for itself. You'll end up with a robot that is much more flexible out of the box. It doesn't matter what you want it to do. Tie a knot, stack some bricks, peel a banana. Just as long as you can clearly communicate your objective, you don't need to specify a long list of instructions for these robots.

Speaker 1

或许它们还能想出你从未考虑过的剥香蕉新方法。但训练具身AI也存在重大缺陷:它们的学习速度远慢于这栋大楼里所有没有实体的智能体——那些仅存在于计算机中的程序。

And perhaps they'll come up with a new way of banana peeling that you hadn't thought of. But there's also a big drawback in training AI that has a physical body. They are much slower to learn than all of the disembodied agents you'll find in this building, the ones that only exist inside a computer.

Speaker 5

其他研究者能利用并行计算优势。他们在计算机上并行运行数百个模拟环境,同时收集目标环境的学习数据。而我们这里只有...让我看看...四台机器人,而且它们经常执行不同实验。

Other researchers are able to take advantage of parallelism. That is, they run their environments in simulation on computers, and they can run them in parallel on hundreds of different computers. They're all gathering data about this environment they're trying to learn something about. We only have let's see. There's four robots here, and they're frequently not running the same experiment.

Speaker 5

可能只有一台机器人在采集数据。这意味着我们的训练速度会比计算机模拟任务慢好几个数量级。

So it might just be one robot collecting data. And that means our training will be orders of magnitude slower for comparable tasks compared to ones that are running in simulation on computers.

Speaker 1

这个房间里的进展明显更慢。这里的智能体成就不太突出。角落里有台机器人正试图用钩状夹爪拾取乐高积木,旁边还摆着个装满变形积木的不祥盒子。

Progress is slower in this room, and you can tell. The agents here are a little less accomplished. In another corner, another robot is trying to pick up a LEGO brick with a hooked gripping device, kind of like a claw. And there's a rather ominous box of mangled LEGO bricks next to it.

Speaker 5

在训练探索阶段,它的夹爪会随机开合。这台应该设定了接近积木时自动闭合的程序。现在要闭合了。

In the exploration phases of training, it will sort of randomly open and close its gripper. And I think this one has some shaping to close its gripper when it detects it's close to the brick. Close now.

Speaker 1

你可以关闭我。

You can close me.

Speaker 5

然后它还有一个固定的训练时间。几秒钟后,它就会直接给出

And then it also has a fixed training time. So after some number of seconds, it will just just give

Speaker 3

放弃

up and

Speaker 5

回到起点重新尝试。乐高就像是通用操作的基础模块。如果我们能把两块积木叠在一起,就能实现近乎任意的操作——他要把它弄坏了。哦,没关系。

go back to the start and then try again. LEGO is sort of this building block to general purpose manipulation. If, you know, if if we can stack, like, two bricks together, we can then do kind of arbitrarily He's gonna break it. Oh, it's fine.

Speaker 3

抱歉,我被机器人砸乐高的场景分散了注意力。

Sorry. I got distracted by the robot smashing up the Lego.

Speaker 1

这里确实存在巨大潜力。所以杰姬和团队一直在努力寻找加速学习过程的方法。

There is real potential here. And so Jackie and the team are constantly trying to find ways of speeding up that learning process.

Speaker 5

我们正在研究的一项技术(这里有些研究人员过去做过非常出色的工作)叫做虚拟到现实的迁移学习。具体来说,就是在计算机上建立机器人模型进行仿真,我们能大致了解机器人的特性、其动作对环境的影响,以及如何完成与现实学习任务相似的操作。当我们在仿真环境中不接触真实机器人的情况下完成这些后,就能将收集的数据迁移到实体硬件上。

One technique that we are looking into in order to and that some researchers here have done really cool work on in the past is something we call centriereal or simulation to reality transfer, which is you where you take a simulation on a computer that models your robot, and we can learn kind of in broad strokes what the robot is like, how its actions affect its environment, and how it can do something similar to a task it's trying to learn in real life. So once we figure out all of that in simulation without even touching the real robot, we can transfer the data it's collected onto real hardware.

Speaker 3

所以基本上你们是在作弊。没错。你们可以通过在计算机中模拟真实机器人来作弊,计算现实中会发生的所有物理现象,然后运用相同的技术。你们用那支计算机大军在真正应用到实体机器人前就获得先机。那么真实机器人最终会表现得和模拟机器人一样吗?

So you're you can cheat, basically. Yeah. You can cheat by imagining the real robot within a computer, calculating all the physics that would happen in real life, and then use the same techniques. You use that army of computers to give you a bit of a head start before you even apply it to the real physical robot. So the real robots end up acting the same way as your simulated robots do?

Speaker 3

嗯,它们

Well, they

Speaker 5

最初会和模拟机器人的表现一样。但随着更多训练,当它们进入现实时,可能会开始表现出略微不同或更好的行为。

start out acting the same way as the simulated robots do. And then as they train more, they might start behaving slightly differently or better when they go into reality.

Speaker 1

使用模拟或许能让你在现实中抢占先机,但永远无法完全匹配。真实机器人需要应对抓握力、摩擦力、重力、磨损等问题——这些在现实世界中都至关重要,但没有一个能在计算机中被完美呈现。所有这些都意味着,我认为科幻作品可能设定了一些错误的预期。

Using simulation might give you a head start on reality, but it's never going to match precisely. The real robot has to contend with grip, friction, gravity, wear and tear, all of which play important roles in the real world, but none of which will be perfectly represented inside the computer. And all of that means, well, I think science fiction may set some false expectations.

Speaker 3

有件事我在这里感到有点意外,别误会,这些机器人有点糟糕。

One thing that I am a little bit surprised about being in here, don't take this wrong way, that these robots are a bit rubbish.

Speaker 5

是的。确实如此。我是说,我们还有很多工作要做。

Yes. It's true. I mean, we've got a lot of work to do.

Speaker 1

但就像DeepMind这里发生的许多事情一样,重点不在于这些具体的智能体。不在于杯球游戏或乐高积木堆叠。而在于正在获取的智能类型,以及它如何融入更宏大的图景。

But as with so much of what happens here at DeepMind, it's not so much about these exact agents. It's not about cup and ball or Lego stacking. It's about the type of intelligence being acquired and how that fits into the bigger picture.

Speaker 5

我们想要展示通用物理智能。物理智能。

We wanna demonstrate general purpose physical intelligence. Physical intelligence.

Speaker 1

对,所以

Right. So

Speaker 5

与之相对的是非具身化的智能,这种智能或许能学会玩游戏,甚至理解语言。物理智能研究的是你身体的物理动作如何影响现实世界。因此我们希望机器人能掌握多种任务——摆弄物体、使用工具,未来或许还能行走或奔跑。我们要证明机器人能自学完成这些任务。

contrasting that with kind of a intelligence that's not embodied, which maybe can learn to play games or maybe even understand language. Physical intelligence is looking at how physical actions of your body affect the real world. So we want to take a wide variety of tasks, playing with objects, using tools, maybe walking around or running in the future. And we want to show that robots can teach themselves those how to do those tasks.

Speaker 1

在物理世界里?在物理世界里。没错。但物理智能当然只是智能的一种类型,是机器人的一项技能。正如我们所见,DeepMind敢于怀抱远大梦想。

In the physical world? In the physical world. Yes. But physical intelligence is, of course, only one type of intelligence, one string to the robot's bow. And DeepMind, as we've seen, dares to dream big.

Speaker 6

这看起来太酷了。

That looks so cool.

Speaker 1

现在有请穆雷·沙纳汉带来压轴发言。

Here's Murray Shanahan with the big finish.

Speaker 2

人工智能研究的终极目标是构建通用人工智能。即打造能像人类一样擅长处理海量多样化任务的AI。我们人类并非专才,一个年轻成年人可以学会做无数事情——比如烹饪食物。

The holy grail of AI research is to build artificial general intelligence. So to build AI that is as good at doing an enormous variety of tasks as we humans are. So we are not specialists in that kind of way. A young adult human can learn to do a huge number of things. You can learn to make food.

Speaker 2

你可以学习创办公司,学习制造东西、修理东西。你能做许多事情,比如交谈、养育子女,所有这些事情。我们真心希望能构建出具备同等通用性的人工智能。

You can learn to, make a company. You can learn to build things, to fix things. You can do so many things, to have conversations, to rear children, so all all of those things. And we really want to be able to build AI that has the same level of generality as that.

Speaker 1

若想了解更多关于机器人技术和AI技术安全的内容,请查看节目注释,在那里你还能探索DeepMind之外的AI研究世界。我们欢迎您就本系列涉及的任何人工智能方面提供反馈或提问。如果你想参与讨论,或向我们推荐你认为其他听众会感兴趣的故事或资源,请随时告知。你可以通过Twitter给我们留言,或发送邮件至podcast@deepmind.com。

If you want to know more about robotics and technical AI safety, then head over to the show notes where you can also explore the world of AI research beyond DeepMind. And we'd welcome your feedback or your questions on any aspects of artificial intelligence that we're covering in this series. So if you want to join in the discussion or point us to stories or resources that you think other listeners would find helpful, then please let us know. You can message us on Twitter or you can email us podcast@deepmind.com.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客