本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
大家好,欢迎回到DeepMind播客。过去两期节目中,我们一直在探讨DeepMind解决智能问题的目标,思考这究竟意味着什么,并探索了可能实现这一目标的若干路径。而本期节目将聚焦机器人领域,我们将深入探讨物理智能的概念。为此,我将带大家探访位于伦敦国王十字区的机器人实验室幕后。
Hello, and welcome back to DeepMind, the podcast. Over the last two episodes, we've been exploring DeepMind's goal of solving intelligence, asking what that actually means, and traveling along some of the roads that could take us there. This time, it's all about the robots. We'll be exploring the idea of physical intelligence. And to do that, I'll be taking you behind the scenes of the robotics lab in Kings Cross, London.
这里有三个黑色人形机器人。它们身体呈立方体结构,但配有四肢和微型头部。不过体型相当小巧,大概只有大型鸡只的尺寸。
You've got three humanoid robots. They're black. They've got a very sort of cuboid body, but they have arms and legs and even little tiny heads. They're quite small, though. They're probably the size of a large chicken.
没错。比鹅小些,比鸡大些。
Yeah. Smaller than a goose, bigger than a chicken.
我是汉娜·弗莱,这里是第四期节目《让我们动起来》。在激活机器人实验室通行证之前,请允许我先介绍些背景知识:为何这家以让机器下棋和折叠蛋白质闻名的公司会对机器人技术如此着迷?六月里,我带着封城冬眠后明亮的双眼,造访了英格兰西南部的温泉小镇切尔滕纳姆。
I'm Hannah Fry, and this is episode four. Let's get physical. Now, before our robotics lab passes are activated, let me fill you in on a bit of background. Why would a company known for getting machines to play board games and fold proteins find robotics so alluring? In June, I emerged bright eyed from my lockdown induced hibernation to visit Cheltenham, a pretty spa town in Southwest England.
切尔滕纳姆科学节是汇聚全球顶尖科学家与思想家的年度盛会,这里成为我自新冠疫情封锁后首次面对面采访的场所。机缘巧合,我的采访对象正是DeepMind机器人技术总监拉娅·哈德泽尔。
The Cheltenham Science Festival, an annual event attracting the world's leading scientists and thinkers, was the setting for my first in person interview since the COVID nineteen lockdown. And as luck would have it, my interviewee was Raya Hadzell, DeepMind's director of robotics.
能再次置身于满屋子的英国人中间真是太好了。
It's very nice to be in a room full of people again, Britains.
拉娅堪称机器人技术与人工智能领域的百科全书,首先就从这两者的区别说起吧。
Raya knows pretty much all there is to know about robotics and artificial intelligence, starting with the difference between them.
当我们思考人工智能时,很多时候人们会立刻联想到机器人作为AI的具体化身。
When we think about artificial intelligence, a lot of the time people immediately go to a robot as being the instantiation of that AI.
想想电影里看到的机器人吧——C-3PO、瓦力、偏执狂机器人马文。它们都是拥有机械躯体的智能生命,都能感知环境并做出决策。同样地,我们对未来超级人工智能的想象也很少只是一个没有实体的声音,除
Just think about the robots you see in films. C three p o, Wall E, Marvin the paranoid android. They're all intelligent beings with robot bodies. They're all able to reason about their environment and make decisions. Likewise, our visions of super intelligent AI long into the future is rarely just a disembodied voice, apart
少数例外
from a couple
比如电影《她》中的场景。在许多人心中,机器人和人工智能就是同义词。
of exceptions, like in the film Her, for example. Robots and AI are synonymous with one another in many people's minds.
但实际上两者应该区分开来。AI是一种计算机程序,通常通过大量数据训练,能够以类人的方式回答问题。比如实现法语到英语再到中文的翻译——这类问题正是AI可能擅长的。而机器人则是通过行动改变世界的实体。
But really, the two should be distinguished. AI is a computer program that's usually trained on a lot of data to be able to give answers to questions in a similar way that a human might. So think about being able to translate from French to English to Mandarin. These are the types of problems that an AI might be able to do. A robot on the other hand takes actions and changes the world.
无论是通过触碰世界来操控物体(比如进行组装),还是自主移动的机器人。我们可以将两者结合看待:AI将成为推动机器人能力实现下一轮突破的自然途径。
Either manipulation through touching the world and moving things around, maybe doing assembly, or a robot that can move itself around. Then we can think about the two together as AI being a really natural way to bring us to the next set of breakthroughs for what robots can do.
如果你听过本播客第一季,就会知道机器人并不必然内置AI这个观点。从技术层面说,你的洗碗机、割草机、压力锅都属于机器人——它们是能自动执行系列动作的机器。但DeepMind感兴趣的并非这类机器人,它们并非通往通用人工智能的路径。他们的机器人运用机器学习技术,自主掌握执行不同任务的能力。
If you heard the first series of this podcast, you'll already be familiar with the idea that robots don't necessarily come with AI built into them. Your dishwasher, your lawnmower, your pressure cooker are all, in the technical sense, robots. They are machines that are capable of carrying out a series of actions automatically. But these aren't the sorts of robots that DeepMind is interested in as a route to artificial general intelligence. Instead, their robots use machine learning techniques to learn for themselves how to perform different tasks.
那么这一切看起来是怎样的呢?什么样的机器人正在研究设施中漫步,将算法经验应用于现实世界,探索物理智能的最前沿?为什么不进来看看呢?欢迎回到机器人实验室。
So what does all of this look like? What kind of robots are being trained to saunter around the research facilities, to ground algorithmic experience in the real world and explore the absolute cutting edge of physical intelligence? Well, why don't you come on in? Welcome back to the robotics lab.
认识一下
Meet
机器人团队的软件工程师Akhil Raju。即使他的脸被口罩遮住,你仍能从他带我参观实验室时眼中闪烁的兴奋看出他的热情。
Akhil Raju, a software engineer on the robotics team. You can see the excitement in his eyes as he shows me around the lab, even while the rest of his face is covered by a mask.
所以这次会比你们上次看到的规模要大一些。
So this is gonna be a little bigger than the last time you were having.
天啊,太壮观了。哇。是啊。就像你去贸易展时看到的那种在巨大空间里搭起的小摊位,感觉有点像那样。我们在这个大混凝土建筑里,一侧全是玻璃。
Oh gosh, massive. Woah. Yeah. You know if you ever go to a trade show and they have like little stalls up in a giant space, it sort of looks a little bit like that. So we're in this big concrete building with lots of glass along one side.
然后沿着这条通道都是小隔间,看起来像是隐私屏风,不过是给人用的。没错。至于机器人,没人在乎它们的隐私
And then you've got these little booths all the way along with I mean, they sort of look like privacy screens, but privacy for the humans. Exactly. The robots, no one cares about the privacy of
机器人。是啊。机器人想干什么都行。
the robots. Yeah. The robots can do whatever they want.
是的。这些小型工作站里配备了各种你能想象到的尺寸和形状的机械臂——有像起重机般的高耸长臂,有短粗的矮臂,还有末端带抓手的,就像游戏厅里看到的那种。所有这些机械臂都是DeepMind研究的一部分,旨在让机器人能灵巧地操作日常物品。
Yeah. Yeah. These mini booths are robotic arms of every size and shape imaginable. Tall crane like arms, short and stubby ones, and arms with grippers on the end like the kind you'll see in a games arcade. All of these arms are part of DeepMind's research into getting robots to dexterously manipulate everyday objects.
Akhil带我走进其中一个工作站近距离观察。
Akhil ushered me into one of the booths to take a closer look.
你看这张桌子伸出来的大机械臂,知道那些高档厨房里的立式搅拌机吗?想象一下那种东西,但放大版。它整体相当粗壮且关节繁多,还装有摄像头。而在最末端有个小小的钥匙,我猜它正试图把钥匙插入锁孔。
So this big arm that is extending out of a table, you know those stand mixers that you get in posh kitchens? Imagine one of those, but like a giant version. So it's kind of quite bulbous and pervascious with all of these joints and cameras attached to it. And then right on the end, there's a teeny tiny key and it's, I guess, trying to put a key in a lock.
没错。这个机器人配有这种附件,可以像插入USB接口那样操作,或者插钥匙等等。我们正在研究如何实现精细操作,把日常生活中可能遇到的任务作为挑战课题。
Yep. Exactly. This robot has kind of this attachment where it can insert like a USB in in a USB hole or maybe a key or so on. And so we're trying to learn how to actually do very fine manipulation. We're taking tasks that you might do in everyday life and we're using that as a challenge.
假设你想在工厂里部署这种机器人执行精细插入任务,为什么不能直接预编程呢?为什么需要它具备自主学习能力?
If you wanted to have one of these robots in a factory, say, doing this really fine insertion task, why can't you just pre program one? Why does it need to be something that has trained itself?
如果是固定场景,我们确切知道钥匙和锁孔的位置,那确实可以直接编程。但问题在于,并非所有工厂都符合这种理想条件。
If it was a case where it's very fixed settings, we know exactly where the key is, we know exactly where the hole is, then probably yeah, you can just program it. The thing is, that's not how all factories really are.
许多需要执行插入任务(比如钥匙开锁)的工厂都存在大量变量,每次锁和钥匙的起始位置并不完全相同。这就使得挑战从可预编程任务变成了更复杂的难题。
A lot of factories that might require some kind of an insertion task, like putting a key in a lock, will also have a lot of variables at play, so that the lock and key aren't at precisely the same start points each time. And that changes the challenge from being something pre programmable to something much harder.
实际上你会发现,当工厂需要进行这类插入操作时,目前现实中完成这些工作的并非机器人,而是人类。这也正是我们选择插入任务作为研究课题的另一个原因——因为这项技术在更广泛的机器人学界尚未得到充分解决。
And what you'll notice actually is when these types of insertions need to happen in a factory, it's not robots that do it in the real world now. It's humans. And that's another reason why we chose insertions as a task, because it's somewhat unsolved by the greater robotics community.
你可能会好奇这一切究竟如何实现。如何让一个没有生命的机械臂自学开锁?时至今日,答案可能不会让你太意外——训练物理智能的核心方法之一正是DeepMind最钟爱的强化学习。简而言之,就是通过积分奖励算法完成任务,比如正确将钥匙插入锁孔。机器人技术之所以适配基于强化学习的算法,自有其道理。
You might be wondering how on earth any of this is possible. How do you possibly set up an inanimate robot arm to teach itself to open a lock? Well, by now, it probably won't surprise you that one of the fundamental methods for training physical intelligence is that deep mind favorite approach, reinforcement learning. In the simplest terms, this involves rewarding an algorithm with points for accomplishing a task, like correctly inserting a key into a lock. And there is a reason why robotics is geared up for algorithms based on reinforcement learning.
这位是DeepMind蒙特利尔办公室负责人多娜·普里库普,她是强化学习领域的全球权威专家。
Here's Doyna Prikup, head of DeepMind's Montreal office. She is a world expert in reinforcement learning.
用奖励机制来描述机器人任务非常直观,因为你能直接观察到机器人是否正确执行了动作,比如将物体放置在指定位置。因此很容易将问题转化为强化学习问题。自然界中通过奖励训练动物完成复杂动作的案例,也启发我们将这一理念应用于机器人领域。
It's very easy to imagine expressing robotics tasks in a reward language because you can observe when the robot is doing the correct thing, let's say putting an object in a particular place. And so it's very easy to phrase the problem as a reinforcement learning problem. And of course, we know from the natural world, animals trained by reward to do complicated physical tasks would like to take that idea to robotics as well.
训练狗狗捡东西时,你不会详细指导它该如何调动每块肌肉去奔跑、拾取并送回物品。而是在它完成动作后给予零食奖励,让它自行摸索最佳动作方案。某种程度上,AI机器人内部的某些算法就像狗狗一样——只不过它们获得的奖励是数字而非美味饼干。这或许让强化学习看起来像万能钥匙,但实际上情况更为复杂。诸如钥匙开锁这类物理任务会面临'稀疏奖励'的难题。
If you want to get a dog to go fetch, you don't carefully explain how it should move each one of its muscles in order to run towards an object, retrieve it, and give it back to you. Instead, you reward it with a treat when it does what you want, and it learns by itself how best to calibrate its body in the performance of that task. In this way, some of the algorithms inside AI robots are much like dogs, except they're rewarded with numbers, not tasty biscuits. This might make it seem like reinforcement learning is a magic bullet, but in practice, things are a bit more complicated. Physical tasks like inserting a key into a lock are subject to a problem known as sparse reward.
如果非要等到机器人偶然成功将钥匙插入锁孔才给予奖励,那可能要等待非常漫长的时间。因此机器人团队一直在寻找其他方法为机器人指明正确方向。
If you waited to reward a robot until it successfully put a key into a lock just by chance, you would be waiting around for a long time. So the robotics team has been looking for other ways of putting their robots on the right track.
在机器人学习过程中,当它接近成功却功亏一篑时,人类可以介入指导:'这样调整,或许该往左移动一点'。虽然我们面对的是稀疏奖励机制——要么完全成功要么彻底失败,就像钥匙要么插进锁孔要么没插进——但机器人会同时利用这种稀疏性信息与人类指导信息,两者的结合正是它的学习方式。
While the robot is learning to do it, a human comes in and when it gets close but no cigar, a human can take over and just be like, adjust like this, maybe move to the left a little bit. And so while we might have a sparse reward, so it's kind of like it's all or nothing, you know, you're in the locker, you're not in it. What the robot will use is both that information of sparsity, but also maybe information from a human and kind of the combination of those things is how it might learn.
尽管这类学习算法在某些领域已能成功完成任务,但你不该误以为这很简单,因为实验室里并非所有机器人都如此出色。
And while there are certain areas where learning algorithms like this one have been able to successfully accomplish tasks, you shouldn't be fooled into thinking this stuff is easy, because not all the robots in this lab are quite as accomplished.
我上次来的时候,看到一台机器人在叠乐高积木。无意冒犯,
When I was here last, I saw a robot that was stacking Lego bricks. Not to be rude,
但我不会说这是我有生以来见过
but I wouldn't say it was the most impressive thing I'd ever
最令人印象深刻的东西。它现在表现如何?
seen in my life. How's it doing now?
我们可以移到实验室另一侧,就能看到那些东西了。
We can actually move to the other side of the lab and we can start to see that stuff.
阿基尔带我去了另一个机器人工作站,里面有一台红黑相间的机械臂。末端装有一个带两个抓取部件的夹爪,有点像垃圾夹的抓取部分。它正悬停在一个装有三个三维形状的托盘上方,目标是学会如何将红色金字塔形状叠放在蓝色八棱柱上。
Akhil took me to another robot cell with a red and black robot arm inside. It had a gripper on the end with two appendages, a bit like the grabby bits of a litter picker. And it was hovering over a tray containing a trio of three d shapes. Its goal was to learn how to stack the red pyramid shape on top of the blue octagonal prism.
所以它只有一种方式能抓稳这个红色物体并成功拿起。但它还没弄清正确方向。可惜每次它试图旋转抓取时——等等,我想它成功了。它做到了。
So there's only one way round that it can hold this red object and successfully pick it up. And it hasn't worked out which way. And unfortunately, every time it tries to rotate and pick it up oh, hang on. I think it's got it. It's got it.
幸好这些东西不会灰心丧气,我的天啊。我多少年没来这里了?不,整整这么久。它一直在这里尝试、尝试、再尝试。
It's good job these things don't get disheartened because my goodness. It's been how many years since I've been here? No. All this time. It's been here trying and trying and trying.
所以我们目前看到的更像是一种训练状态,还不是最佳表现。
So we're seeing something that's kind of training right now. So we're not seeing our best
别找借口。
Don't make excuses.
为什么学习这些灵巧操作如此重要?
Why are these dexterous manipulations are so important to learn?
我们在DeepMind设立机器人实验室的原因之一,就是要将通用人工智能的研究扎根于现实世界,确保我们研发的AGI是真正的AGI。比如如果我们实现了AGI,它至少应该能把一个物体叠放在另一个物体上。
So one of the reasons that we have a robotics lab at DeepMind is really to ground our search for AGI in the real world to make sure that our progress towards AGI is true AGI. Like, if we find AGI, it it probably should be able to stack an object on another object.
说到物体,在这排机械臂旁边,我注意到一个装满儿童玩具的篮子,里面有橡皮鸭、泡沫香蕉和一个备受喜爱的卡通角色。
And speaking of objects, next to this row of robot arms, I noticed a basket full of children's toys, rubber ducks, foam bananas, and a much loved cartoon character.
我发现海绵宝宝还在这里,这次坐在角落里。还有,等等,绿色的小橡皮鸭。这些东西的设计理念是什么?
I noticed SpongeBob is still here, sat in the corner this time. There's also, hang on, little green rubber ducks. What is the idea behind this stuff?
这类可玩性物品非常棒,因为操控那些能弯曲移动的物体,对我们的智能体来说是种需要学习的新型物理特性。
So these kind of play things are really nice because manipulating objects that can bend and move and stuff like that, that's a new type of physics that our agents need to learn.
在某个垃圾填埋场里,会不会有一堆被机器人压扁的泡沫香蕉?
Somewhere in a landfill, is there a pile of sort of crushed foam bananas that robots have
我们目前还没损坏任何香蕉。我可以
We haven't destroyed any bananas yet. I can
你们一个香蕉都没弄坏过?
You haven't destroyed any bananas?
我很高兴这么说。
I'm happy to say.
我才不信呢
I don't believe that
一秒钟都不信。
for a second.
虽然看着这些机械臂尝试将U盘插入电脑失败,或是抛掷泡沫香蕉很有趣,但值得记住的是机器人实验室展示的这些项目有着重要意义。构建能与物理世界互动的人工智能,被认为是实现智能本质这一终极目标的核心。以下是拉雅·哈德扎尔在切尔滕纳姆科学节上的发言。
As fun as it is to watch these robot arms try and fail to insert USB sticks into computers and sling foam bananas around, it's worth remembering that the projects on display in the robotics lab serve an important purpose. Building AI that can interact with the physical world is considered central to the overarching goal of solving intelligence itself. Here's Raya Hadzal again speaking at the Cheltenham Science Festival.
当我们思考人类智能时,多数时候关注的是语言能力或认知技能,比如数学能力。但实际上,我们大脑的很大部分进化都是为了控制身体运动。因此我认为这种运动智能、动作智能是我们智能的核心组成部分,认知技能正是建立在这个基础之上的。
When we think about human intelligence, a lot of the time we focus on things like language or our cognitive skills, how good we are at math. But really a lot of our brain has been developed in order to just move our bodies. So I think that that level of intelligence, motor intelligence, movement intelligence, this is a core part of our intelligence, and that's what our cognitive skills are built on top of.
这种专注于创造能自主学习的智能机器人的理念,正是DeepMind的机器人看起来可能略显初级的原因之一。我相信大家都会联想到网上那些机器人后空翻、被推倒后站起、完成各种高难度动作的视频。为此我特意询问了拉雅·哈德泽尔的看法。
This focus on creating intelligent robots which can learn for themselves is part of the reason why DeepMind's robots might seem a little bit, well, rudimentary compared to what else is out there. Because I'm sure that all of you are thinking about those videos on the Internet of robots doing backflips, being pushed over, getting back up, performing all kinds of incredibly sophisticated movements. So I thought I'd ask Raya Hadzelt about this.
汉娜,你不能轻信网上看到的一切。
You can't believe everything you see on the Internet, Hannah.
说得好。
Well put.
完全正确。确实有些机器人能完成翻跟头、跳跃等令人印象深刻的动作。但在DeepMind,我们更关注通用性这个维度——也就是AGI中的'G'。我们希望机器人能通过经验或观察人类,自主学会从未接触过的新技能,而不需要人为编程。
You're absolutely right. There are robots that can do some pretty impressive stuff that can flip, that can jump. At DeepMind, we've been focusing more on the generality aspect of it, the g in AGI. We want robots that can learn new things that they've never done before without needing somebody to program them just through experience or through watching a human.
所以那些令人惊叹的(非伪造的)机器人后空翻视频,本质上是在执行一套非常精确的指令对吗?是的,我们...
So those very impressive videos, the ones that aren't fake of robots doing backflips, they are essentially following a very precise set of instructions. Is that essentially what Yeah, we're
确实如此。这些演示往往展现了机器人实际能完成的任务。一个能后空翻的机器人之所以令人印象深刻,是因为完成这个动作所需的功率质量比。但这与要求机器人执行它首次观察到的新技能是截然不同的。
absolutely. And they tend to be a demonstration of what that actual robot can do. A robot that can do a backflip, that's very impressive because of the power and mass ratio that's required to do that. But it's very different from wanting that robot to do a new skill that it has just observed for the first time.
它无法走到桌前拿起咖啡杯,因为它确实做不到。
It couldn't walk over to a table and pick up a coffee cup, for It could not.
好吧,你让我失望了,对吧?但理论上,你的机器人未来或许能做到。
Well, you've disappointed me, right? But yours could, in theory, in future.
我们的机器人能做到这点,还能给土豆除草和采摘番茄。
Ours could do that, and weed potatoes and pick tomatoes as well.
这才是关键所在。如果机器人能自主学习操控物体和移动,它们就能具备适应性,在大量关键任务中为人类提供协助,包括目前无法支持我们的场景。
This is the key point here. If robots can teach themselves to manipulate objects and move around, they can be adaptable and offer assistance to humans in a whole host of critical tasks, including situations where they can't currently support us.
这个问题在日本福岛核事故期间就凸显出来了。
So this came up when there was the Fukushima disaster in Japan.
日本一座核电站因昨日强震受损后发生爆炸。福岛核电站上空可见滚滚浓烟升起。
There's been an explosion at a Japanese nuclear power station damaged in yesterday's massive earthquake. Clouds of smoke could be seen rising above the Fukushima nuclear site.
人们意识到我们缺乏有效方法将机器人送入这片极度危险的放射性区域进行维修。因为我们现有的机器人要么需要易于进出的环境,要么缺乏必要的灵活性来执行诸如关闭阀门或开启门等操作。因此整个机器人项目都致力于两个目标:如何改进足式机器人在轮式机器人无法进入区域的移动能力,以及如何提升机器人的操作灵活性。
People realized we didn't have a good way to send robots into this extremely dangerous radioactive area and make repairs. Because all of our robots either required an area that was easily accessible or didn't have the necessary dexterity to, for instance, shut a valve or open a door. And so there was a whole robotics program aimed at how do we improve legged locomotion into areas where a wheeled robot can't go, and how do we improve the dexterity of robots as well.
当然,这里也存在另一面。如果未来这些人工智能机器人足够先进,能够部署到现实世界拯救人类生命,它们同样可能被制造出来做相反的事。
Of course, there is a flip side here. If in the future, these artificially intelligent robots are good enough to be deployed in the real world for saving human lives, they could also be built to do the opposite.
机器人曾被用于携带武器。因此如果你制造出能力更强的机器人,实际上可能是在制造更强大的武器载体。当然,DeepMind坚决反对包括机器人在内的自主武器系统。我认为机器人带来的益处及其对世界的贡献远超过这些风险,特别是当国际社会共同反对将机器人技术用于武器时。
Robots have been used to carry weapons. And so if you make a more capable robot, then potentially what you're making is a more capable vehicle for holding weapons. Of course, DeepMind is very much against autonomous weaponry, including on robots. And I think that the benefits of robots and what they can do in our world outweigh these risks, especially if the world stands strongly against the use of weaponry and robotics.
这并非机器人研究面临的唯一伦理问题。许多人担忧自动化对劳动力市场可能造成的负面影响。
And this is not the only ethical concern about robotics research. Lots of people are worried about the possible detrimental of automation on the workforce.
当前我们关注的是如何利用机器人增强人类能力。比如建筑工地上工人身旁的机器人可以协助完成重物搬运。重点不在于取代人类,而在于扩展人类的能力边界。
What we're looking at now with the use of robots would be to augment humans. Somebody working on a construction site that has a robot next to them that's able to do some of the heavy lifting, for instance. It's not about displacing humans or replacing them, it's about enhancing what a human can do.
任何用于马铃薯除草和番茄采摘的机器人当然都需要掌握移动能力。在DeepMind机器人实验室,近期重点研发的是双足行走机器人——这个课题本身带来了一系列独特的研究挑战。
Any robot that's going to help with weeding potatoes and picking tomatoes will, of course, need to have mastered locomotion. Back at the DeepMind Robotics Lab, a recent focus has been to develop a robot which can move around on two legs a problem which comes with its own unique set of research challenges.
地板上铺着某种类似儿童游戏垫的东西。
On the floor, we've got what looks sort of like the play mat that you put down for kids.
阿基尔带我参观了一个机器人围栏,大约九平方米大小,四周设有围栏,大概是为了防止里面的机器人逃跑。在这个方形区域内,你会看到
Akhil showed me a sort of robot playpen, about nine meters squared with a barrier around it, presumably to stop the robots inside from escaping. So inside this square then, you've got
三个类人机器人。它们通体黑色,身体呈立方体状,但配有四肢和微小的头部。不过它们体型相当小巧,大概只有一只大鸡的尺寸。
three humanoid robots. They're black. They've got a very cuboid body, but they have arms and legs and even little tiny heads. They're quite small though. Should tell you that they're probably the size of a large chicken.
对,比鹅小些,比鸡大些。具体我也说不准。我们主要是在教它们学习行走。也就是说,机器人要实际学会运用它的腿,甚至手臂。
Yeah. Smaller than a goose, bigger than a chicken. I don't know. And basically, what we've been doing is is learning to walk around. And so, like, robot actually learns to kind of use its legs and even its arms.
头部装有摄像头,它能学会环顾四周观察环境。所以在某种程度上,这非常像是一个全身协调控制的问题。
The head has a camera, it learns to kinda look around and see what's going on. So it is very much kind of almost like a whole body control problem in some sense.
我能摸
Can I touch
它吗?你可以抱抱看。
it? You can hold it.
天啊。好吧,还挺沉的。背部有小型把手,几乎像背包一样,还有许多接口,比如微型USB端口和以太网端口之类的。至于脚部,它装有小型滑垫,就像要去滑雪似的,只不过滑雪板特别短。
Oh my gosh. Okay. Well, it's quite heavy. It's got these little handles on the back, almost like a rucksack and lots of ports, like little USB ports and an Ethernet cable port and stuff. And then for feet, it's got these little skid pads, almost like it's going skiing, but just with really short skis.
它非常漂亮。现在我正抬起它的手臂,它会自动回到中间位置,动作非常流畅。听听这个声音。我感觉有点难过,像是想说请让我一个人待着。好吧。
It's very pretty. So I'm lifting its arm up now and it kind of returns to center, but it's got this really nice smooth action. Have a listen to that. I feel really sad sort of like, please make me alone. Okay.
它正在四处走动。想象一下你在夜店里跳着蹩脚的机械舞,就是那种样子。看起来随时会摔倒。所以你们没有编程让它绕圈走吗?
It's walking around. Imagine if you were doing a really rubbish robot dance in a nightclub. That is exactly what it looks like. It looks like it should fall over. So you haven't programmed this to walk around in a circle?
没有。这是机器人通过几天的数据学习自主掌握的。
No. This was learned on the robot just by learning from the data over a couple of days.
这是DeepMind的研究科学家简·亨特利希,他跟踪研究这些人形机器人已经超过一年了。
That's Jan Huntlich, a research scientist at DeepMind who's been following the progress of these humanoid robots for more than a year.
你们教过它像刚才那样直接仰面摔倒吗?
Did you teach it to fall flat on its back like it just did?
没有。它就是会自己摔倒。
No. It just falls.
不过它很擅长自己爬起来。
It's quite good at pushing itself up, though.
是的,这些都是预设程序。包括推地站起的行为也是程序设定的。
Yeah. So those things are programmed. The pushing behavior to stand up, that's programmed.
因为不然的话,你就得一辈子不停地去扶起
Because otherwise, you'd just spend your entire life picking up
机器人。要么我们得去扶它们,要么它们得学会自己站起来。
the robot. Well, either we'd need to pick them up or they would need to learn to stand up.
我们给它们起名字,某种程度上是在赋予它们人性。
We are kind of humanizing them by giving them names.
维吉里卡·普罗特罗坎是运动项目组的另一位研究科学家。这三个机器人叫什么名字?
Vijirika Protrochan is another research scientist on the locomotion project. What are the names of these three?
我记得其中有个叫英格兰,还有个叫梅西——取自足球运动员梅西。我那个叫哈吉,源自罗马尼亚伟大足球运动员哈吉或Humanae d AGI。因为我原本是罗马尼亚人。所以
I think one of them is England and one is Messi, from Messi the footballer. And mine is called Haji. That's from Humanae d AGI or the Haji, the great Romanian footballer. Just because I'm from Romania originally. So
你看那个,它采用了完全不同的训练流程,步态也明显不同。它甚至会尝试倒着走,实际上
if you look at that one, this is a completely different training process, and you can see that the gate is very different. And it can try to walk backwards, and it's actually
看起来它
It looks like it's
一个喝醉的机器人。它正试图倒着走,但有点
a drunk robot. So it's trying to walk backwards, but it's sort of
是啊。想想看,好吧,我觉得它死了。
Yeah. Think, okay, I think it died.
它真的很凄凉。
It's really fornorn.
我必须说,我本可以整天看着这些可爱但基本完全无助的小人形机器人。但我想了解更多关于训练它们行走的过程。所以在向英格兰、梅西和其他机器人挥手告别后,我问Jan和Viarika在疫情期间如何在家里的客厅训练这些机器人。它是怎么运输的?就装在小行李箱里吗?
I must say, I could have stood watching these cute and mostly completely hopeless little humanoid robots all day. But I wanted to find out more about the process of training them to walk. So after waving goodbye to England, Messi, and co, I asked Jan and Viarika about their experience of training these robots in their living rooms at home when the pandemic hit. How does it travel? Does it just pop in a little suitcase?
实际上,如果你购买它,会附带一个行李箱。自带行李箱,方便运输。
Actually, if you buy it, you get it with a suitcase. It comes with it, so it can travel.
你们的客厅相当大吗?
Do you have quite a big living room?
不算大,不过是的,我正在调整它。那里有个带地垫和泡沫墙的围栏。
Not that big, but yeah, I'm adapting it. I have a pen there with floor mats and foam walls.
所以晚上看电视时,你会把脚翘起来,身边有个小机器人围栏。
So when you watch TV at nighttime you sort of put your feet up and around you is a little robot pen.
没错,是的。我们还做过机器人看电视的实验。
Exactly, yeah. We even had experiments where the robot was watching TV.
真的吗?
Did you really?
嗯,想做一些实验来测试视觉网络。电视是多样视觉数据的好来源,而且本来就在客厅里,对吧?何乐而不为呢?
Well, wanted to run some experiments to test visual networks. TV is a good source of diverse visual data and it's already in the living room, right? So why not?
等等,你过去一年的工作就是坐在沙发上和你的机器人一起看电视?
So hang on, your job over the last year has been to sit on the sofa and watch TV with your robots.
不完全是那样,不过可能有几秒钟确实看起来像这样。是的。
Not quite that, but probably a few seconds of it, it does look like that. Yeah.
那么如何训练人形机器人行走呢?再次强调,其底层机制是强化学习。机器人会因前进速度和避免摔倒而获得奖励点数。如果完全不进行训练,它们会怎样表现?
So how do you train a humanoid robot to walk? Again, the underlying mechanism is reinforcement learning. The robots are rewarded with points for forward velocity and not falling over. When you haven't given them any training, what do they do?
噢,它们基本做不了什么。最多就是颤抖一两秒然后摔倒。经过几小时训练后,它们才开始真正行走——先迈出几步,之后会撞到墙,然后通过视觉学习如何避开墙壁。
Oh, they don't do much. They just start shaking for one second or two at most and then they fall. After training for a few hours then they start actually walking, taking a few steps and then later on they bump into walls and then using vision they learn how to avoid the walls.
我家里有个两岁小孩,对吧?听你描述的过程,和两岁孩子学走路的方式其实很相似
So I have a two year old at home, right? And the way you're describing it, it's not dissimilar from the way that the two year old has
学习行走的过程。这里有
learnt to walk. There's a
很多摔倒,倒不怎么
lot of falling, not that much
颤抖,更多是抽搐和乱摆手臂。也会经常撞到墙。你是否发现机器人学走路和幼儿学步的这些相似之处?
shaking, sort of twitching and flailing. There was also a lot of walking into walls. Do you see those similarities with the way that these robots learn to walk and the way that toddlers learn?
确实存在相似性。不过幼儿在爬行前就会探索自己的身体,学习移动四肢。而我们的机器人是直接被放置在站立姿势开始学走的。
There are some similarities, probably for toddlers, even before they crawl, they still discover their body, they still learn to move their limbs. Whereas our robots, we just put them in standing position and now walk.
它学会走路的速度有多快?
And how quickly did it manage to learn to walk?
我想大约24小时后它就已经会走路了。这让我印象深刻。
I think in about twenty four hours it was already walking. For me that's impressive.
不是实际时间的24小时,而是训练时间意义上的24小时。
Not twenty four hours in real time, but twenty four hours in sort of training time.
对,对。这大约跨越了一周的训练时间,但训练过程包括短暂的操作时段,直到某些部件损坏,或者把它带到实验室快速维修之类的。
Yeah, yeah. That spans about a week of training, but training like small sessions before something breaks or taking it to the lab for a quick repair or something like that.
Villarrica提出了关于这些机器人脆弱性的重要观点。现有硬件设计并不适用于需要机器人反复摔倒才能取得进展的机器学习技术。以下是Raya Hadsell的解释。
Villarrica raises an important point about the fragility of these robots. The actual hardware is not designed for a machine learning technique which involves a robot falling down loads of times before any progress is made. Here's Raya Hadsell to explain.
现今制造的机器人并不适用于我们认为对发展通用人工智能至关重要的学习模式。想想孩子学走路时的情形——每次跌倒后他们会自我修复并继续尝试。但机器人只能承受有限次数的摔倒就会彻底损坏。
The robots that are built today are not built for the type of learning paradigms that we think is key to developing AGI. Think about when a child learns to walk. Every time they fall down, they then heal from that and they keep on going. There's only so many times that a robot can fall down before it simply breaks.
这种方法伴随着各种困难和障碍,而这些是预编程机器人完全无需担心的。以下是Jan Humplik的再次说明。
This approach comes with all kinds of difficulties and hurdles that the pre programmed robots just don't have to worry about. Here's Jan Humplik again.
主要限制在于你确实需要从零开始。采用更传统的方法或许根本不需要数据,开箱即用。这些无疑是强化学习的劣势。
The main limitation is that you really start from scratch. With more classical approaches you perhaps don't need any data. It's just going to work out of the box. So these are certainly disadvantages of reinforcement learning.
难道不能作弊吗?一个机器人学到的经验不能传授给另一个吗?
Can't you cheat though? Can't what one robot has learned about the world be imparted onto another?
当然可以。分享知识的方式多种多样,特别是可以让多个机器人共同收集数据,这确实是扩大数据收集规模的有效途径。
Absolutely. And there are many different ways to share knowledge. In particular, you can just have multiple robots collecting data, and this is really the way to scale up this data collection process.
简提到的是一种称为'数据池化'的技术。哈吉和英格兰不是独立学习行走,而是定期将跌倒次数、跌倒时的传感器读数等数据上传至中央控制器,由控制器整合信息后反馈给每个机器人,让它们能基于集体学习经验更好地适应环境。
What Jan is talking about here is a technique called pooling. Instead of Haji and England learning to walk independently of each other, their data how many times they fell over, what their sensor readings were when they fell, etc is regularly uploaded to a central controller, which combines this information and feeds it back to each robot so that they can better navigate the world based on their combined learning experience.
我们可以追踪每个机器人的表现,经常讨论类似'我的机器人最近跌倒次数变多了,你的呢?这会不会变成竞赛?'这样的问题。
We can track each robot how well they are doing and we definitely discuss like, oh, okay, my robot starts falling more often now, is yours the same? Did it get quite competitive?
我一直强调这不是比赛。但每当有人篡改学习曲线,两个机器人就会比较'维吉卡领先了''简妮要赢了',我总得纠正说:只有两个机器人表现同步提升时,我们才算真正成功。
I kept telling everybody that it's not a competition. But yes, every time somebody would cheat the learning curves and there would be the two robots, they would be like, Oh, Vijrika is winning. Oh, Jani is winning. I'm like, No, no. We're only winning if the performances are the same on both robots.
说到团队协作,除了行走、插钥匙、叠积木等基础测试外,还有个重要项目能锤炼机器人未来所需的核心技能。为此,DeepMind延续一贯风格,将目光投向了游戏领域——特别是这项美丽的运动。
Speaking of teamwork, there are other environments beyond just walking around or inserting keys into locks or stacking bricks that serve as an important test project for the robots. A chance to hone in on a set of robot skills that would be useful to have in the long term. For that, in true DeepMind fashion, their focus has turned to games, and one in particular, the beautiful game.
要踢好足球,你必须能够控制自己的身体。你需要能跑能走。但除此之外,你还需要掌握运球和射门这些技巧。更高层次上,你还需要具备全场协调能力和战略意识。这确实是一项充满多层次挑战的运动。
In order to play football, you have to be able to control your body. You need to be able to run to walk. But then you also need to have these skills of dribbling and shooting. And then at even a level above that, you need to have the coordination and the strategy over the whole game. So it's really a challenge that has a lot of layers to it.
目前,DeepMind教授足球的对象并非真实机器人,而是模拟的人类形态计算机化身,有点像你最爱电子游戏中球员的简化版。这些球员的不同之处在于他们的动作库并非预先编程。但和真实机器人一样,他们实际上是从零开始学习移动。重点不在于让机器人在不久的将来能在温布利球场踢球——尽管那会很有趣。
So far, DeepMind has been teaching football not to real robots, but to simulated ones, computerized avatars in human form, a bit like a simplified version of the players in your favorite video game. The difference with these players is that their repertoire of movements is not preprogrammed. But like the real robots, they are effectively learning to move from scratch. The point is not to have robots playing at Wembley Stadium in the near future, however fun that might be.
我们真正想研究的是:通过足球这类运动的奖励机制和竞争来训练这些方法是否有价值,还是说存在其他更适合训练这类行为的方式。
We're really trying to study whether it's valuable to train these methods using reward and the competition of something like football, or whether there are other ways to train for this type of behaviour.
在强化学习这个大框架下,结合其他一系列技术有助于让智能体快速投入运行。这里他们还使用了模仿学习技术,通过收集真实人类足球比赛的视频素材,运用动作捕捉将球员关节运动转化为数据集,然后训练神经网络使这些模拟人形生物开始模仿真实
Underneath that big umbrella of reinforcement learning, it's helpful to use a series of other techniques to get the agents up and running. Here, they're also using something called imitation learning, which involves gathering video footage from real human football matches, using motion capture to translate the movements of each player's joints into a dataset, and then training a neural network so that these simulated humanoids begin to mimic the movements of real
球员的动作。这实际上是将不同类型的学习算法层层叠加。令人兴奋的是,最终成果是四个能在这片场地上驰骋的智能体。它们真正实现了全身协调控制和团队配合。
players. So this is really layering these different types of learning algorithms together. And the exciting thing is that the result in the end is four agents that can race around this field. And they've really achieved this level of whole body control and also team coordination.
接着Raya向我展示了有史以来第一段模拟人形足球赛视频,我将尽力解说精彩片段。现在为您呈现的是本赛季AI足球两大巨头——蓝队与人形联队(身着红色)的冠军争夺战。比赛开始,蓝队的Drogbot控球,内切后突然变向右路。看啊!球被断下,Robo Naldo单刀直入!
And then Raya showed me my first ever video of a simulated humanoid football match, and I'll do my best to commentate on the highlights. So here we are for this season's title decider between these two titans of AI football, the Blues versus humanoids United playing in red. Well, the game's begun, and Drogbot has it for the Blues. Cuts inside and then chops back onto his right. But look, the ball is broken free and Robo Naldo is clear.
球进了!哇。人们常说政坛一日风云变幻,
Goal. Wow. They say a week is a long time in politics,
但在AI足球中,五秒钟确实堪称一个纪元。
but five seconds really is an epoch in AI football.
拉雅,尽管这段视频令人印象深刻,但不得不说在某些时刻它们对身体的控制简直滑稽得离谱。
As impressive as this video is, Raya, I think it's fair to say that at certain points they are quite hilariously bad at controlling their bodies.
是啊,我的意思是它们压根没打算在风格或优雅度上得分。这让智能体可以纯粹专注于实现目标——哪怕手臂乱挥也无所谓。但你能看出把这种系统移植到真实机器人上的问题所在。
Yeah, I mean, they are not trying to win any points on style or grace. So that really lets the agent optimize for just purely trying to achieve its goal. It doesn't matter if the arms are flailing around. So you can see the problem with putting this onto a real robot.
没错,我能理解
Yeah, I can see how
这对机器人来说确实是个问题。正是在足球比赛中,我们开始看到不同形态的智能相互融合。训练智能体踢足球能培养它们的盘带、传球等物理技能。但当结合强化学习算法(奖励团队协作行为)时,我们就能看到上集讨论过的那种协作型AI开始涌现。问题是:如果物理智能和社交智能能这样同步发展,物理智能能否成为通往通用人工智能的路径?
that would be a problem with the robots. It's in the game of football that we start to see the different flavors of intelligence converge. Training agents to play football gives them physical skills like dribbling and passing. But when combined with a reinforcement learning algorithm, which rewards them for team play, you start to see emerging the sort of cooperative AI we heard about in the previous episode. The question is, if physical and social intelligence can be developed in tandem in this way, could physical intelligence provide a path all the way to AGI?
这完全取决于你如何定义通用人工智能,不是吗?
It all depends on how you define AGI, doesn't it?
我经常注意到这种现象,确实如此。
I've noticed this a lot, yes.
也许不会立刻实现。要知道,从进化论角度看,从原始生物到人类经历了漫长历程。我认为,如果我们想从学习控制身体的基本原理开始构建通用人工智能,同样会是一条漫漫长路。但这就是我们正在探索的方向。
Maybe not immediately. You know, if you look at evolution, it's a very long path to go from initial creatures to human beings. Then I think also it could be a very long path if we want to build an AGI starting from first principles of learning to move a body. But that is what we are looking at.
Jan Humplik也认为,机器人技术要实现通用智能还需要很长时间。
Jan Humplik also believes that it will be a long time before robotics takes us to a general form of intelligence.
如果我问街上的人,机器人做什么会让他们印象深刻?他们可能会说,比如能打扫我的公寓。当你深入思考这个问题时,就会发现它需要具备视觉能力,必须理解人类语言才能接收指令,还需要理解'打扫公寓'的具体含义。
If I ask somebody on the street what would they be impressed by the robot doing? They would say something like, well, know, maybe cleaning my apartment. If you start thinking about this problem, you're like, okay, so it certainly needs to use vision. It certainly needs to understand human language because you need to give it command. It needs to understand what does it mean to clean the apartment.
这并不简单,因为打扫公寓不等于毁坏你的家具。要解决这类令人印象深刻的问题,本质上已经非常接近通用人工智能了。
And that's not trivial because cleaning doesn't mean destroying your furniture. Solving anything impressive like this is essentially getting very close to AGI.
但如果具身智能、社交智能和语言智能各自都无法通向通用人工智能,是否存在一条统一的路径?一些深度思维研究者确信存在这样的路径,而且它一直就在我们眼前。
But if embodied intelligence, social intelligence and linguistic intelligence don't necessarily lead to AGI on their own. Is there a single path that does? Well, some deep mind researchers are convinced that there is, and it's been staring us in the face this whole time.
当我们说'奖励机制就足够了'时,实际上是在论证智能的所有能力——从感知到知识、从社交智能到语言——都可以被理解为追求奖励最大化的单一过程。如果这个假设成立,就意味着我们只需要解决智能的一个核心问题,而非为每个独立能力解决上千个不同问题。
When we say that reward is enough, we're really arguing that all of the abilities of intelligence, everything from perception to knowledge to social intelligence to language, can be understood as a single process of trying to increase the rewards that that agent gets. If this hypothesis was true, it would mean that we only need to solve one problem in intelligence rather than a thousand different problems for each of the separate abilities.
以上就是本期DeepMind播客的全部内容,我是主持人Hannah Fry,节目由Whistledown Productions的Dan Hardoum制作。如果您喜欢本期内容,请为播客评分和评论,帮助其他对AI感兴趣的听众发现我们。下周同一时间再见。
That's next time on the DeepMind Podcast presented by me, Hannah Fry, and produced by Dan Hardoum at Whistledown Productions. If you like what you've heard, please do rate and review the podcast. Helps others who are also AI curious to find it. Same time next week.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。