本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
欢迎回到DeepMind播客。我是汉娜·弗莱,一位近年来密切关注人工智能领域显著进展的数学家。在本系列节目中,我将与科学家、研究人员和工程师们探讨他们的最新工作如何改变我们的世界。DeepMind的宏伟目标是破解智能之谜。在接下来的几集中,我们将探讨这可能会是什么样子,以及创造通用人工智能需要哪些能力。
Welcome back to DeepMind, the podcast. I'm Hannah Fry, a mathematician who's been following the remarkable progress in artificial intelligence in recent years. In this series, I'm talking to scientists, researchers, and engineers about how their latest work is changing our world. DeepMind's big goal is solving intelligence. And over the next few episodes, we're gonna be asking what that could look like and just what capabilities will be needed to create artificial general intelligence.
上期节目中,我们了解到为何大量研究投入大型语言模型,这些模型让机器具备了与人类交流的能力。但这里有些人认为,虽然交流本身是重要工具,但其真正价值在于促进另一种对人类智能至关重要的能力——协作能力。这是第三集《携手共进》。
Last time, we heard why lots of work is going into large language models that give machines the ability to communicate with humans. But there are some people here who believe that, while communication is an important tool in its own right, its real value lies in facilitating another skill that is integral to human intelligence, the ability to cooperate. This is episode three, Better Together.
交流是合作的催化剂。要想合作,你不仅需要了解自己的需求和愿望,还需要理解对方的需求和愿望。
Communication acts as a booster for cooperation. If you want to cooperate, you first need to understand not only what are your needs or desires, but also what are the other person's needs or desires.
这是托尔·格拉普尔的声音。直到最近,他一直领导着DeepMind的多智能体团队。托尔和同事们将协作型AI定义为帮助人类与机器找到提升共同福祉的方法。他认为,实现这一目标的能力是人工智能进化过程中的关键里程碑。
That is the voice of Tore Graple. Until recently, he led the multi agent team at DeepMind. Tore and his colleagues have defined cooperative AI as helping humans and machines find ways to improve their joint welfare. And building something capable of that goal is, he believes, a crucial milestone in the evolution of artificial intelligence.
我们所知的最高级智能是人类智能,而人类是超级合作者。在所有动物中,人类以最擅长合作著称。正是这样我们才建立了文明。如果我们想让机器达到这种智能水平,就需要教会它们合作。
The highest level of intelligence that we know of is human intelligence, and humans are super cooperators. Among all the animals, humans stand out as the species that is best at cooperation. You know, that's how we've built our civilization. And if we want to reach that level of intelligence with machines, then we need to teach machines to cooperate.
诚然,每当看到新闻中世界各地关于冲突、贫困和歧视的报道时,人类似乎并非合作的典范。但同样真实的是,我们人类最引以为豪的成就——科学、艺术、工程或文学领域的重大突破,几乎从来不是单打独斗的结果。铁路的发明,甚至新冠疫苗的研发,都需要大批人齐心协力。这难道更像是社会层面的智能,而非个人层面的智能吗?
Admittedly, whenever you read in the news about conflicts and poverty and discrimination around the world, it doesn't exactly seem as though humans are paragons of cooperation. But it's also true that the endeavours of which we humans are most proud, the biggest achievements of science, art, of engineering or literature, are almost never down to a single person. The invention of the railway, even the creation of the COVID-nineteen vaccine, they all required great swells of people pulling in the same direction. Is it almost like intelligence at the level of a society rather than just at the level of an individual?
没错。你也可以认为群体本身——无论是家庭还是整个国家——就是一个智能实体,因为其中的人们会为了共同目标而协作。
Yes. You could also take the viewpoint that the group itself, maybe a family or maybe a whole country, is an intelligent entity because people within it work together towards common goals.
这不仅仅是一个能为学术会议增色添彩的理论构想。
This isn't just a theoretical idea that will make a nice paper for an academic conference.
对于自动驾驶汽车,我们不希望出现这样的情况:你试图并入左车道,我试图并入右车道,结果两车互相阻挡,因为谁也不愿暂时放弃自身利益。
With self driving cars, we don't want to be in a situation where you're trying to merge into the left lane, I'm trying to merge into the right lane, and we block each other because neither of our cars is willing to seed their own self interest for a moment.
在环境保护方面,我们每个人都会想:如果我不减排,能有多大影响?但如果大家都这么想,集体层面上就会出大问题。
In the protection of the environment, individually, each one of us thinks, what difference does it make if I don't save on carbon emissions? But if we all think that way, then collectively, we will have a problem.
人们常讨论开发超级智能机器的可能性,类比而言也可以有超级协作型机器。
People talk about the possibility of developing super intelligent machines. The analog would be a super cooperative machine.
我是说,如果有个存在体专为取悦你而存在,那会相当方便。
I mean, it'd be quite handy to have an entity that existed only to make you happy.
是啊。
Yeah.
我们稍后会详细讨论这些。但首先让我们思考:如何教会智能体协作?部分答案在于它们通过强化学习技术进行训练的方式。如果你听过本播客第一季,可能已经熟悉这种机器学习方法。毕竟它正是DeepMind近年来多项重大突破背后的技术,包括AlphaGo在围棋中战胜人类选手。
We'll come on to all of that in a moment. But let's start with the question, how do you teach an agent to cooperate? Part of the answer lies in the way they're trained using a technique called reinforcement learning. If you listen to series one of this podcast, you may well be familiar with this approach to machine learning already. After all, it's behind many of DeepMind's major breakthroughs in recent years, including AlphaGo's victory over a human player in the game of Go.
不过值得快速回顾一下。乔伊娜·普里克普是DeepMind蒙特利尔办公室的负责人,数十年来一直致力于完善强化学习技术。
But it's worth having just a quick refresher. Joyner Prikup is head of DeepMind's Montreal office and has spent decades refining the technique of reinforcement learning.
强化学习的理念确实源自心理学和动物学习理论,人们认为奖励是动物学习执行特定任务的有效方式。当然,斯金纳是这一研究领域的先驱之一。
The idea of reinforcement learning originates really from psychology and animal learning theory, where people thought that rewards are a good way for animals to learn how to perform certain tasks. And of course, Skinner was one of the pioneers in this line of work.
B·F·斯金纳是美国著名心理学家,其以在鸽子身上运用正向强化的研究闻名。通过特制的按钮投食装置,他发现训练鸽子完成特定动作(比如逆时针转圈)异常简单——只需在动物开始朝正确方向动作(比如左转)时投喂奖励。饥饿的鸽子很快会意识到自己的行为能带来食物,从而强化该行为。整个过程最短仅需四十秒。
BF Skinner was a famous American psychologist, known amongst other things for his work using positive reinforcement in pigeons. Using a specially designed box that would release a treat at his push of a button, Skinner found that training pigeons to perform a task, like spinning around anticlockwise in a circle, was fantastically simple. You only needed to wait until the moment the animal started behaving in the right direction, like turning to the left, before offering a treat. A hungry pigeon soon realizes that its own actions are delivering treats and the behavior is reinforced. The whole thing can take as little as forty seconds.
使用奖励的优势在于能轻松向动物传达任务要求。同理,对于自动化智能体而言,当我们赋予数值化奖励时,也能轻松向其传达任务目标。
The advantage of using rewards is that you can communicate the task to the animal very easily. And similarly, if you have an automated agent, it might be able to communicate the task to that agent very easily if we give the agent these rewards which are numerical in our case.
与投喂鸽子的面包屑不同,AI的奖励是数字形式。听起来或许寒酸,但可以类比电子游戏中的积分:完成任务得+1分,失败则-1分。这些智能体被设计成只会追求积分最大化,而通过这种方式优化合作策略,标志着AI发展史上一次微妙转变。托尔·格拉普尔再次解释道:
Instead of giving AIs literal treats, the breadcrumbs that you'd fling to a pigeon, AIs are rewarded with numbers. Sounds a bit measly, but think of it as points in a computer game. A plus one for succeeding at a given task and a minus one for failing. The agents are built only to maximize the number of points they win, and optimizing for cooperation in this way marks a subtle shift in the history of AI. Here's Tore Graple again.
最初我们训练智能体解决单智能体任务,比如图像分类或迷宫导航。随后转向零和博弈场景——就像AlphaGo和AlphaZero这类围棋程序中,一方所得即另一方所失。而现在自然要研究混合动机场景,即智能体间的激励目标未必完全一致,却能找到绝佳合作方案。
At first, agents were trained to solve single agent tasks, say, to classify images or to navigate through a maze. We then moved on to what we call zero sum games, where one agent's gain is another agent's loss, Like, for example, in alpha go and alpha zero in these programs that play the game of go. And now the natural next step is to consider mixed motive scenarios where it's not quite clear how aligned agents are with one another in their incentives, but they still find great cooperative solutions.
当你的得分不仅取决于自身表现,还来自周围人因你行为获益时,合作行为就产生了。心理学家称这种人类特质为'社会价值取向'。研究合作型AI的科学家凯文·麦基这样解释:
Cooperative behavior arises when you get points not just for how well you do, but when the people around you also benefit from your actions. In humans, this is what psychologists call our social value orientation. Here's Kevin McKee, another research scientist working on cooperative AI to explain.
如果我在咖啡店,我是去拿当天的茶饮,但当我走到柜台时,我突然想到,汉娜其实很喜欢卡布奇诺。
If I'm at the coffee shop, I'm there to get my tea for the day, and then right as I get to the counter, I think, oh, you know, actually, I know Hannah really likes cappuccinos.
我
I
会多买一杯卡布奇诺带给你。我并不指望你下次会给我买咖啡。但我还是愿意这么做,因为我知道这能给你带来快乐。你可以理解为,社会价值取向与随机善举高度契合。
do. I'm gonna grab an extra cappuccino and bring it to you. And there's no expectation that you're going to get the cappuccino for me next time. But it's still feasible that I buy the coffee because I know that it'll bring you some happiness. You can think about, you know, essentially social value orientation maps pretty well to random acts of kindness.
但这些小事对吧?比如为别人扶门,或者公交车上让座之类的。
But they're little things, are they? Like holding the door open for somebody or I don't know, giving up your seat on the bus, that kind of thing.
这些小事属于社会价值增值行为。我个人认为这比表面更深层。当你与某人亲近时,你会重新定义自己的利益,把对方的利益也纳入考量。比如和伴侣或密友共进晚餐时尝试新菜品,如果独自用餐,我可能只关注自己是否喜欢这道菜。
Those small things would be social valumentation. Personally, I think it goes a little bit deeper than that. Anytime you get close with someone, you kind of redefine your own self interest in terms of the other person's interest too. If I'm going to dinner with a partner or a very close friend, and I decide to try a new dish. If I was by myself, then maybe the only thing I'd pay attention to is how much I liked that dish.
如果我很喜欢这道菜,那很好,下次来还会点。但如果是和伴侣、挚友或家人一起,假设他们非常讨厌我点的鱼料理气味,甚至感到难以忍受。那么我会参考这个反馈,决定下次外出就餐时直接避开这家餐厅。
And then if I really liked it, then okay. Great. Next time, I'll make sure to order this again. But actually, if I'm going out with my partner or a very close friend or a family member, let's say they like really don't like it, even the smell of the fish dish that I just ordered, it's kind of driving them crazy. Then actually, I'll probably integrate that feedback and decide, actually the next time we go out, we won't even go to that restaurant if it like really drove them crazy.
这其实是在描述强化学习机制。我不仅考虑自身获得的奖励,还整合了他人的反馈,并据此调整未来行为的概率。
And I'm describing reinforcement learning. I'm taking normally what would only be my reward, but in this case is the reward of someone else. And I'm using it to modify the likelihood that I take an action in the future.
我对沙丁鱼就是这种感觉。我丈夫特别爱吃,但我实在不想在餐桌上看到它们。不过他会顾及我的感受,所以从不点这道菜。
I have exactly that with sardines. My husband loves eating them, but I'm like, I don't wanna look at that while I'm at a table. But he takes my feelings into account and doesn't order them.
这就是社会价值取向。
That's social value orientation.
明白了。那你们是如何将这种观念灌输给智能体的呢?
Okay. So how do you instill that in agents then?
环境本身就已经为每个智能体提供了利己主义的奖励机制。
So the environment provides selfish reward already to each agent.
当我丈夫享用一盘美味的沙丁鱼时,他会获得多巴胺刺激,这相当于他自身奖励函数的+1分。
When my husband eats a plate of delicious sardines, he gets a little dopamine hit, a plus one for his own reward function.
我们在智能体中构建社会价值取向的方式是:让它们也能感知其他智能体获得的奖励。
And so the way that we kind of build social value orientation into our agents is we also expose the reward that other agents are receiving.
让晚餐伴侣倒胃口就该扣1分。但问题在于:这两件事真的都只值1分吗?我们究竟该在多大程度上考虑他人的意见?
Minus one point for putting your extremely delightful dinner partner off her meal. But here's the question. Should they really both be worth one point each? How much should you factor in the opinions of others?
智能体的自私程度在很大程度上是由我们控制的。
The selfishness of agents is to a large degree under our control.
又是多尔·格雷伯。
Dore Grable again.
举个例子,你可以设计一个智能体,当其他智能体或人类进入房间时,它唯一关心的就是那个实体的福祉。
So you can, for example, design an agent that when some other agent or human comes into the room, all they care about is the well-being of that other entity.
一个纯粹利他、近乎自我牺牲的智能体可能只会关注其他智能体的回报。
A purely altruistic, almost self sacrificial agent might just pay attention to the other agent's reward.
这就像你有一个自私程度调节旋钮。对吧?往一边转它们就只为服务他人而活,往另一边转它们就变得彻头彻尾地自私。我们稍后会再讨论这个旋钮,因为我想谈谈你们当中耳朵尖的人可能已经注意到的问题——基于其他智能体欲望来设定奖励函数存在一个根本问题。
It's almost like you got a little selfishness dial. Right? It's like turn it one way and they only live to serve, and turn it the other way and they're like fundamentally, totally and completely selfish. We're gonna come back to that dial in just a moment because I want to pick up on something that the owl eared among you might have already spotted. Because there's one problem with basing your reward function on another agent's desires.
你不可能总是准确判断他人的感受。我们不会顶着霓虹灯招牌到处走,宣告自己有多开心或多难过。虽然面部表情能提供些线索,但表情是可以伪装的。对吧?我可以假装觉得那盘意面美味绝伦,实际上却认为那是我吃过最恶心的东西。
You might not always be right about how they're feeling. It's not like we walk around with a neon sign declaring how happy or sad we are. I mean, I suppose we have our facial expressions which are a bit of a clue, but you can lie. Right? I could pretend that I thought the pasta dish was absolutely delicious when really I thought it was the most disgusting thing I've ever tasted.
没错。最终可能导致你为了不伤害我的感情而说'我爱死这盘意面了',结果我们不得不反复光顾那家餐厅——尽管这根本不是我们任何一方想要的。
Yeah. We could end up in a situation where you don't wanna hurt my feelings. And so you say, I love the pasta dish. And we end up going to that restaurant several more times even though it's not what either of us would want.
那么对于智能体来说,它们其实在很多方面都会互相窥探对方的答案。
So with agents then, they're sort of peeking at the answers for each other in a lot of ways.
没错。有些人会认为这是作弊行为。关键挑战在于如何构建一个能推断其他智能体奖励机制的系统。随之而来的挑战则涉及欺骗行为,这在谈判中经常发生。
Yep. Some people would think about that as cheating. And so a key challenge would be how to build a system that can try to infer those rewards for other agents. And then the concomitant challenges that kind of arise there would be around deception. This happens frequently in negotiations.
对吧?所以你要尽量不暴露自己的真实偏好,这样才能引导对方——另一个智能体——去做你想让它做的事。
Right? So you kind of don't give away what your actual preferences are so that you can try to nudge the other person, the other agent to doing what you would like them to do.
举个例子,其实我并不介意吃意大利面,但我会假装觉得它很恶心,因为我真正想去的餐厅离我家更近。我知道你会考虑我对意大利面的感受,这样我就能利用你对奖励函数的理解来欺骗你,操纵你去做你本不想做的事。
So like, for example, actually, don't mind the pasta dish, but I'm gonna pretend to you that I think it's disgusting because the restaurant that I really want to go to is much closer to my house. And knowing that you'll take into account my feelings about the pasta dish, actually, I can manipulate you into doing something that you don't really wanna do just because I know how your reward function works, and I can deceive you about my internal state.
正是如此。
Exactly.
我现在描述的就是上周末和我丈夫的经历。既然有能力编写利他主义算法,他肯定很想把参数调到最大值看看效果。但事实证明,纯粹的利他主义可能并不是让整个智能体群体合作的最有效方式。
I'm just describing my weekend with my husband now. So with the power to code an algorithm for altruism, surely he must be tempted to turn the dial up to the maximum and see what happens. But it turns out that pure altruism might not be the most effective way for whole groups of agents to cooperate.
假设你和我都是完全利他主义的。我们站在门的两侧,都想进入对面的房间。如果我只关心让你先过门,而你只关心让我先过门,那我们就会僵持在原地很久。
You and I, let's say, we're both perfectly altruistic. We are on opposite sides of a door and trying to enter into the room across from us. If I just care about you walking through the door first, and you just care about me walking through the door first, then we'll just sit there for a while.
完美的利他主义让你寸步难行。在将这些算法应用于现实场景时,比如自动驾驶汽车,这可能是个大问题。
Perfect altruism gets you nowhere fast. And that can be quite a problem when it comes to using these algorithms in real world scenarios, like self driving cars.
想象一个十字路口,一方需要让行。如果车辆只是停在那里不知所措,对谁都没有好处。这种情况下两辆车必须通过协商决定谁先谁后。
Just think about an intersection where one needs to yield to the other. It's really not in anyone's interest that the cars would just stand there and not know what to do. So somehow two cars in that situation would have to negotiate and figure out who goes first and who goes second.
设想2050年,你身处繁华大都市,AI驾驶的汽车穿行于街道,与行人、自行车和其他车辆共享道路。如果这些汽车被设计成完全礼让其他道路使用者,那么只要有行人走到车前,车辆就必须停下。一辆无私的车会不惜一切代价避免碰撞。但接下来会发生什么?
Imagine the year is 2050. You're in a busy metropolis, and AI driven cars fill streets, snaking through traffic alongside pedestrians, bikes and other vehicles. If those cars are designed to be wholly considerate of other road users, it simply must be the case that a pedestrian stepping out in front of one will cause it to stop. A selfless car would avoid collision at all costs. But then what happens?
整个道路的平衡将被打破,因为行人会突然觉得自己随时可以横穿马路,而他们的行为将迫使自动驾驶汽车每次都按照行人预期的方式做出反应——紧急刹车。这里需要强调的是
The whole equilibrium of the road would change because suddenly the pedestrians would feel empowered to cross at any time, and the very fact that they are forces the self driving cars to react in exactly the way that is expected of them by those pedestrians and brake every single time. It's important to say here
DeepMind的研究人员实际上并非在研发完全无私的无人驾驶汽车,而是非常谨慎地权衡利己与利他的平衡。因为在现实世界中,大多数情况都需要两者的微妙结合。就像足球队里,每个球员既想帮助球队获胜,又渴望自己破门得分。再比如与同事安排会议——你们都希望会议举行,但各自希望时间对自己方便。研究人员称这类情况为混合动机场景,它们构成了日常生活中的大多数互动。
that the researchers at DeepMind aren't actually working on selfless driverless cars for the future, but they are thinking very carefully about this balance between selfishness and altruism, especially because in the real world, most situations involve a delicate combination of both. Just think of football team where each player has an incentive to help their team win the game, but also wants to be the one to score the goal. Or even arranging a meeting with a colleague. You both want the meeting to take place, but would like it to happen at a time that suits you. Researchers call these mixed motive scenarios, and they're the ones we encounter most in everyday life.
但还存在更棘手的情况,它们会诱使人们表现出更自私的行为。
But there are other trickier situations, which encourage people to behave in more selfish ways.
社会困境会直接激励个体采取自私行为。但如果每个人都如此自私,集体就会遭殃。这种例子在人类活动中随处可见,最典型的就是环境保护。每个人都会想:我不减排又能怎样?但如果所有人都这么想,我们就会面临集体性的问题。
Social dilemmas directly incentivize the individual to behave in a selfish way. But if everyone behaves in that selfish way, then the collective will suffer. And you can see examples of that everywhere in human endeavor, most crucially in the protection of the environment. Individually, each one of us thinks, what difference does it make if I don't save on carbon emissions? But if we all think that way, then collectively we will have a problem.
里面有这么一句话:‘只是一个塑料瓶而已’,70亿人如是说。妙啊。这个特殊的困境被称为‘公地悲剧’,是经济学中一个被深入研究的现象。当个人利益与集体利益发生冲突时,就会产生这种情况。当然,我们都想保护环境,但拥有一辆新车或在九月份开着暖气确实很惬意。
There's that phrase in there, it's just one plastic bottle, said 7,000,000,000 people. Nice. This particular dilemma is known as the tragedy of the commons, and it's a well studied phenomenon in economics. It arises when the incentives of the individual are in conflict with what's best for the group. Of course, we all want to protect the environment, but it's quite nice owning a new car or having the heating on in September.
当其他人似乎都没有拒绝这些诱惑时,要自己放弃确实很难。结果就是我们全都输了。托尔和他的同事可以在模拟中运行这个场景的简化版本——一种人工智能的培养皿——看看是否有办法鼓励更多合作。在人工智能版的‘公地悲剧’中,智能体以小圆点的形式在网格世界中移动,每次吃到苹果都会获得正向奖励。
And it's really hard to turn down all of those things when nobody else seems to be doing so. So as a result, we all lose. Tore and his colleague can run a simplified idea of the same scenario in a simulation, a kind of AI Petri dish, to see if there are ways to encourage more cooperation. In the AI version of the Tragedy of the Commons, agents move around as little dots in a grid world and receive a positive reward every time they eat an apple.
这些苹果生长在小片区域里。如果你只吃掉部分苹果,它们会重新生长。所以如果谨慎采摘,你就能永远拥有吃不完的苹果。但一旦你毁掉整片苹果地,那里就再也不会长出任何东西了。
These apples grow in little patches. And if you eat only some of the apples, they will regrow. So if you harvest carefully, you can have apples and apples into eternity. But once you destroy the whole patch of apples, nothing will ever grow there anymore.
如果在这个世界里放入一个追求奖励最大化的智能体,它很快就会明白:要确保未来的苹果供应,就必须在每个区域留下一两个苹果。但当两个或更多智能体生活在这个魔法果园时会发生什么?
If you were to put a single reward maximizing agent into this world, they would soon realize that if they want to ensure their future supply of apples, they will always have to leave one or two in each patch. But what happens when two or more agents live in this magical orchard?
现在情况变得复杂得多,因为它们都需要学会留下几个苹果是有益的。最理想的情况是它们能意识到摘走区域里最后一个苹果是被禁止的。现在的问题是:我们能帮助它们发现这些规范吗?一种方法是在环境中建造围墙,让它们各自生活在自己的小领地里——这样它们就能像最初单个智能体那样,重新在领地内可持续地行动。当然,我们在社会中对这种现象有个称呼。
It's much harder now because they all need to learn that it's good to leave a few apples. The best thing would be if they could realize that it's forbidden to take the last apple from the patch. And now the question is, can we help them discover these norms? And one way to do this is if you now build walls within this environment so that they all live in their little territory, then they can act sustainably within that territory again because that's like the first case where there's just one agent. And, of course, we have a name for that in society.
这就是私有财产。一旦土地私有化,所有者就有动力以可持续的方式经营它。
It's private property. As soon as it is a private piece of land, then the owner has an incentive to work with it in a sustainable way.
但利他主义调节钮并不是唯一的杠杆。还有其他方法可以鼓励合作,比如自上而下地制定规范或规则。但如果本着强化学习的精神,你希望智能体自行摸索出合作方式呢?托尔和同事们在训练七个智能体玩策略桌游《外交》的某个版本时,就测试过这个想法。我对玩《外交》的记忆充满创伤,因为这游戏几乎总是以争吵收场。
But the altruism dial isn't the only lever. There are other ways to encourage cooperation, having norms or rules imposed from above being one way. But what if, in the true spirit of reinforcement learning, you want agents to work out how to cooperate by themselves? This is an idea that Tore and his colleagues tested when they trained seven agents to play a version of the strategic board game diplomacy. I'm very scarred from my memories of playing diplomacy because it almost always ends in an argument.
我并不感到意外。我十几岁时和朋友玩过这个游戏,至少在开始玩的时候我们还算是朋友。
I'm not surprised. I played it as a teenager with friends, or at least they were friends when we started playing.
外交游戏在一块绘有欧洲大地图的棋盘上进行,背景设定在第一次世界大战前夕。每位玩家扮演一个大国角色,如法国、奥地利、匈牙利、英国或俄罗斯。目标是通过在棋盘上移动、结盟、占领领土,最终击败对手。
The game of diplomacy is played on a board painted with a big map of Europe, set in the years leading up to the Great War. Each player takes on the role of one of the great powers, France, Austria, Hungary, England, Russia. The aim is to move across the board, form alliances, capture land, and ultimately beat your opponents.
这是测试人工智能的绝佳平台,因为它本质上是合作能力的竞赛。玩家需要在可靠盟友与最终必须独自获胜之间保持平衡——他们必须懂得何时该退出这些联盟。
It is a good test bed for AI because it's effectively a competition in your skill to cooperate. The players need to walk that line between being reliable alliance members, But because they can only win alone in the end, they also need to understand at what point they need to leave those alliances.
外交游戏对人工智能而言是出了名的难攻克。不仅每回合可能有七位玩家做出近乎无限种行动组合,游戏本身更是合作与竞争动态的复杂融合体。这些外交游戏智能体通过强化学习算法训练,每个玩家会评估游戏中每个局面的价值——本质上是他们获胜的概率,其目标是采取能提升这个价值并推进目标的行动。
Diplomacy is a notoriously challenging game for AI to play. Not only are there up to seven different players who could perform an almost infinite number of moves every turn, but the game is a complex fusion of cooperative and competitive dynamics. The diplomacy playing agents were trained using a reinforcement learning algorithm. Each player assigns a value to each situation in the game, which is essentially the probability of them winning. Their goal is to make moves that will increase this value and further their objectives.
值得注意的是,托里和他的同事们发现,这七位玩家开始在没有被明确教导的情况下自发相互合作。
Remarkably, Tory and his colleagues noticed that the seven players were starting to cooperate with each other without being explicitly taught to do so.
我们最初实验的是智能体间无法交流的游戏版本。但即便在这种设定下,我们仍观察到它们会相互支持对方的行动。
We experimented first with a version of the game where there is no communication between the agents. But even in that setting, we see that they support each other's moves.
在外交游戏中支持某个行动,其含义与二十世纪初欧洲的军事支援类似——即派遣部队支援另一名玩家的进攻行动。比如说
To support a move in diplomacy means pretty much what it might have meant in early twentieth century Europe, to lend some troops and back up another player's invasion. Say, for instance
有人在柏林有个单位想迁往慕尼黑,他们需要奥地利人的支持。攻击者(如果你愿意这么称呼)需要完成从柏林到慕尼黑的调动,并把这个计划写在他们的行动清单上。而奥地利方也需要在清单上注明:'我们在维也纳的单位支持从柏林到慕尼黑的调动。'你看这需要多少协调工作。这些事情绝非偶然发生的。
Someone has a unit in Berlin and wants to move into Munich, and they need the support of the Austrians. The attacker, if you like, needs to make the move from Berlin to Munich, and they need to write that on their little sheet of what they want to do. And the Austrian party needs to write on their sheet, my unit in Vienna supports the move from Berlin to Munich. You see how much coordination that requires. These things don't happen by chance.
他们能意识到合作会带来长期成功的最佳机会,这个想法很疯狂。因此即使当时对特定代理人没有直接利益,他们仍会参与其中。是的,正如托尔提到的,目前玩外交游戏的智能体都在处理一个简化版本——无新闻模式,他们无法通过沟通来谈判或达成明确协议。这主要是技术原因,因为事实证明这确实很难实现。
That's a crazy idea that they can recognize that cooperating will give them the best chance at long term success. And so even if in that moment, it doesn't directly benefit that particular agent, they'll still engage in it. Yes. As Tore mentioned, so far, the agents playing diplomacy have been tackling a simpler version of the game known as no press, where they are unable to communicate with each other in order to negotiate and make explicit agreements. This is mostly for technical reasons because it turns out it's really hard.
但研究人员希望未来能加入某种形式的通讯功能。
But researchers would like to add in some form of communication in the future.
初步阶段的通讯可能不会采用完整自然语言,而是更简单的形式——比如直接发问'要结盟吗?',对方回答'是'。当然最终目标是让这些智能体能与人类对战。
And that communication probably in the first step wouldn't be full natural language. It would probably be simpler things, like maybe just a statement, do you want to form an alliance? And the other agent could say, yes. But the ultimate goal, of course, would be for these agents to play the game with humans.
当你开始引入稍微复杂的通讯形式时,你预计这些智能体会变得狡猾吗?
Once you start introducing slightly more sophisticated forms of communication, do you expect these agents to become devious?
我们预计它们会采取对自身长期最有利的行动,这可能包括欺骗。它们可能嘴上说一套,实际做另一套来获取优势。但或许它们也会明白,长期说谎会丧失信誉——如果谎言太多,其他智能体就不再理会其言论,甚至惩罚说谎行为。真正优秀的智能体最终可能会形成至少在多数时候保持诚实的策略。
We would expect them to do what's best for them long term, and that might include deception. They might say one thing, but then they would behave in a different way and try to get an advantage through that. But maybe they will also learn that in the long term, lying will cost them their credibility, and if they lie too much, other agents will not pay attention to what they say anymore or even punish them for lying. Maybe the really good agents will actually arrive at a strategy that would, at least most of the time, tell the truth.
我想人类社会的理念也是如此:说谎固然可能,但存在说真话的压力,因为长期来看诚实才是更好的策略。或许偶尔...
That is, I guess, the idea with humans that, of course, lying is possible, but there is pressure to tell the truth because in the long term, it's a better strategy. Maybe just a little lie every now and
然后。是的。
then. Yeah.
研究人员使用游戏的原因之一是为了理解智能体在安全环境中的行为。但欺骗的可能性确实引发了关于人工智能长期在现实世界中如何部署的问题。正如我们在上一集中听到的伦理研究员劳拉·魏丁格所言,当人工智能进入现实世界时,不应允许其欺骗他人。
One of the reasons that researchers use games is to understand how agents behave in a safe environment. But the possibility of deception does raise questions for how AI is deployed in the real world longer term. According to ethics researcher Laura Weidinger, who we heard from in the last episode, when AI reaches the real world, it should not be allowed to deceive others.
例如,如果你向这个AI询问另一个人的银行信息,它可以直接回答‘我不会提供这些信息’。更广泛地说,我认为能够欺骗的AI确实存在真实风险。这对人类自主权构成威胁——如果AI系统欺骗我,我可能会被操纵去做本不会做的事。当然在基础研究中,特别是游戏领域,我们可能需要开发类似欺骗的能力。
For example, if you were to ask this AI about the bank details of another person, it could just say, I will not give you this information. I think more generally, I see real risks associated with AIs that can deceive. It posits a risk for human autonomy. If the AI system deceives me, I could be manipulated to doing things I wouldn't otherwise do. Of course, in fundamental research, like in particular games, we may want to develop something like deception.
这可能带来重要洞见。但对于公开发布的AI而言,我尚未发现任何需要AI具备欺骗能力的应用场景。
This could give us some important insights. But in terms of AI that is publicly released, I haven't yet seen an application where it would be desirable for an AI to deceive.
在本期节目中,我们主要探讨了AI智能体之间的互动。但正如开头提到的,这并非唯一值得关注的合作关系。未来很可能需要大量人工智能与人类之间的协作。
So far in this episode, we've mainly explored how AI agents interact with each other. But as we heard at the start, those aren't the only partnerships worth considering. The future is also likely to require quite a lot of cooperation between AI and humans.
在现实世界中,很少会出现AI智能体明显更擅长某项任务的情况,即便有也是高度专业化的领域。但人机协作团队往往能做得更好。以放射科医生为例,我们现在可以训练出非常擅长医学影像分类的AI。但你认为AI会取代放射科医生吗?
In the real world, it's rarely the case that there's a task where an AI agent is clearly better, or if so, it's a very specialized task. But often the team of humans and artificial intelligence can do it better together. Just think about a radiologist. We can now train AIs that are very good at classifying these medical images. But do you think that AI's will replace radiologists?
不会。它们会让医生做得更好。因为这份工作还有其他部分——与患者沟通、理解整体治疗方案。
No. They make them better. Right? Because there are other parts of their job, to talk to the patient, to understand the bigger picture of treatment.
斯坦福大学的放射科医生柯蒂斯·朗格瓦曾说过一句名言:AI不会取代放射科医生,但使用AI的放射科医生会取代那些不使用AI的同行。但人类与AI合作的实际体验究竟如何?其实我们许多人日常生活中已经在这样做了,比如与智能音箱对话或用人脸识别系统整理照片。但如果让AI和人类一起下厨会怎样呢?
There's that famous quotation by Curtis Langlois, a radiologist at Stanford. AI won't replace radiologists, he says, but radiologists who use AI will replace radiologists who don't. But what's it actually like for humans to cooperate with AI? Well, many of us already do this in our daily lives, when we talk to our smart speaker or use facial recognition systems to organize our photos. But what would happen if an AI and a human tried to cook a meal together?
你马上就能知道答案了。当我穿上厨师服,在一款名为《胡闹厨房》的合作烹饪游戏中与AI组队时,以下是凯文·麦基的解说。
Well, you are about to find out. Here's what happened when I donned my chef's whites and joined an AI in a collaborative cooking game called Overcooked. Here's Kevin McKee to explain.
两名玩家需要搭档合作,在共享的厨房空间里完成备餐上菜。首先要拿取食材,通过切配、入锅烹饪等步骤,最后装盘上菜。听起来很简单?但如果你曾与家人或伴侣一起下厨就会明白,尤其在时间压力下保持心平气和本身就是种挑战。或许我们未必会真的和AI系统一起做饭,但我们确实期待当AI投入现实世界后,能与之进行紧密协作。
So two players partner together, they have to prepare dishes to serve in a kitchen, and you are fully sharing the kitchen space. You first have to grab ingredients, you have to prepare them by, let's say, chopping them up, putting them in a pot, allowing them to cook, and then serving them on a dish. You might say this is relatively simple, but actually if you've ever cooked with a family member or partner, you know that especially if you're under time pressure, it can be a challenge to kind of keep cool tempers. And so maybe we won't necessarily be cooking with our AI systems, but certainly we hope that we'll be collaborating with them in close proximity once we deploy them to the real world.
我向来不惧挑战,于是立即启动游戏准备开始。好吧,这就是我的角色——我的小厨师。说真的,这造型还挺有范儿。
Now I'm never one to shy away from a challenge, so I fired up the engine and let the game begin. Alright. Actually oh, here I am. That's my little chef. Not being funny, but my chef's got a lot of swag.
这顶软塌塌的帽子挺酷的。现在提示我的厨师即将进入厨房,这是个相当简陋的厨房,画面水准大概像是1998年的电脑游戏。
This pretty cool floppy hat. Now, it's telling me my chef is gonna be in a kitchen. It's quite a simple kitchen. We're talking maybe circa 1998 computer graphics.
你需要通过努力才能解锁更高级的厨房。
You have to kinda earn your way to the more advanced kitchen.
游戏里这个像素风格的矩形厨房,右侧堆着番茄,中间放着煮锅,左侧是出餐台。通过键盘操作,目标是拿起番茄放入锅中烹饪,待番茄汤完成后送至出餐台。每完成一道菜就能获得10美分奖励。我先自己玩了一轮练习模式熟悉操作。
The game has a pixelated rectangular kitchen with a stack of tomatoes on the right, a cooking pot in the middle, and a serving station on the left. Using keyboard keys to navigate, the objective is to pick up the tomatoes, put them in a pot to start cooking, and once they're ready, take the freshly made tomato soup to the serving station. Delicious. My reward, a bonus of 10¢ for every dish delivered. First, I played a practice round by myself to get the hang of it.
好的。我们去摘个番茄。哦,走过头了。好了。对。
Right. Let's pick up a tomato. Oh, gone too far. There we go. Right.
所以我得去拿我的汤。真不错。然后放在供应台上。太好了。很简单。
So I've gotta go get my soup. Lovely. And pop it on the serving station. Great. That was easy.
看来你一直在练习。这个
It seems like you've been practicing. This
是编号。
is No.
相当不错。
Pretty good.
然后凯文让我和两个不同的人工智能队友组队。其中一个长得和我一模一样。他们留着长长的红发,穿着橙色西装。所以我期待与他们对抗,或者说合作,因为我们是一个团队。虽然我是在比赛后才得知,第一个AI是通过与自己的复制品多次对战训练出来的。
Kevin then paired me up with two different AI co players. One of them looks exactly like me. They have long red hair, and they're wearing an orange suit. So I'm looking forward to going up against them or with them, I should say, because we're a team. Although I didn't know this until after playing, the first had been trained by playing the game with a replica of itself lots of times.
这种策略在围棋或国际象棋等竞技类游戏中效果很好,但并不总是有利于合作。好了。我们开始吧。来玩吧。哦,天哪。
This is a strategy which works well in competitive games like Go or chess, but it's not always conducive to cooperation. Okay. Here we go. Let's play. Oh, crikey.
他们动作真快。等等。冷静点。抱歉。稍等。
They're fast. Hang on. Chill out. I'm sorry. Hang on.
等一下。等一下。我想参与进来,但他们总挡着我。借过。谢谢。
Hang on. Hang on. I'm trying to get involved, but they keep blocking me. Excuse me. Thank you.
他们确实比我快多了。我是说我们表现还行,但我感觉自己贡献不大。哦等等,我拿到菜了。稍等。
They're definitely a lot faster than I am. I mean, we're doing well, but I I wouldn't say I felt like I was contributing fairly. Oh, hang on. I've got the dish. Hold on.
经过90秒的游戏,我们总共给饥饿的顾客送出了四道菜。成绩不差,但我的贡献有限。我的第二位队友接受过与不同风格玩家配合的训练,理论上应该更擅长合作。哦,你慢多了对吧?好了。
After ninety seconds of gameplay, we managed to deliver a total of four dishes to our hungry customers. Not a bad score, but I can't say I played much of a role in that. My second teammate was trained on a range of partners with different playing styles and was, in theory, the more cooperative one. Oh, you're much slower, aren't you? There we go.
继续。该你行动了。这个队友比上一个慢很多,但挡路次数少多了,这点我很满意。哦等等,我又拿太多番茄了。
Go on. You take your turn. This partner in comparison to the other one is a lot slower, but they're getting in my way a lot less, which I'm enjoying. Oh, hang on. I've got too many tomatoes again.
其实我觉得这次团队配合稍微好点了,可能因为我们俩水平半斤八两。
So actually, I sort of think we're working slightly better as a team this time, but maybe it's because we're just both equally rubbish.
上次你们完成了四道菜,这次只有三道。
So last time you got four dishes and this time you got three.
嗯,我是说,问题在于我该选择哪个搭档,第一个还是第二个?实际上,虽然我们和第一个搭档赢了比赛,但我更享受与节奏更匹配的第二个搭档合作的体验。凯文,你觉得这是个不寻常的选择吗?
Well, I mean, here's the thing that's asking me which should I prefer, the first partner or the second? The reality is even though we won with the first partner, I enjoyed the experience more with the one who is better matched to my speed levels. Is that an unusual choice, do you think, Kevin?
不,我认为这是常规选择。你觉得搭档在游戏过程中更能响应你的行为方式。而正如你所说,第一个搭档就像在疯狂地冲向所有番茄。
No. I I think that's the usual choice. You felt that your partner was more responsive to your behavior in the way that you were playing. The first one was, as you're saying, kind of making a mad dash for for all of the tomatoes.
老实说,我玩着玩着就完全投入这个游戏了。当我完成番茄汤制作后,凯文向我解释了为什么我可能更喜欢与第二个搭档合作。传统训练AI玩《星际争霸》和围棋的方法是让它们通过无限自我对弈来提升水平。
It's fair to say I got pretty into this game after a while. When I was finished making tomato soup, Kevin told me more about why I might have preferred playing with the second partner. Whereas the traditional approach to training AI to play games like StarCraft and Go has relied on getting them to play endless games against themselves in order to get as good as possible.
在合作情境中,搭档如何调整行为与你保持一致可能非常重要,因此我们应该更关注这一点。
For cooperative context, the way that your partner manages to maybe align their behavior with yours probably matters a lot, and so we should be paying more attention to it.
因为胜利不仅仅取决于团队中有最优秀的玩家,更在于团队协作。
Because winning is not just about having the best player on your team, it's about working together.
正是如此。
Exactly.
你可能已经注意到,目前很多关于合作的研究都是在模拟环境或游戏场景中进行的。
One thing you might have noticed by now is that a lot of this cooperation work is currently being done in simulation or in gaming scenarios.
在模拟环境中绝对安全,不会造成任何损坏。再次尝试Tory Grapple。我们能完整读取所有情况,但最终我们希望智能体能在现实世界中运作。而现实世界总会与模拟环境不同——除非你相信我们本就生活在模拟中。
In simulation, it's absolutely safe. Nothing gets broken. Tory Grapple again. We have complete readout of what's happening, but then we want our agents ultimately to work in the real world. And the real world will always differ from simulation unless you believe we're living in a simulation.
那么我们该如何克服这种模拟与现实的差距?有几种通用策略:其一当然是尽可能让模拟精确反映现实;另一种策略是尽可能增加模拟的多样性,这样训练出的智能体就不会因环境细节而改变行为模式。
So how do we overcome this simulation to reality gap? There are different general strategies. One, of course, is to make the simulation as accurate in reflecting reality as we can. Another strategy is to create as much diversity in simulation as we can, so that the resulting agents are not changing their behavior depending on details of the environment.
你认为智能体能在模拟中学会真正意义上的合作吗?
Do you think that an agent can learn to be really cooperative in simulation?
所谓'真正合作'的问题可能有些转移焦点。只要智能体的行为对合作双方都有利,我就会称之为合作。我不会深究它们是否真心实意。这让我想起让儿子修剪草坪时的情形,他总是开玩笑地说'好吧'。
The question of this true cooperation is maybe a bit of a red herring. If the agents behave in a way that is beneficial for their cooperation partner and for themselves, then I would call it cooperation. I wouldn't then drill down on the question if they meant it. It reminds me a little bit of my son when I ask him to mow the lawn. We always joke, he says, okay.
他会说'我...我会做的',然后我就说'而且你得乐在其中'。因为,确实,这要求太高了。
I'll I'll do it. And then I say, and you have to enjoy doing it. Because, yeah, that's asked too much,
不是吗?但即使像托尔和凯文这样的研究者能成功让智能体在模拟中合作,他们仍需跨越托尔提到的'模拟与现实鸿沟'。毕竟现实世界混乱复杂,充满不可预测的因素和意外状况。有研究者认为,只有让AI具身化于现实世界,才能真正实现通用人工智能。下期节目我们将探访我最喜欢的地方之一——DeepMind机器人实验室。
isn't it? But even if researchers like Tore and Kevin are highly successful in getting agents to cooperate in simulation, they will still need to bridge the simulation to reality, or sim to real gap that Tore alluded to. After all, the real world is messy, full of unpredictable actors and unforeseen circumstances. And there are researchers who believe that it's only by having embodied AI in the real world that true artificial general intelligence can be achieved. In the next episode, we're gonna be paying a visit to one of my favorite places, the DeepMind robotics lab.
看起来像是个醉酒的机器人。它试图倒着走,但现在已经完全放弃了。那种沮丧的样子真可怜。以上就是本期由汉娜·弗莱主持、Whistledown Productions丹·哈杜恩制作的DeepMind播客,我们下期再见。
Looks like it's a drunk robot. So it's trying to walk backwards, but it's sort of it's just given up. It's given up. It's really forlorn. That's next time on the DeepMind Podcast presented by me, Hannah Fry, and produced by Dan Hardoon at Whistledown Productions.
特别感谢Good Kill Music为《Overcooked Game》制作了这首朗朗上口的主题曲,我们在本期节目中使用了它。如果你喜欢这个系列,请尽可能给我们留下评分或评论。再见。
Special thanks to Good Kill Music who produced the catchy tune for The Overcooked Game, which we used in this episode. If you've been enjoying this series, please do leave us a rating or review if you can. Goodbye.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。