Google DeepMind: The Podcast - 生活就像一场游戏 封面

生活就像一场游戏

Life is like a game

本集简介

电子游戏已成为人工智能研究者测试系统能力的首选工具。在本期节目中,汉娜将坐下来体验《星际争霸II》——这款高难度游戏要求玩家每分钟操作多达800次点击来控制屏幕上的行动。前职业星际选手、DeepMind研究科学家奥里奥尔·维尼亚尔斯担任她的向导,并解释AlphaStar程序如何学会这款游戏并击败顶尖职业选手。此外,她还探索了在数字版经典游戏"夺旗战"中学习协作的系统。 若对本系列有任何疑问或反馈,请通过Twitter(@DeepMind并使用标签#DMpodcast)留言或发送邮件至podcast@deepmind.com。 延伸阅读 《经济学人》:为何AI研究者青睐电子游戏 DeepMind博客:夺旗战与AlphaStar 职业星际选手MaNa谈AlphaStar与DeepMind印象 OpenAI关于Dota2的研究 《纽约时报》:DeepMind现在也能在多人游戏中击败人类 英国皇家学会:机器学习资源 DeepMind:AlphaStar幕后故事 安德烈·卡帕西:深度强化学习——从像素到乒乓球 受访者:研究科学家马克斯·雅德伯格、莱娅·哈德塞尔;首席研究员戴维·西尔弗、奥里奥尔·维尼亚尔斯;研究总监科莱·卡武克乔格鲁。 制作名单 主持人:汉娜·弗莱 编辑:戴维·普雷斯特 高级制作人:路易莎·菲尔德 制作人:艾米·拉克斯、丹·哈杜恩 立体声效:露辛达·梅森-布朗 音乐作曲:埃莱妮·肖(桑德·迪勒曼与WaveNet协助) DeepMind委托制作 若喜欢本期节目,请在Spotify或Apple Podcasts留下评价。我们始终期待听众的反馈——无论是意见、新想法还是嘉宾推荐! 由AdsWizz旗下Simplecast平台托管。个人信息收集及广告用途详见pcm.adswizz.com。

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

人工智能正逐渐渗透到现代生活的方方面面。它存在于我们的智能手机中、中央供暖系统里、餐具柜上,以及汽车内部。但通用人工智能又如何呢?这才是真正的追求目标——构建一个能够从零开始学习解决任何问题而无需被教导的智能体或算法。

Artificial intelligence is slowly appearing in every aspect of our modern lives. It's in our smartphones, our central heating, on our sideboards, and in our cars. But what about artificial general intelligence? That is the real quest. The aim to build an agent, an algorithm that can learn to solve any problem from scratch without being taught how.

Speaker 0

欢迎收听DeepMind播客,我是汉娜·弗莱。作为一名与算法打交道近十年的数学家,在本系列播客中,我们将追踪人工智能领域快速发展的故事。过去十二个月里,我们持续关注着伦敦DeepMind的科学家、研究员和工程师们的最新工作。

Welcome to DeepMind, the podcast. I'm Hannah Fry. I'm a mathematician who has worked with algorithms for almost a decade. In this series of podcasts, we're following the fast moving story of artificial intelligence. For the past twelve months, we've been tracking the latest work of scientists, researchers, and engineers at DeepMind in London.

Speaker 0

我们将探讨他们如何推进人工智能科学,以及整个领域当前面临的一些棘手决策。无论您是想了解技术发展趋势,还是希望在自己的AI探索中获得启发,这里都是理想之选。上期节目中我们谈到,让人工智能与国际顶尖选手对弈象棋和围棋,其意义远不止于展示计算机的能力——人类选手可以通过观察AI的策略来提升自身水平,而这背后还有更宏大的图景。

We're looking at how they're approaching the science of AI and some of the tricky decisions the whole field is wrestling with at the moment. So whether you want to know more about where technology is headed or want to be inspired on your own AI journey, then you've come to the right place. Now in the last episode, we were talking about how pitting artificial intelligence against world class players in the game of chess and the game of Go is about much more than just showing off what a computer can do. Human players can learn from how the AI plays and improve their own play as a result. And there's also a bigger picture.

Speaker 0

游戏世界为我们提供了测试人工智能各项能力的完美微观宇宙。但智能远不止于纯粹的逻辑运算,它还需要协作能力等其它技能。下面有请研究科学家马克斯·耶德伯格。马克斯

The world of games provides the perfect mini universe to try out everything we want our artificial intelligence to do. But intelligence is much more than just championing raw logic. Intelligence requires other skills, like the ability to collaborate. I want to introduce research scientist Max Yedeberg. Max

Speaker 1

and

Speaker 0

他的同事们正致力于研究如何训练智能体进行团队协作。

his colleagues are trying to work out how to train agents to work together as a team.

Speaker 2

试想几十年后的未来,世界上有无数AI系统各司其职,但它们可能从未彼此接触过。这些系统数量庞大,每个都有独立目标,却需要以合理方式即兴地协作竞争——即便它们素未谋面。

So imagine, say, a few decades in the future, we have all these AI systems out in the world doing different things, but they maybe have never seen each other before. There's thousands of these things, hundreds of thousands. Each have their own objectives, but somehow they have to cooperate and compete in a sensible way, and in a very ad hoc way, in a way that they've never seen each other before.

Speaker 0

人类在这方面确实很擅长,只要我们愿意。即使我们从未见过某个人,我们依然能理解他们的意图并与之互动。我们未来的智能体也需要具备彼此间这样的交互能力。

Humans are really good at this, when we want to be anyway. Even when we haven't encountered another person before, we still know how to understand their intentions and how to interact with them. Our agents of the future need to be able to do the same thing with each other.

Speaker 2

我们已经有了像Google Home这样的智能设备。未来这类设备会越来越多,你可以想象它们需要相互协作。可能两个设备从未见过面,但它们仍需要以某种方式互动,为你完成任务。

We already have things like Google Home and these sort of smart devices out there. We'll probably have more and more of those, and you can imagine them having to interact and work with each other. And one device may not have ever seen another device before, but they still somehow have to interact and get things done for you.

Speaker 0

我们是在讨论像你的Google Home和洗碗机这样的设备吗?这类东西?

Are we talking about like your your Google Home and your dishwasher here? This kind of stuff

Speaker 2

对,有可能。比如你的洗碗机可能想执行自己的清洁程序,而Google Home希望它把碗碟都洗干净。那么对你这个人来说什么才是最优解?我也不知道。

or Yeah. Potentially. You know, your dishwasher might wanna actually go on its cleaning cycle, Google Home wants it to, you know, clean all the dishes. And so what's best for you as a person? I don't know.

Speaker 0

那由谁来决定呢?

And who gets to decide?

Speaker 2

该由谁来决定?

Who gets to decide?

Speaker 0

谁才是主宰,你的洗碗机还是Google助手?

Who rules supreme, your dishwasher or your Google Assistant?

Speaker 2

是啊,我不知道。

Yeah. I don't know.

Speaker 0

这里有个重要区别。如果你有一个可以编程在傍晚6点自动亮起的智能灯泡,那是算法。如果你有一个能学习你的偏好、能理解你何时喜欢调暗灯光、阅读时喜欢什么氛围照明的灯泡,那就是人工智能。但当我们不再构建执行固定预设任务的东西时,我们要求技术能够解读情境并对其周围环境做出反应。长远来看,这将需要协作。

There's an important distinction here. If you've got a smart light bulb that you can program to come on at 06:00 in the evening, that is an algorithm. If you've got one that can learn your preferences, that can understand when you like the lights to be dimmed, what kind of mood lighting you like when you're reading, that is AI. But as we switch away from building things that do rigid, predecided tasks, we're asking our technology to read the situation and react to what's going on around it. And in the long term, that's going to require collaboration.

Speaker 0

本着在模拟世界中尝试的精神,DeepMind团队一直试图从另一种游戏中寻找灵感——直接取材于校园操场的游戏:夺旗战。规则很简单:率先偷走对方旗帜并带回己方基地的队伍获胜。如果被对方触碰到,你就出局了。

So in the spirit of trying things out in a toy universe, the team at DeepMind have been trying to find inspiration in another kind of game, one taken straight from the school playground. This is capture the flag. You know the deal. The first team to steal the flag of their opponent and bring it back to their own base wins. If you get tagged by the opposition, then you're out of the game.

Speaker 0

拜托,别哭。麦克斯把成群的AI智能体放进了这个游戏的数字版本里。

Oh, come on. Don't cry. Max dropped whole populations of AI agents into a digital version of the game.

Speaker 2

这是屏幕版本。你基本上只能看到第一人称视角,所以必须通过自己的第一人称视角在这个3D世界中环顾移动,但同时要与这些同样拥有独立第一人称视角的其他事物互动。这里不存在一个能

This is an on screen version. You sort of you you just see your first person point of view, so you have to sort of look around and move through this three d world from your own first person perspective, but interact with these other things which see their own first person perspective. So here, there's no centralized entity or being that

Speaker 0

纵观全局的指挥官。

can see everything. Commander.

Speaker 2

没有军队指挥官。每个玩家独立行动,他们只能看到自己的观察范围。我们训练这些智能体的方式是:同时训练整个队友群体,比如30个智能体并行运作,让他们既相互配合又彼此对抗。

No army commander. Every player acts independently. They only see their own observation. And the way we train these things, we actually train whole populations of teammates, you know, let's say 30 agents in parallel, and they're all playing with and against each other.

Speaker 0

马克斯和他的团队没有仅仅创造一个智能体,而是构建了整整一个班级的30个智能体。在每一轮游戏中,他会随机从班级中挑选几个智能体组队。通过成千上万次这样的训练,每个智能体都能从自身经验中学习。但由于它们也在与同学互动,因此必须学会如何与不同的个体相处。

Rather than just creating a single agent, Max and his team build an entire classroom of them, 30 in total. And for each round of the game, he randomly selects a few of the agents from the class to play together on a team. By doing this thousands and thousands of times, each agent will learn from their own experience. But because they're playing with each other too, with their classmates, as it were, they have to learn to interact with someone who's different from themselves.

Speaker 2

问题在于刚开始时,它们的行为完全随机。是的,它们只是漫无目的地四处乱撞。然后其中一个会有所发现,比如开始控制旗帜并得分。这时整个群体就会面临进化压力。

The problem is when we start, they're actually just all very random. Yeah. They're just bouncing about the place without a clue. And then one of them will discover something and will start actually, let's say, taking control of the flag and actually scoring points. And at that point, there's evolutionary pressure on this population.

Speaker 0

这里有个精妙之处:马克斯团队并非让教室里的智能体无限期地玩下去。他们还使用了一种称为遗传算法的技术,确保整个智能体群体的文化能够进化。

And here's the clever bit. Max and his team aren't just letting the agents in the classroom play on and on forever. They're also using something called a genetic algorithm, a way to make sure the whole culture of the population of agents evolves.

Speaker 2

这样实际上会淘汰群体中较弱的个体。

So that actually some of the weaker ones will be removed from this population.

Speaker 0

这几乎就像让这30个智能体繁衍后代。是的,让它们互相交配繁殖。

So it's almost like you're making that population of 30 have children. You're Yeah, breeding them together.

Speaker 2

没错。

Yeah.

Speaker 0

最初的智能体班级会互相繁殖产生后代。随着世代更替,最强的特性得以保留下来。

The original classroom of agents breed together and have kids of their own. And as you go down the generations, the strongest traits survive.

Speaker 2

但与人类孩童不同,在这个设定中,智能体繁衍后代时会继承一切。它们会继承从父代获得的知识。

But unlike human children, when an agent has children in this setup, they inherit everything. They inherit the knowledge that's been gained from their parent.

Speaker 0

但你在代际传递过程中混淆了它们的特性。

But you're mixing up their characteristics as you go from one generation to the next.

Speaker 2

没错。这个智能体需要学会玩一场五分钟的夺旗游戏——实际上你进行五分钟游戏时会执行数千个动作,最终只得到一个胜负结果。而我们得设法让它学会如何处理这种情况。为了帮助解决这个问题,我们引入了内部奖励机制:游戏中会发生各种事件,比如拾取旗帜、丢弃旗帜、队友标记对手或被对手标记等。我们允许智能体自主演化出各自的内部奖励值,也就是它们为每个事件分配的奖励权重。

Yeah. So this agent has to learn to play a five minute game of capture the flag, which is really you play five minutes, you do thousands of actions, and you just get a win or a loss on whether you won or lost the game. And somehow, we have to learn what to do with that. And so to help bridge that problem, we have this idea of internal rewards where there are events in the game such as picking up a flag or dropping a flag or your teammate tagging an opponent or an opponent tagging you, all these sort of things. And we allow the agents to individually evolve their own internal rewards, which is the reward they assign to each one of these events.

Speaker 0

所以有些智能体会特别重视夺取旗帜(确实如此),而另一些则更关注队友标记对手的行为(没错)。这种群体进化训练意味着它们能承担不同角色,从而更高效地完成夺旗任务。

So some agents are gonna care a great deal about grabbing hold of the flag. Exactly. And other agents are gonna care a lot about teammate tagging someone. Yeah. This kind of evolutionary group training means that they can assume different roles, producing better results for stealing a flag.

Speaker 0

经过少量练习——比如几千轮训练后,智能体团队会变得相当擅长这个游戏。

And with a bit of practice, after a few thousand rounds, say, teams of agents become really rather good at this game.

Speaker 2

它们表现简直惊艳。这种训练方式最棒的地方在于智能体非常强韧——是的,它们既能相互对抗,也能与采用完全不同训练体系的智能体对战,甚至能击败游戏内置的硬编码机器人。

They absolutely smashed it. And the great thing about training an agent in this manner is that they're robust. Yes. They can play themselves, but they can play other agents that have been trained in completely different regimes. They can also play these in game bots, which are sort of these hard coded bots that ship with the game.

Speaker 2

但最有趣的是,它们还能与人类玩家互动。你可以把人类放进这些游戏里,让人工智能充当队友或对手。

But most interestingly, they can also play with people. So you can drop people into these games and have, you know, an AI teammate or AI opponents.

Speaker 0

当时与智能体一起玩耍的实际体验是怎样的?它们是否既在猜测你的行动,又在执行自己的策略?

What was it actually like to play with an agent then? Do they does it feel like they are guessing what you're going to do as well as doing their own thing?

Speaker 2

与其说它们在猜测你的行为,不如说它们完全无视你且非常冷酷。人类会高度关注其他人类,即使在游戏场景中,人类玩家也会紧盯对手。但这些智能体被训练得完全不带这类人类偏见。你的对手会径直从你身边跑过,甚至不试图触碰你,因为它们只专注于尽快夺取旗帜——这样才能最大化夺旗次数并赢得比赛。

It feels less like they're guessing what you're doing and more like they completely ignore you and they're very ruthless. Humans pay a lot of attention to other humans. Even in game scenarios, like humans will fixate on the other players of the game. But these agents have been trained completely unbiased without these sort of human biases. Your opponent will run right past you and not even try and tag you because they're so fixated on actually getting the flag as quickly as possible because that's what's going to maximize their number of flag captures and win them the game.

Speaker 2

那些真正令人讨厌的人类玩家会做的事。

Things that really annoying human players would do.

Speaker 0

这里有种奇妙的魔力。最初研究人员研究这些智能体时,试图通过模型寻找突破口。然后就会出现那个突破性时刻——智能体突然开窍,开始如你所预期般行动。让我为你介绍DeepMind研究总监Karai Kavocholo的见解。

There's a kind of magic going on here. Initially, researchers are working on these agents, trying to see a way through the model. Then there is that breakthrough moment when the agent gets it, when they start to behave like you think they should. Let me tease you with Karai Kavocholo, Director of Research at DeepMind.

Speaker 3

我记得早期训练智能体的日子。当它们第一次表现出对环境的本能反应——尝试导航、避开障碍物等——那一刻真的很美妙。看着它们自主做出决定是件相当有趣的事。创造出一个能自主决策的算法,这种成就感令人愉悦。

I remember training agents in the early days. The first time actually those agents started behaving like, it's an environment, it's trying to navigate and it's trying to avoid certain obstacles and whatnot. The first time it starts doing that, it's actually it is nice. It is like it's it's quite fun to see that because you know that it makes a decision for itself. I think knowing that you have created an algorithm that can take decisions, I think that aspect is quite enjoyable.

Speaker 3

这非常令人满意。

That is very satisfactory.

Speaker 0

值得记住的是,这些游戏对DeepMind而言绝非消遣。他们投入严格训练是有原因的——他们想观察人工智能如何自主发展这类技能。

It's worth remembering that these games aren't just a trivial pursuit for DeepMind. They've invested in this rigorous training for a reason. They want to see how an AI develops these kinds of skills for itself.

Speaker 2

我们在这项夺旗任务中投入了大量时间,研究这些智能体的神经网络,试图理解它们关注什么以及如何表征游戏世界。最酷的是我们发现这些智能体实际上对游戏世界有着非常丰富的内在表征,而无需被告知任何关于游戏世界本身的信息。你知道吗?这些智能体只是看着屏幕像素,却不知怎地将内部激活聚类成了类似'我在己方基地'、'我在敌方基地'这样的概念。

We spent a lot of time in this capture the flag work looking into the the neural networks of these agents to try and understand what they care about and how they represent the game world. And what was really cool is that we found that the agents actually had a really, really rich representation of this game world without being told anything about the game world itself. You know? These agents just look at the pixels of the screen, yet somehow they've clustered the, you know, internal activations into things like, oh, I'm in my home base. I'm in the opponent base.

Speaker 2

我拿到了旗帜,能看到队友在前方。我正盯着敌方旗手,而我的队友正拿着我们的旗帜。你甚至能找到特定神经元,比如当队友持旗时就会激活的那种。

I've got the flag, and I can see my teammates ahead of me. I'm looking at the opponent flag carrier while my teammate is holding our flag. And you can even find individual neurons which just activate if, for example, your teammate is holding the flag.

Speaker 0

随着深入分析,你完全可以理解智能体是如何感知这个游戏的。

You can totally understand how the agent is seeing the game as you go through.

Speaker 2

我不确定是否完全理解,但我们确实对哪些表征强烈、哪些表征薄弱有了概念。

I'm not sure about totally understand, but we're really getting an idea of what is being represented strongly and what isn't being represented strongly.

Speaker 0

马克斯开发的智能体使用了一种叫神经网络的技术。这是一种机器学习算法,其原理大致基于简化版的人脑模型。层层人工神经元通过庞大网络相互连接,彼此传递信息。通过观察智能体的电子大脑,马克斯能找出微观层面的连接如何对应宏观行为。随着AI日益融入日常生活,这项技术可能带来巨大益处。

Max's agents are using something called neural networks. It's a type of machine learning algorithm that is loosely based on a simplified version of the human brain. Layers on layers of artificial neurons are connected together in a vast network and fire information between themselves. By looking inside an agent's electronic brain, Max can work out which micro level connections are responsible for what macro level behavior. And this could be hugely beneficial as AI becomes more integrated in our everyday lives.

Speaker 2

我们希望在未来,能够真正开发出可以进入现实世界、与人类和其他智能体互动的智能体。

The hope is that well into the future, we can start actually having agents which can go out into the real world, interact with humans, with other agents.

Speaker 0

而且不会打架。

Without fighting.

Speaker 2

不争斗,保持理智,是的,别太多争吵。

Without fighting, being sensible, yeah, not squabbling too much.

Speaker 0

不像人类。

Unlike humans.

Speaker 2

没错,正是如此。

Yeah. Exactly.

Speaker 0

无国界的游戏,无泪水的团队合作。但从棋盘游戏或简单的夺旗游戏,到充满复杂性与混乱的真实世界,还有巨大的鸿沟。你会记得大卫·席尔弗——带给我们AlphaGo的人,这个智能体曾在古老的围棋比赛中击败世界冠军。如今他也正参与推动DeepMind的AI进入更令人困惑的环境。

Games without frontiers, teamwork without tears. But there is a big leap between board games or simple games like capture the flag and the big bad world with all of its complexity and messiness. You'll remember David Silver, the man who brought us AlphaGo, the agent that defeated the world champion at the ancient board game of Go. Well, he's also involved in pushing DeepMind's AI into ever more perplexing environments.

Speaker 4

在游戏领域,我认为还存在一个更进一步的挑战,这也是社区中许多人正在努力的方向——攻克最具挑战性的电脑游戏。当前这个游戏就是《星际争霸》。AI社区的许多人将其视为下一个重大挑战。我们该如何设计出能在如此丰富环境中游戏的智能体?这些挑战不仅不同,而且在某些方面比围棋快得多。

In the context of games, I think there is a further challenge, which is many people in the community are moving towards, which is to take the most challenging computer game. In this case, it's the game of Starcraft. Many people in the AI community are viewing this as the next grand challenge. How can we actually devise agents which can play in this very rich environment, has challenges which are not only different, but many times faster than go in other ways?

Speaker 0

这里是DeepMind播客,带您走进当今科学最迷人的领域之一——人工智能。你可曾看过那些大型电竞赛事的画面?整个场馆坐满狂热粉丝,兴奋地观战支持那些高水准选手——他们坐在舞台上的电竞椅里,仅凭键盘、鼠标和屏幕作战。很可能他们玩的就是暴雪娱乐开发的《星际争霸2》。这是一款极其复杂的战术游戏,玩家可选择三个神秘命名的种族之一:虫族、神族或人族。玩家需要采集资源、发展经济、获取日益先进的科技,同时在这个未来感十足且荒凉的战场上击败外星对手。

This is DeepMind, the podcast, an introduction to AI, one of the most fascinating fields in science today. Have you ever seen footage of those vast e tournaments where an entire arena of dedicated fans excitedly watches on in support of highly skilled players sat on stage in their gaming chairs, armed only with a keyboard, a mouse, and a computer screen. Well, chances are they are playing something like Starcraft two, created by the American video game developer, Blizzard Entertainment. It is a monumentally tricky tactical game where you play as one of three races, the enigmatically named Zerg, Protos, or Terrans. Each player has to mine resources, build an economy, and acquire increasingly sophisticated technology, all the time trying to defeat your alien opponents in a futuristic, rather bleak looking landscape.

Speaker 0

游戏中的视野受限于你需要操控的移动镜头,因此无法一览全局,甚至常常看不见对手。全球有数万人参与这款游戏,有时争夺丰厚奖金。人类玩家的操作速度快得惊人——世界顶级选手每分钟能操作高达800次点击。

Your field of view of the simulated game is limited by a moving camera that you have to operate, And so there's no way to see everything at once. Often, can't see your opponent at all. And it is played by tens of thousands of people, sometimes for hefty cash prizes. And the human players are staggeringly fast. The best in the world can manage up to 800 clicks in a minute.

Speaker 0

觉得自己不够格吗?

Feeling inadequate?

Speaker 5

这绝对超酷——我能从事一项我青少年时期就充满热情的事业。

Definitely super cool that I can work on one thing that has been certainly a passion of mine in in my teenage days.

Speaker 0

这位是DeepMind的研究科学家奥里奥尔·维尼亚莱斯,他曾是职业星际争霸选手,现在负责DeepMind的星际争霸项目。

Meet Oriol Vignales, a research scientist at DeepMind. He is an ex pro Starcraft player and co leads the Starcraft effort at DeepMind.

Speaker 5

当你开发出新算法或新想法时,测试时会亲眼看到它在喜欢的游戏中表现更好。这种即时反馈既直观又充满成就感,对吧?当你尝试新方法时,会惊叹'天啊,它真的理解了这个单位的运作机制'。

As you develop a new algorithm or a new idea, when you test it, you actually see it play better the game you you like. So that's very rewarding and very visual. Right? That you try something new and you really see, oh my god. It's really understands how this unit works.

Speaker 0

星际争霸是项严肃的事业,事实上它已发展成职业竞技。对奥里奥尔而言,这证明这是款能推动人类智力发展的游戏。

Starcraft is a serious business. So serious, in fact, that it has now been professionalized. And for Oriol, that proves it is a game that pushes human intelligence.

Speaker 5

人类觉得它有趣,正说明这是款能以我们喜爱的方式挑战智力和创造力的游戏,让我们愿意投入无数小时沉浸其中。

Humans found it interesting, so that means it's an interesting game that challenges intelligence and creativity in ways that we like, that we spend many hours playing.

Speaker 0

那么目前AI的水平如何?它能玩星际争霸到什么程度?

So how good is the AI at the moment then? How well can it play Starcraft?

Speaker 5

它超越了人类迄今建造的所有人工智能,显然是通过经验学习而非依赖游戏规则编码。这可以说是我们攻克过最复杂的游戏,对我们的理解和算法都提出了相当大的挑战。

It's better than any AI anyone has ever built, And it obviously has learned from experience, not from someone knowing the game and encoding some set of rules. This is, I mean, one of the most complicated games we've ever tackled. It's challenging kind of our understanding and our algorithms quite a bit.

Speaker 0

DeepMind团队决定邀请两位世界顶尖的《星际争霸2》选手来测试他们研发中的算法。请允许我介绍DeepMind的AlphaStar——首个与顶级职业选手对抗的人工智能。它通过深度神经网络直接学习原始游戏数据,结合监督学习与强化学习来完整运行《星际争霸2》。解说席上是被称为Artosis的Dan Stemkoski和绰号Rotterdam的Kevin van der Khoi。

The DeepMind team decided to see how good their work in progress really was by inviting two of the world's best Starcraft two players to take on their own algorithm. So let me introduce DeepMind's AlphaStar, the first artificial intelligence to ever take on top professional players. It plays the full game of StarCraft two by using a deep neural network trained directly from raw game data by supervised learning and reinforcement learning. Your commentators are Dan Stemkoski, aka Artosis, and Kevin van der Khoi, aka Rotterdam.

Speaker 6

首先,能和你一起参与这次活动真是太棒了。我想我们俩都对今晚的进展充满期待。

Well, first of all, it's really awesome to be here together with you then. We're both, I think, incredibly excited to see how this evening unfolds.

Speaker 7

DeepMind做的这一切实在太令人兴奋了。

I mean, this is just so exciting that D MIND is doing all this.

Speaker 0

本次基准测试中迎战AlphaStar的是德国冠军Dario Wuch(更广为人知的名字是TLO)。他通常使用虫族,但本场对决将使用神族。Kevin和Dan已经兴奋不已,甚至有点过于激动了。

Taking on AlphaStar in this benchmarking match is German champion Dario Wuch, better known as TLO. He's normally a Zerg player, but he's playing as Protos for this match. Kevin and Dan are excited. Maybe even a tad overexcited.

Speaker 6

我简直兴奋到不行,天啊!

I'm so incredibly excited. Oh my god.

Speaker 7

这大概是我个人对一场赛事最期待的时刻了。等不及要分析这场神族内战了。这就是AlphaStar——我们尚不清楚它的实力,但已经出现了一些有趣的情况。

This is, like, the most excited I have personally ever been for a event. Can't wait to really break down some PvP. So this is AlphaStar. This is an AI that we don't know how good it is yet, but already we have some interesting things happening.

Speaker 0

我对《星际争霸》的战术术语并不完全熟悉。所以我就直说吧,AlphaStar的追踪者正在施展一些精妙的操作。

Now I'm not entirely conversant with the StarCraft playbook lingo here. So I will just say that AlphaStar's stalkers are laying down some sharp moves.

Speaker 7

在我看来,AlphaStar的这些攻击到目前为止都策划得非常周密。而且它们毫不留情。它们本应在攻击中落败的。

Feels to me like so far these attacks have been very well planned by AlphaStar. And they're relentless. They'd lost to attack.

Speaker 0

短短几分钟内,一切就结束了。

And in a matter of minutes, it's all over.

Speaker 7

好吧,就是这样。TLO打出了GG(Good Game)。AlphaStar与职业选手的第一场比赛以AlphaStar获胜告终。

Well, that is it. The GG is called the good game here from TLO. And the first game from AlphaStar against a pro gamer goes to AlphaStar.

Speaker 0

大卫·西尔弗当时就在场边。

David Silver was there at ringside.

Speaker 4

我们有一个团队在过去几个月里一直在推进这个项目,加速我们的开发进程。这标志着一个里程碑——我们首次见证了一个AI真正能够击败职业选手。

We have a a team that's been working on this and ramping up our development over the last few months, and this represents, you know, a milestone where we actually, for the first time, were saw an AI that was actually able to defeat a a professional player.

Speaker 0

我们要不要快速采访一下我们落败的挑战者TLO?

Shall we have a quick word with our defeated challenger, TLO?

Speaker 8

在我训练时,大部分对战的人类玩家都采用非常标准的星际争霸打法。我再次以为,打完第一场比赛后我就能掌握对抗这个AI的要领。但我错了。

When I was practicing, most of the humans I played against played very standard Starcraft. I, once again, I assumed after the first match, I'll probably have a good idea how to play against this agent. I did not.

Speaker 0

接下来是重头戏。阿尔法星将对战波兰顶尖选手格雷戈尔·科明斯克,也就是人称Mana的世界顶级职业星际选手之一。

Next up, the main event. Alpha star versus Poland's finest, Gregor Kominsk, better known as Mana, one of the world's strongest professional Starcraft players.

Speaker 7

Mana,我需要听听你现在的想法,因为那看起来太可怕了。

Mana, I need to hear what you're thinking here because that looks scary.

Speaker 6

没错。阿尔法星完全不惧怕斜坡。如果我现在对战的是人类玩家,没人敢走那个斜坡。

Yeah. Alpha Star, he's not scared about the the ramp. So if I would be playing against a human player right there, nobody's going up that ramp.

Speaker 0

我要向玩星际的观众说明:这些比赛是在职业比赛条件下进行的,使用的是天梯竞技地图,且没有任何游戏限制。这个版本的阿尔法星可以随时查看整个游戏地图,但其他方面与人类玩家的操作方式相当。

I should point out for those of you that play StarCraft that these matches are taking place under professional match conditions on a competitive ladder map and without any game restrictions. This version of AlphaStar could see the whole of the game map at any one time, but otherwise played in a comparable way to humans.

Speaker 4

我们的目标不仅是击败这些选手,更要以正确的方式实现这一目标。

Our goal is not just to defeat these players. Our goal is to do it in the right way.

Speaker 5

好的,大家再等两秒钟。

Alright. Two seconds, guys.

Speaker 0

最终比分是AlphaStar五比零Mana。我得告诉你,Mana后来使用了更新版本的算法并取得了胜利。总而言之,五比一。为了理解AI如何学习玩《星际争霸》,Aureel Vinales对我进行了测试。这是一场决战,数学家对战机器。

And the result, AlphaStar five, Mana nil. I should tell you that Mana played a later version of the algorithm in the end and won. So all in all, five one. Now to understand how an AI could learn to play Starcraft, Aureel Vinales put me to the test. A match to the end, mathematician versus machine.

Speaker 0

我面前有个造型古怪的鼠标和一个普通键盘。屏幕上有个面目狰狞的外星人,还有个...呃...兄弟?他长得像大象和拳击手的结合体,拳头特别大。我可不想在漆黑的夜晚遇见这家伙。

I've got a quite a funky looking mouse in front of me and a normal keyboard. And on the screen, there is a very mean looking alien and Yeah. A brother. He's sort of like an elephant meets where he's got fists. I wouldn't want to meet him on a dark night.

Speaker 0

等等,他到底是友军还是敌人?

No. Is he my friend or not?

Speaker 5

他就是你。你将担任这个种族的指挥官。

He is you. You're gonna you're gonna be the commander of the this particular race.

Speaker 0

我很快发现要掌握的内容太多了。《星际争霸》可能不适合新手。你有工蜂为你采集资源...不对,这些小家伙...它们看起来像蚂蚁之类的生物对吧?

I quickly found out that there is a lot to take in. Starcraft is perhaps not for beginners. You have your worker bees collecting resources for you. No. These these little I mean, they're almost like ant creatures Right.

Speaker 0

它们会跑出去采集水晶。

So running out and and grabbing crystals.

Speaker 5

没错。

Exactly.

Speaker 0

你需要尝试理解你的行动将如何影响游戏未来走向。这对人类来说已不易掌握,更别提那些毫无背景知识、不具备物体识别能力、更不可能有前星际争霸冠军手把手指导的智能体了。

And you need to try and work out how your actions will affect the game in future. This is not easy for humans to learn, let alone agents that have absolutely no context, no object recognition, and definitely no former StarCraft champion to hold their hand.

Speaker 5

看,这就是敌人。

Look, this is the enemy.

Speaker 0

哦,不。

Oh, no.

Speaker 5

这会很愉快的。它只是来寻找你,现在正观察你的行动——而到目前为止你根本什么都没做。我们确实完全没有采取任何行动。

It's gonna be pleasant. It just came to kind of find you and now see what you're doing, which is absolutely nothing so far. We have we have done nothing at all.

Speaker 0

星际争霸的挑战之一在于不存在通吃的完美策略。这有点像石头剪刀布,制胜战术取决于对手的打法。但记住,你的视野非常有限。在镜头之外的区域,对手可能在进行任何操作。

Part of the challenge of StarCraft is that there isn't an ideal strategy that wins every time. It's a bit like rock, paper, scissors in that way. The winning tactic will depend on how your opponent plays. But remember, you only have a very narrow field of vision. Outside of where your camera is pointing, your opponent could be up to anything.

Speaker 5

由于看不见对手,你必须判断何时去侦察。我是否已掌握局势?是否该去探查敌情?但若我这么做,对方就会知道我已察觉,如此循环。这种信息不对称的特性让星际争霸对玩家极具吸引力,也将把我们的智能体测试推向其他游戏从未达到的高度。

Because you don't see the other player, you must decide when am I gonna see it. Do I already know what is going on? And should I not go and scout what it's doing? But maybe if I do that, he knows that I know and so on and so forth. So this kind of imperfect information aspect of Stargardt is extremely interesting as a player, and it's gonna be testing our agents to levels that we haven't seen in any other game.

Speaker 5

当然,游戏中还会出现一些你必须长期牢记的细节信息。

And then, of course, there's sort of details that have happened in the game that you must remember for a long time.

Speaker 0

也许我本该更认真地听取那些建议。

Advice I should have listened to more carefully, perhaps.

Speaker 5

我们正遭受攻击,可能就要死了。天啊!不过没关系。

We're being attacked and we're probably gonna die. Oh my! So that's okay.

Speaker 0

持续时间并不长。

Didn't last very long.

Speaker 5

所以想想这个探索阶段,现在你基本上会输掉,得到负一分的奖励,然后重新开始。

So think this discovery phase, right, where you would now basically you would lose, you get the reward of minus one and you start again.

Speaker 0

如果我是算法,我不会因失败而沮丧。我会重置并重新开始,每次都带着更多知识。但为了能玩《星际争霸》,为了能操作控制,AI必须掌握许多可迁移的技能。

If I was an algorithm, I wouldn't be upset by losing. I would just reset and go again, each time armed with a little more knowledge. But to even be able to play StarCraft in the first place, to even be able to operate the controls, the AI had to master quite a few transferable skills.

Speaker 5

你玩的时候可能注意到,有些操作类似于浏览网页或使用笔记本电脑,比如点击、拖拽、选择矩形区域、移动鼠标、结合键盘操作等。我们尝试用完全相同的智能体、相同的架构、几乎相同的代码,只是改变了环境。不是让它玩《星际争霸》去获胜,而是让它使用微软画图工具作画,如果画得像人脸就会获得奖励。

You've noticed when you were playing that there were some movements that were resembling what it was like to maybe navigate the web or, like, operate your your laptop, namely click, drag and click, drag and drop, select rectangles, moving the mouse, combining mouse with keyboard and so on. And we tried exactly the same agent, the same architecture, absolutely everything the same way, the same code almost. And we changed the environment. Instead of saying, now here is StarCraft, please play to win. We said, here is Paint, Microsoft Paint as an environment, interact with it, and I'll reward you if what you paint looks like a face.

Speaker 5

结果真的成功了。我认为这就是掌握了点击界面这类基础技能,这些技能在很多场景都适用。

And it actually worked. So, I think that's just learning these basic skills of point and click interfaces that apply in so many places.

Speaker 0

那个能玩《星际争霸》的智能体,同样可以在微软画图中绘制真实人脸。

The same agent that plays StarCraft can draw real faces in Microsoft Paint.

Speaker 5

没错。这里需要澄清的重点不是那个被训练来玩游戏的智能体,而是同一个算法既能训练玩《星际争霸》,也能训练绘画。

Right. And here, the point to be clarified is not the same agent that was trained to play It's the same algorithm that can train to play StarCraft, also train to do paint.

Speaker 0

让同一个算法在画图中绘制名人肖像时,它能捕捉面部所有主要特征,通过点击拖动鼠标来重现轮廓、色调和发型,就像街头艺术家那样。

Put that same algorithm to work drawing celebrities in paint, and it can capture all the main traits of the face, clicking and dragging the mouse to recreate shape and tone and hairstyle, much like a street artist would.

Speaker 5

这是同一种技术,但可以说它像一张空白的大脑,然后这个大脑可以学习做这个或那个。我们通过反复在环境中行动并获得奖励,让大脑逐渐适应去完成这项或那项任务。虽然目前还做不到像人类大脑那样同时处理多种任务,但这显然是我们下一步非常感兴趣要攻克的课题之一。

It's the same technique, but if you will, it's kind of a brain that is blank, and then this brain can learn to do this or that or that. And then we kind of, by acting in the environment repeatedly and getting reward, the brain waits or gets shaped to do this task or that task or that task. We are not yet at the point where the same brain does both like we do, but obviously that's one of the things we we would be very interested in tackling next as well.

Speaker 0

因为这是在向通用人工智能迈进,我猜。

Because that's stepping towards artificial general intelligence, I guess.

Speaker 5

正是如此。而这正是我们每天都在做的事情。

Exactly. And that's that's what we do every day.

Speaker 0

这就是终极目标。在这栋大楼里无论你和谁交谈,这个话题永远不会远离讨论中心。因为让AI玩《星际争霸》或围棋这类游戏的意义,在于加深我们对智能本质的理解。以下是深度学习团队的Raya Hadsell的看法。

That is the ultimate goal. And it's a topic of conversation that's never far away whoever in this building you find yourself talking to. Because the point of getting AI to play games like StarCraft or Go is to enhance our understanding of what intelligence actually is. Here's Raya Hadsell from the deep learning team.

Speaker 1

我们编写程序。我们运行这些程序,进行实验——比如训练一个智能体玩游戏,或在模拟世界中解决谜题,然后观察实验结果。这本质上是在试图理解学习与表征、记忆、控制(如机器人采取行动)这一复杂谜题。关于什么是智能存在、什么是智能体,这个大谜题包含太多复杂的部分。

We write programs. We run those programs, those experiments where we might train an agent to play a game, for instance, or to solve a puzzle in a simulated world, and then we look at the results of that. It really is trying to understand this puzzle of learning and representation, memory, control in terms of actions that a robot would take. There's so many complex parts to this big puzzle of what is an intelligent being, what is an intelligent agent.

Speaker 0

但如果你问人们如何看待AI的未来,他们往往会联想到更具实体感的东西,比如配备可移动手臂等完整部件的形态。

But if you ask people what they think the future of AI looks like, it tends to be wrapped up in something a bit more physical, something that comes complete with moving arms and everything.

Speaker 4

我认为AI面临的一个自然挑战(也是许多人关注的焦点)是真正以机器人形式对现实世界产生影响——看到机器人能够移动、抓取、操控,甚至实现接近(不必达到人类水平,或许动物级别即可)的运动能力,这将代表重大进步。

I think one natural challenge for AI, which many people are centering upon would be to actually have an impact on the real world in the guise of robotics, to actually see a robot which is able to move, to grip, to manipulate, to even have locomotion in anything approaching not even what a human does, maybe even an animal, I think this would represent a major stride forwards.

Speaker 0

更多相关内容将在下期探讨。如果你想了解本集主题的更多信息,或探索DeepMind之外的AI研究世界,每期节目说明中都有大量实用链接。若你有认为对其他听众有帮助的故事或资源,请告诉我们。你可以通过Twitter留言或发送邮件至team@podcastatdeepmind.com联系我们,该邮箱也接收对本系列节目的提问与反馈。

More on that next time. If you would like to find out more about the themes in this episode or explore the world of AI research beyond DeepMind, you'll find plenty of useful links in the show notes for each episode. And if there are stories or resources that you think other listeners would find helpful, then let us know. You can message us on Twitter or email the team@podcastatdeepmind.com. You can also use that address to send us your questions or feedback on the series.

Speaker 0

不过现在,我们先出去透会儿气吧。

But for now, let's nip out for a bit of air.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客