本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
一位可能掌握我们未来命运的天才。
A genius who may hold the cards of our future.
谷歌DeepMind的首席执行官,该公司人工智能发展的核心引擎。
CEO of Google DeepMind, which is the engine of the company's artificial intelligence.
在获得诺贝尔奖和查尔斯国王授予的爵士头衔后,他成为了人工智能领域的先驱。
After his Nobel and a knighthood from King Charles, he became a pioneer of of artificial intelligence.
我们是现代第一个认真开始做这件事的。我认为AlphaGo是一个重大的分水岭时刻,不仅对DeepMind和我的公司,对整个AI领域都是如此。这从我小时候起就一直是我对AI的目标,即用它来加速科学发现。
We were the first ones to start doing it seriously in the modern era. AlphaGo was the big watershed moment, I think, not just for DeepMind and my company, but for AI in general. This was always my aim with AI from a kid, is to use it to accelerate scientific discovery.
女士们先生们,请欢迎谷歌DeepMind的德米斯·哈萨比斯。欢迎。很高兴来到这里。感谢你们追随塔克、马克·库班等人的脚步。首先,恭喜你获得诺贝尔奖。
Ladies and gentlemen, please welcome Google DeepMind's Demes Hassabis. Welcome. Great to be here. Thanks for thanks for following Tucker, Mark Cuban, et al. First off, congrats on winning the Nobel Prize.
谢谢。
Thank you.
为了AlphaFold的惊人突破,也许你之前可能已经做过这件事,但我知道这里的每个人都想听听你讲述赢得诺贝尔奖时你在哪里。你是怎么
For the incredible breakthrough of AlphaFold, maybe you may have done this before, but I know everyone here would love to hear your recounting of how you where you were when you won the Nobel Prize. How did
得知的?嗯,那显然是一个非常超现实的时刻。你知道,关于它的一切都超现实。他们告诉你的方式,他们在一切公开前大约十分钟告诉你。就是,你知道,你真的不能…当你接到那个来自瑞典的电话时,你有点懵了。
you find out? Well, it's a very surreal moment, obviously. You know, it's Everything about it is surreal. The way they tell you, they tell you like ten minutes before it all goes live. It's just, you know, you can't really It's You're sort of shell shocked when you get that call from Sweden.
那是每个科学家都梦想的电话。然后颁奖典礼是在瑞典与王室共度整整一周。太棒了。显然,这个传统已经持续了一百二十年。最令人惊叹的部分是他们从保险库的保险箱里拿出这本诺贝尔之书,你可以把你的名字签在,你知道,所有其他伟人的旁边。
It's the call that every scientist dreams about. And and then the Serrel ceremony is a whole week in Sweden with the Royal family. It's amazing. Obviously, it's been going for a hundred and twenty years. And the most amazing bit is they bring out the this Nobel book from the from the vaults in the safe and you get to sign your name next to, you know, all the other greats.
所以这是一个相当不可思议的时刻,翻看其他页面,看到费曼、居里夫人、爱因斯坦、尼尔斯·玻尔,你继续往回翻,然后你就能把你的名字写进那本书里。这太不可思议了。
So it's quite an incredible moment sort of leafing back to the other pages and seeing Feynman and Feynmari Curie and Einstein and Niels Bohr and you just carry on going backwards and you just get to put your name on that in that book. It's incredible.
你之前有没有预感自己会被提名,并且这个奖项可能会落到你头上?
Did you have an inkling you had been nominated and that this might be coming your way?
嗯,你会听到一些传言。实际上在当今这个时代,他们能如此保密真是令人惊讶。但这有点像瑞典的国宝。正如你所听到的,也许AlphaFold这类成就是值得获得这种认可的。他们既看重科学突破,也看重在现实世界中的影响力。
Well, you you you get you hear rumors. It's amazingly locked down actually in in today's age how they keep it so so quiet. But it's sort of like a national treasure for Sweden. And as you hear, maybe AlphaFold is the kind of thing that would be worthy of that recognition. And they look for impact as well as the scientific breakthrough impact in the real world.
而这可能需要二三十年的时间才能显现。所以你永远不知道它何时会到来,甚至是否会到来。这真是太神奇了。
And that can take twenty, thirty years to arrive. So you just never know how soon it's going to be and whether it's going to be at all. It's Amazing. A
好吧,恭喜你。
Well, congrats.
是的,谢谢你。
Yeah. Thank you.
也谢谢你。几周前你还让我和它合了影,我们拍下来了。那是我会珍藏的记忆。DeepMind在Alphabet内部是什么角色?Alphabet是一个庞大的组织,拥有众多业务部门。
And thank you. You let me take a picture with it a few weeks ago and we had it. That's something I'll cherish. What is DeepMind within Alphabet? Alphabet is a sprawling organization, sprawling business units.
DeepMind是什么?你们负责什么?
What is DeepMind? What are you responsible for?
嗯,我们现在把DeepMind和Google DeepMind视为一体。几年前我们基本上合并了谷歌和Alphabet旗下所有的不同AI项目,包括DeepMind。把所有力量整合在一起,将各个团队的优势汇集到一个部门中。现在描述它的方式就是,我们是整个谷歌和整个Alphabet的引擎室。所以Gemini是我们正在构建的主要模型,但还有许多其他模型,比如视频模型和交互式世界模型,我们现在将它们接入到谷歌的各个产品中。
Well, we sort of see DeepMind now and Google DeepMind as it's become. We sort of merged a couple of years back all of the different AI efforts across Google and Alphabet including DeepMind. Put it all together, kind of bringing the strengths of all the different groups together into one division. And really, the way describe it now is that we're the engine room of the whole of Google and the whole of Alphabet. So Gemini, our main model that we're building, but also many of the other models that we also build, video models and interactive world models, we plug them in all across Google now.
所以几乎每个产品、每个界面都嵌入了我们的AI模型。现在有数十亿人通过AI概览、AI模式或Gemini应用与Gemini模型互动。而这仅仅是个开始。我们正将其整合到Workspace、Gmail等产品中。所以这对我们来说是一个绝佳的机会,既能进行前沿研究,又能立即将其推送给数十亿用户。
So pretty much every product, every surface area has one of our AI models in it. So billions of people now interact with Gemini models, whether that's through AI overview, AI mode, or the Gemini app. And that's just the beginning. We're incorporating it into Workspace, into Gmail, and so on. So it's a fantastic opportunity really for us to do cutting edge research, but then immediately ship it to billions of users.
那么有多少人,他们的背景如何?是科学家、工程师吗?你手下5000人的构成是怎样的?
And how many people what's the profile? Are these scientists, engineers? What's the makeup of your 5,000 people in
在你们机构?在Google DeepMind。我想主要是80%以上是工程师和博士研究员。所以是的,大约有三四千人。
In your org? In Google DeepMind. And it's predominantly, I guess, 80% plus engineers and PhD researchers. So yeah, about three or 4,000.
所以模型正在不断演进,有很多新模型出现,还有新型的模型类别。前几天,你们发布了这个Genie世界模型。是的。那么Genie世界模型是什么?我想我们有一段视频。
So there's an evolution of models, a lot of new models coming out, and also new classes of models. The other day, you released released this Genie World Model. Yes. So what is the Genie World Model? And I think we got a video of it.
值得一看吗?我们可以现场讨论一下。
Is it worth looking at? And we can talk about it live.
是的。我们可以看一段演示。
Yeah. We can watch a show.
我觉得你必须亲眼看到才能理解,因为它太非凡了。我们能播放视频吗?然后Demis可以简单解说一下我们看到的内容。
Think you have to see it to understand it because it's so extraordinary. Can we pull up the video? And then Demis can narrate a little bit about what we're looking at.
你们看到的不是游戏或视频。它们是世界。每一个都是由Genie三生成的可交互环境,这是世界模型的新前沿。通过Genie三,你可以使用自然语言生成各种世界并交互探索,全部只需一个文本提示。
What you're seeing are not games or videos. They're worlds. Each one of these is an interactive environment generated by Genie three, a new frontier for world models. With Genie three, you can use natural language to generate a variety of worlds and explore them interactively, all with a single text prompt.
是的。所有这些视频,所有这些你看到的交互世界,实际上有人可以控制视频。它不是静态视频。它只是由一个文本提示生成,然后人们能够使用方向键和空格键控制这个3D环境。所以你在这里看到的一切,所有这些像素都是实时生成的。
Yeah. So all of these videos, all these interactive worlds that you're seeing, so you're seeing someone actually can control the video. It's not a static video. It's just being generated by a text prompt, and then people are able to control the three d environment using the arrow keys and the space bar. So everything you're seeing here is being fully all these pixels are being generated on the fly.
它们并不存在,直到玩家或与之交互的人去到世界的那个部分。所以所有这些丰富性。然后你马上会看到,这完全是生成的。这不是真实视频。这是一个生成器,有人在粉刷他们的房间,在墙上画一些东西,然后玩家会向右看,再回头看。
They don't exist until the player or the the person interacting with it goes to that part of the world. So all of this richness. And then you'll see in a second so this is fully generated. This is not real video. This is a generator of someone painting their room, and they're painting some stuff on the wall, and then the the player is gonna look to the right and then look back.
所以现在世界的这部分之前不存在,现在它存在了。然后他们回头看,看到他们刚刚留下的相同涂鸦痕迹。再次强调,这完全是,你能看到的每一个像素都是完全生成的。然后你可以输入诸如“穿着鸡装的人”或“喷气滑雪板”之类的东西,它会实时将它们包含在场景中。所以,我认为,你知道,这真的相当令人震撼。
So now this part of the world didn't exist before, so now it exists. And then they look back and they see the same painting marks they left just earlier. And again, this is fully every pixel you can see is fully generated. And then you can type things like person in a chicken suit or a jet ski, and it will just, in real time, include them in the scene. So And I think You know, it's quite mind blowing, really.
但我觉得看这个时难以理解的是,因为我们所有人都玩过有3D元素的视频游戏,当你身处沉浸式世界时。是的。但这里没有创建任何对象。没有渲染引擎。你没有使用Unity或Unreal这些3D渲染引擎。
But I think what's hard to grok when looking at this, because we've all played video games that have a three d element to them when you're in an immersive world. Yeah. But there's no objects that have been created. There's no rendering engine. You're not using Unity or Unreal, which are the three d rendering engines.
是的。这实际上只是两张二维图像,由AI实时渲染生成。
Yeah. This is actually just two d images Yeah. That are being rendered, like, created on the fly by the AI.
这个模型正在逆向工程直观物理学。它观看了数百万个关于世界的视频和YouTube视频等资料,仅凭这些就逆向推导出了世界的运作方式。虽然还不完美,但它能在无数不同世界中为用户生成持续一两分钟的一致性交互。后续还有一些视频,你可以控制海滩上的狗或水母——这不仅仅局限于人类事物。
This model is reverse engineering intuitive physics. So, you know, it's watched many millions of videos and YouTube videos and other things about the world. And just from that, it's kind of reverse engineered how a lot of the world works. It's not perfect yet, but it can generate a consistent minute or two of interaction as you as the user in many, many different worlds. Are some videos later on where you can control a dog on a beach or a jellyfish or that's not limited to just human things.
因为3D渲染引擎的工作原理是:程序员编写所有物理定律的程序,比如光线如何从物体反射。你创建一个3D物体,光线进行反射,最终我看到的视觉画面是由软件渲染的,因为它包含了所有编程逻辑。
Because the way a three d rendering engine works is you type in the programmer programs all the laws of physics. How does light reflect off of object? You create a three d object. Light reflects off. And then so what I see visually is rendered by the software because it's got all the programming on Yeah.
如何创建物理效果,如何实现物理模拟。但这个模型仅仅通过视频训练就自行领悟了这一切。
How to create physics, how to do physics. But this this was just trained off of video, figured it all out.
没错。它通过视频和游戏引擎的合成数据进行训练,完全靠逆向工程实现。这个项目让我倍感亲切,同时也十分震撼——因为在90年代我职业生涯早期,我曾编写电子游戏、游戏AI和图形引擎。我记得手动编程所有多边形和物理引擎有多困难,现在看到它能毫不费力地完成,实在令人惊叹。
Yeah. It was trained off of video and some synthetic data from from game engines, and it's just reverse engineered it. And for me, it's it's it's very close to my heart, this project, but it's also quite mind blowing because in the nineties, in my early career, I used to write video games and AI for video games and graphics engines. And I remember how hard it was to do this by hand, program all the polygons and the physics engines. And it's amazing to just see this do it effortlessly.
水面的所有反射效果、材质的流动方式以及物体的行为表现——它全部都能开箱即用地实现。
All of the reflections on the water and the way materials flow and objects behave. And it's just doing that all out of the box.
我觉得很难形容这个模型解决了多少复杂性。这真的非常非常令人震撼。这将引领我们走向何方?如果快进这个模型的发展
I think it's hard to describe like how much complexity was solved for with that model. It's it's it's really, really, really mind blowing. Where does this lead us? So fast forward this model
到第五代。我们构建这类模型的原因在于——我们始终认为,虽然像Gemini这样的常规语言模型在进步,但从Gemini诞生之初我们就希望它是多模态的。我们希望它能接收任何类型的输入(图像、音频、视频)并输出任何内容。
to Gen five. Yeah. So so the reason we're building these kind of models is we feel and we've always felt we're obviously progressing on the normal language models like with our Gemini model. But from the beginning with Gemini, we wanted it to be multimodal. So we wanted it to input and take any kind of input, images, audio, video and it can output anything.
因此我们对此非常关注,因为要构建真正通用的人工智能(AGI),AGI系统必须理解我们周围的现实物理世界,而不仅仅是语言或数学的抽象世界。这当然是机器人技术能工作的关键,也可能是当前所缺失的。还有像智能眼镜这样的设备,一个能在日常生活中协助你的智能眼镜助手,它必须理解你所处的物理环境以及世界的直观物理规律。
And and so we've been very interested in this because you for an AI to be truly general, to build AGI, we feel that the AGI system needs to understand the world around us and the physical world around us, not just the abstract world of languages or mathematics. And of course, that's what's critical for robotics to work. It's probably what's missing from it today. And also things like smart glasses, a smart glass assistant that helps you in your everyday life. It's got to understand the physical context that you're in and and how the world, the intuitive physics of the world works.
所以我们认为,构建这类模型(如Genie模型和我们的顶级文生视频模型Vio),正是我们在构建理解世界动态和物理规律的世界模型的具体体现。如果你能生成它,那就说明你的系统理解了这些动态规律。
So we think that building these types of models, these Genie models and also Vio, our the best text to video models, those are expressions of us building world models that understand the dynamics of the world, the physics of the world. If you can generate it, then that's an expression of your system understanding those dynamics.
而这最终会引领我们进入机器人技术的世界,其中一个方面、一个应用。但也许我们可以谈谈这个。目前视觉、语言、动作模型的最先进水平是什么?就是一个通用系统,一个盒子,一台机器,是的。它可以通过摄像头观察世界,然后我可以用语言。
And that leads to a world of robotics, ultimately, one aspect, one application. But maybe we can talk about that. What is the state of the art with the vision, language, action models today? So a generalized system, a box, a machine Yeah. That can observe the world with a camera and then I can use language.
我可以用文本或语音告诉它我想要你做什么,然后它就知道如何在物理世界中通过身体动作来执行某些任务
I can use text or speech to tell it I want you to do it and then it knows how to act physically to do something in the physical world
是的,没错。如果你看看我们的Gemini Live版本的Gemini,你可以举起手机对准周围的世界。我建议大家都试试。它已经对物理世界的理解有点神奇了。
for Yeah. That's right. So if you if you look at our Gemini Gemini Live version of of Gemini where you can hold up your phone to the world around you. I recommend any of you try it. It's kind of magical what it already understands about the physical world.
你可以把下一步想象成将其整合到某种更便捷的设备中,比如眼镜。然后它就会成为一个日常助手。当你在街上行走时,它能够向你推荐东西,或者我们可以将其嵌入Google Maps。而在机器人技术方面,我们构建了名为Gemini机器人模型的东西,这是一种用额外机器人数据微调过的Gemini。最酷的是,我们在夏天发布了一些演示,你可以看到我们在桌面上设置了两个机械手与物体互动的场景。
You can think of the next step as as incorporating that in some sort of more handy device like glasses. And then it will be an everyday assistant. It'll be able to recommend things to you as you're walking the streets, or we can embed it into Google Maps. And then with robotics, we've built something called Gemini robotics models, which are sort of fine tuned Gemini with extra robotics data. And what's really cool about that is, and we released some demos of this over the summer, was you can have know, we've got these tabletop setups of two hands interacting with objects on a table, two robotic hands.
你可以直接和机器人对话。比如你可以说,把黄色物体放进红色桶里之类的,它会将那个指令、那个语言指令解释为运动动作。这就是多模态模型的力量,而不仅仅是特定于机器人的模型。对吧。是它能够将现实世界的理解带入你与它的互动方式中。
And you can just talk to the robot. So you can say, put the yellow object into the red bucket or whatever it is, and it will interpret that instruction, that language instruction, in to motor movements. And that's the power of a multimodal model rather than just a robotic specific model. Right. Is that it will be able to bring in real world understanding to the way you interact with it.
所以最终,它将成为你需要的用户界面和体验,同时也是机器人安全导航世界所需的理解。
So in the end, it will be the UI, UX that you need for as well as the understanding the robotic the robots need to to navigate the world safely.
我问过Sundar这个问题。这是否意味着最终你可以构建相当于,称之为,要么是一个Unix,像一个操作系统层,或者像一个用于通用机器人技术的Android,到那时,如果它在足够多的设备上运行得足够好,机器人技术设备、公司和产品将会突然在世界上激增。是的。因为存在这种可以普遍做到这一点的软件。
I asked Sundar this. Does that mean that ultimately you could build what would be the equivalent of, call it, either a Unix, like an operating system layer, or like an Android for generalized robotics, at which point, if it works well enough across enough devices, there will be a proliferation of robotics devices and and companies and products that will suddenly take off in the world Yep. Because this software exists to do this generally.
正是如此。这肯定是我们正在追求的一种策略,是一种类似Android的策略,如果你愿意的话,一个交叉路口,作为一种机器人技术,几乎是一个操作系统层,跨机器人技术。但也有一些相当有趣的事情,关于将我们的最新模型与特定机器人类型和设计垂直整合,以及某种端到端的学习。所以两者实际上都很有趣,我们正在同时追求这两种策略。
Exactly. That's certainly one strategy we're pursuing is a is a kind of Android play, if you like, a crossroad as a kind of robotics, almost an OS layer, cross robotics. But there's also some quite interesting things about vertically integrating our latest models with specific robot types and robot designs and some kind of end to end learning of that too. So both are actually pretty interesting and we're pursuing both strategies.
你认为人形机器人是一种好的形态因素吗?这在世界上有意义吗?是的。因为有些人批评它对人来说很好,因为我们本来就要做很多不同的事情。但如果我们想解决一个问题,折叠衣物、洗碗或打扫房子等可能有不同的形态因素。
Do you think that there's humanoid robots as a good kind of form factor? Is that does that make sense in the world? Yeah. Because some folks have criticized it as being good for humans because we're meant to do lots of different things. But if we want to solve a problem, there may be a different form factor to fold laundry or do dishes or clean the house or whatever.
是的。我认为两者都会有其一席之地。实际上,我大约五到十年前曾持这种观点,认为我们会为特定任务设计形态特定的机器人。我认为在工业领域,工业机器人肯定会是这样,你可以为特定任务优化机器人,无论是在实验室还是生产线上。你会需要相当不同类型的机器人。
Yeah. I think there's going to be a place for both. Actually, I used to be of the opinion maybe five, ten years ago that we'll have form specific robots for certain tasks. I think in industry, industrial robots will definitely be like that, where you can optimize the robot for the specific task, whether it's a laboratory or a production line. You'd want quite different types of robots.
另一方面,对于通用或个人用途的机器人技术以及与日常世界的交互,人形形态可能相当重要,因为我们设计物理世界时本就是围绕人类需求。台阶、门道等所有为我们自身设计的事物,与其改变现实世界中的所有这些东西,不如设计出能与我们已构建世界无缝协作的形态。因此我认为有理由相信,人形形态对于这类任务可能至关重要。但我也认为专业机器人形态同样有其用武之地。
On the other hand, for general use or personal use robotics and just interacting with the ordinary world, the humanoid form factor could be pretty important because of course we've designed the physical world around us to be for humans. And so steps, doorways, all the things that we've designed for ourselves, rather than changing all of those in the real world, it might be easier to design the form factor to work seamlessly with the way we've already designed the world. So I think there's an argument to be made that the humanoid form factor could be very important for for those types of tasks. But I think there is a place also for specialized robotic forms.
你对未来五年、七年内的数量级有何看法?是数亿、数百万还是数千?我的意思是,你脑海中是否有一个具体的愿景?
Do you have a view on hundreds of millions, millions, thousands over the next five years, seven years? Mean, you have a like, in your head, do you have a vision on Yeah.
我确实有。我在这方面投入了大量时间。我认为机器人技术仍处于早期阶段。未来几年内,机器人领域将会出现真正令人惊叹的突破时刻。但现阶段算法仍需进一步发展。
I I do. And I I spend quite a lot of time on this. And I think we're we're we're still I I feel we're still a little bit early on robotics. I think in the next couple of years, there'll be a sort of real wow moment with robotics. But I think the algorithms need a bit more development.
这些机器人模型所基于的通用模型需要变得更优秀、更可靠,并更好地理解周围世界。我认为这将在未来几年内实现。硬件方面的关键在于,当我们与硬件专家交流时,需要明确何时具备适合规模化生产的硬件水平?因为当你开始建造工厂生产数万、数十万台特定类型机器人时,快速迭代更新机器人设计会变得更为困难。
The general purpose models that these robotics models are built on still need to be better and more reliable and better understanding the world around it. I think that will come in the next couple of years. And then also on the hardware side, the key is I think eventually we will have millions of robots society and increasing productivity. But the key there is when you talk to hardware experts is at what point do you have the right level of hardware to go for the scaling option? Because effectively when you start building factories around trying to make tens of thousands, hundreds of thousands of particular robot type, you know, it's harder for you to update, quickly iterate the the robot design.
这就好比一个时机选择问题:如果过早投入量产,六个月后可能就会出现更可靠、更优秀、更灵巧的新一代机器人。
So it's one of those kind of questions where if you call it too early, then then then the next generation of robot might be invented in six months' time that's just more reliable and better and and more dexterous.
听起来用计算机来类比的话,我们现在相当于处于七十年代的PC DOS阶段
Sounds like using a computing analogy, we're kind of in the seventies era PC DOS kind of
是的,有可能。但我觉得虽然所处阶段相似,不同的是现在十年内的进展可能压缩在一年内完成。对吧?
Yeah. Potentially. But of course, I think the the the the maybe that's where we are, but think the except that ten years happens in one year probably. Right. So
没错。所以也许四年就能实现相当于过去几十年的发展。
Right. So maybe four might be one of those years.
对,正是如此。
Right. Exactly.
那么我们来谈谈其他应用领域,特别是科学方面。作为诺贝尔奖得主科学家,您始终心系科研。我一直认为AI最伟大的价值在于解决人类凭借现有技术、能力和大脑无法攻克的难题,从而释放无限潜力。您对哪些科学领域和突破最为期待?
Yeah. So let's talk about other applications, particularly in in science. True to your heart as a scientist, as the Nobel Prize winning scientist, I always felt like the greatest things that we would be able to do with AI would be the problems that are intractable to humans with our current technology and capabilities and our brains and whatnot. And we can unlock all of this potential. What are the areas of science and breakthroughs in science that you're most excited about?
我们使用哪些类型的模型来实现这一目标?
And what kinds of models do we use to get there?
是的。我的意思是,用AI加速科学发现并助力人类健康等领域,正是我毕生致力于AI的原因。我认为这是AI能做的最重要的事情。如果以正确的方式构建AGI,它将成为科学的终极工具。我们在DeepMind已经展示了诸多成果,最著名的当属AlphaFold,但实际上我们的AI系统已应用于多个科学分支——无论是材料设计、协助控制等离子体和聚变反应堆、天气预报,还是解决数学奥林匹克竞赛难题。
Yeah. Mean, AI to accelerate scientific discovery and help with things like human health is the reason I spent my whole career on AI. And I think it's the most important thing we can do with AI. And I feel like if we build AGI in the right way, it will be the ultimate tool for science. And I think we've been showing at DeepMind a lot of the way of that, obviously AlphaFold most famously, but actually we've we've applied our AI systems to many branches of science, whether it's material design, helping with controlling plasma and fusion reactors, predicting the weather, solving mass Olympiad math problems.
同类型系统经过额外微调后,基本上能解决许多这类复杂问题。我认为我们只是触及了AI潜力的表面,目前仍存在一些缺失。如今的AI尚不具备真正的创造力——它还不能提出新的猜想或假设。或许能证明你给定的命题,但无法自主产生新思想或新理论。因此我认为这实际上将成为一项关键测试标准。
And the same types of systems with some extra fine tuning can basically solve a lot of these complex problems. So I think we're just scratching the surface of what AI will be able to do and there are some things that are missing. AI today, I would say, doesn't have true creativity in the sense that it can't come up with a new conjecture yet or a new hypothesis. It can maybe prove something that you give it, but it's not able to come up with a new idea or new theory itself. So I think that would be one of the tests actually What for is that?
像人类那样的创造力?是的。什么是创造力
Creativity as a human? Yeah. What is creativity
我认为这是那种直觉飞跃,历史上最伟大的科学家和艺术家常因此备受赞誉。可能通过类比或类比推理实现。心理学和神经科学有多种理论探讨人类科学家如何做到这一点。一个很好的测试是:给现代AI系统设置1901年的知识截止点,看它能否像爱因斯坦在1905年那样提出狭义相对论。如果它能做到,那我们就接近了真正重要的突破,或许意味着我们即将实现AGI。
Well, think it's this sort of intuitive leaps that we often celebrate with the best scientists in history and artists, of course. And maybe it's done through analogy or analogical reasoning. There are many theories in psychology and neuroscience as to how we as human scientists do it. But a good test for it would be something like give one of these modern AI systems a knowledge cutoff of nineteen o one and see if it can come up with special relativity like Einstein did in nineteen o five. If it's able to do that, then I think we're onto something really important where perhaps we're nearing an AGI.
另一个例子是我们的AlphaGo程序击败围棋世界冠军。它不仅十年前获胜,还发明了围棋前所未有的新策略——著名的第二局第37手至今仍被研究。但AI系统能否设计出如围棋般优雅、令人满意、具有美学价值的全新游戏?而不仅仅是新策略。
Another example would be with our AlphaGo program that beat the world champion at Go. Not only did it win in, you know, back ten years ago, it invented new strategies that had never been seen before for the game of Go. This famously Move 37 in game two that is now studied. But can an AI system come up with a game as elegant, as satisfying, as aesthetically beautiful as Go? Not just a new strategy.
目前这些问题的答案是否定的。因此我认为真正通用系统(AGI系统)缺失的一点,是它应该也能完成这类创造性工作。
And the answer to those things at the moment is no. So that's one of the things I think that's missing from a true general system, an AGI system, is it should be able to do those kinds of things as well.
能否分析缺失的具体要素,并联系达里奥、萨姆等人关于AGI几年内将实现的观点?
Can you break down what's missing and maybe relate it to the point of view shared by Dario, Sam, others about AGIs a few years away?
是的。
Yeah.
您不认同这种观点吗?能否帮助我们理解,在您对系统架构的理解中,具体缺乏哪些关键结构要素?
Do you not subscribe to that belief? And maybe help us understand what is it in your understanding of structure in your understanding of the system architecture, what what's lacking?
那么,我认为根本问题在于我们能否模仿这些直觉飞跃,而不是渐进式进步——最优秀的人类科学家似乎能做到这一点。我常说,伟大科学家与优秀科学家的区别在于,当然,两者技术能力都很强。但伟大科学家更具创造力。因此,他们可能会从另一个学科领域发现某种模式,能够与他们试图解决的领域进行类比或某种模式匹配。我认为有一天人工智能将能做到这一点,但它目前还不具备进行那种突破所需的推理能力和某些思维能力。
Well, so I think the fundamental aspect of this is can we mimic these intuitive leaps rather than incremental advances that the best human scientists seem to be able to do? I always say like what separates a great scientist from a good scientist is they're both technically very capable, of course. But the great scientist is more creative. And so maybe they'll spot some pattern from another subject area that can have an analogy or some sort of pattern matching to the area they're trying to solve. I And think one day AI will be able to do this, but it doesn't have the reasoning capabilities and some of the thinking capabilities that are going be needed to make that kind of breakthrough.
我还认为我们缺乏一致性。你经常听到一些竞争对手谈论我们今天的现代系统是博士级智能。我认为这是无稽之谈。它们不是博士级智能。它们确实具备一些博士水平的能力。
I also think that we're lacking consistency. So you often hear some of our competitors talk about these modern systems that we have today are PhD intelligences. I think that's a nonsense. They're not PhD intelligences. They have some capabilities that are PhD level.
但它们并不具备普遍能力,而这正是通用智能应有的特质——在各个领域都能达到博士水平的表现。事实上,正如我们与当今聊天机器人互动时所知,如果你以某种方式提问,它们甚至会在高中数学和简单计数上犯低级错误。这对于真正的AGI系统来说是不应该发生的。因此我认为,我们可能还需要五到十年时间才能开发出具备这些能力的AGI系统。另一个缺失的能力是持续学习,即在线教授系统新知识或以某种方式调整其行为的能力。
But they're not in general capable, and that's exactly what general intelligence should be, of performing across the board at the PhD level. In fact, as we all know interacting with today's chatbots, if you pose the question in a certain way, they can make simple mistakes with even high school maths and simple counting. So that shouldn't be possible for a true AGI system. So I think that we are maybe, I would say, five to ten years away from having an AGI system that's capable of doing those things. Another thing that's missing is continual learning, this ability to online teach the system something new or adjust its behavior in some way.
所以我认为,许多这些核心能力仍然缺失。也许扩展规模会让我们达到目标,但如果要我打赌,我认为可能还需要一两个突破性进展,这些突破将在未来五年左右实现。
So a lot of these, I think, core capabilities are still missing. And maybe scaling will get us there, but I feel, if I was to bet, I think there are probably one or two missing breakthroughs that are still required and will come over the next five or so years.
与此同时,一些报告和评分系统似乎显示了两个现象。第一,也许——如果我说错了请纠正——大型语言模型的性能正在趋同。第二,可能是每一代性能改进的速度正在放缓或趋于平稳。这两个说法大体上是正确的吗?还是不太准确?
In the meantime, some of the reports and the scoring systems that are used seem to be demonstrating two things. One perhaps and tell me if we're wrong on this a convergence of performance of large language models. And number two, perhaps, is a slowing down or a flatlining of improvements in performance on each generation. Are those two statements generally true or not so much?
不。我的意思是,我们在内部并没有看到这种情况,我们仍然看到巨大的进步速度。而且,我们是从更广泛的角度来看待事物的。你看到我们的Genie模型和VO模型以及
No. I mean, we're not seeing that internally, we're still seeing a huge rate of progress. But also, we're sort of looking at things more broadly. You see with our genie models and VO models and
Nano Banana太疯狂了。简直不可思议。
Nano Banana is insane. It's bananas.
是的。确实很疯狂。真是
Yes. It's bananas. Was well Can can
有人用过吗?有人用过Nano Banana吗?太不可思议了。对吧?我是说,没错。
I see who's used it? Has anyone used Nano Banana? It's incredible. Right? I mean Yeah.
我是个书呆子,小时候就用Adobe Photoshop和Kai's Power Tools,我还跟你说过Bryce 3D。是的。所以看到图形系统以及识别其中发生的事情,真是令人震撼。
I'm a I'm a nerd who used to use Adobe Photoshop as a kid and Kai's power tools and I was telling you Bryce three d. Yes. So like the graphic systems and like recognizing what's going on there was just Yeah. Mind blowing.
嗯,我认为很多创意工具的未来就是你会与它们产生共鸣或直接对话。它的连贯性足够强,就像Nana Banana,它令人惊叹的地方在于这是一个图像生成器。它是最顶尖的,你知道,是行业领先的。但让它如此出色的原因之一就是其一致性。它能够根据指令理解你想要修改的部分,同时保持其他一切不变。
Well, think that's the future of a lot of these creative tools is you're just gonna sort of vibe with it or just talk to them. And it'll be consistent enough where like with Nana Banana, what's amazing about it is that it's an image generator. It's best in best, you know, it's state of the art and best in class. But it's one of the things that makes it so great is that it's consistency. It's able to under instruction follow what you want changed and keep everything else the same.
因此你可以与它迭代,最终得到你想要的输出结果。我认为,这就是许多创意工具未来的发展方向,也指明了趋势。人们喜欢它,也喜欢用它进行创作。
And so you can iterate with it and eventually get the kind of output that you want. And that's, I think, what the future of a lot of these creative tools is going to be and signals the direction. And people love it and they love creating with it.
所以我认为创意的民主化真的很重要。我记得小时候不得不买Adobe Photoshop的书籍,然后阅读学习如何操作。对吧。从图片中移除某些内容,如何填充,羽化等等所有这些。现在,任何人都可以用Nano Banana做到,他们只需向软件解释他们想要什么,它就会执行。
So democratization of creativity, I think, is really I remember having to buy books on Adobe Photoshop as a kid, and then you'd read them to learn how to Right. Remove something from an from an image, and how to fill it in, and feather, and all this stuff. Now, anyone can do it with Nano Banana, and just they can explain to the software what they want it to do and it just does it.
是的。我认为你会看到两件事:一是这些工具的民主化,让每个人都能使用和创作,而无需学习极其复杂的用户体验和界面,就像我们过去必须做的那样。但另一方面,我认为我们也在与电影制作人、顶级创作者和艺术家合作。所以他们帮助我们设计这些新工具应该是什么样子,他们想要什么功能。比如我的朋友,了不起的导演达伦·阿罗诺夫斯基。
Yeah. I think you're gonna see two things, which is the the sort of democratization of these tools for everybody to just use and and create with without having to learn, you know, incredibly complex UXs and UIs like like we had to do in the past. But on the other hand, I think we'll and we're also collaborating with filmmakers and top creators and artists. So they're helping us design what these new tools should be, what features would they want. People like the director, Darren Aronofsky, who's a good friend of mine, an amazing director.
他和他的团队一直在使用VIO和我们的一些其他工具制作电影。通过观察他们并与他们合作,我们学到了很多。我们发现,它也能极大地增强和加速顶级专业人士的能力。因为最好的创意者,那些专业创意人士,突然能够提高10倍、100倍的生产力。他们可以以极低的成本尝试他们心中的各种想法,然后得到他们想要的美丽成果。
And and he's been making and his team is making films using VIO and some of our other tools. And we're learning a lot by observing them and and collaborating them. And what we find is that it's it also superpowers and turbocharges the best professionals too. Because they're suddenly the best creatives, the professional creatives, they're suddenly able to be 10 x, 100 x more productive. They can just try out all sorts of ideas they have in mind, you know, very low cost, and then get to the beautiful thing that they wanted.
所以我实际上认为这两方面都是真的。我们正在为日常使用、YouTube创作者等民主化这些工具。但另一方面,在高端领域,那些理解这些工具的人——并不是每个人都能从这些工具中获得相同的输出,这其中也有技巧,以及顶级创意者的视野、讲故事和叙事风格。我认为这让他们真的很享受使用这些工具,并能够以更快的速度迭代。
So I actually think it's sort of both things are true. We're democratizing it for everyday use, for YouTube creators and so on. But on the other hand, at the high end, the people who understand these tools and it's not everyone can get the same output out of these tools, there's a skill in that, as well as the vision and the storytelling and the narrative style of the top creatives. I think it just allows them they really enjoy using these tools. It allows them to iterate way faster.
我们会进入一个每个人都能描述他们感兴趣的内容类型的世界吗?比如‘播放像戴夫·马修斯那样的音乐’,然后它就会播放一些新曲目。
Do we get to a world where each individual describes what sort of content they're interested in? Play me music like Dave Matthews and it'll play some new track.
是的。
Yes.
或者我想玩一个设定在电影《勇敢的心》中的视频游戏,我想是的,沉浸其中体验。我们会达到那个境界吗?还是社会中仍然保持一对多的创作过程?从文化角度来说,这有多重要?我知道这有点哲学,但我觉得很有趣,那就是:我们是否还会拥有那种因为我们共享某人创作的一个故事而存在的讲故事方式?
Or I wanna play a video game set, you know, in the movie Braveheart I wanna Yes. Be in that And I just have that experience. Do we end up there or do we still have a one to many creative process in society? How important culturally? And I know this is a little bit philosophical but it's interesting to me, which is, are we still going to have storytelling where we have one story that we all share because someone made it?
是的。或者我们每个人都将开始发展并打造属于自己的那种虚拟
Yeah. Or are we each going to start to develop and pull our own kind of virtual
我实际上预见到一个世界,我经常思考这个问题,因为我在九十年代作为游戏设计师和程序员从游戏行业起步。我认为,我们现在看到的是娱乐未来的开端,可能是一种新的流派或新的艺术形式,其中存在一定程度的共同创作。我仍然认为会有顶级的创意远见者,他们将创造这些引人入胜的体验和动态故事情节,即使使用相同的工具,他们的作品质量也会高于普通人所能达到的水平。因此,数百万人可能会沉浸在这些世界中,但也许他们也能共同创作这些世界的某些部分,或许可以说,主要的创意人员几乎就像是那个世界的编辑。
I I actually foresee a world and I think a lot about this having started in the games industry as a game designer and programmer is the in the nineties is that, you know, I think the future of entertain this is what we're seeing is the beginning of the future of entertainment. Maybe some new genre or new art form and where there's a bit of co creation. I still think that you'll have the top creative visionaries. They will be creating these compelling experiences and dynamic storylines and they'll be of higher quality even if they're using the same tools than the everyday person can do. But also and so millions of people will potentially dive into those worlds, but maybe they'll also be able to create co create certain parts of those worlds and perhaps that, you know, the the the main creative person is almost an editor of that world.
是的。这就是我在未来几年预见的情况,而且我实际上很想通过像Genie这样的技术来亲自探索。
Right. So that's the kind of things I'm foreseeing in the next few years, and I'd actually like to explore ourselves with with with technologies like Genie.
是的,太不可思议了。那你的时间是怎么安排的?你在Isomorphic吗?也许你可以描述一下Isomorphic。是的。
Right. Incredible. And how are you spending your time? Are you at ice maybe you can describe isomorphic. Yes.
当然。
Of course.
Isomorphic是什么?你花了很多时间在那里吗?
What isomorphic is? Are you spending a lot of your time there?
是的。所以我还运营着Isomorphic,这是我们分拆出来的公司,旨在彻底改变药物发现,它建立在我们蛋白质折叠方面的AlphaFold突破之上。当然,了解蛋白质的结构只是药物发现过程的一个步骤。因此,你可以把Isomorphic看作是构建许多相邻的AlphaFold,以帮助设计那些没有副作用但能结合到蛋白质正确位置的化合物。我认为,在未来十年内,我们可以将药物发现从耗时数年、有时甚至十年的过程缩短到可能只需几周甚至几天。
I am. So so I also run isomorphic, which is our spin out company to revolutionize drug discovery, building on our alpha fold breakthrough in in protein folding. And of course knowing the structure of a protein is only one step in the drug discovery process. So isomorphically you can think of it as building many adjacent alpha folds to help with things like designing chemical compounds that don't have any side effects but bind to the right place on the protein. And I think we could reduce down drug discovery from taking years, sometimes a decade to do, down to maybe weeks or even days over the next ten years.
这太不可思议了。你认为这很快就能进入临床阶段,还是仍处于发现阶段?
That's incredible. Do you think that's in clinic soon or is that still in the discovery phase?
我们目前正在构建这个平台。并且与礼来公司建立了很好的合作伙伴关系,我想你之前请他们的CEO讲过话。是的。还有诺华公司,这些合作都非常棒,我们还有自己的内部药物项目。
We're building up the platform right now. And it's have great partnerships with Eli Lilly. I think you had the CEO speaking earlier. Yeah. Novartis, which are fantastic, and our own internal drug programs.
我认为我们将在明年某个时候进入临床前阶段。
And I think we'll be entering sort of preclinical phase sometime next year.
所以候选药物会交给制药公司,然后由他们继续推进?
So candidates get handed over to the pharma company, and they then take them forward?
是的
That's
没错。我们正在研究癌症、免疫学和肿瘤学,并且与MD Anderson等机构合作。
right. And we're working on cancers and immunology and oncology, and we're working with places like MD Anderson.
这需要多大程度...我想回到你刚才关于AGI的观点,这与您刚才说的相关。模型可以是概率性的或确定性的。告诉我我是否过于简化了这一点:模型接收输入,然后输出非常具体的内容。就像它有一个逻辑算法,每次输出相同的结果。而它也可以是概率性的,能够改变事物并做出选择。
How much of this requires and I just want to go back to your point about AGI as it relates to what you just said. Models can be probabilistic or deterministic. And tell me if I'm reducing this down too simplistically that the model takes an input and it outputs something very specific. Like, it's got a logical algorithm and it outputs the same thing every time. And it could be probabilistic where it can change things and make selections.
比如有80%的概率选择这个字母,90%的概率选择下一个字母,等等。我们需要开发多少确定性模型来与(例如)分子相互作用背后的物理或化学原理同步,以进行药物发现建模?你们在构建多少新颖的确定性模型,与基于数据训练的概率性模型协同工作?是的,这是一个
The probability is 80% I'll select this letter, 90% I'll select this letter next, etcetera. How much do we have to kind of develop deterministic models that sync up with, for example, the physics or the chemistry underlying the molecular interactions as you do your drug discovery modeling? How much are you building novel deterministic models that work with the models that are probabilistic trained on data? Yeah, it's a
很好的问题。实际上,目前以及我认为未来五年左右,我们正在构建的或许可以称为混合模型。AlphaFold本身就是一个混合模型,它包含学习组件,也就是您提到的概率性组件,基于神经网络和Transformer等技术。这部分从您提供的任何可用数据中学习。但在许多生物学和化学案例中,并没有足够的数据可供学习。
great question. Actually, for the moment, and I think probably for the next five years or so, we're building what maybe you could call hybrid models. So AlphaFold itself is a hybrid model where you have the learning component, this probabilistic component you're talking about, which is based on neural networks and transformers and things. And that's learning from the data you give it, any data you have available. But also, in a lot of cases with biology and chemistry, there isn't enough data to learn from.
因此,您还必须融入一些已知的化学和物理规则。例如,在AlphaFold中,原子间键角的规定。并确保AlphaFold理解原子不能相互重叠等原则。理论上,它可以学习这些,但这会浪费大量学习能力。所以实际上,最好将其作为一种约束。
So you also have to build in some of the rules about chemistry and physics that you already know about. So for example, with AlphaFold, the angle of bonds between atoms. And make sure that the AlphaFold understood you couldn't have atoms overlapping with each other and things like that. Now in theory, it could learn that, but it would waste a lot of the learning capacity. So actually, it's better to kind of have that Constraint.
是的,作为一种约束融入其中。现在的关键在于,所有混合系统(AlphaGo是另一个混合系统)都需要巧妙结合:神经网络学习围棋游戏及何种模式有利,而顶层的蒙特卡洛树搜索则负责规划。因此,难点在于如何将学习系统与更多手工定制系统有机结合,并让它们良好协作?这确实相当棘手。
As a yeah, as a constraint in there. Now the trick is, with all hybrid systems, and AlphaGo was another hybrid system, whereas a neural network learning about the game of Go and what kind of patterns are good. And then we had Monte Carlo TreeSearch on top, was doing the planning. And so the trick is, how do you marry up a learning system with more handcrafted system, bespoke system, and actually have them work well together? And that's that's pretty tricky to do.
您认为这种架构最终会带来AGI所需的突破吗?是否存在需要解决的确定性组件?
Does that sort of architecture ultimately lead to the breakthroughs needed for AGI, do you think? Are there deterministic components that need to be solved?
最终,当您通过混合系统发现某些规律时,目标是将这些知识上游整合到学习组件中。因此,如果能够进行端到端学习,直接从给定数据预测目标结果,总是更好的。所以,一旦利用混合系统有所发现,您会尝试回溯并逆向工程已实现的内容,看看能否将所学信息融入学习系统。这正是我们在AlphaZero(AlphaGo的更通用版本)中所做的。AlphaGo包含了一些围棋特定知识。
Ultimately, what you want to do is when you figure out something where this one of these hybrid systems, what you what what you ultimately want to do is upstream it into the learning component. So it's always better if you can do end to end learning and and and directly predict the thing that you're after from the data that you you're you're given. So So once you've figured out something using one of these hybrid systems, you then try and go back and reverse engineer what you've done and see if you can incorporate that learning, information into the learning system. And this is sort of what we did with AlphaZero, the more general form of AlphaGo. So AlphaGo had some Go specific knowledge in it.
但在AlphaZero中,我们去除了这些,包括学习的人类数据和人类棋局,完全从零开始进行自我学习。当然,随后它就能够学习任何游戏,而不仅仅是围棋。
But then with AlphaZero, we got rid of that, including the human data, human games that we learned from, and actually just did self learning from scratch. And of course, then it was able to learn any game, not just Go.
关于AI带来的能源需求,已经有很多炒作和喧嚣。这是我们几周前在华盛顿特区举办的AI峰会的重要议题。这似乎是当今科技界每个人都在谈论的头号话题。所有这些电力将从何而来?但我想问你们,模型的架构、硬件或模型与硬件之间的关系是否有变化,能够降低每个输出token的能源消耗或成本,最终或许会抑制我们面前的能源需求曲线?
A lot of hype and hoopla has been made about the demand for energy arising from AI. This is a big part of the AI summit we held in Washington DC a few weeks ago. It seems to be the number one topic everyone talks about in tech nowadays. Where's all this power going to come from? But I ask the question of you, are there changes in the architecture of the models or the hardware or the relationship between the models and the hardware that brings down the energy per token of output or the cost per token of output that ultimately maybe, say, mutes the energy demand curve that's in front of us?
或者你们不认为情况如此,我们仍然会面临一个近乎几何级增长的能源需求曲线?
Or do you not think that that's the case and we're still going to have a pretty kind of geometric energy demand curve?
嗯,有趣的是,我认为两种情况都成立。特别是在谷歌和DeepMind,我们非常注重开发高效且强大的模型。因为我们有自己的内部用例,比如需要每天为数十亿用户提供AI概览服务。这必须极其高效、极低延迟且服务成本非常低廉。因此我们开创了许多技术来实现这一点,比如蒸馏技术,即用内部的大型模型训练小型模型,让小型模型模仿大型模型的行为。
Well, look, interestingly, again, think both cases are true in the sense that, especially us at Google and at DeepMind, we focus a lot on very efficient models that are powerful. Because we have our own internal use cases, of course, where we need to serve, say, AI overviews to billions of users every day. And it has to be extremely efficient, extremely low latency, and very cheap to serve. And so we've pioneered many techniques that allow us to do that, like distillation where you have a bigger model internally that trains the smaller model. So you train the smaller model to mimic the bigger model.
如果你回顾过去两年的进展,模型效率在相同性能下提升了10倍甚至100倍。但需求没有减少的原因是我们尚未实现通用人工智能(AGI)。因此前沿模型仍在不断以更大规模训练和实验新想法,同时服务端变得越来越高效。这两方面都在同步发展。最终,从能源角度来看,我认为AI系统在电网系统效率、电气系统、材料设计、新型特性及新能源等方面对能源和气候变化的贡献,将远超过其消耗。
And over time, if you look at the progress of the last two years, the model efficiencies are like 10x, even 100x better for the same performance. Now the reason that that isn't reducing demand is because we've still not got to AGI yet. So also the frontier models, you keep wanting to train and experiment with new ideas at larger and larger scale, whilst at the same time, at the serving side, things are getting more and more efficient. So both things are true. And in the end, I think that from the energy perspective, I think AI systems will give back a lot more to energy and climate change and these kind of things than they take in terms of efficiency of of of grid systems and electrical systems, material design, new types of properties, new energy sources.
我认为AI在未来十年内对这些领域的帮助,将远远超过它今日所消耗的能源。
I think AI will help with all of that over the next ten years that will far outweigh the energy that it uses today.
作为最后一个问题,请描述十年后的世界。
As the last question, describe the world ten years from now.
哇。好吧。我的意思是,你知道,在AI领域,十年甚至十周都像是一辈子。所以这个布朗运动场
Wow. Okay. Well, I mean, you know, ten years, even even ten weeks is a is a lifetime in AI. So The Brownian field
十年的。对。
of ten years Right. For
但我确实觉得,如果我们在未来十年内实现通用人工智能(AGI),完整的AGI,我认为这将开启一个科学的新黄金时代,一种新的文艺复兴。我们将看到从能源到人类健康等各个领域受益。
But I do feel like if we will have AGI in the next ten years, you know, full AGI, and I think that will usher in a new golden era of science. So a kind of new renaissance. And I think we'll see the benefits of that right across from energy to human health.
太棒了。请大家和我一起感谢诺贝尔奖得主丹尼斯。
Amazing. Please join me in thanking Nobel laureate Dennis.
谢谢。太棒了。干得漂亮。谢谢。
Thank you. Fantastic. That was a great job. Thank you.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。