DeepMind的Pushmeet Kohli谈人工智能的科学革命

本集简介

Pushmeet Kohli 领导 DeepMind 的科学人工智能团队，其团队开发的 AlphaEvolve 系统能够发现全新算法并证明困扰研究者数十年的数学成果。从改进已有50年历史的矩阵乘法算法，到为数据中心调度等复杂问题生成可解释代码，AlphaEvolve 展现了大型语言模型与进化搜索相结合超越人类专家的新范式。Pushmeet 将解析这些突破背后的技术架构，分享与陶哲轩等数学家合作的经验，并探讨人工智能如何从芯片设计到材料科学加速各领域的科学发现。主持人：Sonya Huang 与 Pat Grady，红杉资本

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

我参加了一个生物学会议。在我做完报告后，一位生物学家走过来对我说：'Pushmit，我研究这种蛋白质已经十年了。我收集了大量实验数据来表征这种蛋白质，试图解析其结构，但不知为何，这个结构就像躲过了所有研究手段，我们至今仍未能确定它的结构。结构啊。但我们掌握了所有这些数据。'

So I went to a biology conference. And, after I gave my talk, a biologist approached me, and he said, Pushmit, I, have been working on this protein for the last ten years. And I had collected so much lab data to characterize this protein, to figure out its structure, but somehow this had this has eluded like, this has evaded all kind of investigation, and we still didn't know the structure. Structure. But we had all this data.

Speaker 0

如果我们知道结构，就能很快验证它。我运行了AlphaFold 2，它给出了结构预测。结果完全吻合。我为此已经研究了十年。

If we knew the structure, we could sort of validate it very quickly. I ran AlphaFold two. It gave me the structure. It perfectly fit the answer. I've been working on this for ten years.

Speaker 1

哇。

Wow.

Speaker 0

接下来我该怎么办？

What do I do next?

Speaker 2

当AI不再只是回答问题，而是开始提问时会发生什么？本期节目中，Pushmeet Kali将讨论DeepMind的AlphaEvolve——这个突破性的进化AI系统能发现全新算法。Pushmeet揭示了语言模型与评估器结合如何创造出前所未有的AI：它能解决数十年悬而未决的数学问题，生成可读性高且性能超越专家设计的代码。Pushmeet分享了AI揭示隐藏数学真理的惊人案例，并解释为何我们正在见证新科研范式的诞生——AI不仅加速发现，更将彻底改变人类可尝试解决的问题范畴。请享受本期节目。

What happens when AI stops just answering questions and starts asking them? In this episode, Pushmeet Kali discusses DeepMind's AlphaEvolve, a breakthrough evolutionary AI system that discovers entirely new algorithms. Pushmeet reveals how coupling language models with evaluators create something unprecedented, AI that can tackle decades old math problems and generate human interpretable code that outperforms expert design solutions. Pushmeet shares stunning examples of AI uncovering hidden mathematical truths and explains why we're witnessing the emergence of a new scientific method, one where AI doesn't just accelerate discovery but transforms which problems we can even attempt to solve. Enjoy the show.

Speaker 2

Pushmeet，非常感谢你今天参加节目。我们都在热切期待AI能够做出全新科学发现的时刻。你认为AlphaEvolve就是那个转折点吗？

Pooshmeet, thank you so much for joining us today. We've all been eagerly waiting for the moment that AI is capable of making novel scientific discoveries. Do you think that AlphaEvolve is that watershed moment?

Speaker 0

是的，这无疑是个关键里程碑。我们证明当大型语言模型与评估框架结合时，AI能够发现新算法。不仅如此，它实际上还能发现那些被研究多年的数学领域的新结果。

Yeah. So it's certainly a key sort of milestone. What we have shown is that you have AI sort of model, a large language model, when coupled with a harness is able to discover new algorithms. And, and not only that, it's basically able to view new mathematical results, which have been, studied for many, many years.

Speaker 1

你提到了'与装备结合'的说法，能详细讲讲这个装备吗？

You used the words when coupled with a harness. Can you tell us more about that harness?

Speaker 0

是的。如果回溯AI模型的发展，AI在科学领域的应用历史其实非常悠久。我们有许多尝试进行科学发现的模型，比如这个领域的关键模型AlphaFold，它堪称AI在科学领域成就的典范。我们在2021年发布了AlphaFold2，去年它获得了诺贝尔奖。

Yeah. So if you now go back to AI models, like, the history of AI for science is is very long. We have a number of different models that have tried to do scientific discovery. Like, one of the key models in this category is AlphaFold, right, which which is the prototypical example of what can be achieved in by AI in science. We released AlphaFold two in at the 2021, and it won the Nobel Prize last year.

Speaker 0

因此AI对科学的影响已得到充分验证。现在的问题是：大语言模型和基础模型如何影响科学？约两年前，我们开发了名为FunSearch的智能体，将大语言模型与评估器结合。这个评估器能帮助模型区分它提出的新猜想或解决方案——究竟是幻觉还是真知灼见。在这个案例中，幻觉反而成了优势，因为其中部分幻觉确实是前人未曾想到的突破性见解。

So the impact of AI in science is very well understood. Now the question is whether LLMs and foundational models, how can they impact science? In around two years back, we had a agent called FunSearch in which we took an LLM and we coupled it with an evaluator. And the evaluator allowed the LLM to figure out when it was making new conjectures on sort of or coming up with new ideas to solve problems, whether they were hallucinations or whether they were brilliant insights. So, essentially, in this particular case, hallucinations were great because those some of those hallucinations were, in fact, brilliant new insights that nobody had thought about.

Speaker 0

这就是装备的作用——通过评估函数和搜索协议与大语言模型配合，最终产生具有重大影响力的全新发现。

So which this is where the harness comes in, that you have this evaluate evaluation function and essentially a search protocol associated with the LLM that together is able to come up with completely new discoveries that are really impactful.

Speaker 2

你提到FunSearch，能否简要说明FunSearch与AlphaEvolve在成果上的区别？

You mentioned FunSearch. Could you say a word on the difference between the results you all accomplished with FunSearch versus AlphaEvolve?

Speaker 0

FunSearch是我们首次尝试用大型聚合模型探索新算法发现的实例。当时的模型能力较弱，我们的搜索方式也相对初级。

Yeah. So FunSearch was our first instantiation of taking a large aggregate model and trying to see if it can discover new algorithms. The models at that time were were weaker. Right? And and the type of search that we were trying to do, we we were we had not sort of explored things much further.

Speaker 0

我们当时只要求大语言模型完成小型函数优化，但它意外发现了数学家长期研究的全新算法。但局限在于需要研究人员提供算法模板。而AlphaEvolve突破了这种限制，它不再局限于搜索几行代码。

So what we asked the LLM to do was essentially try to complete a small function and see if it can do that much better. And surprisingly, it was able to discover completely new algorithms that mathematicians had been trying to study for a long time. But the limitation was that the mathematician or the or the researcher had to give a template on in which the algorithm should be should be found. With alpha evolve, we have removed restriction. Alpha evolve is not just sort of searching for a few lines.

Speaker 0

本质上是在审视整个算法本身。对吧？处理非常庞大的代码块，并经过长时间优化。其次，我们的初始模型FunSearch通过大量函数评估来获得新发现。而AlphaEvolve只需少量函数调用，通过筛选更少的提案，就能更快发现新算法。

It's basically looking at whole sort of whole algorithms themselves. Right? Very, very large pieces of code and sort of optimizing them over a long period of time. And secondly, FunSearch, our original sort of model, used a lot of function evaluations to discover to make these new discoveries. AlphaEvolve can work with many fewer function calls, and it can basically, by looking at fewer proposals, it can discover new algorithms much more quickly.

Speaker 2

能否谈谈Gemini模型的演进对AlphaEvolve能力的影响？我在博客中看到你们同时使用了Gemini Flash和Pro版本，它们各自负责什么？

Can you tell us about the role that the evolving Gemini models play in the capabilities of AlphaEvolve? And I think I saw in your blog posts, you have both Gemini Flash and Pro involved in in the harness. What is each responsible for?

Speaker 0

是的。随着Gemini各代版本的改进，它对代码的理解能力显著提升。当提案生成器能更高效理解代码时，生成的提案不仅语法正确，还能在语义上解决问题。作为基础模型，Gemini的编码能力提升了我们在复杂数学和计算问题中寻找解决方案的采样效率。

Yeah. So I think see, we have been evaluating or as Gemini gets Gemini improves with the various sort of generations, it is becoming much, much better at its understanding of code. Now if you have a proposal generator which can understand code much more effectively, then it gives it generates proposals which are not only syntactically correct, they are also semantically trying to solve the task. And then you are sampling what are the different ways in which the task can be solved. So as the baseline model, Gemini's abilities to perform coding improve our sample effectiveness in searching for the right solution on these very hard maths and computational problem becomes much better.

Speaker 0

要在广阔空间进行搜索，有两个关键要素：一是生成提案的速度，二是评估提案的速度。首先，你能多快提出新候选算法？其次，能多快评估该算法的优劣？

So so if you want to search in a large space, there are two elements. One is the speed of how you can generate these proposals, and then the speed at which you can evaluate those proposals. Right? So first, how quickly can you say, can you give me a new sort of candidate algorithm? And then secondly, how quickly can you evaluate whether the algorithm is any good or not?

Speaker 0

这两点都至关重要。而拥有Gemini变体（如高效快速的Gemini Flash）正是关键所在。

And both things are really important. And the fact that you have these variants of Gemini, like Gemini Flash, which can do that very efficiently and very quickly, this is really important.

Speaker 1

我知道AlphaVolve相比前代是更广泛的领域模型。它的适用范围有多广？包含哪些领域？排除哪些？

I know AlphaVolve is more of a broad domain model than some of its predecessors. How how broad is it? What's in scope? What's out of scope?

Speaker 0

AlphaEvolve不仅扩展了可搜索内容的规模——能发现全新算法，还具备跨语言思考算法的通用性。它既能在C++中搜索，也能在Python中实现。

Yeah. So AlphaEvolve, essentially allows you to search not only in terms not only in terms of the size of what you can search over. Right? You can now discover whole new algorithms, but it also is extremely general in its ability of thinking about algorithms in various different languages. So not only can it sort of search in c plus plus but it can also do it in Python.

Speaker 0

它也能在Verilog中实现部分功能，Verilog本就是用于描述芯片设计的语言。所以AlphaEvolve的通用性体现在它能搜索这些庞大的算法空间，同时处理不同的语法和语义表示。它不像Python那样局限于特定语言，而是能跨多种语言和任务类型进行搜索。唯一要求是必须有个函数评估器，能快速评估任何提案并判断其优劣。

It can also sort of do in Verilog, which is what the language is for define describing chips, right, in chip design. So so the generality of of AlphaEvolve is in its ability to search for these large algorithmic spaces, but also in different syntactic and semantic representations. Right? It's not restricted to a particular language like Python, but it can sort of do that search across many different types of languages and many different types of tasks. The only expectation it has is is that you have a function evaluator, that the you can quickly evaluate whatever proposal there is and say how good it is.

Speaker 2

这似乎类似于一种粗略的认知架构：生成大量算法候选方案，进行评估，然后通过进化机制决定保留哪些方案并继续推进。这个过程与科学方法大致吻合。这是有意为之的吗？

It seems like the rough kind of cognitive architecture, so to speak, of generating a bunch of algorithm candidates, evaluating them, and then, I guess, evolutionarily deciding which ones to keep and then going going forward from there. And this it it seems like it roughly mirrors the scientific method. Is that is that intentional?

Speaker 0

是的。其实今年早些时候我们还发布了另一个智能体叫'科研家'。在科研家中，Gemini扮演了整个科研学术流程的角色：既是假设生成器，又是批评者，还会对各类创意进行评审排序和编辑。本质上是通过多智能体协作让Gemini承担不同角色。

Yeah. So I I think, like, there is there are also like, if you think about it, there is another sort of agent that we released earlier this year, which was called coscientist. And in coscientist, essentially, what you had was Gemini playing the role of the whole scientific academic process. So Gemini playing the role of a hypothesis generator, Gemini playing the role of the critic, some Gemini playing the role of sort of ranking different sort of of of reviewing those ideas and ranking those ideas and then editing those ideas. So it was Gemini playing playing all these roles in a multi agent setup.

Speaker 0

这些其实都是通过不同提示词调教的Gemini模型来扮演不同角色。有趣的是，这种多智能体系统产生的行为远超单个Gemini模型的回答水平，能提出更优质的新方案和创意。

And these were all sort of Gemini models prompted differently to play different roles. And very sort of interestingly, this combined multi agent system came up with behavior that went much beyond a single Gemini sort of model's answer. So it was able to give much, much, much better proposals and new ideas compared to a single sort of model.

Speaker 1

这种现象背后的原理是什么？

What's the intuition behind why that works?

Speaker 0

这个问题目前仍在研究中。以'科研家'为例，当你用它处理问题时，最初得到的答案可能与基础版Gemini差异不大。但随着计算时间的延长——不是几分钟几小时，而是数天——情况就会变化。

Yeah. So I think the so it it is it is something that we that is still being studied. But there are it is a fascinating sort of thing. The the one thing that I actually sort of noticed is that, especially with regards to a sort of coscientist, you would run sort of coscientist on a particular problem, and the very first answer that you would you might get might not be very different from the baseline Gemini sort of model. And but what happens is even as you sort of increase the amount of computation over sort of this is you're not talking about just a few minutes or a few hours, but even days.

Speaker 0

当整个多智能体系统开始审视解决方案并进行迭代优化时，表现就会大幅提升。可能原因是：深度洞见或某些直觉隐藏在概率分布的尾部，而Gemini评估提案优劣的能力远强于其生成新创意的能力。

As the whole multi agent system sort of looks at the solutions and then refines them and sort of tries to sort of rank them, it just becomes much, much better. So why why that might be happening? It might be that the proposal that there is deep insights or there is some sort of intuitions that are buried in the tail of the distribution. Right? And then somehow, Gemini's ability to evaluate which sort of proposal, which idea is better is much better than its capability to come up with a sort of a new idea.

Speaker 0

对吧？计算机科学中也有类似的情况。对吧？有时我们能判断某个解决方案是否正确，但想出解决方案本身却非常困难。对吧？

Right? It's the same sort of thing in computer science. Right? Sometimes we are able to find if sometimes we know whether a a particular solution is correct or not, but it's very difficult to come up with a solution. Right?

Speaker 0

所以在多智能体系统中也出现了同样的情况，多个智能体协同工作能够产生更具影响力的结果。

So it's it's the same sort of thing appearing again in this multi agent setup that somehow the agents working together are able to extract many more impactful results.

Speaker 2

看起来这种生成器与验证器的架构范式正在整个AI领域产生共鸣，无论是通用大模型还是特定应用的专业AI系统。这是否可以视为当前的共识架构？你认为人们会继续推进和扩展这种模式吗？

It seems like the architecture of kind of, you know, generators and verifiers, it seems like that paradigm is is being echoed across the broad AI space, whether it's very general models or very specific AI systems for very specific applications. Is that fair that that's sort of the consensus architecture right now, and do you think that'll be the thing that people continue to push and scale?

Speaker 0

是的。我认为智能体领域会有更多研究，对吧？我们现在看到的只是智能体研究的开端。比如在Alpha Evolve中，生成器与评估器协同工作——生成器是神经网络/基础模型/大语言模型，评估器甚至是手工编码的。

Yeah. So I think there is going to be more work in agents, right? What we are seeing is basically the very start of research on agents. Whether in alpha evolve, you had a generator coupled with an evaluator. The generator was a neural network, a foundational model, an LLM, and the evaluator was even hand coded.

Speaker 0

对吧？但结合进化搜索策略后，就能获得更高效的结果。在Coscientist项目中，你不是用单个智能体，而是多个智能体在共享内存中协作。那么最优的智能体配置是什么？

Right? But together with an evolutionary sort of search scheme, you were able to sort of get these much more effective results. In Coscientist, you didn't have just one agent. You had multiple agents working in a shared memory. Now, like, what is the optimal agent configuration?

Speaker 0

这仍是一个开放的研究问题。

This is still an open research problem.

Speaker 2

非常有趣。你们得到的结果与人类推导方式有差异吗？我联想到AlphaGo的第37步棋。方法是否不同？这些结果与人类思维方式相比如何？

Super interesting. Are the results that you're getting, are they different from the ways that humans would derive them? I'm kind of thinking of the AlphaGo Move 37 stuff. Is are the methods different? Are the results how do they compare to the ways humans would think about them?

Speaker 0

让我们回到最初的动机，即我们为何开始尝试使用大语言模型进行算法发现的第一版工作——FunSearch。几年前，正如你们所知，DeepMind在利用AI系统探索大空间方面做了大量工作。我们在构建通过强化学习训练的智能体方面投入很多，这些智能体能够应对从围棋到《星际争霸》等许多复杂挑战。我们给自己设定了一个挑战：能否采用AlphaZero系列模型这类相同模型——它们是我们围棋研究和AlphaGo开发的延伸——来发现新算法？

So let's go back to the sort of original sort of motivation for why we started even working on the first iteration of using LLNs for algorithmic discovery, which was FunSearch. So a few years back, like, as you know, DeepMind had spent a lot of has done a lot of work in using AI systems for searching over large spaces. We have done a lot of work in building agents which have been trained using reinforcement learning, which can deal with many complex challenges from the game of Go to playing Starcraft, which are quite complex challenges. We set ourselves a challenge that can we take the same kinds of models, like the AlphaZero family of models, which were extensions of or sort of extensions of what we had done in in Go and the development of AlphaGo. Can we use the same types of models for discovering new algorithms?

Speaker 0

于是我们提出了一种名为AlphaTensor的新型智能体，专门聚焦于解决矩阵乘法问题。我们发现这个智能体能够改进已存在五十年的已知结果。但核心问题依然存在：能否做得更好？其次，能否提出更具可解释性的解决方案？与此同时，我们在谷歌研究实际问题时，比如如何在数据中心调度任务？

And we came up with a new sort of agent called AlphaTensor, which was particularly sort of focused on finding solutions for the matrix multiplication problem. We found that this agent was able to improve over the the past known results, which had stood for fifty years. But the key question sort of remained, can you do something better? And secondly, can you sort of come up with a solution that is more interpretable? At the same time, when we were looking at practical problems in Google, like how do you schedule jobs in a data center?

Speaker 0

目前已有大量关于新算法开发的工作，这些启发式方法由谷歌最优秀的研究人员和工程师设计，因为它们对计算机资源利用率影响巨大。如果用典型的强化学习智能体处理这类问题，可能会得到更好结果，但可能以牺牲可解释性为代价——因为现在是由神经网络决定哪些工作负载分配给哪些计算机。如果出现问题，你将无从调试。因此工程师真正希望的是获得可解读、可运行的代码，而非一个神经网络。这本质上就是我们的动机。

Now, has been a lot of work on coming up with new algorithms, and these heuristics have been designed by some of the best researchers and engineers Google and because they have a huge amount of impact in terms of computer utilization. If you use a typical reinforcement learning agent on this kind of problem, you might get better results, but it might come at the cost of interpretability because now you have a neural network deciding which workloads go to which computers. And if something breaks, then you don't know how do you debug this thing. So what engineers would really prefer is instead of giving them a neural network, you gave them a piece of code that they can interpret and they can run. And this was essentially the motivation.

Speaker 0

我们现在能否用大语言模型替代特定算法空间的搜索（如AlphaTensor中对应用算法的度量）或直接解决问题的神经网络策略？能否开发一个能在程序空间中搜索并生成解决方案的智能体？其优势当然是可解释性——你可以查看代码类型，了解其属性等。这正是实际发生的情况：我们发现的程序不仅有效，当专家查看时还能从中获得新见解。例如我们在FunSearch中研究的一个数学问题叫'帽子集问题'。

Can we now use LLMs instead of searching in the space of specific algorithms like we had done metric for application algorithms like we had done in AlphaTensor or coming up with a neural network policy to directly solve the problem, can we come up with an agent which can search in the space of programs and come up with a program that solves this heart problem? And the benefit, of course, will be interpretability, that you can see the sort of code, you can see what its properties are and so on. And and that's what is what happened. We found sort of programs that not only were effective, but when the experts actually saw those programs, they could sort of recover insights. So for instance, they we one of the maths problems that we had looked at for fund search was called the cap set problem.

Speaker 0

这是著名数学家陶哲轩非常感兴趣的问题。我们与纽约大学的乔丹·艾伦伯格合作时发现，FunSearch生成的程序中存在某些先前未被识别的对称性。某种程度上，这个智能体发现了这些特性并利用它们获得更优解。

This is a problem that Terence Tau, one of the famous mathematicians, he is very interested in. And and we collaborated with this mathematician, Jordan Ellen Ellenberg at NYU. And when we looked at the program that FunSearch had produced, he found that there were certain symmetries that were in the problem that had not been recognized before. And somehow, the program, the like like, FunSearch, the agent had discovered those and were was utilizing those to get a better solution.

Speaker 2

您提到与陶哲轩等著名数学家合作，能否谈谈数学是否被视为测试这些模型是否产生新颖科学成果的黄金标准？

Can you say a word about you you mentioned working with Terence Tao and other famous mathematicians. Is math considered the gold standard for, you know, test testing and benchmarking if these models are generating novel scientific results?

Speaker 0

确实。数学具有某些非常有趣的特性，比如高度精确性。你能明确判断是否找到了目标属性——比如矩阵乘法。

Yeah. So math certainly has some properties which are very interesting, right? The fact that it's very precise. You know whether the property that you are looking for, whether you have found it or not. Know matrix multiplication.

Speaker 0

关于矩阵乘法，需要多少次乘法运算。比如对于一个4x4矩阵，已知需要49次乘法运算。这是Strassen的成果，而我们证明可以优化到48次。这是个非常精确的结果，对吧？

For metric multiplication, how many multiplications you require. Like, for a four by four matrix, what was known was that you can do it with 49 sort of multiplications. That's by Strasser, and we showed that you can do it by 48. So that's a very precise result. Right?

Speaker 0

这一点毋庸置疑。它为你提供了一种非常清晰的评估方式，无需依赖人类反馈的RLHF来判断结果优劣，也无需参考LMSS评分。你只需知道自己更优秀。

There is no sort of arguing about that. So it gives you a very crisp way of evaluating how well you have done. And there is no RLHF needed in terms of human feedback, whether this was a nice result or whether this was a nice output or not. And you don't need to rely on sort of an LMSS score. You just know that you're better.

Speaker 2

好的。那么当你们从理想化的meth环境转向现实世界时，似乎已经在Verilog领域发现了许多实际应用场景和数据中心。能否谈谈你们预计AlphaEvolve在哪些应用领域会产生最大影响？

Yeah. Okay. So then when you go from the beautiful, pristine environment that is meth to the real world, it seems like you all have found a lot of real world applications and data centers in the Verilog world. Could you say a little bit about where which applications you expect AlphaEvolve to be most impactful for?

Speaker 0

是的。任何能找到优质函数评估器的场景——只要存在可信的评估体系，比如给定程序就能具体评估其优劣，且问题符合这个框架，就可以使用AlphaEvolve。因为与人类程序员不同，AlphaEvolve可以无限尝试——它不会像人类那样只能试10次、100次或1000次。

Yeah. So wherever sort of you can find a good function evaluator, Wherever you can find an evaluator, where you can say, I really trust this evaluation scheme. If you give me a program, I can tell you very concretely how good it is. If your problem satisfies that that setup, then you can use AlphaEvolve. Because alpha unlike a human sort of programmer who can try 10 things or 100 things or a thousand things, AlphaEvolve does not Like, it's it it does not it does not sort of it can go on and on and on and on.

Speaker 0

对吧？它能提出非常反直觉的解决策略，有些方案可能完全超出你的想象。

Right? It can come up with very counterintuitive sort of strategies to find to solve that problem. Some things that you might not have ever imagined.

Speaker 2

人类能否担任函数评估器？这样可行吗？

Can you have humans be the function evaluators, or does that not work?

Speaker 0

人类当然可以担任评估器。但关键在于规模问题——你能评估多少样本？能否有效评估程序特性？要实现规模化且保持准确度。

Humans humans can be the function evaluators. It's it's a bad it's it's a question of sort of scale. Right? Like, how much how many can you sort of evaluate and whether you whether you can evaluate the property of the program effectively. So at scale and with the right level of accuracy.

Speaker 1

你们是怎么做到的？是把评估机制内置在应用程序里形成人工闭环实时评估吗？还是说在应用产出前单独进行离线评估？我想知道具体操作方式，或者说你们设想中人们会如何实现这一点？

How do you do that? Do you build that into the application itself so that there's a human in the loop evaluating as it goes? Do you do that offline separately before the application is produced? I guess, how do you do that, or how do you imagine people doing that?

Speaker 0

是的。我们AlphaEvolve项目确实没有采用人工闭环评估。大多数评估器都是程序化评估器，明白吗？

Yeah. So, I mean, like so we haven't used a human in the loop for AlphaEvolve. Right? Most of our evaluators were programmatic evaluators. Right?

Speaker 0

假设一个场景：AlphaEvolve接到指令要解决某个数学问题并发明新算法。如果它提出了多种性能相当的解决方案，这时候怎么选最优解？最优解不仅要高效解决问题，还得符合数学家眼中的优雅标准，或者是最容易理解的方案。

But imagine a hypothetical scenario where AlphaEvolve was told that you have to solve this math problem and come up with a new algorithm to solve this problem. And suppose it came up with many different kinds of solutions, which are all equivalent in performance. Okay? But then which one is the best? The best is the one which is not only very effective on the problem, but is the most elegant according to a mathematician or the most simple to understand.

Speaker 0

这就是非常主观的人类判断了。像简洁性或可解释性这类标准，我们并没有明确的定义依据，它完全取决于人类观察者的主观认知。

And that's a very subjective human thing. Simplicity or interpretability, like we don't have a sort of a crisp definition of it. It's it's it depends on it is grounded in the human observer.

Speaker 2

数字世界的演化成果需要在什么阶段与物理世界产生联动？你们博客提到AlphaEvolve可应用于材料科学等领域，是否需要连接现实实验室获取反馈？还是说整个过程都能在算法领域完成？

At what point do you need to pair kind of what's happening in the digital world to any kind of physical world stuff? I think in your blog post, mentioned that you could see AlphaEvolve being useful for, for example, for material science. Do you need to be able to connect to a real world laboratory to get any of that feedback, or do you think all of this can happen in the algorithmic domain?

Speaker 0

这个问题很好。这其实取决于你对评估器的信任程度——如果评估完全基于计算手段，而这个计算模型完美无缺且你完全信任它，那就不需要实体验证。你会认为计算模型确认AlphaEvol的解决方案符合要求，任务就完成了。

That's a very good question. And I think this goes back to how much do you trust the evaluator? If your evaluation was based on a computational method and the computational method was perfect and you completely trusted it, then you don't have to. Then you think, well, I believe the computational model, the computational model says that the solution that AlphaEvol came up with satisfies these properties. Job is done, right?

Speaker 0

但如果你认为计算模型不能完美表征现实，那就需要在实际场景中验证结果，确认评估器的判断是否准确。

But if you don't believe that the computational model is the perfect characterization of reality, then you want to make sure that you sort of validate that result in the real world, right? And you see whether that assessment of the evaluator was indeed correct.

Speaker 2

随着AlphaEvolve越来越成功，Gemini越来越强大，你认为这些领域会发生什么变化？在这些领域工作的人类科学家和工程师将如何适应？以芯片设计为例，你提到这些模型在生成Verilog和创建新芯片设计方面表现得非常出色。这是否意味着芯片设计师的角色会消失或改变？你认为这会如何改变世界？

As AlphaEvolve becomes more and more successful, as Gemini becomes more and more powerful, what do you think happens to these domains and how will the human scientists and engineers working in them adapt? For example, if you take chip design as an example, you mentioned these models are getting very good at generating Verilog, creating new chip designs. Does that mean the role of a chip designer goes away, changes? How do you think that this changes the world?

Speaker 0

是的。我认为这是一个非常有趣的问题。我会以AlphaFold为例来说明。我们开始研究蛋白质结构预测这个问题。对于不了解的人来说，蛋白质是生命的基本构建块。

Yeah. So I think that that's that's, again, sort of a very interesting question. I'll give you the example of what happened with AlphaFold. So we started working on this problem of protein structure prediction. So if for those of you who who don't know, like, proteins are the building blocks of life.

Speaker 0

它们是生命的乐高积木。几十年来，科学家们一直在试图弄清楚蛋白质的形状。因为如果我们了解蛋白质的形状，就能理解它们的功能，并利用这一点开发新药来治疗地球上最具挑战性的疾病。我们还可以开发更好的金属酶等等。2021年，我们发布了AlphaFold 2。

They are the Lego blocks of of life. And for many, many decades, scientists have been trying to figure out what is the shape of proteins. Because if we understand the shape of proteins, we understand how they function, and we can use that to sort of develop new drugs to treating sort of the most sort of challenging diseases on the planet. We can develop metal better sort of enzymes and so on. Now in 2021, as I sort of mentioned, we released Afafold two.

Speaker 0

在此之前，研究一个蛋白质的结构可能需要一到五年时间，花费一百万美元。有些蛋白质极其难以研究，人们花了近一二十年的时间仍未能解决。这就是为什么只有大约37%的人类蛋白质结构是已知的。在我们发布AlphaFold 2后，我参加了一个生物学会议。AlphaFold 2不仅能找到人类蛋白质的结构，还能找到地球上所有蛋白质的结构。

Before that, you used to take or even a single protein, sometimes one to five years to find the structure of a single protein, and it might take a million dollars. And there were some proteins which are so notoriously hard that people had been trying to study them for almost one or two decades and had not found the solution. And which is why only 37%, roughly 37% of the human proteins, their structure was known. So after we sort of released AlphaFold two, I went to a biology conference. And then because AlphaFold two with AlphaFold two, we could find the structure of all protein, not just human proteins, all proteins on the planet.

Speaker 0

我们将这些结构向全球公开。在会议上做完报告后，一位生物学家找到我说：‘Pushmit，我过去十年一直在研究这个蛋白质，收集了大量实验数据来表征它、确定其结构。但不知为何，它一直逃避各种研究手段，我们仍不知道其结构。如果我们知道结构，就能很快验证这些数据。我运行了AlphaFold 2，它给出了结构，完美匹配了答案。我为此工作了十年，现在该怎么办？’

And we made the structures available to everyone on the planet. So I went to a biology conference, and after I gave my talk, a biologist approached me, and he said, Pushmit, I have been working on this protein for the last ten years, and I had collected so much lab data to characterize this protein, to figure out its structure. But somehow, this had this has a new like, has evaded all kind of investigation, and we still didn't know the structure. But we had all this data. If we knew the structure, we could sort of validate it very quickly.

Speaker 0

AlphaFold 2的出现带来了什么变化？

I ran AlphaFold two. It gave me the structure. It perfectly fit the answer. I've been working on this for ten years. What do I do So what what has happened after alpha fold two?

Speaker 0

实际上它带来了三方面的改变：首先推动了结构生物学的发展。过去需要同步辐射加速器、六个月和一百万美元才能完成的工作，现在一秒钟就能解决。它极大地拓展了可能性的边界。

What happened is basically, suddenly, it did three things. It first advanced structural biology. What was not possible earlier, which would take a synchrotron and six months and a million dollars, is now done in a second. Right? So it really advanced what was possible.

Speaker 0

其次，它加速了这一进程。第三，它实现了民主化。就像那些在拉丁美洲、南亚或非洲研究某些被忽视的热带疾病的科学家，他们原本根本没有机会解析目标蛋白质的结构。他们既缺乏资金，也接触不到能解析结构的仪器。而现在他们能获取这些资源，可以研究他们正在攻关的任何寄生虫。

Secondly, it accelerated it. And thirdly, it democratized it. Like that particular scientist working in Latin America or South Asia or Africa on some neglected tropical disease had no chance to figure out the structure of their protein. They did not have the funds or have access to instruments that could find them the structure. Now they have access to those things, to any sort of parasite that they're working on.

Speaker 0

那么他们会怎么做？他们现在正采用这种新模式工作，蛋白质结构不再难以获取，而是随处可见。因此他们开始攻关下一阶段的问题，比如如何利用这些知识来治疗疾病、设计更好的药物？我认为AlphaEvolve也会发生同样的情况。

So what do they do? They are now working in this new model where structures of proteins are not hard to get. They are everywhere. And so they are working on the next set of things, like how do you now use that knowledge to treat diseases and design better drugs? And I think the same thing will happen with with AlphaEvolve.

Speaker 0

一旦拥有了这些能超越人类能力解决问题的智能体，问题就变成了：我们该解决哪些问题？我们需要改进芯片的哪些重要特性？比如我们希望它更高效、更节能，这样对冷却系统的需求更低，制造工艺成本也更低。

Once you have these agents which can go beyond human abilities in solving these problems, then the question becomes, which problems do we solve? What are the important characteristics of a chip that we need to improve on? Right? Like, we want to make it much more efficient, much more sort of so that it requires less cooling. It requires sort of less expensive construction mechanism.

Speaker 0

它的容错性更强，还有许多其他优点。你可以把问题设计得越来越复杂，因为现在有更先进的系统来优化它们。

It's more fault tolerant. Many other things. You can make the problem more and more sophisticated because now you have more sophisticated systems to sort of optimize them.

Speaker 2

嗯，正好请教您。我一直有个疑问：AlphaFold的成果非常惊人，您分享的故事也极具影响力。您认为这是否导致了新药研发效率的转折点？还是说现在又出现了其他瓶颈——我们在某个环节变快了，但其他环节依然困难，导致整体进展仍然缓慢？

Well, I have you. Something I've always wondered. The AlphaFold results are phenomenal, and the story you shared with us is really impactful. Do you think that it's caused an inflection point in the availability of new drugs or are there other bottlenecks now that are just, you know, we're faster at one part, but unfortunately, everything else is just hard, so we're still slow overall?

Speaker 0

不，它确实加快了进程，但必须明白药物研发是个漫长的过程。现在药物研发的障碍是什么？首先需要理解靶点，要明确体内某种蛋白质是需要结合的目标，因为这种蛋白质与疾病相关。

No. It's it's it's so it has it has speeded things up, but I think there's a one has to understand that drug discovery is a long process. Now what are the what are the roadblocks for drug discovery? First, you have to understand the target. You have to understand, here's the protein in the body that I need to bind because this protein is somehow involved in the disease.

Speaker 0

如果能将某种物质与该蛋白质结合并改变其功能，就能产生治疗疾病的效果。首先你要提出这个假设，然后才能说：好，现在我有了靶点蛋白质，该如何开发药物呢？

So if I can somehow bind something to this protein and change its function, it will have an effect that can sort of treat the disease. Like, first, you have to come up with that conjecture. Then you have to say, okay. Now I I have a target protein. How do I develop a drug?

Speaker 0

如何开发一种能与目标结合的小分子或蛋白质？为此，你需要了解蛋白质的结构、它与其他蛋白质的相互作用方式，以及它如何与该分子结合？这通常需要耗费大量时间，有时甚至两年。如今这个过程已大幅加速，现在只需几周或几个月就能完成过去需要多年才能完成的工作。

How do I develop a small molecule or another protein that binds to it? So for that, you needed to understand the structure of the protein, which other proteins that it interacted with, How did it interact with this molecule? This would take a significant amount of time, sometimes two years. Now that process is dramatically sort of accelerated. Now you can do it in sort of a few weeks or a month a few months that took you multiple years sometimes.

Speaker 0

但这还不是终点。之后你还需要进行临床验证，必须通过一期、二期、三期临床试验，考虑毒性等各种因素。AlphaFold的作用是消除了一个障碍，缩短了整体时间线，但还有其他障碍需要我们新一代生物AI模型来加速突破。

But that's not the end of the story. After that, you need to now clinically validate it. So you have to go through phase one trials, phase two trials, phase three trials. You have to think about toxicity, all these other sort of things. So what AlphaFold did was take one blocker away, made the overall timeline faster, but there are other sort of blockers, which our new generation of AI for biology models are hoping to accelerate and much make much faster.

Speaker 0

所以我们已迈出一大步，但还需要再迈出几个大步。

So we have taken a big step, but we need to take a few more big steps.

Speaker 2

你认为这类模型在哪些领域最具商业价值？

What domains do you think will be most lucrative for this this family of models?

Speaker 0

我认为问题的本质是：哪些领域对社会最重要？因为AI将加速所有领域的发展，从医疗保健到材料科学，我们将能开发更智能的系统。纵观人类文明史，我们曾经历穴居时代、石器时代、铁器时代和青铜时代。

I think the is, what is I mean, the answer to your question is basically what domains do you think are important for society? Because AI is going to accelerate everything. It's going to accelerate healthcare. It's going to accelerate the ability for us to develop more smart systems, from healthcare to material science. If you think about the history of our civilization, we even describe our civilization in the sense of, first we were sort of cave dwellers, and then we went into the Stone Age, and then we sort of went to the iron age and then the bronze age.

Speaker 0

现在根据不同观点，有人称我们处于硅时代或塑料时代。但退一步看，人类相比其他物种的成就是掌握了能量转化与利用的能力。如果能发明出室温超导体，将彻底改变我们驾驭能量的能力。

And now depending on who you talk to, you're either in the silicon age or in the plastic age, whether you're autistic or feeling a bit sort of sad. But if you take a step back and you think about what has humanity achieved, what we have achieved compared to any other species is the ability to transform energy, to leverage energy. Right? We have been able to leverage energy and do big things with that power. Now, if you can come up with, say, a new room temperature superconductor, that completely transforms your ability to handle energy.

Speaker 1

对吧？

Right?

Speaker 0

这会为社会带来什么变化？如果能够以这种方式处理能源问题，这些变化将难以预测。对吧？如果我们能实现核聚变。对吧？

What changes will it bring about in society? They're hard to predict if you can if you can deal with energy in that way. Right? If we can unlock fusion. Right?

Speaker 0

而且能源会变得极其廉价。比如，想想地缘政治，想想经济，很多方面都关乎能源。对吧？突然间，如果能源成本趋近于零，对整个经济体系会产生什么影响？同样地，想想编程，如果有了能编程的智能体，这意味着什么？

And energy becomes so cheap. Every like like, if you think about geopolitics, if you think about economy, a lot of it is about energy. Right? And suddenly, if energy sort of goes down to zero, what will be the impact on on on the economics of the whole thing? Similarly, like, if you think about, like, coding and if you have these agents which can code, what does that mean?

Speaker 0

如果人人都能编程，智能将无处不在，所有人都能接触到这些不同的事物。因此将发生剧变，所有领域都会受到影响——从材料到能源，从编程到医疗保健。

If everyone can sort of code, intelligence is completely ubiquitous, everyone has access to all these different things. So there will be dramatic changes, and everything will be impacted. So from from materials to energy to sort of coding to health care.

Speaker 2

真酷。你认为科学发现会迎来一个快速爆发的时刻吗？你觉得我们正处于这个临界点吗？还是说已经身处其中了？

Really cool. Do you think we're gonna have a fast takeoff moment moment for scientific discoveries? Do you think we're at the ramp of one? You think we're already there?

Speaker 0

我认为我们正身处这个加速过程之中。只是身处其中时往往难以察觉。但我认为我们已经处于AI加速科学发现的时代了。

I think we are in we are living through the middle of it. We we are just we don't like, when you're in the middle, you don't really see it. But I think we are already in that era of AI accelerated scientific discovery.

Speaker 2

你认为未来最大的瓶颈是什么？

What do you see as the biggest bottlenecks going forward?

Speaker 0

我认为主要有两个因素。一是验证环节，如何弥合数字世界与现实世界之间的鸿沟。对吧？如何验证某些成果？这是关键点之一。

I think sort of two elements. One is validation, bringing bridging the gap between the digital and the real world. Right? How do you validate some of that? So that is one sort of key idea.

Speaker 0

对吧？而且，真正要抓住问题的关键所在。对吧？第二个瓶颈是如何让这项技术变得易于获取？你可以开发最尖端的技术，但如果人们不知道如何使用，那就无法产生你期望的影响力。

Right? And, like, really sort of capturing what is important for the problem. Right? And and the second is sort of the other bottleneck is how do you make this technology accessible? You can build the most sophisticated technology if people don't know how to use it, then then you will not have the the impact that you want.

Speaker 0

对吧？AlphaFold 2之所以具有变革性影响力，不仅因为它准确度极高——即便它相当准确，但并非完美无缺。假设它对99%的预测都是准确的（实际上肯定不到99%，可能在90%或95%左右）。

Right? AlphaFold two, the it was not just impactful and transformative because it had very high accuracy. Because even if it was quite accurate, but it was not perfect. And suppose it was accurate on 99% of the things that it predicted. It's definitely not at 99%, probably at the ninety or 95% mark.

Speaker 0

但即便假设它有99%的准确率，那个不幸得到错误预测结果的人，可能会花一两年时间追逐这个错误预测，然后说我不该用它，不该相信这些预测。那为什么大家还在用AlphaFold？因为AlphaFold不仅擅长做出准确预测，更擅长明确自身预测的局限性——当它犯错时，它会主动举手承认错误。

But suppose even it was accurate at 99%, the one person who got unlucky with their prediction and then spent the next sort of one or two years chasing a wrong prediction would then sort of say that I should not use it. I should not sort of use the predictions. So why is why is everyone using AlphaFold? They're using AlphaFold because not only AlphaFold not only is AlphaFold good at making these predictions, which are accurate, but it's also very good in understanding the limits of its predictions. When it makes mistakes, it basically holds its hand and says, I've made a mistake.

Speaker 0

所以当它做出预测并说'我非常有把握'时，大多数情况下都是正确的。这很棒。而当今的大语言模型（LLN）恰恰缺乏这种能力——它们没有经过校准的不确定性评估。

So now if it is making your prediction and saying, I'm very confident, most of the time, it's correct. And that's great. This is something that the LLNs of today don't have. They don't have calibrated uncertainty.

Speaker 2

我们最后来几个快问快答环节如何？

Should we close out with some rapid fire questions?

Speaker 0

好啊，没问题。

Yeah. Sure.

Speaker 2

年度必读论文是？

Must read paper of the year.

Speaker 0

年度必读论文。哦，我可能会说是《Alpha Evolve》或《Coscientist》。我喜欢cosine这个词，就是那种感觉。

Must read paper of the year. Oh, I would say alpha evolve or coscientist. I like cosine I like the like yeah.

Speaker 1

有什么鲜为人知但值得推荐的算法吗？

Favorite algorithm nobody talks about?

Speaker 0

哦，是wake-sleep算法。知道的人很少，但本质上这是MIT的Kevin Ellis和Josh Tenenbaum提出的方法——通过探索性训练来构建核心框架。可以类比图书馆建设的过程，明白吗？

Oh, the wake sleep algorithm. And very few people know about it, but it's essentially the idea it's a it's a paper from MIT, from Kevin Ellis and Josh Tenenbaum, which talks about It's a way of doing training where you find some exploration and then you somehow build the gist of it, Right? So think about sort of library construction. An analogy is library construction. Right?

Speaker 0

我们不仅要编写程序，更要创建包含通用模块的库，这样未来编写程序会容易得多。

You don't just want to write programs, but you want to also create the libraries that have common modules that will make all your future programs much easier to write.

Speaker 1

非常酷。

Very cool.

Speaker 2

赞成还是反对：推理时间计算将成为下一个计算规模扩展的主要瓶颈？

Agree or disagree? Inference time compute will be the next major lag of compute scaling.

Speaker 0

基本同意。

Somewhat agree.

Speaker 2

好的，继续说。

Okay. Say more.

Speaker 0

我认为推理时的计算会非常非常重要。测试时的计算，或者说训练时的计算，也同样重要。对吧？再看看蒸馏技术，它的威力已经显现出来了。对吧？

I think inference time compute will be very, very important. I think also test time, sort of training time compute, will be equally important. Right? We also like, if you look at distillation, how powerful distillation sort of has been. Right?

Speaker 0

所以如果这些模型具备理解和概念化的能力，能够理解这些模型能做什么，并形成更好的内在表征，那么它们在预测方面就会高效得多。也许它们的不确定性会有所改善等等。它们甚至会更高效。

So if these models have a have an ability to sort of understand and conceptualize what these models are able to do and come up with better inherent representations, then they just become much more effective in making predictions. Maybe their sort of sort of uncertainty improves and so on. They become more efficient even.

Speaker 1

机器人技术。看涨还是看跌？

Robotics. Bullish or bearish?

Speaker 0

我对所有事情都看涨，所以必须说看涨。我认为每件事都会产生影响。问题基本在于短期还是长期。短期内让机器人技术发挥作用确实具有挑战性。但中长期来看，我是看涨的。

I'm bullish about everything, so I have to say bullish. I think everything will sort of will have an impact. The question is basically near term or longer term. In the near term, it will take some sort of getting robotics to work is challenging. But, like, in the medium to long term, I think I'm bullish.

Speaker 2

人形机器人，看涨还是看跌？

Humanoid robots, bullish or bearish?

Speaker 0

我们的世界是为人类建造的。对吧？我们喜欢人形。我们周围的许多人造环境都是为人类设计、为人类打造的，比如从建筑角度来看。对吧？

We have constructed our world for humans. Right? We like, the human form. A lot of the the non natural world around us is made for humans, has been designed for for humans, like from an architecture perspective. Right?

Speaker 0

如今人形机器人已拥有与人类相同的外形。因此它们能完美适应我们建造的各种建筑结构。虽然目前尚不确定这是否是最优解，但它们确实具有一个优势——我们所有的设计原本就是为人类体型打造的，而现在人形机器人拥有了相同的

Now humanoids have the same form as humans. So they will fit in in all these different architectures that we have built. Now whether they are the most optimal thing, that is not clear, but, they certainly, sort of, have an advantage that we designed everything for the human form, and now human humanoids have the same

Speaker 1

形态。未来的诺贝尔科学奖。是否所有奖项都将由与AI合作的团队获得？

form. Future Nobel Prizes in the sciences. Will all of them be won by teams working with AI?

Speaker 0

不会。我认为我们会逐渐接近那个阶段，但目前科学领域的诺贝尔奖仍由人类获得。但我相信终将迎来AI不可或缺的时代，届时将由人类与AI团队协作实现这些惊人突破。很好。

No. I I think we we would be getting there, but, I think, like, humans are still winning Nobel Prizes in the sciences. So I think there will come a point where AI will be indispensable. So it will be sort of humans and AI teams working together to achieving these amazing breakthroughs. Good.

Speaker 2

普什米特，非常感谢你今天参与我们的对话。你在DeepMind推进的这些成果确实具有根本性和普适性意义，我们很感激你能来分享你是如何取得这些成就的，以及未来的发展方向。谢谢。

Pooshmeet, thank you so much for joining us today. These are really fundamental, really general results that that you're pushing forward at DeepMind, and we appreciate you joining us to to share more about how you how you managed to do do all this so far and what's ahead. Thank you.

Speaker 0

谢谢。

Thank you.