从氛围编程到氛围研究：OpenAI的马克·陈与雅各布·帕霍茨基

本集简介

氛围编程之后是什么？或许是氛围研究。 OpenAI首席科学家雅各布·帕霍茨基与研究总监马克·陈，与a16z普通合伙人安杰尼·米达和萨拉·王深入探讨GPT-5——他们如何将快速响应与长程推理相结合，如何在基准测试饱和后衡量进展，以及为何强化学习持续让质疑者惊讶。他们探索了代理系统（及其稳定性权衡）、改变软件开发方式的编程模型，以及更大的赌注：能产生真实经济影响的自动化研究者。此外还包括：如何分配算力优先级、招聘"洞穴隐居型"人才、在产品公司内部保护基础研究，以及在不追逐每个花哨演示的情况下保持节奏。时间码： 0:00 引言与自动化研究者的目标 0:43 AI推理能力的演进 1:46 评估：从基准测试到现实影响 5:15 GPT-5的惊人能力 6:56 研究路线图：未来1年、2年、5年 7:46 长程代理与模型记忆 9:44 开放领域中的推理 11:18 强化学习的角色与进展 13:14 奖励建模与最佳实践 14:21 新版Codex：现实世界编程 16:20 AI vs. 人类编程：新常态 20:07 杰出研究者的特质 21:14 坚持、信念与问题选择 26:00 构建并维持成功的研究文化 31:45 产品与基础研究的平衡 39:00 算力与物理限制的重要性 45:50 保持速度与规模化学习 47:18 OpenAI的信任与合作资源：雅各布的X账号：https://x.com/merettm 马克的X账号：https://x.com/markchen90 萨拉的X账号：https://x.com/sarahdingwang 安杰尼的X账号：https://x.com/AnjneyMidha 保持更新：若喜欢本期节目，请点赞、订阅并分享给朋友！ a16z的X账号：https://x.com/a16z a16z的LinkedIn：https://www.linkedin.com/company/a16z Spotify收听a16z播客：https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX 苹果播客收听a16z播客：https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 关注主持人：https://x.com/eriktorenberg 请注意，此处内容仅供信息参考；不应视为法律、商业、税务或投资建议，也不应用于评估任何投资或证券；且不针对任何a16z基金的投资者或潜在投资者。a16z及其关联公司可能持有讨论企业的投资。更多详情请见a16z.com/disclosures。保持更新： a16z的X账号 a16z的LinkedIn Spotify收听a16z播客苹果播客收听a16z播客关注主持人：https://twitter.com/eriktorenberg 请注意，此处内容仅供信息参考；不应视为法律、商业、税务或投资建议，也不应用于评估任何投资或证券；且不针对任何a16z基金的投资者或潜在投资者。a16z及其关联公司可能持有讨论企业的投资。更多详情请见a16z.com/disclosures。由Simplecast托管，AdsWizz旗下公司。关于我们收集和使用个人数据用于广告的信息，请访问pcm.adswizz.com。

What comes after vibe coding? Maybe vibe researching. OpenAI’s Chief Scientist, Jakub Pachocki, and Chief Research Officer, Mark Chen, join a16z general partners Anjney Midha and Sarah Wang to go deep on GPT-5—how they fused fast replies with long-horizon reasoning, how they measure progress once benchmarks saturate, and why reinforcement learning keeps surprising skeptics. They explore agentic systems (and their stability tradeoffs), coding models that change how software gets made, and the bigger bet: an automated researcher that can generate new ideas with real economic impact. Plus: how they prioritize compute, hire “cave-dweller” talent, protect fundamental research inside a product company, and keep pace without chasing every shiny demo. Timecodes: 0:00 Introduction & Goals of Automated Researcher 0:43 The Evolution of Reasoning in AI 1:46 Evaluations: From Benchmarks to Real-World Impact 5:15 Surprising Capabilities of GPT-5 6:56 The Research Roadmap: Next 1, 2, 5 Years 7:46 Long-Horizon Agency & Model Memory 9:44 Reasoning in Open-Ended Domains 11:18 The Role and Progress of Reinforcement Learning 13:14 Reward Modeling & Best Practices 14:21 The New Codex: Real-World Coding 16:20 AI vs. Human Coding: The New Default 20:07 What Makes a Great Researcher? 21:14 Persistence, Conviction, and Problem Selection 26:00 Building and Sustaining a Winning Research Culture 31:45 Balancing Product and Fundamental Research 39:00 The Importance of Compute and Physical Constraints 45:50 Maintaining Speed and Learning at Scale 47:18 Trust and Collaboration at OpenAI Resources: Find Jakub on X: https://x.com/merettm Find Mark on X: https://x.com/markchen90 Find Sarah on X: https://x.com/sarahdingwang Find Anjney on X: https://x.com/AnjneyMidha Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Find a16z on X: https://x.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated: Find a16z on X Find a16z on LinkedIn Listen to the a16z Podcast on Spotify Listen to the a16z Podcast on Apple Podcasts Follow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

我们的主要目标是打造一个自动化研究员。即实现新想法的自动化发现，我们接下来关注的评估标准和里程碑将涉及在经济相关领域取得实际进展。

The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas, the next set of evals and milestones that we're looking at will involve actual movement on things that are economically relevant.

Speaker 1

我之前和一些高中生交流时，他们说现在默认的编程方式其实是氛围编程。我确实认为，未来的研究或许也会走向氛围研究的方向。

And I was talking to some some high schoolers, and they were saying, oh, you know, actually, the default way to code is vibe coding. I I do think, you know, the future, hopefully, will be vibe researching.

Speaker 2

构建自动化研究员需要什么条件？AI能否自主发现新想法？OpenAI首席科学家雅各布·波霍茨基与研究总监马克·陈，与a16z普通合伙人阿迪娜·米萨和莎拉·王共同探讨GPT-5的推理推进、为何评估需转向经济价值基准，以及迈向自动化研究速率的征程。我们将深入探讨长周期代理能力、强化学习持续有效的原因、现实世界编程的新准则、研究文化与产品的碰撞，以及为何当前算力决定命运。让我们开始吧。

What does it take to build an automated researcher and can AI discover new ideas on its own? OpenAI's chief Scientist, Jakob Pohotsky, and Chief Research Officer, Mark Chen, join a16z general partners Adine Mitha and Sarah Wang to unpack GPT-five's reasoning push, why evals must shift to economically meaningful benchmarks, and the march towards an automated research rate. We get into long horizon agency, why RL keeps working, the new codex for real world coding, research culture versus product, and why, for now, compute is destiny. Let's get into it.

Speaker 3

感谢雅各布和马克的到来。雅各布，你是OpenAI的首席科学家；马克，你是OpenAI的研究总监。你们肩负着运营可能是AI界最高调研究团队的殊荣与压力。我们非常期待能与你们探讨诸多感兴趣的话题，包括近期OpenAI最令人振奋的更新之一GPT-5。更宏观地说，你们如何打造不仅能开发GPT-5，还能兼顾Codex、ChatGPT和API业务的研究团队？如何将跨模态、跨产品形态的多元布局整合成连贯的研究文化与叙事？

Thanks for coming, Jakob and Mark. Jakob, you are the chief scientists at OpenAI. Mark, you are the chief research officer at OpenAI, and you guys have the both the the privilege and the stress of running probably one of the most high profile research teams in AI. And so we're just really stoked to talk with you about a whole bunch of things we've been curious about, including GPD five, which was one of the most exciting updates to come out of OpenAI in recent times. And then stepping back, how you build a research team that can do not just GPT-five, but Codex and ChatGPT and an API business, and can weave all of the many different bets you guys have across modalities, across product form factors into one coherent research culture and story.

Speaker 3

那么作为开场，不如我们从GPT-5开始？请从你们的视角简单谈谈GPT-5的发布情况。进展如何？

And so to kick things off, why don't we start with GPT-five? Just tell us a little bit about the GPT-five launch from your perspective. How did it go?

Speaker 1

我认为GPT-5本质上是我们将推理能力推向主流的尝试。在GPT-5之前，我们有两类模型系列：GPT-2/3/4这类即时响应模型，以及经过长时间思考后给出最佳答案的O系列。从策略上讲，我们不希望用户困惑于该选择哪种模式。

So I think GPT-five was really our attempt to bring reasoning into the mainstream. And prior to GPT five, right, we have two different series of models. You had the GPT kind of two, three, four series, which were kind of these instant response models. And then we had an o series, which essentially thought for a very long time and then gave you the best answer that it could give. So tactically, we don't want our users to be puzzled by, you know, which mode should I use?

Speaker 1

这需要大量研究来确定不同提示所需的思考强度，从而免除用户的决策负担。我们相信未来将越来越注重推理能力，越来越强调智能代理。GPT-5正是朝着默认提供推理能力和更主动的代理行为迈出的重要一步。

And involves a lot of research in kind of identifying what the right amount of thinking for any particular prompt looks like, and taking that pain away from the user. So we think the future is about reasoning, more and more about reasoning, more and more about agents. And we think GPT-five is a step towards delivering reasoning and more agentic behavior by default.

Speaker 0

相较于我们之前的三款模型，这款新模型在各方面都有诸多改进。但我们此次发布的主要目标，确实是让推理能力惠及更多人。

There is also a number of improvements across the board in this model relative to all three on our previous models. But our primary physio for this launch was indeed bringing the reasoning out to more people.

Speaker 4

能否详细说说你们对评估体系的思考？我注意到即使在发布视频里，也有许多评估指标是从98%缓慢提升到99%，这种饱和状态下的微增长。你们采用什么方法来衡量进展？又是如何看待这个过程的？

Can you say more about how you guys think about evals? I noticed even in that launch video, there are a number of evals where you're inching up from, you know, 98 to 99%, and that's kind of how you know you saturated the eval. What approach do you guys take to measuring progress, and how do you think about it?

Speaker 0

首先，我们过去几年使用的这些评估指标确实已接近饱和。对很多指标而言，从96%提升到98%未必是世界上最关键的事。我认为更重要的是——虽然这更微妙些——在GPT-2、GPT-3到GPT-4的时代，我们基本遵循单一范式：用海量数据预训练模型，再把这些评估当作衡量模型泛化到不同任务的标尺。

One thing is that indeed for, like, these evals that we've been using for the last few years, they're indeed pretty close to saturated. And so, yeah, like, for a lot of them, like, you know, inching from, like, 96 to 98% is not necessarily the most important thing in the world. I think another thing that's maybe even more important, but a little bit subtler. When we were in this, like, g p t two, g p t three, g p four era, you know, there was kind of one recipe. You just, like, pre train a model on a lot of data, and you kind of, like, use these evals as just kind of a a yardstick of how this generalizes to, like, different tasks.

Speaker 0

现在我们有了不同的训练方式，特别是针对严肃推理的强化学习。我们可以选择特定领域，真正训练模型成为该领域的专家，进行深度推理。这使得我们能针对特定任务获得极致性能，但会削弱其他领域的泛化能力。因此我们认为当前确实缺乏优秀的评估体系。我们更关注模型能否发现新事物——今年最让我兴奋的进展趋势，就是模型在数学和编程竞赛中的表现。尽管这些领域某种程度上也正在趋于饱和。

Now we have these different ways of training, in particular reinforcement learning on, like, serious reasoning, where we can take a domain and we can really train a model to, like, become an expert in this domain to reason very hard about it, which lets us, you know, target particular kinds of tasks, which will mean that we can get extremely good performance on some evils, but it doesn't indicate as great generalization to other things. So the way we think about it in this world, we definitely think we are in a little bit of a deficit, like, of great evaluations. And I think the big things that we look at are actual marks of the model being able to discover new things. I think, for me, the most exciting trend and, like, actual sign of progress this year has been our model's performance in math and programming competitions. Although, I think, like, they are also becoming saturated in a sense.

Speaker 0

我们正在关注的新一轮评估标准和里程碑，将涉及具有经济价值的实际发现与突破。

And the next set of evals and milestones that we're looking at will involve actual discovery and actual movement on things that are economically relevant.

Speaker 4

完全理解。你们已经在AtCoder竞赛中拿到第二名，现在只剩榜首了。

Totally. You guys already got number two in the AtCoder competition, so there's only number one left.

Speaker 1

确实。我想强调的是，像IOI、AtCoder、IMO这些评估体系，实际上是未来研究成功的现实标杆。世界上许多顶尖研究者都经历过这些竞赛并取得佳绩。我们正在为这个前沿领域做准备，试图让模型具备发现新事物的能力。

Yeah. Yeah. I mean, I think it is important to note that these evals, like, you know, IOI, AtCoder, IMO, are actually real world markers for success in future research. I think a lot of, you know, the best researchers in the world have gone through these competitions and have gotten very good results. And, yeah, I think we are kind of preparing for this frontier where we're trying to get our models to discover new things.

Speaker 4

是的，非常令人兴奋。

Yep. Very exciting.

Speaker 3

在发布前，GPD五代的哪项能力在你们评估或内部使用时最让你惊讶？是否有某个瞬间让你觉得它已经足够好用，可以发布了，因为它在你的日常使用中已经很有帮助？

Which capability from GPD five before the release surprised you the most when you were working through the eval bench or using it internally? Were there any moments where you felt like this was starting to get good enough to release because it was useful in your daily usage?

Speaker 1

我认为对我来说一个重大的发现是它在高难度科学领域的前沿突破有多大。我们曾和一些专业物理学家、数学家的朋友一起测试模型。推特上已经能看到一些例子，比如你可以让它解决一个问题，发现可能不是非常复杂的新数学，但确实是一些非平凡的新数学。我们看到物理学家、数学家们反复验证这一体验，他们试用QPD五代Pro后都会惊叹：哇，这是之前模型版本做不到的。

I think one big thing for me was just how much it moved the frontier in very hard sciences. You know, we would try the models with some of our friends who are, you know, professional physicists or professional mathematicians. And you already saw kind of some instances about this on Twitter where, you know, you can take a problem and have it discover, maybe not like very complicated new mathematics, but, you know, some nontrivial new mathematics. And we see physicists, mathematicians kind of repeating this experience over and over where they're trying QPD five Pro and saying, wow. This is something that previous version of the models couldn't do.

Speaker 1

这对他们来说有点像灵光乍现的时刻。它能够自动化可能需要他们学生花费数月时间完成的工作。

And it is a little bit of a light bulb moment for them. It's, like, able to automate maybe, like, what could take one of their students months of time.

Speaker 0

确实，GPT五代相比三代有明显提升。对我来说，二代才是推理模型真正开始日常实用的转折点。特别是在处理数学公式或推导时，它达到了相当可靠的水平，我确实能将其作为工作工具使用。能见证这一刻非常激动，但正如我们现在所见，这些模型已经能自动化解决竞赛问题等长期任务，我预期接下来一年的进展会让这些成就显得微不足道。

Well, GPT-five is a definite improvement on o three. For me, o two was definitely like that moment where the reasoning models became, like, actually very useful on a daily basis. I think especially for, you know, working through a math formula or a derivation, like, it actually got to a level where it is, like, fairly trustworthy. I can actually use it as a tool for my work. And, yeah, I think it is very exciting to get to that moment, but I expect that well, now as we're seeing, you know, these models, like like, actually able to automate well, yes, like like we're saying solving contest problems over over longer time horizons, I expect that that that was quite small compared to what's coming over the next year.

Speaker 3

未来一、二、五年会有什么发展？在您方便分享的范围内，研究路线图是怎样的？

What is coming in the next one, two, five years? It would be just at whatever level you're comfortable sharing, what does the research road map look like?

Speaker 0

我们研究的主要目标是打造自动化研究者——实现新思想的自动化发现。当然，我们经常思考的是如何自动化我们自己的工作，即机器学习研究的自动化，但这可能有点自我指涉。所以我们也在考虑如何推动其他科学的自动化进展。衡量这方面进展的一个好方法是观察这些模型实际能进行多长时间的推理并取得进展。现在，比如在高中竞赛级别达到接近精通的水平，我认为当前模型的推理时长大约在一到五小时量级。

So the big thing that we are targeting with our research is producing an automated researcher. So automating the discovery of new ideas, and, you know, of course, like a particular thing we think about a lot is automating our own work, automating ML research, but that can get a little bit self referential. So we're also thinking about automating progress in other sciences. And I think, like, one good way to measure progress there is looking at, like, what is the time horizon on which these models actually can reason and make progress. And so now, as we get to a level of near mastery of this high school competitions, let's say, I would say, like, we get to, like, maybe on on the order of one to five hours of reasoning.

Speaker 0

因此我们正致力于扩展这一视野，既包括模型在超长周期内的规划能力，也包括其实际保持记忆的能力。

And so we are focused on extending that horizon, both in terms of, like, the model's capability to plan over very long horizons and actually able to retain ability to retain memory.

Speaker 1

回到eval的问题，这就是为什么我认为评估模型能自主运行多长时间的测试对我们特别有意义。

And back to eval's question, that's why I think eval's of the form of how long does this model autonomously operate for are of particular interest to us.

Speaker 4

实际上关于这个话题，当前模型开发正大规模转向代理能力方向。但就目前阶段而言，用户已观察到这种权衡：过多工具或规划步骤可能导致质量倒退，而代理性稍弱的模型，其输出质量目前看来反而更高。你们如何权衡系统稳定性与深度之间的关系？模型执行步骤越多，第十步的准确性可能越低；而只让它做一件事时，完成度会非常高。

And actually, maybe on that topic, there's been this huge move toward agency and model development. But I think at least the state that it's in currently, users have sort of observed this trade off between too many tools or planning hops can result in quality regressions versus something that maybe has a little bit less agency. The quality is at least observed today to be a bit higher. How do you guys think about the trade off between stability and depth? The more steps that the model is undertaking, maybe the less likely the tenth step is to be accurate versus you ask it to do one thing, it can do it very, very well.

Speaker 0

我认为保持深度的能力很大程度上取决于长周期内的稳定性。嗯。是的。所以我觉得这些都是密切关联的问题。事实上，在推理模型方面，我们已经看到模型能够可靠地进行推理和工作的时长得到了极大延伸，且不会偏离轨道。

I think, actually, like well, the ability to maintain depth is a lot of it is being consistent over long horizons. Mhmm. Yeah. So I I think there are very related problems. And in fact, I think, like, with the reasoning models, we have seen the models, like, greatly extend the length over which they are able to reason and work reliably without going off track.

Speaker 0

是的，这一直是我们重点关注的领域。

Yeah. I think this has remained a big area of focus for us.

Speaker 1

没错。我认为推理能力是实现长周期运作的核心。就像解数学题时，你尝试某种方法行不通，就必须思考接下来要采取什么策略。

Yeah. And I think reasoning is core to this ability to operate over a long horizon. Because, you know, you imagine kind of yourself solving a math problem. You try an approach, it doesn't work. And you have to think about, you know, what's the next approach I'm gonna take?

Speaker 1

第一种方法存在哪些错误？然后你尝试另一种方法，而世界却给了你一些严厉的反馈。对吧？接着你不断尝试不同的方法。这种长时间持续尝试的能力就是推理，它赋予了智能体这种稳健性。

What are the mistakes in the first approach? And then you try another thing, And yet the world gives you some hard feedback. Right? And then you keep trying different approaches. And the ability to do that over a long period of time is reasoning and gives agents that robustness.

Speaker 4

我们讨论了很多数学和科学方面的问题。我很好奇你对以下问题的看法：你认为我们所取得的一些进展是否也能类似地扩展到那些较难验证的领域？这些领域没有明确的对错之分。

We talked a lot about math and science. I was curious to get your take on do you think some of the progress that we've made can actually extend similarly to domains that are less verifiable? They're sort of less of an explicit right or wrong.

Speaker 0

对吧？你需要浏览的想法数量是有限的，这与解决一个非常开放的问题可能感觉截然不同。但即使你想解决一个定义明确但时间跨度更长的问题，比如证明某个千禧年难题，这突然就需要你思考：哪些数学领域或其他科学可能相关？是否有来自物理学的灵感我必须借鉴？围绕这个问题，我想发展的整个计划是什么？

Right? And there's a kind of a finite amount of ideas you need to look through, and that might feel extremely different from solving something very open ended. But, you know, even if you want to solve a very well defined problem that is on a much longer scale, right, like, you know, prove this Millennium Price problem, well, that suddenly requires you to think about, okay, like, what are the fields of mathematics or other sciences that might possibly be relevant? You know, are there inspirations from physics that I must take? Like, what is kind of the entire program that I want to develop around this?

Speaker 3

对，对。我们来谈谈强化学习（RL）。因为自从它问世以来，RL似乎一直是个不断给予的礼物。每隔几个月，OpenAI就会发布新成果，大家都会说，哦，太棒了。

Right. Right. Let's talk about RL. Because it seems like since one came out, RL has been the gift that keeps giving. You know, every couple of months, OpenAI puts out a release, and everyone goes, oh, that's great.

Speaker 3

为什么强化学习效果这么好？它的出色表现中有哪些令你们意外的部分？

Why is RL working so well? And what, if anything, has surprised you about how well it works?

Speaker 0

后来语言建模取得突破。我们发现大规模深度学习应用于自然语言建模时，能创造出具有革命性语言理解能力的模型。从那时起，我们一直在探索如何融合这些范式，让AI在自然语言领域发挥作用。

And then, you know, of course, came the language modeling breakthrough. Right? And we saw that, oh, yeah. If we scale deep learning on modeling natural language, we can create models with this, like, incredibly new understanding of human language. And so since then, we've been, you know, seeking how to combine these paradigms and how to get our all to work on natural language.

Speaker 3

对于非强化学习从业者来说，最困难的部分是设计合适的奖励模型。特别是对企业用户而言，他们想利用你们取得的惊人进展，却不知从何入手。未来几年这类企业该如何发展？理解强化学习、构建奖励模型的正确思路是什么？你们是否总结出最佳实践或运用这类新型推理技术的方法论？

One of the hardest things about RL for folks who are not practitioners of RL is the idea of crafting the right reward model. And so, especially if you're a business or an enterprise who wants to harness all this amazing progress you guys are putting out, but doesn't even know where to start. What do the next few years look like for a company like that? What is the right mindset for somebody who's trying to make sense of RL to craft the right reward model? Is there anything you've learned about the best practices or an approach of thinking of using this latest sort of family of reasoning techniques?

Speaker 3

作为一名生物学家或物理学家，我应该如何正确思考甚至着手进行奖励建模？

What is the right way I should think about even approaching reward modeling as a biologist or a physicist?

Speaker 0

我预计这会发展得非常迅速。我认为它会变得更简单。对吧？比如，我想，也许在两年前，我们会讨论如何正确构建我的微调数据集？而我认为我们还没有走到那个演变的终点。

I expect this will evolve quite rapidly. I expect it will become simpler. Right? Like, I think, maybe, you like, two years ago, we would have been talking about, like, what is the right way to craft my fine tuning dataset? And I don't think we are, like, at the end of that evolution yet.

Speaker 0

而且我认为我们会逐渐接近越来越像人类的学习方式，你知道，IRL（逆强化学习）仍然不太像。所以我认为思维模式中最重要的部分可能是，不要假设现状会永远持续下去。是的。

And I think we will be inching towards more and more human like learning, which, you know, IRL is still not quite so I think maybe the most important part of the mindset is to, like, not assume that, like, what is now will be forever. Yeah.

Speaker 1

是的。我认为Codecs团队的一个主要重点就是利用我们推理模型的原始智能，使其对现实世界的编程非常有用。所以他们做的很多工作都与此一致。他们致力于让模型能够处理更复杂的环境。我们知道现实世界的编程非常混乱，所以他们试图处理所有的复杂性。

Yeah. So I think one of the big focuses of the Codecs team is to just take the raw intelligence that we have from our reasoning models and make it very useful for real world coding. So a lot of the work they've done is kind of consistent with this. They are working on kind of having the model be able to handle more difficult environments. We know that real world coding is very messy, So they're trying to handle all the intricacies there.

Speaker 1

有很多编程涉及到风格和一些软性因素，比如模型的主动性如何，它的惰性如何。并且能够在某种意义上定义编码模型应该如何表现的规范。他们在这方面做了很多非常扎实的工作。作为这些学生，他们也在开发更好的预设。编码者有一种概念，即愿意为特定解决方案等待多长时间。

There's a lot of coding that has to do with, you know, style with just like kind of softer things, like how how proactive the model is, how how lazy it is. And just being able to define, in some sense, like a spec for how a coding model should behave. They do a lot of very strong work there. And as these students, they're also working on a lot better presets. Coders, they have some kind of notion of this is how long I'm willing to wait for a particular solution.

Speaker 1

我认为我们做了很多工作来调整，对于简单问题，延迟要低得多；而对于更难的问题，实际上正确的做法是延迟更高，以得到真正最好的解决方案，并且能够找到那个预设。

I think we've done a lot of work to dial in on, you know, for easy problems being a lot lower latency. For harder problems, actually, the the right thing is to be even higher latency. Get you the really best solution and just being able to find that preset is

Speaker 4

如果要说简单问题与难题的区分，这里是不是有个很合适的临界点？

very sweet spot for if you were to say, like, easier problems versus harder?

Speaker 1

我们发现上一代Codex模型存在一个问题——它们花在解决难题上的时间太少，而在简单问题上耗时过多。我认为这很可能就是GPT-3开箱即用的默认表现。

What we've found is the the latest the the previous generation of the Codex models, they they were spending too little time solving the hardest problems and too much time solving the easy easy problems. And I think that that is actually just probably out of the box what what you might get out of o three.

Speaker 4

既然两位都是前竞赛程序员，就编程话题展开说说。我知道你们在OpenAI已近十年，但李世石的故事让我印象深刻——这位围棋选手多次败给AlphaGo后宣布退役。最近采访中你们提到，现在编程模型已超越自身水平，这令你们兴奋。能否详细谈谈？你们现在实际写多少代码？

Maybe just on the the topic of coding since you guys are both competitive coders in prior lives. I know you've been at OpenAI for almost a decade now, but I was struck by the story of Lee Sedol, the Go player, who kind of famously quit Go after he lost to AlphaGo multiple times. And I think in a recent interview, you guys were both saying that now the coding models are better than your capabilities, and that gets you excited. But say more about that. And how much would you say you code now?

Speaker 4

如果具体到实际操作层面，虽然可以泛谈OpenAI整体情况，但现在有多少比例代码是由AI编写的？

Well, if you're hands on keyboard, you you can talk about OpenAI generally, but how much code is written by AI now?

Speaker 0

关于模型能力超越人类这点，我认为进展确实令人振奋。编程竞赛就像个封装测试，能在限定环境中检验创新思维。虽然像IMO第六题或最难编程赛题，模型还有提升空间，但这个差距不会持续太久。我个人偶尔还会写点代码。

In terms of cutting models being better, I I mean, I I think yeah. I think it is extremely exciting to see this progress. I think, like, the programming competitions have a nice kind of encapsulated test of, like, ability to come up with some new ideas in in in in, you know, in this, like, boxed environment and time frame. I do think, like, you know, if you look at things like well, I guess the IMO problem six or or maybe some very hardest programming competition problems, like, I think there's still a little bit of headway to go for the models, but I wouldn't expect that to last very long. I do go a little bit.

Speaker 0

说实话他太谦虚了。我向来极度抗拒使用辅助工具，基本只用Vim。

Historically, I've been like He's being humble. Historically, I've actually been, like, extremely reluctant to use any sort of tools. I I I just used Vim pretty much. Oh, yeah.

Speaker 4

老派作风。

Old school.

Speaker 0

是啊。确实。最终，我认为，特别是有了这些最新的编程工具，比如GPT-5，我真的有种感觉，好吧，这不再是以前的方式了。你可以在大约十五分钟内近乎完美地完成一个30个文件的重构，你差不多必须使用它。

Yeah. Yeah. Eventually, I think, like like, especially with this with this latest coding tools, like GPT five, I I really kind of felt like, okay. Like, this is this is no longer the way. Like like, you can do a, you know, 30 file refactor, like, pretty much perfectly in, like, fifteen minutes, like, you kind of have to use it.

Speaker 0

是的。所以我一直在学习这种新的编程方式，感觉确实有点不同。我觉得目前它仍处于一种‘恐怖谷’状态，你不得不使用它，因为它极大地加速了许多工作，但它还不太像一个同事那样好。所以，我认为我们的首要任务是走出这个‘恐怖谷’。

Yeah. And so I've been I've been kind of, like, learning this new way of coding, which definitely feels a little bit different. I I think it is a little bit of an uncanny valley still right now, where you kind of have to use it, because it is just accelerating so many things, but it's still a little bit not quite as good as a coworker. So, you know, I think our priority is getting out of that uncanny valley. Yeah.

Speaker 0

但是，这确实是一个有趣的时期。

But, yeah, it's definitely an interesting time.

Speaker 4

确实如此。

Yeah. Definitely.

Speaker 1

谈到李在石时刻，我认为AlphaGo对我们两人来说，都是AI发展中一个非常重要的里程碑。至少对我来说，这是我最初开始从事这项工作的原因。也许部分是因为我们在竞赛编程的背景，我对构建能在这些比赛中表现非常出色的模型有一种亲近感。从解决八年级数学问题到一年后在这些编程竞赛中达到我们的水平，看到这种进步真是疯狂。你几乎可以想象或认为你感受到了李在石曾经感受到的所有情感，对吧？

To kinda speak to the Lee Sittle moment, I think AlphaGo, for both of us, was a very formative milestone in AI development. And at least for me, it was the reason I started working on this in the first place. And maybe partly because of our backgrounds in competitive programming, like I had this affinity to building these models which could do very, very well in these forms of contests. And going from, you know, solving eighth grade math problems to a year later, hitting our level of performance in in these coding contests, it's crazy to see that progression. And you kinda imagine or like to think that you feel a set of the feelings that Lisa had all felt too, right?

Speaker 1

就像，哇，这真的很疯狂，对吧？还有什么可能性？这是我花了数十年时间，通过大量努力才达到的前沿。所以你确实能感受到其中的含义，这些模型，它们有什么不能做的？对吧？

It's like, wow, this is really crazy, right? And what are the possibilities? And this is something that I took decades to do, and took a lot of hard work to get to the forefront of. So you really do feel an implication of that, is these models, what can't they do? Right?

Speaker 1

而且我感觉它已经改变了编程的默认方式。上周末我和一些高中生聊天，他们说，哦，实际上现在默认的编程方式是Vive编程。他们认为，也许有时候为了完整性，你会去从头开始自己编写所有代码，但这对他们来说是个奇怪的概念。就像，你为什么要那样做？你知道吗？

And I do feel like already it's kind of transformed the default for coding. This past weekend I was talking to some high schoolers and they were saying, oh, you know, actually the default way to code is Vive coding. I think they would consider, oh, it's like maybe sometimes for completeness, you would go and actually do all of the mechanics of coding it from scratch yourself, but that's just a a strange concept to them. Like, why would you do that? You know?

Speaker 1

对，对。你就是凭感觉编码。就像默认的那样。是的。

Right. Right. You just vibe code. Like default. Yeah.

Speaker 1

是的。所以，嗯，我是说，我真的认为，未来有望实现凭感觉研究。没错。

Yeah. And and so, yeah, I mean, I I I do think, you know, the future hopefully will be vibe researching. Yeah.

Speaker 3

我有个问题，什么造就了伟大的研究者？对吧？当你说凭感觉研究时，很大一部分凭感觉编码在于拥有良好的品味，想为世界创造有用且有趣的东西。我认为像Codex这样的工具最棒的地方在于，如果你对人们的需求有很好的直觉，它能帮助你表达出来，然后快速实现一个原型。那么在研究领域，类似的要素是什么？

I have a question about that, which is what makes a great researcher? Right? When you say vibe researching, there's a big part of vibe coding is just having good taste in wanting to build something useful and interesting for the world. And I think what's so awesome about tools like Codex is if you've got a good intuition for what people want, it helps you articulate that and then and then basically actualize a prototype very fast. With with research, what's the what's the analog?

Speaker 3

什么造就了伟大的研究者？

What what makes a great researcher?

Speaker 0

坚持是非常关键的特质。对吧？我认为研究的不同之处在于，你实际上是在尝试创造或学习一些未知的东西。对吧？就像，它是否可行是未知的。

Persistence is a is a very key trait. Right? Like, I think, like, what what is different about research when you're actually trying to I I think the special thing about research, right, is you're trying to create something or or learn something that is just not known. Right? Like, it's not known to work.

Speaker 0

就像，你不知道它是否会成功。所以总是在尝试很可能失败的事情，我认为关键在于达到一种心态，准备好失败并从这些失败中学习，当然，同时要形成清晰的假设，并对自己在这些假设上的进展保持极度诚实。对吧？我认为很多人陷入的陷阱是千方百计证明它是可行的。对吧？

Like, you don't know whether it will work. And so always trying something that will most likely fail, and I think getting to a place where you are, in a mindset of being ready to fail and being ready to learn from these failures, and, you know, so and, you know, and of course, with that comes creating kind of clear hypothesis and being extremely honest with yourself about how you're doing on them. Right? I think a trap many people fall into is going out of their way to, like, to prove that it works. Right?

Speaker 0

这与相信自己的想法并认为它极其重要是相当不同的。对吧？你希望坚持这一点，但必须诚实地面对何时有效何时无效，这样才能...

Which is quite different from, you know, like I think, like, believing in your idea and thinking of it is extremely important. Right? And you want to persist that, but you have to be honest with yourself about when it's working and when it's not, so that you can

Speaker 1

学习与调整。是的。我认为经验几乎没有捷径可言。通过经验，你会逐渐学会思考问题的合适视野——既不能选择过于困难的任务，也不该满足于太过简单的事情。我认为许多研究者还需要长期管理自己的情绪状态。

learn and adjust. Yeah. I think there are just very few shortcuts for experience. I think through experience you kind of learn, you know, what's the right horizon to be thinking of a problem, but you can't pick something that's too hard, or it's not satisfying to do something that's too easy. And I think a lot of researchers managing your own emotions over a long period of time too.

Speaker 1

要知道，你会尝试很多行不通的事情。有时需要坚持到底，有时则需要转换研究方向。而判断课题趣味性这件事，你可以通过阅读优秀论文、与同事交流来培养，将他们的经验融入自己的研究流程。

You know, there's just gonna be a lot of things you try and they're not gonna work. And sometimes you know when to persevere through that or sometimes when to kinda switch to a different problem. And I think interestingness is something, you know, you try to fit through reading good papers, talking to to your colleagues, and and you kind of maybe distill their experience into your own process.

Speaker 3

我在研究生阶段深有体会——虽然我是个失败的机器学习研究者（当时攻读生物信息学），但导师研究的核心就是选择那些能让你在困境中持续攻坚的课题。你提到个有趣观点：坚信某个想法与最大限度追求真相之间存在张力，尽管这两者有时会相互影响。

When I was in grad school, there's a big part. I'm a failed machine learning researcher. I was in grad school for bioinformatics, but a big part of my research advisors thrust was about picking the right problems to work on such that you could then sustain and persist through the hard times. And you said something interesting, which was there's a difference between having conviction in an idea and then being maximally truth seeking about when it's not working. And though both those things might or sometimes intention.

Speaker 3

因为当你对某个课题产生深度认同时，可能会陷入思维定式。在选题阶段，你是否发现某些启发式方法能帮助找到信念与求真不那么对立的课题类型？

Because you kind of go native on a topic or a problem sometimes that you have deep conviction in. Have you found is there any sort of heuristics you found are useful at the taste step, at the problem picking step that help you arrive at the right set of problems where that conviction and truth seeking is not as much in zero sum tension as other kinds of problems.

Speaker 0

明确地说，我不认为信念与求真是零和博弈。你可以坚信某个想法并坚持探索，关键是要诚实地评估进展，保持从失败中学习的心态。最重要的是选择那些你真正关心且认为重要的问题，对吧？

Yeah. To to be clear, I don't think conviction and truth seeking are really in a zero sum tension. I think, like, you can be, like, you can be convinced or, you know, you can have a lot of belief in idea, and and you can be, you know, very persistent in it while it's not working, I think it's just important that you're kind of honest with yourself, like like, how much progress you're making, and you're in a mindset where you're able to learn from the failures along the way. I think it's important to look for problems that you really care about and you really believe are important. Right?

Speaker 0

我观察到许多令我敬佩的研究者都在攻坚难题——他们盯着那些公认无解的问题追问：为什么不可行？某种方法失败的根本原因是什么？始终在思考突破下一个障碍的关键。

And so I think one thing I've observed in in in in many researchers that inspired me has been really going after the hard problems, like looking at the questions that are, you know, kind of like, you know, widely known, but like not really kind of considered tractable, and just asking like, you know, why are they not tractable? Or, like, you know, what, like, what about this approach? Like, why does this approach fail? Right? You're you're always, like, thinking about what is really the barrier for the next step.

Speaker 0

如果你追逐的是自己真心认为重要的问题，那么持续多年保持研究动力就会容易得多。

If you're going after problems that, like, you really truly believe are important, right, then then that makes it so so much easier to find the motivation to persist with them over years.

Speaker 3

比如在GPT-5的再训练阶段开发过程中，是否遇到过棘手难题——最初尝试解决问题的方法行不通，但有人坚持攻克了难关？有哪些令你印象深刻的故事案例？你希望其他研究者在哪些方面能效仿这些成功经验？

And in the development of, like, during the retraining phase of g p d five, for example, are there any were there any moments where there were there was a hard problem, their original initial attempts that were being made to crack that problem weren't working, and yet you found somebody persisted through that. And what was it about those any of those stories that comes to mind that worked well that you wish other people and other researchers did more of?

Speaker 0

我认为在整个模型演进过程中，无论是预训练模型还是推理模型，一个永恒主题就是'漏洞'。嗯。既包括那些潜伏数月却足以暗中破坏所有实验数据的愚蠢软件漏洞，发现它们往往能带来研究突破；也包括认知层面的'思维漏洞'——当固有思维方式存在偏差导致错误假设时，需要彻底推翻重来的情形。

I think on the path there, right, like along the sequence of models, like above the pre trained models and the reasoning models, I think one very common theme is bugs. Mhmm. And, you know, both, like, just like, yes, silly bugs in software that can kind of stay in your software for, like, months and kind of invalidate all your experiments a little bit in a way that you don't know. And, you know, identifying them can be can be a very meaningful breakthrough for your research program. But also kind of bugs in the sense of, like, well, you have a particular way of thinking about something, that way it's a little bit skewed, which causes you to make the wrong assumptions, identifying those wrong assumptions, rethinking things from scratch.

Speaker 0

无论是实现首批推理模型还是构建更大规模的预训练模型，我想我们都多次遭遇并最终克服了这类问题。

I think, you know, both for getting the first reasoning models working or getting the, you know, larger pre trained models working, I think I think we've had, like, multiple issues like that that we've had to work through.

Speaker 4

作为研究团队的领导者，你们如何思考留住顶尖人才的关键要素？另一方面，如何打造不会因关键人员离职而崩溃的韧性组织？

As leaders of the research org, how do you think about what it takes to keep the best talent on your team? And on the flip side, creating a very resilient org that doesn't crumble if a key person leaves?

Speaker 1

OpenAI在保持人才激励方面最大的优势在于：我们从事的是基础研究。我们不会四处张望说'X公司做了什么模型？Y公司又做了什么？'，而是对要构建什么有着清晰定义。我们热爱在前沿领域创新。

The biggest, I think, things that OpenAI has going for it in terms of keeping the best people motivated and exciting cited is like we are in the business of doing fundamental research, right? We aren't the type of company that looks around and says, oh, what model did company X build? Or what model did company Y build? You know, we have a fairly clear and crisp definition of what it is we're out to build. We like innovating at the frontier.

Speaker 1

我们极其厌恶模仿。这个使命本身就能激励人心——你真正投身于深度学习栈的新发现，我们正在共同构建非凡的事物。除此之外，营造优质文化至关重要：我们要建立培养优秀研究者的完善管道。

We really don't like copying. And I think people are inspired by that mission, right? You are really in the business of discovering new things about the deep learning stack, and I think we're kind of building something very exciting together. I think beyond that, a lot of it's creating very good culture. So we want a good pipeline for training up people to become very good researchers.

Speaker 1

从历史经验看，我们始终招募最具创新力的顶尖人才。因此我认为...

We, I think historically, have hired you know, the best talent and and the most innovative talent. So I just think,

Speaker 0

你知道，我们有一个

you know, we have a

Speaker 1

非常深厚的团队储备。是的，我认为我们大多数领导者都深受使命感的激励，这也是他们一直留在这里的原因。比如，当我审视我的直接下属时，他们都没有受到泰隆战争的影响。

very deep bench as well. And, yeah, I think most of the our leaders are very inspired by the mission, and that's what's kept all of them there. Like, when I look at my direct reports, they haven't been affected by the Talon Wars.

Speaker 4

最近我和一位研究员聊天，他提到想找到那些‘洞穴居住者’。这些人通常不会在社交媒体上发布自己的工作内容，出于各种原因，他们甚至可能不发表论文。他们更像是默默在幕后工作的人。不知道你是否认同这个概念，但你们是如何招聘研究员的？

I was chatting with a researcher recently, and he was talking about wanting to find the cave dwellers. And these are often the people who are not posting on social media about their work. For whatever reason, they may not even be publishing. They're sort of in the background doing the work. I don't know if you would agree with this concept, but how do you guys hire for researchers?

Speaker 4

你们在寻找人才时，有没有什么不太明显的方法或特质是你们特别关注的？

And are there any non obvious ways that you look for talent or, you know, attributes that you look for that are non obvious?

Speaker 0

我认为我们看重的一点是，候选人是否在任何领域解决过难题。我们许多最成功的研究员最初是在OpenAI开始深度学习之旅的，他们之前从事过物理学、计算机科学、理论计算机科学或金融等领域。过去，扎实的技术基础加上致力于解决雄心勃勃的问题并坚持到底的意愿，是我们看重的。我们不会单纯寻找那些在社交媒体上最显眼或作品曝光度最高的人。

So I think I think one thing that we look for is having solved hard problems in any field. A lot of our most successful researchers have started their journey with deep learning at OpenAI and have worked in other fields like physics or computer science, theoretical computer science, or finance. In the in the past, strong technical fundamentals coupled with the the intents to, like, work on very ambitious problems and and actually stick with them. We don't purely look for, oh, you know, who did the most visible work or or or or is the most visible on social media.

Speaker 3

或者，是的。听你这么说，我不禁回想起我自己创业经营公司时，我们在招募优秀工程师时，心里想的许多特质和你描述的如出一辙。埃隆最近发推文说，他认为研究员和工程师之间的区分很愚蠢。这只是语义上的挑剔，还是你觉得这两者实际上比表面看起来更相似？

Or Yeah. As you were talking, I I I was thinking back to when I when I was a founder and I was running my own company and we would recruit for great talent engineers, many of the attributes you described were ones that were on my mind then. And Elon recently tweeted that he thinks this whole researcher versus engineer distinction is silly. Is that just a semantic is he just being, you know, semantically nitpicky, or do you think these two things are more similar than they actually look?

Speaker 1

是的。我是说，我认为研究员并不只有一种类型。在OpenAI，我们有些研究员非常擅长创意生成，他们并不需要通过实现所有想法来展现巨大影响力。他们产生的价值更多在于不断提出‘我们试试这个’或‘也许可以考虑那个’的点子。

Yeah. I mean, I I do think there can like, researchers, they don't just fit one shape. You know, we have certain researchers who are very productive at OpenAI who are just so good at idea generation, and they don't necessarily need to show great impact through implementing all of their ideas. Right? I think there's so much alpha they generate in just kind of coming up with, oh, let's try this, or let's try this, or maybe we're thinking about that.

Speaker 1

还有一些研究者，他们非常擅长围绕一个核心想法，严谨地探索相关实验空间。我认为研究者类型各异，第一种类型未必能直接归类为优秀工程师。但我们确实努力保持研究品味与风格的多样性。

And there's other researchers who, you know, they are just very, very efficient at taking one idea, rigorously exploring, you know, the space of experiments around that idea. So I think, you know, researchers come in very different forms. I think maybe that first type wouldn't necessarily map into the same bucket as a great engineer. But we do kind of try to have a fairly diverse set of research tastes and styles. Yeah.

Speaker 3

能否谈谈如何打造一个能吸引各类研究者、促进其成长并实现规模化共赢的前沿文化？你认为制胜文化最关键的要素是什么？

Say a little bit about what it takes to make like a create a frontier sort of winning culture that can attract all kinds of shapes and of researchers and then actually grow them, thrive them, make them win together at scale? What is it what what do think are the most critical ingredients of a winning culture?

Speaker 1

我认为最关键的是确保基础研究得到保护。当今许多公司都陷入这样的思维：如何在某款聊天产品或产品界面上竞争。但必须为研究保留空间，认清其本质价值，并给予相应的发展自由度。

So I I think actually the most important thing is just to make sure you protect fundamental research. Right? I think you can get into this world with so many different companies these days where you're just thinking about, oh, how do I compete on, you know, a chat product or some other kind of product surface? And you need to make sure that you leave space and recognize the research for what it is, and also give them the space to do that. Right?

Speaker 1

不能让研究者被各种产品需求分散精力。这是我们文化中特别注重的一点。

Like, you can't have them being pulled in all of these different product directions. So I think that's one thing that we pay attention to within our culture.

Speaker 0

尤其在OpenAI和整个AI领域备受瞩目的当下，实验室间的竞争很容易让人陷入'追赶最新发布'的心态。确实有些领域会让人开始左顾右盼。我们的重要职责就是确保团队有足够空间思考：一两年后真正的技术图景是什么？哪些是值得攻关的重大课题？如何突破现有范式实现质的飞跃，而非渐进式改良。

Especially now that there's so much spotlight on OpenAI, so much spotlight on AI in general, and and the competition between different labs, it would be easy to fall into a mindset of, like, oh, we're racing to beat beat this latest release or something. And and, you know, there's definitely, like, areas that people kind of start looking over their shoulder and start thinking about, oh, you know, what are these other things? And and I see it as a large part of our job to make sure that people have this comfort and space to think about, you know, what what are things actually going to look like in a year or two. Like, what are the actually big research questions that we want to answer, and and how do we actually get to models that, like, vastly outperform what we see currently rather than just, like, iteratively improving in the current paradigm.

Speaker 4

关于基础研究保护的问题。你们既是顶尖研究机构，又是顶级产品公司——还引进了世界级产品高管。在保护基础研究的同时，如何平衡两者关系并推动产品发展？

Just to pull on that thread more on protecting fundamental research. You guys are obviously one of the best research organizations in the world, but you're also one of the best product companies in the How do you balance- and especially with- you've brought on some of the best product execs in the world as well. How do you balance that focus between the two? And while protecting fundamental research, continue to move forward the great products that you have out.

Speaker 1

关键在于明确区分两类研究者：真正关心产品并对产品成功负责的群体。他们当然需要与整体研究工作紧密协作，但明确各自职责范围与激励标准至关重要。

Yeah. I mean, I think it's about kind of delineating a set of researchers who really do care about product and who really want to be accountable to the success of the product. And And they should, of course, very closely coordinate with the research work at large. But I think just kind of people understanding their mandates and what they are rewarded for, that's a very important thing.

Speaker 0

我认为还有一点很重要的是，我们的产品团队和更广泛的公司领导层都认同这一愿景，即我们在研究领域的发展方向。所以，你知道，没有人认为我们现在拥有的产品会一成不变，只需等待研究部门推出新版本。相反，我们能够共同思考未来的模样。

One thing that I think is also helpful is that our product team and broader company leadership is bought into this vision, right, where we are going with research. And so, you know, nobody is assuming that, like, oh, the product we have now is the product we'll have forever, and we'll just kind of wait for, like, you know, new versions from research. Like like, we we are able to think jointly about what the future looks like.

Speaker 3

你们所做的一件事是让如此多样化的想法和尝试在OpenAI内部蓬勃发展，然后作为研究领导者，你们需要找到某种方式，使所有这些在路线图中形成连贯的整体。这边有人在研究扩散模型和视觉媒体的未来，那边还有人在探索代码推理的未来。你如何描绘这一切的连贯图景？当给予研究者进行基础研究的独立性，与将这些研究整合成一个连贯的研究计划之间可能存在张力时，这一切又如何协调？

One of the things that you guys have done is let such a diversity of different ideas and bets flourish inside of OpenAI that you then have to figure out some way as research leaders to to make it all make coherent sense as one part of a roadmap. And you've got people over here investigating the future of diffusion models and visual media. And over here, you've got folks investigating the future of reasoning when it comes to code. How do you paint a coherent picture of all that? How does that all come together when there might be at least naively some tension between giving researchers the independence to go to fundamental research and then somehow making that all fit into one coherent research program?

Speaker 3

我们研究计划的公开目标一直是致力于实现自动化研究

Our stated goal for our research program has been getting to an automated researcher for

Speaker 0

已有几年时间。因此，我们大多数项目都是围绕这一目标构建的。这仍然为自下而上的创意产生、各领域的基础研究留出了大量空间，但我们始终在思考这些想法最终如何汇聚。比如，我们相信推理模型会走得更远，虽然我们有很多探索并不直接涉及推理模型，但我们一直在思考它们最终如何结合。当面对一个非常棘手的问题时，这种创新会是什么样子。

a couple years now. And so we've been we've been building most of our projects with this goal in mind. And so this still leaves a lot of room for kind of bottom up idea generation, for fundamental research on on various domains, but we are, you know, always thinking about how do these ideas come together eventually. We are you know, we we believe, for example, that reasoning models go much further, and we have a lot of explorations on things that are not directly reasoning models, but we are thinking a lot about how they eventually combine. And, you know, what does what what will this kind of innovation look like once you have something that is out there and thinking for for for moms about a very hard problem.

Speaker 0

因此，我认为明确我们的长期目标很重要。但这并不意味着我们对所有细节都采取规定性态度。我们确实将此视为探索和学习这些技术的过程。嗯。

And so I think this clarity of of, like, our long term objectives is important. But, yeah, but it doesn't doesn't mean that we are, you know, prescriptive about, like, oh, here are all the little pieces. Right? Like, we definitely view this as a as a question of of exploration and learning about about these technologies. Mhmm.

Speaker 1

是的。我认为你需要在宏观层面上持有明确观点和指导性，但在更细致的层面上，许多想法可以自然涌现。

Yeah. I think you wanna be opinionated and prescriptive about their very kind of course level, but, you know, a lot of ideas can bubble up in a finer level.

Speaker 3

最近是否有过这些方面产生冲突的时刻？一个可能引发争议的例子是谷歌最近推出的新图像模型Nano Banana，它展示了当模型擅长理解编辑时，普通大众能释放巨大创造力。这可能会对一个未直接优先考虑该方向的研究计划造成压力。如果你的团队中有才华的成员过来说，这东西在现实世界中价值如此明显，我们应该投入更多精力和资源。

And have there been any moments where those things have been intentioned at all recently? One provocative example could be recently, this new image model came out, which is Nano Banana, right from Google. It's extraordinary value shown that lots of everyday people can unlock a lot of creativity when these models are good And at understanding editing I could see how that would create some tension for a research program that may not be prioritizing that as directly. If if if one of your you know, somebody talented on your team came and said, guys, like, this thing is so clearly valuable in the world out there. We should be spending, you know, more effort, more energy on this.

Speaker 3

你是如何思考那个问题的？

How do you reason about that question?

Speaker 0

我认为这绝对是OpenAI团队长期思考的问题。你看GPT-3时就会发现，当我们意识到语言模型将朝这个方向发展后，我们就不断在讨论：显然AI能实现无数神奇功能。这些极其智能的模型不仅能推动科学各领域发展，还将带来革命性的媒体生成和娱乐应用。因此如何在这些方向中确定优先级，确实是我们长期思考的课题。

I think that's definitely a question that we've been kind of thinking about for quite a while at OpenAI. I mean, if you if you look at GPT-three, right, like like, once we kind of saw, like, oh, like, this is kind of where language models are going, we we definitely have had a lot of discussions about, well, clearly, there are going to be so many magical things you can do with AI. Right? And you will will be able to go to this, like like, extremely smart models that are, you know, out there pushing different tiers of science, but you will also have this, like, incredible media generation and this incredibly, you know, transformative entertainment applications. And so, like, how do we prioritize among all these directions has definitely been something we've been we've been thinking about for for for quite a while.

Speaker 1

确实如此。真正的答案是：我们不会打击任何人的热情。只要我们在优先级和产品战略上保持一致性，这些自然会水到渠成。所以我们鼓励大家积极开发这类产品，无论是智能代理产品还是他们热衷的任何产品。

Yeah. Absolutely. And and the real answer is like, we don't discourage someone from being really excited by that. And and it's just if we're consistent in the prioritization and our product strategy, then it just will naturally fall in. And so it's just for us, like, we do encourage a lot of people to be excited about building this, you know, or building kind of agentic products, you know, whatever kind of products that they're excited by.

Speaker 1

但我认为同样重要的是，需要专门设立一个受保护的团队，他们的核心使命是推动算法突破。

But I think it's important for us to also have a a separate group of people who you you protect that. Their goal is to create the algorithmic advances.

Speaker 4

接着安德烈的问题，这如何转化为具体的资源分配框架？比如是否考虑将X%算力投入长期探索——虽然重要但可能更天马行空，同时兼顾产品推理需求，以及那些中短期可实现的项目？

How does that translate, just to build on Andre's question, into a concrete framework around resourcing? Like, do you think about, okay, x percent of compute resources will go to longer term, you know, very important, but maybe a bit more pie in the sky exploration versus there's also, you know, obviously, product inference, but sort of this thing in the middle where it's achievable in the short to medium term.

Speaker 1

是的，我认为这正是我们双方工作的核心部分。

Yeah. So I think that's a big part of both of our jobs.

Speaker 0

你明白吗？

You know?

Speaker 1

就是这个关于如何分配计算资源给不同项目的组合管理问题。我认为历史上，我们在核心算法进步上投入了比产品研究稍多的资源，但这需要随时间调整。对吧？这是动态的。我认为每个月都可能有不同的需求。

Just this portfolio management question of how much compute do you give to which project. And I think historically, we've put a little bit more on just the core algorithmic advances versus kind of the the product research, but it's something that you have to feel out over time. Right? It's it's dynamic. I think month to month, there could be different needs.

Speaker 1

因此我认为在这方面保持相当的灵活性很重要。

And so I think it's important to stay fairly flexible on that.

Speaker 4

如果你有额外10%的资源，你会将其投入计算资源、数据整理还是人力？从边际效益来看，你会优先考虑哪个方面？

And if you had 10% more resources, would you put it toward compute, or is it data curation, people? Where would you stick that from, like, a marginal

Speaker 1

好问题。说实话，是的，我认为

Good question. Honestly, yeah, I think

Speaker 3

计算资源

compute's

Speaker 0

目前是计算资源。相当合理的回答。是的。没错。

a Compute today. Fairly reasonable answer here. Yeah. Yeah.

Speaker 1

说实话，我认为关于优先级的这个问题，对吧？就像在真空环境下，你会希望在所有方面都表现出色并取得成功。但危险在于你可能在每件事上都屈居第二，没有在任何领域明确领先。所以我认为明确优先级很重要，对吧？你需要确保在某些事情上保持清醒，这就是我们必须赢得的领域。

I mean, honestly, I I do think kind of to your question of prioritization, right? It's like in a vacuum, any of these things you would love to like go and excel and win at. I think the danger is you end up like second place at everything and you know, not like clearly leading at anything. So I think prioritization is important, right? And you need to make sure there's some things you're clear eyed on, this is the thing that we need to win.

Speaker 1

是啊。

Yeah.

Speaker 3

但我觉得有必要再稍微讨论一下算力问题——某种程度上，算力几乎决定了命运，对吧？在OpenAI这样的研究机构里。几年前，人们开始流行说‘我们短期内不会受算力限制，因为芯片制造技术在进步，算法效率在提升，最终我们只会受限于数据’。但几年过去了，现实似乎仍是算力严重受限的环境。

But I think it makes sense to talk about it for just a little bit more, which is compute so much of compute is destiny in a way, right? At a research organization like OpenAI. And so a couple of years ago, I think it became very fashionable to say, oh, okay, we're not gonna be compute constrained anytime soon because there's a bunch of CMs that are, you know, people are discovering and we're gonna get more efficient and all the algorithms are gonna get better. And then eventually, like, really, we'll just be in a data constrained regime. And it seems like, you know, a couple of years have come and gone, and we're still like, this is sort of very compute constrained environment.

Speaker 3

你觉得这种情况短期内会改变吗？还是说...

Does that change anytime soon, you think? Or

Speaker 0

我的意思是...我们见证算力潜力已经够久了。我始终不太相信‘我们将受限于数据’的说法，也不认为这种状况会改变。

I mean, I I think, like, we've seen for long enough, like, how much we can do with compute. Yeah. I I haven't really bought that much into the, like, will be data constrained claim. And, I don't I don't I don't expect that to change.

Speaker 1

没错。说这种话的人真该来替我工作一周。历史上从没有人说过‘我的算力完全够用’。

Yeah. Anyone who says that should just step into my job for a week. There's no one who's like, I have all the compute that I need. Historically,

Speaker 3

推动基础研究的使命传统上主要由大学承担，部分原因正是你刚才提到的算力限制。但前沿AI领域打破了这种模式——你们出色地引领了AI进步曲线来助力科学发展。我好奇当这两个世界碰撞时——当今大学的基础研究与前AI研究——会产生什么火花？

job of advancing fundamental research has historically been largely a mandate that universities have had, Partly for the compute reasons you just described, that hasn't been the case for Frontier AI. You guys have done such an incredible job kind of channeling the arc of Frontier AI progress to help the sciences out. And I'm wondering when those worlds collide, the fundamental world of university research today and the world of frontier AI, what comes out?

Speaker 1

我个人最初是OpenAI的驻场研究员，这个项目旨在让不同领域的人快速掌握AI知识并成为研究者。这个项目有很多强大之处，其核心理念是：能否用最短时间完成类似博士的培养过程？我认为关键就在于大量实践核心研究成果——当然过程中难免会犯错。

So I guess I personally started as a resident at OpenAI, and it's a program that we had for people in different fields to come in, learn quickly about AI, and become productive as a researcher. And I think there is a lot of really powerful elements in that program. And the idea is just like, you know, could we accelerate something that looks like a PhD in as little time as possible? And I think a lot of that just looks like implementing a lot of very core results. And through doing that, you're gonna make mistakes.

Speaker 1

你会觉得，哇，这太神奇了。就像培养一种直觉，如果我设置错了，你知道，那可能会以某种方式毁掉我的网络。所以你需要大量的实践经验。我认为随着时间的推移，所有这些大型实验室可能都已经开发出了关于优化、架构和强化学习的课程。确实，没有什么比尝试实现很多东西、阅读相关资料并批判性思考更好的方法了。

You're gonna be like, oh, wow. Like, build intuition for if I, you know, set this wrong, like, that's gonna blow up my network in this way. And so you just need a lot of that hands on experience. I think over time, there have been curriculums developed at probably all of these large labs in optimization and architecture and RL. And yeah, probably no better way than to just kind of try to implement a lot of those things and read about them and think critically about them.

Speaker 0

是的，是的。我想在学术界你还能体验到的一点是那种坚持，对吧？比如，你有几年的时间去尝试解决一个问题，而且这是个难题，你以前从未处理过这么难的问题。而且，我确实觉得现在进步的速度非常快。也许现在的想法比过去更容易成功，因为深度学习就是想要学习。

Yeah. Yeah. I think maybe like one other nice thing that you get to experience at academia is like, yeah, this like persistence, right, of like, oh, you know, you have a few years and you're kind of trying to solve a problem, and it's a hard problem, and you've never dealt with such a hard problem before. And, yeah, I do feel like this is a thing that's, like well, currently, the pace of progress is very fast. Maybe also the ideas tends to work out a little bit more often than they did in the past because, yeah, deep learning just wants to learn.

Speaker 0

并且，稍微接触一些更具挑战性的问题，也许成为团队的一部分去攻克一个雄心勃勃的挑战，体验那种卡住的感觉以及最终取得进展的感觉，我认为这也是非常值得学习的东西。

And getting your hands on on a more challenging problem for for a little bit, maybe, you know, being part of a team attacking, like, an ambitious challenge and and and, you know, getting that feeling of, you know, what what it feels like to be stuck and what it feels like to finally be making progress, I think is is also something that's, like, very useful to learn.

Speaker 4

外部对某个产品发布的看法如何影响你的优先级设定？在看法和使用紧密结合的情况下，显然会有明确的指导方向。但如果它们有些脱节，这是否会影响你对路线图的思考或资源分配的重点？

How does external perception reception of a particular product launch impact how you prioritize something? Is that is it to the extent where, you know, perception and usage in the case where they're married, obviously, there's probably a clear directive there. But in a case where maybe they're divorced a bit, does that impact how you think about road map or where you emphasize resources?

Speaker 0

所以我们通常对未来有相当坚定的信念，因此我们不会把它们与产品的短期反响紧密挂钩。当然，我们会根据实际情况学习，阅读其他论文，关注其他实验室的工作，但总体上，我们行动的基础是对我们所构建的东西有很强的信念。当然，这是针对我们的长期研究计划，至于产品方面...

So we we generally, like, have some pretty strong convictions about the future, and so we we don't tie them that closely to, like, the short term reception of our products. Right? Like, of course, we, you know, learn based on what is going on. We, you know, read other papers, and we we we look at, like, what other labs are working on, but but generally, like, we we act from a place of of of fairly strong belief in and and in what we're building. And so, of course, like, you know, that that is for our long term research program, of course, when it comes to product.

Speaker 0

对吧？我认为迭代周期要快得多。

Right? Like, I think the the cycle of iteration is much much faster.

Speaker 4

嗯，是的。

Mhmm. Yep.

Speaker 1

是的。我认为，每次产品发布，我们都力求在功能层面取得巨大成功。从基础研究的角度看，我们致力于打造具备构建丰富体验和产品所需核心能力的模型。总有人会对某个特定应用有所构想，而我们推出的每款产品，都真心希望它能大获成功，并从中获得反馈。

Yeah. I think, you know, with every launch, you know, we are trying to aim it to be something that's wildly successful on the product side. And I think from a fundamental research perspective, we're trying to create models with all of the kind of core capabilities needed to build a very rich set of experiences and products. And there are gonna be people who have some vision of one particular thing they could build, and we'll launch it, and everything we launch, we really hope it goes wildly successful. And we get that feedback.

Speaker 1

若未达预期，我们会适当调整产品策略。但毫无疑问，我们始终专注于推出极其实用且大获成功的产品。

If it's not, we'll kind of shape our product strategy a little bit. But yeah, are definitely also in the business of launching very useful, wildly successful products.

Speaker 3

由于我们刚才讨论的这种毫无约束的进步速度，感觉未来几年将发生翻天覆地的变化，对吧？预测变得极其困难——想象十年后的情形都难，更别说十个月后了。所以我的问题是：在AI前沿带来的所有这些变革中，您认为哪些基本原则应该保持不变？显然算力不足是其一。

It feels like because of the sort of completely unbridled pace of progress that we've just spent a lot of time talking about, a lot is gonna change over the next few years, right? It gets really hard to predict. I imagine ten years out, let alone, ten months out. And so my question I guess is through all that change that the frontier of AI is going to bring, what are some priors that you actually think should stay constant? Is there anything well, one clearly is that we don't have enough compute.

Speaker 3

您认为还有哪些不会改变、值得坚定秉持的基本原则吗？

Is there anything else that you think doesn't change that you think would be strong, reasonably held priors as constant?

Speaker 0

我认为比算力更广泛的是物理限制，比如能源问题。而且不久之后，机器人技术将成为重点领域。所以考量物理限制将持续重要。但在智能发展方面，我不会做太多假设。

I think more broadly than compute, there's physical constraints of, well, energy, but also, like, you know, at some point, not too far, like, robotics will become a major focus. And so so I think I think thinking about, like, the the physical constraints is got is is is going to remain important. But, yeah, I do think on the intelligence front, I would not make too many assumptions.

Speaker 4

很少有初创企业能像你们这样，在员工规模和营收体量上都达到如此高度，同时保持你们七八年前刚加入时那种惊人的发展速度。实现这一点的秘诀是什么？如今已处于行业顶端，又如何持续保持这种快速迭代的压力？

Very few startups can get to the scale that you have both from a, you know, employee perspective, but also revenue count and maintain that breakneck speed that you probably had, I mean, seven, eight years ago when you when you both joined. What's the secret sauce to doing that? And how do you continue to maintain this pressure almost to to ship as quickly as possible even though, you know, you're kind of on, you know, top now?

Speaker 1

我认为良好研究文化最显著的标志之一（至少在我看来）是：我曾任职多家公司，确实存在学习瓶颈——入职头一两年学到很多，之后就会陷入'我已熟悉这套工作模式'的停滞期。但在OpenAI从未有此感受。正如你描述的这些不断涌现的突破性成果，每周都能学到新东西，要跟上节奏本身就是份全职工作。

I think one of the clearest markers that we have really good research culture, at least in my mind, is you know, I've worked at different companies before, and there is a real thing which is a learning plateau, right? You go to a company, you learn a lot for the first one or two years, and then you just find kind of like, you know, I know how to be fairly efficient in this framework, and learning kind of stops. And I've really never felt that at OpenAI. Just like that experience you described of all these really cool results bubbling up, you're just learning so much week over week. It is a full time job to kind of stay on top of all it.

Speaker 1

这一切都让人感到非常充实。是的，我认为这个描述非常准确。我们只是希望产出大量真正高质量的研究，如果研究成果多到几乎应接不暇，这反而是件好事。

And that's just been very fulfilling. So yeah, no, I think that's a very accurate description. We just want to generate a lot of really high quality research, and it's almost a good thing if you're generating enough that you're barely able to keep on top of it.

Speaker 4

没错，正是如此。

Yeah, exactly. Yeah.

Speaker 0

我认为技术的发展绝对是主要驱动力。要知道，或许在某个研究范式上工作几年后我们会感到安逸，但我们始终处于突破的边缘，不断尝试围绕新约束和新可能性重新构建思维框架。因此这种持续变化的状态造就了永远在学习新事物的心态。

I think definitely the development of technology, I think, is a driving force here, where, you know, maybe, yeah, maybe we would kind of become comfortable after, like, a few years working on a given paradigm, but we are always on the cusp of that, you know, new and, you know, trying to reconfigure our our thinking around the kind of new constraints and new possibilities that we're gonna be faced with. And so I think I think that kind of creates this this feeling of constant change and and the and the mindset of of, like, always kind of learning the new thing.

Speaker 3

有件事我们在研究中发现，在OpenAI诸多变化中始终未变的是你们两位之间的信任。最近《麻省理工科技评论》对你们的专题报道也重点提到，你们之间的默契与信任已被许多OpenAI成员视为恒量。能说说这段信任是如何建立的吗？

Well, you know, one thing that came up in our research about things at OpenAI that have not changed through a lot of the change is the trust that the two of you guys have in each other. Because I think there was an article or profile of you guys recently in the MIT tech review. And that was also one of the highlight themes that your chemistry, trust with each other, your oppose something a lot of the people at OpenAI have come to treat as a constant. So what's the backstory? How'd you guys build trust there?

Speaker 3

这是怎么发生的？就像《当哈利遇到莎莉》里的情节——感觉你们现在应该坐在沙发上回忆往事了

How did that happen? Just like asking you to have you ever seen that when Harry met Sally? I feel like you're on the couch, and now you gotta

Speaker 4

你们的浪漫初遇是怎样的？

What's your meet cute?

Speaker 1

确实如此。我们开始更密切合作是在最初研究推理能力的阶段。当时这并非热门研究方向，但我们都在其中看到了希望曙光，于是共同探索如何实现强化学习的突破。

Yeah. Exactly. Well, I do think, know, we started working together a little bit more closely when we kind of had the first seeds of working on reasoning. I think, you know, we at the time, you know, that wasn't a very popular research direction to work on, and I think both of us kind of saw glimmers of hope there. And, you know, we were kind of pushing in this direction, kind of figuring out how to make RL work.

Speaker 1

是的，我认为随着时间的推移，将非常微小的努力逐渐发展成越来越大的成果。正是在这个过程中，我得以与雅各布深入合作。他真是一位杰出的研究者，在任何排名榜单上都应该名列前茅。他那种将任何技术难题视为个人挑战，思考两周后就能完美解决的能力令人叹服。

And yeah, I think over time kind of growing a very small effort into an increasing larger effort. And I think that's kind of where I, yeah, really got to kind of work with Jakob in-depth. Think he's just really a phenomenal researcher. I think, you know, any of these ranked lists, like he should be number one. Like just his ability to take any very difficult technical challenge almost like personally just kind of think about it for two weeks and and just crush it.

Speaker 1

令人难以置信的是，他既拥有广博的知识面，又能深入钻研，亲自解决许多具有第二世界难度的挑战。

It's incredible that he has kind of the the wide range that he does in terms of understanding, as well as that kind of depth that you can go and just personally solve a lot of these second world challenges.

Speaker 3

现在轮到你说些好话了。

Now you get to say some nice stuff.

Speaker 1

关于你。我可以对你说任何赞美之词。

About you. I can say anything nice about you.

Speaker 0

是的。谢谢，马克。我认为我们合作的第一个重大突破是当我们意识到这个算法可能有效时，我开始思考如何引导团队。

Yeah. Thanks, Mark. Yeah. Yeah. I I think I think the big kind of the first, like, big thing that we did did together was, like, we started seeing, like, okay, like, we think this algorithm is going to work, and so, you know, I was thinking, like, okay, like, how do we, you know, direct people at this?

Speaker 0

我们和马克讨论后决定组建专门团队来实现它。马克真的做到了——他将从事不同领域的人聚集起来，打造出具有惊人默契的团队，这让我印象深刻。能与马克共事并见证他既精通技术研究，又具备卓越领导才能，在混乱中构建出条理清晰的组织架构，实在令人振奋。

And we were talking with Mark, like, oh, we should establish a team that's actually going to make this work. And then, you know, Mark and Mark went and actually did this, right, like, actually kind of, like, got a group of, like, people working on very different things, like, got them all together, and created a team with, like, incredible chemistry out of, like, this whole disparate group, and that was, like, such an impressive thing to me. And, yeah, I'm I'm really grateful, and as far as I kind of get to, you know, work with Mark and kind of experience that, yeah, I think this incredible capacity to both, you know, understand and get engage and and and and, you know, think about the the technical matter of the research itself, but then coupled with this, like, great ability to lead and inspire teams and create an organizational structure that, you know, in this whole kind of mess of chaotic directions, actually, like like, is coherent and and able to gel together. Yeah. Very, very inspiring.

Speaker 3

太棒了。那么，就说到这里吧

It's awesome. Well, on that note

Speaker 4

完美的收尾。确实。

Great note to end on. Yeah.

Speaker 3

科学史上一些最伟大的发现，尤其在物理学领域，往往来自跨大学、跨领域的合作者搭档，而你们似乎延续了这一传统。我们非常感激你们今天抽空参与对话。感谢你们的到来。谢谢。感谢你们

This is some of the greatest discoveries in science, especially in physics, have often come from a pair of collaborators, often across universities, across fields, and it seems like you guys have have now added to that tradition. And so we're just super grateful that you guys made the time to chat today. Thanks for coming by. Thank you. Thanks for being

Speaker 0

与我们同在。

with us.

Speaker 2

感谢收听本期a16z播客。若喜欢本期内容，请点赞、评论、订阅、留下评分或评价，并与亲友分享。更多节目请访问YouTube、Apple Podcasts和Spotify。关注我们的X账号@a16z，订阅我们的Substack专栏a16z.substack.com。再次感谢收听，下期节目再见。

Thanks for listening to this episode of the a 16 z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or a review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on x and a 16 z, and subscribe to our Substack at a16z.substack.com. Thanks again for listening, and I'll see you in the next episode.

Speaker 2

提醒：本内容仅作信息参考，不作为法律、商业、税务或投资建议，亦不用于评估任何投资或证券，且非针对任何a16z基金的现有或潜在投资者。请注意a16z及其关联机构可能持有本播客讨论企业的投资。详情请访问a16z.com/disclosures查看投资披露。

As a reminder, the content here is informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any a sixteen z fund. Please note that a sixteen z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a 16z.com forward slash disclosures.