BG2Pod with Brad Gerstner and Bill Gurley - OpenAI企业内幕:前沿部署工程、GPT-5及更多 | BG2嘉宾访谈 封面

OpenAI企业内幕:前沿部署工程、GPT-5及更多 | BG2嘉宾访谈

Inside OpenAI Enterprise: Forward Deployed Engineering, GPT-5, and More | BG2 Guest Interview

本集简介

比尔·格利与布拉德·格斯特纳双周科技对谈:聚焦技术、市场、投资与资本主义。本期由Altimeter的阿普尔夫·阿格拉瓦尔客串主持,携手OpenAI平台工程负责人Sherwin Wu与产品负责人Olivier Godement,探讨OpenAI如何重塑企业生态。从T-Mobile的AI语音支持到安进的药物突破,再到洛斯阿拉莫斯国家实验室的气隙超级计算机——本集深入揭秘规模化AI的真实应用场景。欢迎收听新一期BG2! 时间戳: (00:00) 开场 (01:50) OpenAI的企业使命:超越ChatGPT (06:00) 案例研究:T-Mobile<>语音与客服系统 (11:30) 案例研究:安进<>加速药物研发 (13:45) 案例研究:洛斯阿拉莫斯国家实验室 (17:00) 为何95%的AI部署会失败? (20:30) 物理与数字自主权:架构与基础设施 (26:00) GPT-5:发布时间、基准测试与行为表现 (30:00) GPT-5反馈:指令遵循、幻觉问题、代码质量 (33:00) 多模态:文本、语音与视频 (35:30) 音频:实时API与拼接音频 (38:00) 模型定制与强化微调(RFT) (43:00) 快问快答:多空观点 (1:03:00) OpenAI的高光与低谷时刻 节目备注: T-Mobile合作详情:https://www.t-mobile.com/news/business/t-mobile-launches-intentcx-with-openai 安进合作详情:https://openai.com/index/gpt-5-amgen/ 洛斯阿拉莫斯合作详情:https://www.lanl.gov/media/news/0130-open-ai MIT AI报告:https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf 制作:丹·舍夫丘克 音乐:扬·斯皮尔伯格 收听平台:Apple、Spotify、www.bg2pod.com 关注: 布拉德·格斯特纳 @altcap https://x.com/altcap 比尔·格利 @bgurley https://x.com/bgurley BG2播客 @bg2pod https://x.com/BG2Pod 阿普尔夫·阿格拉瓦尔 @apoorv03 https://x.com/apoorv03 Sherwin Wu @sherwinwu https://x.com/sherwinwu Olivier Godement @oliviergodement https://x.com/oliviergodement

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

我们真的不得不把模型的权重物理性地搬进他们的超级计算机里。

We literally had to bring the weights of the the model, physically into their supercomputer.

Speaker 1

在旧金山,你可以乘坐完全自动驾驶的汽车从城市一端到另一端。与数字世界相反,我现在甚至无法在线订票。2025年,物理自主性已经领先于数字自主性。

In San Francisco, you could take a car from one part of SF to the other fully autonomously. As opposed to the digital world, I can't book a ticket online right now. Physical autonomy is ahead of digital autonomy in 2025.

Speaker 0

我认为AI智能体现在还处于非常初级的阶段。就像ChatGPT直到2022年才问世。我认为发展曲线非常陡峭。实际上,我觉得自动驾驶汽车在现实世界中有相当完善的基础设施。你有道路。

I think AI agents are, like, really in day one here. Like, ChatGPT only came out in 2022. The slope, I think, is incredibly steep. I actually do think self driving cars have a good amount of scaffolding in the world. You have roads.

Speaker 0

道路是存在的。它们相当标准化。你有交通信号灯。而AI智能体就像是直接被丢到了荒郊野外。

Roads exist. They're pretty standardized. You have stoplights. AI agents are just kind of dropped in the middle of nowhere.

Speaker 1

我们将从长短游戏开始。

We'll start with long short game.

Speaker 0

我对整个工具类、评估产品类别持看空态度。

I'm short on the entire category of, like, tooling, EVALs products.

Speaker 2

医疗保健可能是从AI中受益最多的行业。我想我是AJA Pill。

Healthcare is probably the industry that will benefit the most from AI. I think I'm AJA Pill.

Speaker 0

你绝对是AJA Pill。

You're definitely AJA Pill.

Speaker 2

第一个是2023年意识到我再也不需要手动编码了,永远、永远都不需要了。

The first one was the realization in 2023 that I would never need to code manually, like, ever, ever again.

Speaker 1

大家好,我是Apoor Vagarwal,今天在OpenAI办公室,我们进行了一场关于OpenAI在企业领域工作的广泛对话。与我一起的是OpenAI平台工程主管和产品主管Shervin Wu和Olivia Goddent。OpenAI以创建ChatGPT而闻名,这是全球数十亿人喜爱和享受的产品。但今天我们深入探讨业务的另一面,即OpenAI在企业领域的工作。我们深入了解他们与特定客户的工作,以及OpenAI如何改变医疗保健、电信和国家安全研究等重要行业。

Hey folks, I'm Apoor Vagarwal, and today at the OpenAI office, we had a wide ranging conversation about OpenAI's work in enterprise. I have with me the head of engineering and head of product of the OpenAI platform, Shervin Wu and Olivia Goddent. OpenAI is well known as the creator of ChatGPT, which is a product that billions across the world have come to love and enjoy. But today we dive into the other side of the business, which is OpenAI's work in enterprise. We go deep into their work with specific customers and how OpenAI is transforming large and important industries like healthcare, telecommunications, and national security research.

Speaker 1

我们还讨论了Sherwin和Olivier对AI未来、技术未来的展望,以及他们在长期和短期方面的选择。这非常有趣。希望大家喜欢。两位世界级的建设者,两位让建设看起来轻松的人。Sherwin,我2013年在Palantir的同学,网球伙伴,在加入OpenAI之前曾在Quora和Opendoor工作直至IPO,那时还没有ChatGPT。

We also talk about Sherwin and Olivier's outlook on the next, what's next in AI, what's next in technology, and their picks both on the long and short side. This is a lot of fun to do. I hope you really enjoy it. Well, two world class builders, two people who make look building easy. Sherwin, my Palantir 2013 classmate, tennis buddy, with two stops at Quora and Opendoor through the IPO before joining OpenAI, before Chad GPT.

Speaker 1

你在这里已经三年了,负责OpenAI平台的所有工程工作。Olivier,前企业家,在Stripe工作了近十年并赢得了Golden Llama奖,现在负责OpenAI平台的所有产品工作。没错。感谢你们的参与。

You've now been here for three years and lead engineering for all OpenAI platform. Olivier, former entrepreneur, winner of the Golden Llama at Stripe where you were for just under a decade, and now lead all of the product at OpenAI platform. That's right. Thanks for doing it.

Speaker 0

谢谢。感谢邀请我们。

Thank you. Thanks for having us.

Speaker 1

你知道,作为股东,作为思想伙伴,来回交流想法,总是从你们那里学到很多。所以这是一种享受。真的很高兴能为大家做这个。我会以大家都知道OpenAI是创建ChatGPT的公司开场。嗯。

You know, as a shareholder, as a thought partner, kicking ideas back and forth, always learn a lot from you guys. And so it's a treat. It's a real treat to be do this for everybody. You know, I'll open with people know OpenAI as the firm that built ChatGPT. Mhmm.

Speaker 1

他们每天随身携带、陪伴工作和个人生活的产品。但今天的重点是面向企业的OpenAI。你们领导着OpenAI平台。请介绍一下。OpenAI平台为企业对企业的服务底层是什么?

The product that they have in their pocket that comes with them every day to work, to personal lives. But the focus for today is OpenAI for enterprise. You guys lead OpenAI platform. Tell us about it. What's underneath the OpenAI platform for b to b for enterprise?

Speaker 0

是的。这其实也是个非常有趣的问题,因为我说过三年前加入OpenAI负责API时,这实际上是我们当时唯一的产品。所以我觉得很多人其实忘记了,OpenAI最初的产品并不是ChatGPT,而是一个面向企业的产品——我们为开发者服务的API。

Yeah. So this is actually a really interesting question too because I said when I joined OpenAI around three years ago to work on the API, it was actually the only product that we had. So I think a lot of people actually forget this where the original product for from OpenAI actually was not ChatGPT. Was a b to b product. It was the API we were catering towards developers.

Speaker 0

因此我亲眼见证了ChatGPT的发布以及之后的所有发展。但究其根本,我认为我们拥有平台并从API起步的原因,其实可以回溯到OpenAI的使命。我们的使命显然是构建AGI(这本身就很困难),同时将其益处分配给全世界每个人、全人类。现在很明显能看到ChatGPT正在实现这一点,因为连我妈妈,甚至可能你们的父母

And so I've actually seen, you know, the launch of ChatGPT and all of everything downstream from that. But at its core, I actually think the reason why we have a platform and why we started with an API is it kinda comes back to the OpenAI mission. So our our mission, obviously, is to build AGI, which is pretty hard in and of itself, but also to distribute the benefits of it to everyone in the world, to all of humanity. And, you know, it's pretty clear right now to see ChatGPT doing that because, you know, my mom, you know, maybe even your parents

Speaker 2

嗯。

Mhmm.

Speaker 0

都在使用ChatGPT。但我们实际上将我们的平台,特别是我们的API以及与客户、企业客户的合作方式,视为将AGI和AI的益处传递给尽可能多的人、传递给世界每个角落每个人的途径。ChatGPT现在显然非常非常庞大,我认为它像是全球第五大网站。但通过开发者使用我们的API,我们实际上能够触达更多人群,覆盖世界每个角落和所有可能的使用场景。

Are using ChatGPT. But we actually view our platform and especially our API and how we work with our customers, our enterprise customers, as our way of getting the benefits of AGI, of of AI, to as many people as possible, to everyone in every corner of the world. ChatGPT obviously is really, really, really big now. It's, I think, like the fifth largest website in the world. But we actually, by working through developers using our API, we're actually able to reach even more people in, you know, every corner of the world and every different use case that you might have.

Speaker 0

特别是与一些企业客户合作,我们甚至能够触及企业内部的用例,并触达这些企业的终端用户。因此,我们实际上将平台视为充分实现我们使命的方式——嗯——将AGI的益处带给每个人。具体来说,平台目前包含的内容——嗯——我们最大的产品显然是开发者平台,也就是我们的API。

And especially with some of our enterprise customers, we're able to reach even use cases within businesses and reach end users of those businesses as well. And so we actually view the the platform as kind of our way of fully expressing our our mission Mhmm. Of getting the the benefits of AGI to to to everyone. And so, concretely though, what the platform actually includes today Mhmm. The biggest product that we have is is obviously our developer platform, which is our our API.

Speaker 0

嗯。你知道,许多开发者,初创企业生态系统中大多数都构建于此之上,还有大量数字原生企业,以及现在的财富500强企业。嗯。我们还有一个面向政府和公共部门的产品。这些也都属于平台的一部分。

Mhmm. You know, many developers, you know, the majority of of the startup ecosystem builds on top of this as well as a lot of start digital natives, Fortune 500 enterprises at this point. Mhmm. We also have a product that we sell to governments as well in the public sector. So that's all part of this as well.

Speaker 0

对我们平台而言,一个新兴的产品线是企业产品。因此,除了核心API服务外,我们实际上可能会直接向企业销售。

And also an emerging product line for us in the platform is our enterprise products. So what we actually might sell directly to enterprises beyond just a core API offering.

Speaker 1

很有意思。而且

Fascinating. And

Speaker 2

也许需要加倍投入,我认为B2B实际上是OpenAI使命的核心。我们所说的分配AGI福利是指,希望生活在一个每年有10倍更多药物问世的世界。我希望生活在一个教育、公共服务、公务员体系等越来越为每个人优化的世界。坦白说,有很多用例只能通过B2B实现,除非你赋能企业。我们之前讨论过Palantir。

maybe to double down, I think B2B is actually quite core to the OpenAI mission. What we mean by distributing AGI benefits is, you know, want to live in a world where there are 10 x more medicines, like, know, going out every year. I wanna live in a world where, you know, education, like, you know, public service, civil service, you know, are, like, you know, increasingly, like, optimised, you know, to everyone. And, you know, there are, like, a large category of use cases that only go through B2B, frankly, unless you enable like the enterprises. And, you know, we talked about Palantir.

Speaker 2

我认为Palantir的情况可能类似。这些企业才是真正在现实世界中推动事情发生的实体。所以如果你赋能它们,加速它们的发展,这本质上就是你分配AGI福利的方式。

I think that's probably the same fees at Palantir. It's like, hey, like those are the businesses who are actually, like, you know, making stuff happen in the real world. And so if you do enable them, if you do accelerate them, that's how essentially you benefit, distribute AGI.

Speaker 1

是的。奥利维尔,也许我们可以深入探讨这一点。Chat的覆盖范围显然很广,拥有数十亿用户。但对企业方面,也许可以告诉我们更多。或许我们可以深入一两个客户案例。

Yeah. Well, maybe we can double click into that, Olivier. The reach for chat is obviously wide, Billions of users. But for enterprise, it's maybe tell us about it. Maybe we go deep into a customer example or two.

Speaker 1

我们帮助转型了哪个组织?在哪些层面?

And what is an organization that we have helped transform maybe? And at what layers?

Speaker 2

如果让我回顾一下,我们几年前从API开始了B2B业务。最初客户是初创公司、开发者、独立黑客,这些技术极其精湛的人,他们基本上在构建酷炫的新事物,并承担巨大的市场风险。我们现在仍然有很多这类客户,我们热爱他们并持续与他们共同构建。除此之外,过去几年我们还更多地与传统企业和数字原生企业合作。

So, if I were to step back, like, we started our b to b efforts with the API, like, a few years ago. Mhmm. Initially, the customers were startups, developers, indie hackers, extremely technically sophisticated people, like, you know, who are building, like, know, cool new stuff essentially, and taking massive, like, know, market that take your risk. So we still have a bunch of customers in that category, and we love them, and we keep building with them. On top of that, you know, over the past couple of years, we've been working one more with traditional, like, enterprises and also, like, digital natives.

Speaker 2

基本上,我认为几乎每个人都意识到了,像LGBT这样的模型正在发挥作用。它们蕴含着巨大的价值,并且可以在企业中看到许多应用场景。

Essentially, I think basically everyone woke up, like, with LGBT on, those models are working. Is a ton of value and they could see essentially many use cases in

Speaker 0

企业

the

Speaker 2

中。我最喜欢的几个例子之一,既新颖又很酷,我们与T-Mobile合作了很多。T-Mobile?是的,T-Mobile是美国领先的电信运营商。

enterprise. A couple of examples which I like the most, one which is very both fresh and, you know, it's quite cool, we've been working a lot with T Mobile. T Mobile? So T Mobile, leading, like, US telco operator. Right.

Speaker 2

T-Mobile有着巨大的客户支持负荷。比如,人们会问,嘿,我被扣了这么多钱,怎么回事?或者,我的手机用不了了。

T Mobile has, like, you know, a massive customer support load. Like, you know, people asking, like, you know, hey. I was charged, like, that amount of money. What's going on? Or, you know, my my cell phone, like, isn't working anymore.

Speaker 2

其中很大一部分负荷是语音通话。人们想和人交谈。所以对他们来说,能够自动化更多流程,帮助人们自助调试他们的订阅,意义重大。过去一年我们一直与T-Mobile合作,不仅自动化文本支持,还包括语音支持。如今,T-Mobile应用中的一些功能,如果你打电话,实际上是由OpenAI的模型在后台处理的。

A massive, like, you know, share of that load is, like, you know, voice calls. Like, people wanna talk to someone. And so for them, like, you know, to be able to essentially automate, like, more and more and, know, to help, like, people, like, self serve in a way, like, you know, debug their their subscription was pretty big. And so we've been working with T Mobile pretty much for the past year at that point to basically automate, like, not only, like, text support, but also voice support. And so today, like, you know, there are, like, features, like, in the T Mobile app that if you call, actually handled by open AI, you know, models behind the scenes.

Speaker 2

而且,它听起来非常自然,像人类一样的延迟和质量。所以这个项目很有趣。第二个例子是

And, you know, it does sound, like, super natural, like, you know, human sounding latency quality wise. So that one was really fun. A second one, which is

Speaker 1

就这一点,我能问一个后续问题吗?所以我们有文本模型、语音模型,也许将来还会有视频模型部署在T-Mobile。是的。但除了模型本身,或者与模型相邻的方面,我们可能还帮助了T-Mobile什么,例如?

Just really on that. Can I can I ask you a follow-up question? So we've got text models, we've got voice models, maybe even video models someday that are deployed at T Mobile. Yeah. But what above the models or or or adjacent to the models might we have helped T Mobile with, for example?

Speaker 2

是的,我们正在做很多事情。首先,你必须站在企业采购者的角度思考。他们的目标是实现自动化,减少成本,优化客户支持。从模型的角度来看,就是输入token,输出token,对吧。

Yeah. There is a ton we're doing. The first one is, you know, you have to put yourself in the shoes of an enterprise buyer. Like, their goal is to automate, know, reduce, like, you know, optimize customer support. And, you know, going from, like, a model, like, tokens in, tokens out Right.

Speaker 2

要落实到具体用例上确实很难。所以首先需要进行大量的设计工作,比如系统设计。我们现在确实有前向部署工程师,他们帮了我们很大的忙。

To the use case, it's hard. Yeah. And so, you know, first, like, there's a lot of design, like, you know, system design. We do have actually now forward deployed engineers who are helping us quite a bit.

Speaker 1

前向部署工程师?是的,我对这个概念很熟悉

Forward deployed engineers? Yeah. I mean familiar to the

Speaker 0

这个术语是从Palantir借鉴来的。是的,这是个很好的术语。

borrowed the term from Palantir. Yeah. It's a great term.

Speaker 2

你们在Palantir是FD吗?

Were you FDs at Palantir?

Speaker 0

我不是FD。我当时在开发部门,我想他们称之为dev side。就是自研工程。而且我在Palantir时也只负责互联网业务。

I was not an FD. I was on I think they called it the dev side. Right? It's like self engineering. I was also also only in Internet at Palantir.

Speaker 0

但是,是的,这是个很好的术语。我认为它准确描述了我们要员工做的事情,就是深度嵌入客户环境,说实话,要针对他们的系统构建特定解决方案。这些方案会部署到客户那里。我们显然正在大力发展和招聘这个团队,因为他们非常有效,比如在T Mobile项目上。

But, yeah, it's a great term. I think it accurately describes what we're asking folks to do, is like embed very deeply with customers and and honestly, build things specific to their systems. They're deployed onto these customers. But, yeah, we we are obviously growing and hiring that team quite a bit because they've been very effective, like, T Mobile.

Speaker 1

我生命中的四年。

Four years of my life.

Speaker 0

是的。

Yeah.

Speaker 1

是的。是的。前沿部署。是的。不过你继续说吧。

Yeah. Yeah. Forward deployed. Yeah. But go ahead.

Speaker 1

所以是前沿部署工程。

So forward deployed engineering.

Speaker 2

所以我们的工程师以及他们正在做的系统和集成工作,首先,你需要协调这些模型。这些模型并不了解CRM系统或正在发生的事情。所以你必须将模型连接到许多工具上。企业中的许多工具甚至没有API或清晰的接口。这是它们第一次被暴露给第三方系统。

So our people engineers and the sort of, like, systems and, like, integrations they're doing is, you know, first, like, you know, you have to orchestrate those models. Like, those models are not just, you know those models, like, know nothing about, like, you know, the CRM, like, you know, and, like, what's going on. So you have to plug the model to many, many tools. Many of those tools in the enterprise do not even have APIs or clean interfaces. It's the first time they're being exposed to a third party system.

Speaker 2

因此需要大量搭建API网关之类的工具进行连接。然后你基本上需要定义什么是好的标准。嗯。你知道?这对所有人来说都是一项相当新的任务。

So there is a lot of, you know, standing up, like, you know, API gateways, like tools, connecting. Then you have to essentially, like, define what good looks like. Mhmm. You know? Again, like, it's a pretty new exercise for everyone.

Speaker 2

比如,定义一套黄金评估集,说起来容易做起来难。是的。所以我们一直在他们身上花费大量时间。

Like, you know, defining, like, a golden set of evals is, you know, easier than it sounds, harder than it sounds. Yeah. And so we have been spending, like, a bunch of time with them.

Speaker 1

评估很重要。评估超级重要。

Evals are important. Evals are super important.

Speaker 0

特别是,比如音频评估。我知道音频评估,比如,特别难评分和做对。但是,比如,这里的主要用例实际上是音频,对吧。我们有,比如,我不知道,五分钟的,比如,通话记录。你真的知道正确的事情发生了吗?

Especially, like, audio evals. I know audio evals are, like, extra hard to grade and and get right. And but, like, the bulk of the use case here is actually audio and Right. We have, like, I don't know, five minute, like, call transcript. Do you actually know that the right thing happened?

Speaker 0

这是一个相当棘手的问题。

It's a pretty tough problem.

Speaker 2

是的。这相当棘手。然后,你知道,实际上要确定,比如,客户体验的质量,直到它感觉不自然。在这里,延迟和中断扮演了非常,你知道,重要的角色。我们在GA中发布了一个API,一个实时API。

Yeah. It's pretty tough. And then, you know, actually nailing down, like, the quality of the customer experience, like, you know, until it feels unnatural. And here, latency and interruptions play a really, like, you know, important part. We shipped in GA an API, a real time API.

Speaker 2

我想是上周?

I think it was last week?

Speaker 1

几周前。

A couple of weeks ago.

Speaker 2

是的。它

Yeah. It

Speaker 0

刚刚

was just

Speaker 2

我想是上周。这真是一项精美的工程杰作。你知道,背后有一个非常出色的团队。是的。这基本上让我们能够获得最自然的声音体验,而不会出现那些奇怪的延迟中断,让你感觉设备出了问题。嗯。

last week, I think. Last Which is like a beautiful work of engineering. You know, there was a really cracked team behind the Yeah. Which basically allows us, like, to get, like, the most, like, natural sounding, like, you know, voice experience without having, like, these weird interruptions on your lag where you can feel that essentially the thing is off. Mhmm.

Speaker 2

所以,是的。把所有这些东西结合起来,你就能获得一个非常好的体验。

So, yeah. Coupling all that together, you know, and you get, like, you know, a really good experience.

Speaker 1

是的。这远远不止是模型的问题。没错。

Yeah. That's a lot more than just models. Yep.

Speaker 0

是的。我正想说,从T-Mobile合作中我们真正获得的一个巨大收获其实是与他们合作改进我们的模型本身。例如,上周实时GA版本发布时,我们显然发布了一个新的快照,即GA快照。我们在模型中实现的许多改进都来自于从T-Mobile那里学到的经验。虽然也包含了其他客户的许多变化,但由于我们与Mobile深度合作,能够理解什么对他们来说是好的,我们就能把这些经验应用到我们的某些模型中。

Yeah. I was gonna say, one actually really great thing that I think we've gotten from the T Mobile experience is actually working with them to improve our models themselves. So for example, the last real time the real time GA last last week, we obviously released a new snapshot, the GA snapshot. And a lot of what the improvements that we actually got into the model came out of, you know, the learnings that we have from T Mobile. It brings in a lot of other change from other customers, but because we were so deeply embedded into Mobile and and we were able to understand what good looks like for them, we were able to bring that to some of our models.

Speaker 1

这很合理。所以这是一个拥有数千万甚至可能上亿用户的大客户,前后对比体现在支持方面,包括内部技术支持和他们的客户支持。是的。说得通。是的。

That makes sense. So this is a large customer with tens of millions of users, if not hundreds of millions, and the before and after is on the support side, both tech support internally and then their customer support. Yeah. Makes sense. Yeah.

Speaker 1

你们还能分享另一个案例吗?

Is there another one that you guys can share?

Speaker 2

我非常喜欢安进公司(Amgen)。安进,这家医疗保健企业。安进。是的。所以我们与医疗保健公司有相当多的合作。

I like a lot Amgen. Amgen, the healthcare business. Amgen. Yeah. So we are working quite a bit with healthcare companies.

Speaker 2

安进是领先的医疗保健公司之一。我们专注于癌症药物或炎症性疾病药物等领域。他们总部位于洛杉矶。我们与安进合作的核心目标是加速药物的研发和商业化进程。所以,你知道,他们的北极星目标相当宏大,这真的很有意思。

Amgen is one of the leading healthcare companies. We specialise into drugs for cancer or like, you know, inflammatory diseases. They're based out of LA. And we've been working essentially with Amgen to essentially speed up like, the drug, like, development and the commercialization So, you know, the the sort of the the north star is, like, pretty bold. And it's really interesting.

Speaker 2

就像类似地,你知道,我们与安进进行了深度嵌入式合作,以了解他们的需求。这真的很有趣。当我观察这些医疗保健公司时,我觉得主要有两大类需求。一类是纯粹的研发,就像你看到海量数据,有超级聪明的科学家们试图进行各种测试。

Like, when you similarly, like, you know, we embedded, like, pretty deeply with Amgen to understand what are their needs. And it's really interesting. Like, when I look at those health care companies, I feel like there are two big buckets of needs. One is, like, pure R and D. It's like, you know, you're seeing, like, a massive amount of data and, like, you have super smart scientists who are trying to, you know, come by, test out things, you know.

Speaker 2

这是第一类需求。第二类需求则更类似于其他行业的常见需求,比如纯粹的行政工作、文档撰写、文件审阅等。当你的研发团队确定了药物配方后,将这种药物推向市场需要大量工作。你需要向各种监管机构提交申请,接受大量审查。当我们研究这些问题时,结合我们所掌握的模型能力,我们看到了巨大的好处和自动化机会,能够增强这些团队的工作效率。

So that's one bucket. A second bucket is like, you know, much more like, you know, common across other industries. It's like pure, like, you know, admin, document authoring, documents reviewing work, which is, you know, by the time, like, your R and D team has essentially locked the recipe of a medication, getting that medication to market is a ton of work. Like, you have to submit to, like, various regulatory bodies, get a ton of reviews. And, you know, when we looked at essentially those problems, what we knew, what models we're capable of, we saw, like, you know, a ton of benefits, a ton of, like, opportunities to automate and, you know, augment essentially the work of those teams.

Speaker 2

所以,是的,安进一直是Deeply Five的重要客户之一。哇。

And so, yeah, Amgen has been, like, a top customer of Deeply five, for instance. Wow.

Speaker 1

我的意思是,如果新药能更快研发出来,这可能影响数亿人的生命。

I mean, this could be hundreds of millions of lives if a new drug is developed faster.

Speaker 2

是的。完全正确。影响巨大。所以,我认为这是一个很好的例子,说明了我们需要赋能企业实现的那种影响力。对吧。

Yeah. Exactly. Huge impact. So that's, you know, that's I think one good example of like a kind of impact on which you need to enable enterprises to do it. Right.

Speaker 2

你知道吗?所以我认为我们会越来越多地做这类事情。而且,说实话,从个人层面来说,这真的是一种享受。如果我能扮演一个微小的角色,实际上相当于让人们在现实世界中获得的药物数量翻倍,那感觉就像是一个相当不错的成就。

You know? And so I think we're going to do more and more of those. And, yeah, frankly, like, you know, on a personal level, like, it's a delight, you know. If I can play, like, you know, tiny role essentially, like doubling, like, you know, the kind of medication that people, you know, get in the real world, that feels like, you know, a pretty good, like, you know, achievement.

Speaker 1

巨大。巨大。巨大。我知道你有一个

Huge. Huge. Huge. I know you had one

Speaker 0

所以,我最近最喜欢的部署之一实际上是和洛斯阿拉莫斯国家实验室合作的。这是美国政府在新墨西哥州洛斯阿拉莫斯运营的国家研究实验室。嗯。这里也是四十年代和五十年代曼哈顿计划发生的地方。是的,当时那还是一个绝密项目。

as So one of my favorite deployments that we've done more recently actually is with the Los Alamos National Labs. So this is the the the, like, government national research lab that the US government is running in Los Alamos, New Mexico. Mhmm. It's also where, you know, the Manhattan Project happened back in the forties and fifties Yeah. Back when it was a secret secret project.

Speaker 0

所以,在那之后,他们最终将其正式确立为一个城市和项目,现在它已经是一个相当大规模的国家实验室。这个案例非常有趣,因为首先,这里的影响深度对我来说是难以想象的。它的规模相当于安进公司和其他一些大型企业。但是,显然,他们在那里进行大量真正的新研究,所以有很多新科学。他们也在与我们的国防部门和国防用例方面做很多事情。

So, you know, after that, they ended up formalizing it as a city and a program, and then now it's a a pretty sizable national laboratory. This one is is very interesting because one, just the depth of impact here is is, like, unimaginable for me. It's, on the scale of of of Amgen and and some of these other larger companies. But, you know, obviously, they're doing a lot of actual new research there, so a lot of new science. They're doing a lot of stuff with our defense departments and defense use cases as well.

Speaker 0

所以非常紧张,你知道,非常紧张的事情。但另一个实际上非常有趣的地方是,这也是一个关于我们完成的一种非常定制化和新型部署的故事。是的。因为他们是政府实验室,他们对很多事情都非常限制性、高安全性和高保密级别,我们不能只是和他们做一个常规部署。你不能让人们进行国家安全研究时直接调用我们的API。

So very intense, you know, very intense stuff. But the other thing that's actually very interesting about this one was that it's it's also a story of a very, like, bespoke and, like, new type of deployment that we've done. Yeah. So because they are so they're government lab, they're so, you know, restrictive and high security and high clearance with a lot of their things, we couldn't just do a normal deployment with them. They couldn't you know, you can't have people doing national security research just hitting our our APIs.

Speaker 0

所以我们实际上和他们做了一个定制的本地部署,部署到他们的一台名为Venato的超级计算机上。这实际上涉及大量非常定制化的工作,需要一些全职员工,也需要我们开发团队的很多参与

And so we actually did a custom on prem deployment with them onto one of their supercomputers called Venato. And so this actually involves a bunch of, you know, very bespoke work with some FTEs, also with a lot of our developer team

Speaker 1

很好。

Nice.

Speaker 0

实际上将我们的一个推理模型o3带入他们的实验室,进入一个物理隔离的超级计算机环境Venato。并实际部署安装,使其在他们的硬件、网络堆栈上运行,在这个特定环境中实际运作。这实际上非常有趣,因为我们不得不将模型的权重物理带入他们的超级计算机环境——顺便说一句,这个环境出于正当理由被严格封锁,不允许携带手机或任何电子设备。我认为这是一个非常独特的挑战。

To actually bring one of our reasoning models, o three, into their laboratory, into an air gapped, you know, supercomputer Venato Yeah. And actually deploy it and get it installed to work on their hardware, on their networking stack, and and actually run it in this in this particular environment. And so it was actually very interesting because we literally had to bring the weights of the model physically into their supercomputer in an environment, by the way, where you're not allowed to have you know, it's very locked down for a good reason. You're not allowed to have, like, cell phones or, like, any electronics with you as well. And so I think that was a very, very unique challenge.

Speaker 0

这次部署的另一个有趣之处在于它的使用方式。因为环境如此封闭且是本地部署,我们实际上不太清楚他们具体用它做什么。但我们

And then the other interesting thing about this deployment is just how it's being used. Right? So the interesting thing is because it's so locked down and on prem, we actually do not have much visibility into exactly what they're doing with it. But we

Speaker 1

确实有,你知道,他们 是的。会给我们

do have, you know, they Yeah. Give us

Speaker 0

是的。他们实际上有一些遥测数据,是在他们自己的系统内。但我们知道它被用于许多不同的事情:帮助他们加速实验,有很多数据分析用例,运行大量需要处理海量数据的笔记本。

Yeah. They actually do have some telemetry, it's, you know, within their own systems. But we do know that it's, you know, being used for a bunch of different things. It's being used for aiding them in in terms of speeding up their experiments. They have a lot of data analysis use cases, a lot of notebooks that they're running with reams of data that they're trying to process.

Speaker 0

他们实际上将其用作思维伙伴,这对我来说很有趣。o3作为一个模型相当聪明,而这些研究人员正在攻克非常困难的新研究问题。很多时候,他们会与o3来回交流实验设计,讨论实际应该用它做什么——这是我们以前的模型无法做到的。

They're actually using it as a thought partner, is something that's pretty interesting to me. O three is like pretty smart as a model. And a lot of these people are tackling really tough, you know, novel research problems. And a lot of times, they're kind of using o three and going back and forth with it on their experiment design on like what they actually should be using it for, which is, you know, something that we couldn't really say about our our Yeah. Our older models.

Speaker 0

所以,它正被国家实验室用于许多不同的用例。另一个很酷的事情是,它实际上在洛斯阿拉莫斯实验室与其他一些实验室(劳伦斯利弗莫尔、桑迪亚)之间共享,因为超级计算机的设置允许它们相互连接

And so, yeah, it's it's just being used by by for a lot of different use cases for the for the national lab. And the other cool thing is it's actually being shared between Los Alamos and some of the other labs, Lawrence Livermore, Sandia as well, because it's the supercomputer setup where they can all kind of connect

Speaker 1

太迷人了。我们刚刚经历了三个大型企业部署,可能触及数千万甚至数亿人。但另一方面是几周前MIT发布的报告:95%的AI部署都不成功。一堆令人担忧的头条甚至震动市场好几天。

Fascinating. With it I mean, we've just gone through three pretty large scale enterprise deployments, right, which might touch tens if not hundreds of millions of people. But there's this on the other side of this is the MIT report that came out a couple of weeks ago. 95% of AI deployments don't work. A bunch of, you know, scary headlines that even shook the markets for a couple of days.

Speaker 1

就是说,你要把这件事放在一个宏观视角来看,每个成功的部署背后,可能都有很多失败的尝试。所以也许我们可以,嗯,讨论一下这个问题。基于你服务所有这些大型企业的经验,要打造一个成功的企业部署、成功的客户部署需要什么条件?以及那些失败案例的反面教训是什么?

Like, you know, put this in perspective, like for every deployment that works, there's presumably a bunch that don't work. So maybe we can, you know, maybe talk about that. Like what does it take to build a successful enterprise deployment, a successful customer deployment, and and the counterfactual based on all your experience serving all these large enterprises?

Speaker 2

我想那个时候,我可能已经与几百家企业合作过了。是的。好吧,我来做个模式匹配。根据我的观察,最明显的成功领先指标是这些。

I think at that point, I may have worked with, a couple of hundreds, I think, enterprises. Yeah. So okay. I'm going to pattern match. What I've seen being like clear leading indicator of success.

Speaker 2

第一点是一个有趣的组合:既有自上而下的支持与认同,又能赋能一个明确的团队,本质上就是一个老虎团队。在企业中,这个团队通常是混合型的,包括像OpenAI这样的外部人员和内部员工。以T-Mobile为例,高层的领导支持极其重要,这是优先事项。但然后要让团队自主组织,如果他们想从小处着手就开始小规模,然后我们再逐步扩大规模。

Number one is like the the interesting combination of like top down, like buy in and like enabling like, you know, very clear group of like a tiger team essentially. Like, you know, at enterprise, which sometimes a mix of like OpenAI, like, you know, enterprise employee. So, you know, typically, like, you know, you take like T Mobile, like the top leadership was like extremely important, like, you know, it's a priority. Right. But then letting the team, like, you know, organize and be like, okay, if you wanna start small, start small, you know, and then we can scale it up, essentially.

Speaker 2

所以这是第一点。

So that would be part number one.

Speaker 1

所以是自上而下的支持加上一个自下而上的,称之为老虎团队。

So top down buy in and a bottom, call it tiger team.

Speaker 2

老虎团队,你知道,需要混合技术技能的人员和那些拥有组织知识、机构知识的人。这很有趣,比如在企业中,客户支持就是一个很好的例子,我们发现绝大部分知识都在人们的脑子里。对吧?这可能是一个普遍现象,但以客户支持为例,你可能会以为一切都完美记录在Jira等系统中。

Tiger team, you know, people like, you know, a mix of technical skills and people who just have the organisational knowledge, like the institutional knowledge, you know. It's really funny, like in the enterprise, like customer support is a good example, like what we found is that the vast majority of the knowledge is in people's heads. Right. Right? Which is probably like a thing that, you know, with FDEs, like in general, but like, you know, take customer support, you would think that, you know, everything is, like, perfectly documented, like, the Jira, etcetera.

Speaker 2

现实是,标准操作程序(SOPs)大部分都在人们的头脑中。所以除非你有那个混合团队,包括技术专家和领域专家,否则很难真正落地实施。这是第一点。第二点是先做评估。无论我们定义什么是好的评估,它都能为团队提供一个清晰、明确的共同目标去达成。

The reality is, like, the standard, like, operating procedures, like the SOPs, are largely in people's heads. And so unless you have that tag team, like, mix of technical and, like, you know, subject matter expert, really hard to get something on the ground. That would be one. Two would be evals first. Like, whatever we define, like, good evals, like, that gives, like, a clear, clear common goal for people to hit.

Speaker 2

每当客户无法提出好的评估标准时,这就成了一个移动靶标。你基本上永远无法确定自己是否已经达标。要知道,评估工作远比看起来要难完成得多。

Whenever the customer fails to come up with good evals, it's a moving target. You never know, essentially, if you've made it or not. You know, evals are much harder than what it looks to get done.

Speaker 0

而且评估也需要自下而上进行,对吧?因为所有这些标准其实都存在于人们的脑海中,在实际操作者的脑海中。就像,实际上很难通过自上而下的指令来规定评估应该是什么样子。很大程度上需要自下而上的采纳。

And evals also need to come bottom up. Right? Because all of these things are kind of in people's heads, in the actual operators' heads. Like, it's actually very hard to have a top down mandate of, like, you got, like, this is how the evals should look. A lot of it needs the bottoms up adoption.

Speaker 2

对。是的。是的。所以我们一直在评估工具方面做了不少开发。我们有一个评估产品,而且我们正在开发更多工具来本质上解决这个问题,或者说尽可能让它变得简单。

Right. Yeah. Yeah. So we've been building quite a bit of tooling on eval. We have like an eval product and you know we are working on more to essentially solve like you know that problem or you know make it easy as we can.

Speaker 2

最后一点是,你本质上想要逐步提升。你有自己的评估标准,目标是达到99%。你从比如46%开始。那么如何达到目标呢?说实话,我认为很多时候需要混合使用,我会说,几乎是来自有经验人士的智慧。

The last thing is you know you wanna hill climb essentially. You have your evals, the goal is to get to 99%. You start at like, you know, 46. You know, how do you get there? And here frankly, I think, oftentimes, like, know, a mix of like like, I will say, like, almost wisdom from people who've done it before.

Speaker 2

就像,你知道,很多时候这更像是艺术而非科学。

Like, you know, a lot of that is, like, you know, like art sometimes more than science.

Speaker 1

是的。是的。

Yeah. Yeah.

Speaker 2

而且,你知道,了解模型的特性、行为特点。嗯。有时候当存在明显限制时,我们甚至需要自己微调模型。要有耐心,逐步推进,最终完成交付。

And, like, you know, knowing, like, the quirks of the model, the behavior. Mhmm. Sometimes we even need to fine tune ourselves the models, you know, when there are some clear limitation. And, you know, being patient, getting your way, you know, up there, and then, you know, ship.

Speaker 1

我们能稍微深入探讨一下吗?你知道,我们经常思考的一个更广泛的问题是自主性。对吧?自主性的构成是什么?一方面,在旧金山,你可以让一辆车完全自主地从城市的一边开到另一边。

Can we go under the hood a little bit? You know, one of the things that we think about a lot is autonomy more broadly. Right? What is the makeup of autonomy? On one side, you know, in San Francisco, you could take a car from one part of SF to the other fully autonomously.

Speaker 1

完全不需要人类参与。没错。你只需按下

No humans involved. Nope. You press the

Speaker 0

Waymo的。

the Waymo's.

Speaker 1

对吧?他们已经完成了数十亿次的出行。

Right? They've done billions of rides.

Speaker 2

我认为

I think

Speaker 1

大概是,多少,特斯拉FSD完成了大约35亿次出行。Waymo可能完成了数百万甚至数千万次出行。这是大量的自主性。在物理世界中,与数字世界相反,我现在甚至无法在线订票。如果让我的操作员尝试订票,会出现各种问题。

it was like, what, three and a half billion rides on on the this is on the Tesla FSD. Think Waymo's done like millions tens of millions of rides. That's a lot of autonomy. In the physical world, as opposed to the digital world, I can't book a ticket online right now. There's all sorts of problems that happen if I have my operator try to book a ticket.

Speaker 1

这非常违反直觉,因为物理安全的标准要高得多。物理安全的标准甚至超过了人类的能力,因为涉及生命危险。是的。数字安全的标准没那么高,因为最多只是损失金钱。不涉及人命,但到了2025年,物理自主性却领先于数字自主性。

And it's very counterintuitive because the bar for physical safety is so much higher. The bar for physical safety is higher than the human's capability because lives are at stake. Yeah. The bar for digital safety, not that high because all you're going to lose is money. Nobody's life is at stake, but yet physical autonomy is ahead of digital autonomy in 2025.

Speaker 1

这似乎有些反直觉,比如,从技术层面来说,为什么会这样呢?为什么听起来应该更容易的事情实际上却困难得多?

What seems counterintuitive, like, why is that the case at, you know, at a technical level? Why is it that what should sound easier is actually a lot harder?

Speaker 0

是的。我认为这里有两方面因素在起作用,我真的很喜欢用自动驾驶汽车来类比,因为你知道,它们实际上一直是人工智能最好的应用之一,我觉得这是我最近

Yeah. So I I think there are there are kind of two things at play here, I really like the analogy with with self driving cars because, you know, they they've been they've actually been like one of the best applications of AI, I think, that I've

Speaker 2

是的。

Yeah.

Speaker 0

最近使用过的。但我认为有两个因素在起作用。其中一个说实话就是时间线问题。你看,我们研究自动驾驶汽车已经这么久了。

I've used recently. But I think there are two things in play. One of them is honestly just the timelines. Like, we've been working on self driving cars for so long.

Speaker 1

没错。

That's right.

Speaker 0

比如,我记得在2014年左右,这还算是新兴事物,每个人都觉得五年内就能实现。结果却花了大概十年、十五年左右。所以这项技术真正成熟花了很长时间,而且我认为可能在2015年或2018年左右还有过一段黑暗时期,当时感觉好像永远实现不了。

Like, I remember when I you know, back in like 2014, it was kinda like the advent of this, and everyone was like, oh, this is happening in like five years. Turns out it took like, I don't know, ten, fifteen years So or so for this there's been a long time for this technology to really mature, and I think there's probably like dark ages, you know, back in like 2015 or 2018 or something where it felt like it wasn't gonna happen.

Speaker 1

幻灭的低谷期。

Short off of disillusionment.

Speaker 0

是的。是的。没错。然后现在,我终于看到它被部署了,这真的很令人兴奋。但这大概已经花了,我不知道,从研究最初开始算起,可能有十年,甚至二十年了。

Yes. Yes. Yeah. And then now we're, you know, I'm finally seeing it get deployed, which is really exciting. But it has been like, I don't know, ten years, maybe even twenty years from the very beginning of the research.

Speaker 0

嗯。而我认为,AI智能体现在还处于第一天。比如,ChatGPT直到2022年才问世。是的,大概在'23年左右,不到三年前。

Mhmm. Whereas, I think AI agents are like really in day one here. Like, ChatGPT only came out in 2022. Yeah. Like, around '3 like, less than three years ago.

Speaker 0

是的。但我实际上认为,我们关于AI智能体等等的思考,我认为,是从推理范式开始的,那时我们在去年年底发布了O1预览模型。所以,我实际上认为,这种带有AI智能体的推理范式及其带来的鲁棒性,真正展开才大约一年时间。真的不到一年。所以,我知道你在博客文章中有一张图表,我真的很喜欢,你知道,现在的斜率已经非常明显地不同了。

Yeah. But I actually think the what we think about with AI agents and all that really, I think, started with the reasoning paradigm that when we released the the O1 preview model back in late last year, I think. And so I actually think this whole reasoning paradigm with AI agents and the robustness that those bring has only really unfolded for like a year. Less than a year, really. And so I I know you had a chart in your blog post which I really like, which, you know, the the slope is very meaningfully different now.

Speaker 0

是的。自动驾驶开始得非常非常早。是的。斜率似乎有点慢,现在它正在到达应许之地。但是,天啊,我们超级最近才开始搞AI智能体,而斜率,我认为,是极其陡峭的。

Yeah. Self driving started very, very early. Yeah. Slope seems to be a little bit slower, now it's reaching the promised land. But man, like, we we started super recently with AI agents, and the slope, I think, is incredibly steep.

Speaker 0

我们可能会在某个时间点看到它 crossover(超越)。是的。但我们真的只有大约一年的时间来探索这些。

And we'll probably see it cross over at some point. Yeah. But we really have only had, like, a year really to explore these

Speaker 2

你认为我们还没有 crossover(超越)吗?特别是当你看看编码工作方面?

Do you think we haven't crossed over already when you look at, like, the coding work in particular?

Speaker 0

是的。这是个好观点。就像,你知道,你的图表实际上显示AI智能体低于自动驾驶,但是,你知道,y轴是什么?就像,在某些衡量标准下,我实际上不会惊讶,如果AI产品或AI智能体产品现在的收入已经超过Waymo了。Waymo赚得很多,但看看所有涌现的初创公司。

Yes. It's a it's a good point. It's like, you know, your chart actually shows AI agents as below self driving, but, you know, it was like, what is the y axis? Like, by some measures, like, I would not be surprised actually if, you know, AI products or AI agent products are making more revenue than Waymo at this point. Like, Waymo's making a lot, but just look at all the startups coming up.

Speaker 0

看看ChatGPT以及那里有多少订阅量等等。所以也许我们实际上已经跨越了某个临界点,几年后情况会变得非常非常不同。

That's Look at a ChatGPT and how many subscriptions are happening there and all of that. And so maybe we have actually crossed and a couple years from now it's going to look very, very different.

Speaker 1

是的。Y轴是可感知的自主性。完全是的。客观可见。

Yeah. The y axis is tangible, felt autonomy. Perfectly Yeah. Objective. Can see.

Speaker 1

我感觉更多是关于氛围而非收入。但收入是个好指标。我们可能应该用收入重新做那个图表。

Do I feel about yeah. Vibes more than revenue. But revenue is a good one. We should probably redo that with revenue.

Speaker 0

关于这一点我还想提到的第二件事是,这些事物运行的支撑框架和环境。我记得在自动驾驶早期,很多研究人员说道路本身必须改变来适应自动驾驶。对吧?可能到处都需要传感器让自动驾驶汽车与之交互,回想起来我觉得这有点过度设计了。是的。

The there's a second thing I wanted to mention on on this as well, which is the scaffolding and the environment in which these things operate in. So I actually remember in the early days of self driving, a lot of the, like, researchers around self driving were saying that the roads themselves will have to, like, change to accommodate self driving. Right? There might be, like, sensors everywhere so that the self driving cars can interact with it, which I think is, like, you know, retrospect overkill. Yeah.

Speaker 0

但我确实认为自动驾驶汽车在世界上有相当完善的运行框架。它不是完全无限制的。是的。你有道路。道路是存在的。

But I actually do think self driving cars have a good amount of scaffolding in the world for them to operate in. It's, like, not completely, like, unlimited. Yeah. You have roads. Roads exist.

Speaker 0

它们相当标准化。你有交通信号灯。人们通常以相当正常的方式操作,还有所有这些可以学习的交通法规。而AI智能体就像是被扔到了荒郊野外,它们只能自己摸索着前进。实际上我认为,顺着Olivier刚才说的,我的直觉是,一些未能成功的企业部署很可能就是因为缺乏让这些智能体交互的支撑框架或基础设施。

They're pretty standardized. You have stoplights. People generally operate in, like, pretty normal ways, and there are all these traffic laws that that you can learn. Whereas AI agents are just kind of dropped in the middle of nowhere, and they kind of have to feel feel around for them. And I actually think, you know, going off of what Olivier just said too, my hunch is some of the enterprise deployments that don't actually work out likely don't have the scaffolding or infrastructure for these agents to interact with as well.

Speaker 0

我们很多非常成功的部署案例中,我们的现场工程师最终要做的很大一部分工作就是为客户创建几乎像一个平台或某种支撑框架连接器,整理数据,让模型能够以更标准化的方式进行交互。所以我觉得自动驾驶汽车在某种程度上已经通过道路拥有了这种框架,在其部署过程中。但我认为在AI智能体领域这还非常早期。如果很多企业、很多公司还没有准备好支撑框架,我一点都不会感到惊讶。所以如果你把一个AI智能体放进去,它可能不知道该怎么办,其影响也会很有限。

A lot of the, like, really successful deployments that we've made, a lot of what our FDs end up doing with some of these customers is to create almost like a platform or some type of scaffolding connectors organizing the data so that the models have something that they can interact with in a more standardized way. And so my sense of self driving cars actually have had this in in some degree with with roads over the last, you know, over the course of their deployment. But I actually think it's still very early in the AI agents space. And I would not be surprised if a lot of these a lot of enterprises, a lot of companies just don't really have the scaffolding ready. So if you drop an AI agent in there, kind of doesn't really know what to do and its impact will be limited.

Speaker 0

所以,我认为一旦这种框架结构在这些公司中建立起来,部署速度也会加快。

And so, I think once this scaffolding gets built out across some of these companies, think

Speaker 2

部署速度也会加快。

the deployment will also speed up.

Speaker 0

但再次强调我们之前的观点,我认为并没有放缓。没有,是的。你知道,事情仍然进展得非常快。

But again, to our point earlier, think there's no slowdown. There's no Yeah. Know, things are still moving very fast.

Speaker 1

那太棒了。嗯,你知道,我把自主性看作是一个三部分结构:感知、推理(大脑部分)、以及所谓的框架结构——让一切运作起来的最后一英里。也许我们可以深入探讨第二部分,也就是推理,这是你们最近用GPT-5构建的核心精华。巨大的努力,恭喜你们。

That's great. Well, you know, I thought about autonomy as a three part structure. You've got perception, you've got the reasoning, the brain, and then you've got the, call it, the scaffolding, the last mile of making things work. Maybe we can dive into the second part, is the reasoning, which is the juice that you guys are building with GPT-five most recently. Huge endeavor, congrats.

Speaker 1

这是你们第一次推出一个完整的系统,而不仅仅是一个模型或一组模型。谈谈这个吧。我的意思是,整个开发过程的重点是什么?说实话,所有基准测试似乎都已经饱和了。显然,你们关注的远不止是基准测试。

The first time you guys have launched a full system, not a model or a set of models, but a full system. Talk about that. I mean, the full arc of that development, what was your focus? I mean, honestly, the benchmarks all seem so saturated. Like, clearly, it was more than just benchmarks that you were focused on.

Speaker 1

那么什么是北极星目标呢?给我们从头到尾讲讲GPT-5吧。

And so what is a North Star? Like, tell us about GPT-five, soup to nuts.

Speaker 2

这是许多人长期倾注心血的成果。正如你所说,我认为GPT-5非常智能。你看那些基准测试,比如SWE bench之类的,它的分数已经相当高了。但对我来说同样重要且有影响力的是模型的工艺——风格、语气和行为。所以,就是能力、智能以及模型的行为表现。

It's been the work of love of many people for a long time. And to your point, I think GPT-five is amazingly intelligent. You look at the benchmark, like, you know, the SWE bench and the likes, you know, it is going pretty high. But I think to me equally important and impactful was, I would say, the craft, like the style, the tone, the behaviour of the model. So, you know, capabilities, intelligence and, you know, behaviour of the model.

Speaker 2

关于模型的行为,我认为这是首个大型模型发布,我们与众多客户密切合作了数月之久,以更好地理解模型的具体限制和阻碍因素。通常,关键不在于拥有一个更智能、更快、更善于遵循指令的模型,或者一个更倾向于在不知道某事时说'不'的模型。因此,DPP五上那种超级紧密的客户反馈循环确实令人印象深刻。我认为,过去几周DP五获得的所有喜爱,表明构建者们开始意识到这一点。一旦你体验过,就很难再回到一个极其智能但更偏向学术化的模型。

On the behaviour of the model, I think it's the first model, like large model release, for which we have worked so closely with a bunch of customers for like month and month essentially to better understand what are the concrete locks, what are the concrete blockers of the model. And often, it's not about having a model which is way more intelligent, a model which is faster, a model that better follows instruction, a model that we they're more likely to say no, you know, when, you know, he doesn't know about something. And so that, like, super close, like, you know, customer feedback loop on DPP five was pretty impressive to to to see. And I think, like, all the love that DP five has been getting, like, you know, in the past, like, couple of weeks, I think people are starting to sell that, essentially, the builders. And once you see it, like, it's really hard, essentially, to come back to a model which is, like, extremely intelligent, but more, you know, an exclusively, like, academic, essentially way.

Speaker 1

是的。在这个过程中你们做出了哪些权衡?比如,在构建GPT五时最艰难的权衡是什么?

Yeah. There trade offs that you made as you were going through it? Like, maybe what are the hardest trade offs you made as you were building GPT-five?

Speaker 0

我认为一个非常明确的权衡——老实说我们仍在迭代中——是推理令牌数量与思考时长之间的权衡与性能的关系。是的。因为这确实是我们自推理模型推出以来一直与客户合作解决的问题,这些模型非常非常聪明,尤其是当你给予它们充足的思考时间时。嗯。

I actually think a very clear trade off, which I honestly think we are still iterating on, is the trade off between the reasoning tokens and how long it thinks versus performance. Yeah. Because and I honestly, this is something that I think we've been working on with our customers since the launch of the reasoning models, which is these models are so so smart. Especially if you give it all this, like, thinking time. Mhmm.

Speaker 0

我我我认为关于GDP五Pro的反馈也相当疯狂。就像,你知道,昨晚安德烈发了一条很棒的推文。

I I I think the the feedback I've been seeing around g p d five pro has been pretty crazy too. It's just like like, you know, these Andre, a great tweet last night.

Speaker 2

是的。是的。

Yeah. Yeah.

Speaker 1

我看到了

I saw

Speaker 0

山姆转发了它。是的。但是,就像那些其他模型都无法解决的未解问题,你扔给GPT五Pro,它就能一次性搞定。这相当疯狂。

that Sam retweeted it. Yeah. But, like, these, like, unsolved problems that none of the other models could handle. You throw it to GPT five Pro, and it just, like, one shots it. It's it's pretty crazy.

Speaker 0

但这里的权衡是,你需要等待十分钟。这是相当长的时间。所以这些模型通过更多的推理时间会变得非常智能。但对于产品构建者来说,在API方面处理一些商业用例时,我认为管理这种权衡相当困难。对我们来说,确定在这个频谱上应该处于什么位置一直很困难。

But the trade off here is you're you're you're waiting for ten minutes. It's quite a long time. And so these things just get, so smart with more inference time. But on the product builder, on the API side for some of these business use cases, I think it's pretty tough to manage that trade off. And for us, it's been difficult to figure out where we wanna fall on that spectrum.

Speaker 0

因此我们必须在模型思考的程度与其智能水平之间做出一些权衡。作为产品构建者,你必须处理真实的延迟权衡问题,因为用户可能不愿意为了世界上最好的答案等待十分钟。他们可能更愿意接受一个次优但无需等待的答案。

So we've gotta make some trade offs on how much of the model think versus like how intelligent should it get. Because as a product builder, there's a latency there's a real latency trade off that you have to deal with where, you know, your user might not be happy waiting ten minutes for like the best answer in the world. It might be more okay with the substandard answer and like no wait at all.

Speaker 1

是的。我的意思是,即使在GBD五和GBD五思考模式之间,我现在也得来回切换,因为有时候我太没耐心了,只想要尽快得到答案。

Yeah. I mean even between GBD five and GBD five thinking, I have to toggle it now because sometimes I'm so impatient, I just want it ASAP.

Speaker 0

是的。我认为有一个跳过功能,对吧?对。就像我没什么耐心,只想要更简单的答案。

Yeah. I think there's an ability to skip, right, in Yeah. Right. It's like I'm I'm impatient, just want the more simple answer.

Speaker 1

没错。没错。那么,GBD五上线四周了,反馈怎么样?

That's right. That's right. Well, four weeks in, g b d five, how's the feedback?

Speaker 0

是的。我认为反馈非常积极,尤其是在平台方面,这真的很令人欣喜。我认为Olivier提到的很多特点都出现在了客户的反馈中。这个模型在编码方面非常出色,在各种任务推理方面也表现得很好。特别是对于编码用例,当它思考一段时间后,通常能解决其他模型无法解决的问题。

Yeah. I I think feedback has been very positive, especially on the platform side, which has been really great to see. I think a lot of the the things that Olivier mentioned have been you know, come up in feedback from customers. The model is extremely good at coding, extremely good at kind of like reasoning through through different tasks. But especially for, like, coding use cases, especially at the, you know, at the when it thinks for a while, it'll usually solve problems that no other models can solve.

Speaker 0

所以我认为这是反馈中的一个重要积极点。嗯。模型的稳健性和幻觉减少也是非常重要的积极反馈。是的。是的。

So I think that's been a big positive point of feedback. Mhmm. The kind of robustness and the reduction in hallucinations has been a really big positive feedback. Yeah. Yeah.

Speaker 0

是的。我认为有一项评估显示,在很多情况下幻觉基本降为零了。虽然还不完美。

Yeah. I I think there's an eval that showed that the hallucinations basically went to zero for a lot of this. It's not perfect.

Speaker 1

仍然存在

There's still

Speaker 0

很多工作要做。

a lot of work to be done.

Speaker 1

这是个很大的问题。

That's a big one.

Speaker 0

我认为因为其中也有推理能力,这让模型更倾向于说'不',减少幻觉答案的可能性。所以这也是用户非常喜欢的一点。另一个反馈是关于指令遵循。它在指令遵循方面表现得非常好。这几乎与我们正在处理的建设性反馈相重叠——因为它太擅长遵循指令了,以至于用户需要调整他们的提示,或者说它有点过于字面理解了

I think because of the reasoning in there too, it just makes the model more likely to say no, less likely to hallucinate answers. So that's been something that people have really liked as well. Other bit of feedback has been around instruction following. So it's really good at instruction following. This almost bleeds into the constructive feedback that we're working on where for that it's so good at instruction following that people need to tweak their prompts or it's almost like too literal in in

Speaker 2

这实际上是一个有趣的权衡。因为,你知道,当你问开发者时,他们当然都希望模型能遵循指令。对吧?

That's one is an interesting trade off actually. Because, know, when you ask people, developers, like, do you want? Like, you want them all to follow instructions. Of course. You know?

Speaker 2

是的。但一旦你有了一个本质上极其字面理解的模型,它实际上就迫使你必须非常清晰地表达你想要什么。否则,模型可能会偏离方向。所以这个反馈很有意思。

Yeah. But once you have a model which who is like that is like, you know, extremely little essentially, then essentially forces you to express extremely clearly what you want. Otherwise, the model, you know, may go sideways. And so that one was interesting feedback.

Speaker 0

这几乎就像猴爪效应,开发者和平台客户要求更好的指令跟随能力。所以,是的,我们会给你非常非常好的指令跟随,但它几乎是完全照搬字面意思。显然这是团队正在努力解决的问题。顺便说个很好的例子,有些客户会使用这些提示词。我记得我们在测试GPT-5时,收到的一个负面反馈就是模型太简洁了。

It's almost like the monkey paw where it's like developers and and platform customers ask for better instruction following. So, yes, we'll give you really really good instruction following, but it was like, you know, follows it almost to a to a t. And so it's obviously something that the the the team is actually working working through. I I think a good example of this, by the way, is some customers would have these prompts. I remember when we were testing GPT five, one of the negative feedback that we got was the model was too concise.

Speaker 0

我们当时很困惑,怎么回事?为什么模型这么简洁?有趣的是,后来我们发现是因为他们重复使用了其他模型的旧提示词。而对于其他模型,你必须像乞求一样让模型变得精确简洁。

We were like, what's going on? Why is the model so concise? Interesting. And then we realized it was because they were reusing their old prompts from other models. And with the other models, they have to like you have to like really beg the model to be precise concise.

Speaker 0

所以他们会有大约10行的提示,比如要简洁,真的要简洁,还要保持回答简短。结果当你把这些给GPT-5时,它就会想:天啊,这个人真的想要简洁。

So there are like 10 lines of, like, be concise. Really be concise. Also, keep your answer short. And it turns out when you give that to g p d five, it's like, oh my gosh. This person really wants it to be concise.

Speaker 0

于是回应就会变得只有一句话,太过简略。只要去掉那些关于简洁的多余提示,模型的表现就会好得多,更接近他们实际想要的效果。

And so the response would be, like, one sentence, which is too terse. And so just by removing the extra prompts around being concise, the model behaved in a much better way and much closer to what they actually end up wanting.

Speaker 1

事实证明,写出正确的提示词仍然很重要。

Turns out writing the right prompt is still important.

Speaker 0

是的,是的。没错。提示词工程仍然非常非常重要。

Yes. Yes. Yeah. Prompt engineering is still very, very important.

Speaker 2

是啊。

Yeah.

Speaker 0

关于对GBD5的建设性反馈,实际上也有很多,我们都在努力处理。其中有一点让我对下一个快照版本的发布感到非常兴奋

On constructive feedback for g b d five, there's actually been a good amount as well, which we're all we're all working through. One of them that I think is I'm I'm really excited for for the next, you know, snapshot to come out

Speaker 2

来修复一些

to to fix some of

Speaker 0

这是代码质量方面的问题。比如一些小的代码范式或习惯用法。我记得有一些关于代码类型和使用模式的反馈,我们也在处理中。另外一点关于推理令牌与智能延迟之间权衡的反馈,我认为我们在内部已经取得了良好进展。特别是对于这些较简单的问题,通常不需要太多思考

this is code quality. And, like, small, like, code, like, paradigms or, like, idioms that they might use. I think there were, like, feedback around the types of code and and and the patterns in which it was using, which I think we're working through as well. And then the the other bit of feedback, which I think we've already made good progress on internally, is around the trade off of the reasoning tokens and thinking and latency around intelligence. I think especially for these simpler problems, you don't usually need a lot of thinking.

Speaker 0

理想情况下,思考应该更加动态一些。当然,我们一直在努力用尽可能少的推理令牌来获得最大的推理能力和性能,所以我预计这个曲线也会逐渐下降

The thinking should ideally be a little bit more dynamic. And, of course, we're always trying to squeeze as much reasoning and performance into as little reasoning tokens as possible, so I'd imagine that curve kinda going down as well.

Speaker 1

是的。祝贺你们取得了巨大成功。我知道这对我们很多公司来说都是一个持续进行的工作。他们在使用GPT-5方面取得了惊人的成果,其中一个是Expo网络安全业务

Yeah. Well, huge congrats. I mean, it's been I know it's a work in motion for a bunch of our companies. They've had incredible outcomes with GPT-five. One of them is Expo Cybersecurity business.

Speaker 1

就像

Just like

Speaker 0

是的,看到了巨大的变化,这真的很疯狂

Yeah, saw huge charge from that, it was pretty crazy.

Speaker 1

相比之前使用的任何系统,这都是一次巨大的升级。

Huge upgrade from whatever they were using prior to that.

Speaker 0

我觉得他们很快需要新的评估了。

I think they're gonna need a new eval soon.

Speaker 1

没错。他们需要新的评估。一切都关乎评估。在多模态方面,显然你们上次宣布了实时API,我看到T-Mobile是其中的重点客户之一。谈谈这个吧,显然文本模型是核心,处于领先地位。

That's right. They're gonna need a new eval. It's all about evals. On on the multi modality side of it, obviously you guys announced the real time API last I saw T Mobile was one of the featured customers on there. Talk about that, like how obviously the text models are core leading the pack.

Speaker 1

是的。但我们还有音频和视频。是的。谈谈多模态模型的进展吧。我们什么时候能期待下一个重大突破?那会是什么样子?

Yeah. But then we got audio and we got video. Yeah. Talk about the progress on the multimodal models. When should we expect to have, like, the next big unlock and what would that look like?

Speaker 2

这是个好问题。团队在多模态方面取得了惊人进展。在语音、图像、视频方面,坦白说,上一代模型已经解锁了不少冷门用例。我们收到的一个反馈是,因为文本在智能方面遥遥领先,人们觉得在商场使用语音助手感觉有点不够智能。直到你真正看到它,确实会感觉有点奇怪,

It's a good question. The teams have been making amazing progress on multimodality. On voice, image, video, frankly, the last generation models have been unlocking, like, quite a few cold use cases. One of the feedback that we've received is, you know, because, like, text was so much leading the pack on intelligence, like, people felt taking patronage on voice at the mall was somewhat a little less intelligent. And, you know, until you actually see it, like, it does feel weird, like, you know, to have,

Speaker 0

就像你

like, you

Speaker 2

知道,在文本和语音之间获得更好的答案。所以这目前是我们重点关注的方向。我认为我们已经填补了部分差距,但肯定还没有完全填补。所以我觉得,追上文本的水平,可以说是其中一个目标。是的。

know, to have a better answer, like, on text versus voice. And so that's pretty much a focus that we have at the moment. I think we like filled like part of that gap, but not the full gap for sure. So I think, you know, catching up, I would say, you know, with the text, like, know, would be one. Yeah.

Speaker 2

第二个非常吸引人的点是,目前这个模型在轻松随意的对话场景中表现非常出色,比如与你的教练或治疗师交谈。但我们基本上需要教会模型如何在真正具有经济价值的工作场景中更好地表达。举个例子,模型需要能够理解什么是SSN(社会保障号码),知道拼写SSN意味着什么。如果某个数字模糊不清,是应该要求重复还是

A second one, know, which is absolutely fascinating is the model is like excellent at the moment on like, you know, easy, like casual conversation, like talk to your coach, your therapist. And we basically had to teach the model, like to speak essentially better, like, in actual, like, you know, work economically valuable setups. Give an example. Like, the model has to be able to understand what an SSN is and, you know, what does it mean to spell an SSN? And if one digit is actually, like, you know, fuzzy, should actually have to repeat versus,

Speaker 0

你知道,猜测。

you know, guess.

Speaker 2

你知道,有很多类似的直觉性判断,这些正是我们目前正在教给模型的语音特性。这实际上是我们与客户持续进行的工作。直到我们真正让模型面对实际的客户支持电话、实际的销售电话,我们很难真正感受到

You know, there are lots of, like, you know, intuitions like that that someone, you know, of our voice that we are currently, like, teaching the model. And that's, like, an ongoing work, actually, with our customers. You know, until we actually confront the model to, like, actual, like, customer support calls, actual sales calls. It's really hard, like, to get a feel

Speaker 1

没错。

Right.

Speaker 2

你知道,那些差距所在。所以这也是一个顶级优先事项。

You know, for those gaps. So that's a top, like, you know, priority as well.

Speaker 1

这完全不在剧本上,但语音模型中会出现一个有趣的问题,特别是实时API。你知道,以前人们的做法是:接收语音输入,转换成文本(是的),然后加入某种智能层(是的),最后通过文本转语音模型播放出来(是的)。

This is completely off script, but an interesting question that comes up in voice models, particularly the real time API is, you know, previously people were doing they were taking a speech input, convert that to text Yep. Then have some layer of intelligence. Yep. Then you would have a text to speech model that would sort of play it back. Yeah.

Speaker 1

这就像是把这三个部分拼接在一起。但实时API,你们已经将所有功能整合在一起了。是的。没错。那么,这是如何实现的呢?

And this would be it would be a stitch of these three parts. But the real time API, you guys have integrated all of that. Yes. Yep. And, know, how does it happen?

Speaker 1

因为很多逻辑都是用文本编写的。很多布尔逻辑或者任何函数调用都是用文本编写的。它与实时API是如何配合工作的?是

Because a lot of the logic is written in in text. A lot of the boolean logic or any call it any function calling is written in text. How how does it work with the real time API? Is that

Speaker 2

这是个很好的问题。我们推出实时API的原因是我们看到stitch模型存在这个问题。stitch模型?是的。

That's an excellent question. So the reason why we ship the real time API is that we saw that for the stitch model. The stitch model? Yeah.

Speaker 1

哦,没有stitch。

Oh, there's no stitch.

Speaker 2

stitch

The stitch

Speaker 0

就像是拼接在一起。

Like a stitch together.

Speaker 2

好的。是的。是的。文本到思维,文本到语音。是的。

Okay. Yeah. Yeah. To text, thinking, text to speech. Yeah.

Speaker 2

我们发现主要有几个问题。一是速度慢,你知道,基本上就是弹出窗口的问题。是的。是的。二是信号丢失,你知道,在各个模型之间。

Like, we saw essentially a couple of issues. One, slowness, like, you know, pop ups, essentially. Yeah. Yeah. Two, loss of signal, like, you know, across each model.

Speaker 2

比如,语音转文本模型就没那么智能。你会失去情感表达。你会

Like, the speech to text model is less intelligent. You'd lose emotion. You'd

Speaker 0

失去口音语调。

lose accent tone.

Speaker 2

没错。

Exactly.

Speaker 1

对吧?

Right?

Speaker 2

老板们。是的。而且你知道,当你进行实际语音通话时,这些信号对于听取主干内容非常重要。我们面临的挑战之一就是你提到的,这意味着文本和语音需要稍微不同的架构。

Bosses. Yeah. And, you know, when you when you are doing, like, actual voice, like, phone, like, calls, essentially, like, those signals are, like, so important, like, you know, for listening to the stem. Yeah. One of the challenges that we have is what you mentioned, which is, you know, it means, like, a slightly different architecture, essentially, for text versus voice.

Speaker 2

所以这是我们正在积极努力的方向。但我认为从打造自然语音体验开始是正确的选择,达到让你放心投入生产的程度,然后再回过头来统一跨模态的编排逻辑。

And so that's something that we are actively working on. But I think it was the right call to start essentially with let's make, like, the voice experience, like, natural sounding to a point where essentially you're feeling comfortable, like, putting in production and then working backward, like, to unify, like, the sort of orchestration logic essentially across modalities.

Speaker 0

需要明确的是,很多客户仍然将这些拼接在一起。这是上一代的做法。但我们越来越多地看到,更多客户转向实时方法,因为它听起来更自然,延迟更低,特别是随着我们不断提升智能水平

And and to be clear, like, a lot of customers still stitch these together. It's, kind of what worked in the last generation. But what we're increasingly seeing is more and more customers moving towards the real time approach because of how natural it sounds, how much lower latency it is, especially as we up level the intelligence

Speaker 1

是的。

Yeah.

Speaker 0

关于模型。但退一步说,我觉得它居然能运行这件事本身就非常令人震惊。我的意思是,这些语言模型仅仅通过大量文本训练,就能自动预测下一个词元,听起来超级智能,这本身就很不可思议。

Of the model. But also even like taking a step back, I will say it's like pretty mind blowing to me that it works. Like the fact that like I think it's mind blowing that these LMs work at all where you just train it on a bunch of techs and it's just you know, auto aggressively coming up with the next token and it sounds super intelligent. Yeah. That's like mind blowing in and of itself.

Speaker 0

但我认为更令人震惊的是,这种语音到语音的设置居然真的能正确工作。因为你实际上是在获取某人说话的音频比特流,实时输入模型,然后它再生成音频比特输出。对我来说,它居然能工作就已经很疯狂了,更不用说它还能理解口音、语调、停顿之类的东西,并且足够智能地处理客服电话等任务。

But I think it's actually even more mind blowing that this speech to speech setup actually works correctly because you're literally taking the audio bits Yeah. From it from someone speaking, streaming or putting it into the the model, and then it's generating audio bits back. And so to me, it's actually crazy that it's worse at all. It's pretty crazy. Let alone the fact that it can understand accents and tone and pauses and things like that, and then also be intelligent enough to handle a support call or something

Speaker 1

我的意思是,你们已经从文本输入文本输出,发展到语音输入语音输出,这相当疯狂。是的。我们投资组合中有很多公司都在使用这些模型,比如客服端的Parlo,基础设施端的LiveKit,并且我们开始看到语音到语音模型能解决的大量用例。当然,很多更复杂的任务仍然运行在你所说的拼接模型上。但我希望全部转向实时API的那一天不会太远。

like mean, you've gone from text in text out to voice in voice out, that's pretty crazy. Yeah. We have a bunch of companies in our portfolio that are using these models, you know, Parlo on the customer support side, LiveKit on the infra side, and there's a bunch of use cases we were starting to see that a speech to speech model could address. Obviously, lot of the harder ones still running on what you're calling the stitch model. But I hope the day is not far when it's all on real time API.

Speaker 2

总会在某个时刻发生的。对,对,对,对。

Gonna happen at some point. Right, right, right, right.

Speaker 1

实际上,这也许是个很好的过渡,来谈谈模型定制化。因为我猜你们有非常多样化的企业客户,我记得你提到过有几百家甚至更多客户吧?每个客户都有不同的用例、不同的问题集、不同的参数范围——可能是延迟、功耗或其他。你们是如何处理的?聊聊OpenAI为那些需要定制化优秀模型以使其更适合自身需求的企业提供了什么。

And actually maybe that's a good segue into talking about model customization, because I suspect that you have such a wide variety of enterprise customers. I think you mentioned what, hundreds of customers, or maybe more. Each of them has a different use case, different problem set, a different, call it envelope of parameters that they're working in, maybe latency, maybe power, maybe others. How do you handle that? Talk about what OpenAI offers enterprises who need a customized version of of of a great model to make it great for them.

Speaker 0

是的。模型定制化实际上是我们从最初就在API平台上深度投入的领域。甚至在HTTP时代之前,我们就提供了监督微调API,并且用户使用效果非常好。嗯。关于模型定制化,最令人兴奋的显然是它非常契合客户需求,因为他们希望能够带入自己的定制数据,创建适合自身需求的、定制化的o3、o4 mini甚至GPT-5版本。

Yeah. So model customization has actually been something that we've invested very deeply in on the API platform since the very beginning. So even, you know, pre HTTP days, we actually had a supervised fine tuning API available, and people were actually using it to great effect. Mhmm. The most exciting thing actually I'd say around model customization, it obviously resonates quite well with customers because they want to be able to bring in your own custom data and create your own custom version of a, you know, o three or o four mini or something or g p d five even suited to their own needs.

Speaker 0

这非常吸引人,但我认为最近最令人兴奋的发展是强化微调的引入。嗯。我们在去年年底宣布的,我想是在圣诞十二天期间。之后我们已经将其正式发布,并且正在持续迭代改进。

It's very attractive, but the the most recent development I think is very exciting is has been the introduction of reinforcement fine tuning. Mhmm. Something we announced late last year, I think in the twelve days of Christmas. We've GA ed it since, and we're continuing to iterate on it.

Speaker 1

这是什么?请详细解释一下

What what is it? Break it down for

Speaker 0

是的。所以它被称为——其实挺有趣的。我...我觉得我们创造了'强化微调'这个术语。就像在我们宣布之前,这还不是一个真正存在的概念

Yeah. So it's called it's actually funny. I I I think we we made up the term reinforcement fine tuning. It's like not a real thing until until we

Speaker 2

现在这个术语已经固定下来了。我看到它。可以随时发推文。

announced it. It's stuck now. I see it. Can tweet all the time.

Speaker 0

我记得我们在讨论时,我当时说,我...我不太确定

I remember we were discussing it and I was like, I I don't know about

Speaker 2

你不是在开玩笑吧。你在开玩笑。是的。是的。

our You're not kidding. You're kidding. Yeah. Yeah.

Speaker 0

所以强化微调。实际上它是在微调过程中引入了强化学习。原来的微调API做的是所谓的监督微调,称为SFT。它不使用强化学习。它是,你知道,它使用的是监督学习

So reinforcement fine tuning. So it really it it's introducing reinforcement learning into the fine tuning process. So the original fine tuning API does something called supervised fine tuning, called SFT. It is not using reinforcement learning. It is, you know, it's it's it's using supervised

Speaker 1

监督学习。

Supervised learning.

Speaker 0

是的。这通常意味着你需要大量数据,大量提示-完成对。你需要真正监督并明确告诉模型应该如何行动。然后当你在我们的微调API上进行训练时,它会朝着那个方向靠拢。强化微调引入了类似RL或强化学习到这个循环中。

Yeah. And so what that usually means is you need a bunch of data, a bunch of prompt completion pairs. You need to really supervise and tell exactly the model how to how it should be acting. And then when you train it on on our fine tuning API, it moves it closer in that direction. Reinforcement fine tuning introduces, like, RL or reinforcement learning to this loop.

Speaker 0

方式更复杂,更挑剔,但效果却强大一个数量级。这实际上引起了很多客户的共鸣。如果你使用RFT,讨论的重点不再是创建一个专门针对你特定用例的定制模型,而是你可以使用自己的数据,真正启动RL机制,为你的特定用例打造一个顶尖水平的模型。

Way more complex, way more finicky, but in order of magnitude more powerful. And so that's actually what's really resonated with a lot of our customers. It allows you to if you use RFT, the the discussion is less of like creating a custom model that's specific to your own use case. It is you can actually use your own data and actually crank the RL yeah. Turn the crank on RL to actually create a like best in class model for your own particular use case.

Speaker 0

这就是这里的主要区别。使用RFT时,数据集看起来有点不同。不是简单的提示-完成对,你真正需要的是一组非常可评分(gradable)的任务。你需要一个非常客观的评分器(grader)来使用。这是我们过去一年投入大量精力的方向。

And so that's kind of the the main difference here. With RFT, the dataset looks a little bit different. Instead of, you know, prompt completion occurs, you really need a set of tasks that are very gradable. You need a grader that is very objective that you can use here as well. And so that's actually been something that we've invested a lot in over the last year.

Speaker 0

我们实际上已经看到相当多的客户在这方面取得了非常好的成果。我们讨论过几个不同行业的案例。比如Rogo,一家金融服务领域的初创公司。他们拥有非常成熟的AI团队,我认为他们聘请了一些DeepMind的人员来运营他们的AI项目。

And we've actually seen a couple a good number of customers get really good results on this. We've talked about a couple of couple of them across different verticals. So, Rogo, which is a startup in the financial services space. They have a very, you know, sophisticated AI team. I think they hired some folks in DeepMind to to run their AI program.

Speaker 0

他们一直在使用RFT,在解析金融文档、回答相关问题以及执行相关任务方面取得了顶尖成果。还有一家名为Accordance的初创公司在税务领域做类似的事情。我想他们一直瞄准一个名为Tax Bench的评估基准,该基准也考察CPA(注册会计师)风格的任务。因为他们能够将其转化为一个非常可评分的设置,他们实际上能够转动RFT的手柄(turn the crank),并且仅使用我们的RFT就在Tax Bench上取得了,我认为,非常出色的(soda)结果。

And they've used been using RFT to get best in class results on, you know, parsing through financial documents, answering questions around it, and doing tasks around that as well. There's another startup called Accordance that's doing this in the tax space. I think they've been targeting an eval called Tax Bench, which looks at CPA style tasks as well. And because they because, you know, they're able to turn it into a very gradable setup, they're actually able to turn the RFT crate and also get, I think, like, soda results on a tax bench just using our RFT

Speaker 1

不错。

Nice.

Speaker 0

产品也是如此。因此,讨论已经从仅仅为你的用例定制某些东西,转向真正利用你自己的数据来创建一个一流的、也许是世界顶级的模型,用于你业务中关心的领域。

Product as well. And so it has kinda shifted the the discussion away from just customizing something for your own use case to, like, really leveraging your own data to create a best in class, maybe best in the world model for something that you care about for your business.

Speaker 2

我觉得基础模型在指令遵循方面已经变得如此出色,以至于在行为控制方面,你不需要进行微调。你可以描述你想要什么,模型就能很好地完成。但在推动实际能力的前沿方面,我的直觉是RFT(强化微调)基本上会成为常态。如果你真的在你的领域推动智能达到相当高的水平,在某个时间点,你基本上需要通过自定义环境进行强化学习。

I feel like the the base models are getting so good at instruction following that for, you know, behavior, like steering, like, you know, you don't need to fine tune that point. Like, you know, you can describe what you want, and the model is pretty good at it. But pushing the frontier on, like, actual capabilities, my hunch is that RFT will pretty much become the norm. Like, you know, if you are actually pushing in your field, like, you know, intelligence, like, you know, to a pretty high point, like, at some point, like, you know, you need to RL RL essentially with custom environments.

Speaker 0

是的,很吸引人。再回到之前关于这些企业的自上而下与自下而上的观点,你最终为RFT所需的很多数据都需要对你正在执行的确切任务有非常细致的了解,并知道如何对其进行评分。因此,很多数据实际上来自自下而上。我知道很多初创公司会与所在领域的专家合作,试图获得正确的任务和正确的反馈来构建这些数据集。

Yeah. Fascinating. And even going back to the point earlier around, like, top down versus bottoms up for some of these enterprises, a lot of the data that you end up needing for RFT require like very intricate knowledge about the exact task that you're doing and understanding how to grade it. And so a lot of that actually comes from bottoms up. Like, lot I know a lot of these startups will work with experts in their field to try and get the right tasks and get the right feedback to craft some of these datasets.

Speaker 1

事不宜迟,我们将进入我最喜欢的环节——快速问答。我们有很多好朋友为你们发送了一些问题。好的,我们将从Ultimator最喜欢的游戏开始,这是一个多空游戏。

Without further ado, we're gonna jump into my favorite section, which is a rapid fire question. We had a lot of great friends of ours send in some questions for you guys. Okay. We'll start with Ultimator's favorite game, is a long short game.

Speaker 0

嗯。

Mhmm.

Speaker 1

选择一个你看好的企业、想法或初创公司(做多),以及一个你认为被高估、会下注做空的对象。谁准备好先来?多空选择?

Pick a business, an idea, a startup that you're long, and the same short that you would bet against, that there's more hype than there's reality. Whoever's ready to go first, long short?

Speaker 0

我看好的实际上不在AI领域,所以这会有点不同。有

My long is actually not in the AI space, so this is gonna be slightly different. There

Speaker 1

我们开始吧。

we go.

Speaker 0

虽然我在AI领域做空,但实际上我非常看好电子竞技。我所说的电子竞技是指围绕视频游戏兴起的整个专业游戏产业。嗯。这对我来说非常亲切和重要。

My short is though in the in the AI space. So I'm actually extremely long esports. And so what I mean by esports is the entire, like, professional gaming industry that's emerging around video games. Mhmm. It's very near and dear to my heart.

Speaker 0

我玩很多视频游戏,也经常观看这些内容。所以很明显,我对这方面非常了解。但我认为电子竞技有着巨大的未开发潜力和增长空间。具体来说,比如《英雄联盟》就是一个很大的例子。Riot Games推出的所有游戏实际上都有他们自己的职业联赛。

I play a lot of video games, and so I watch a lot of this. So, obviously, I'm I'm pretty in the weeds on this. But I actually think there's incredible untapped potential in esports and incredible growth to be had in this area. So, concretely, I mean are like, you know, a really big one is League of Legends. All of the games that Riot Games puts out, they actually have their own professional leagues.

Speaker 0

信不信由你,他们实际上还举办职业锦标赛。现在他们甚至会租用体育场。但我认为,如果你看看年轻人和孩子们关注什么、他们的时间花在哪里,主要都投入到了这些事情上。他们在视频游戏上花费大量时间。

They actually have professional tournaments, believe it or not. They rent out stadiums actually now. But I just think it's like if you look at kinda what the youth and, like, what younger kids are looking and where their time is going, it's predominantly going towards these things. They spend a lot of time on video games.

Speaker 2

观看电子竞技比赛比看足球、篮球等更多。

Watch more esports than, like, soccer, basketball, etcetera.

Speaker 0

是的。是的。是的。这类人群的数量也在不断增长。我实际上参加过一些这样的活动,非常有趣。

Yeah. Yeah. Yeah. A growing number of these too. I've actually been to some of these events, and it's very interesting.

Speaker 2

他非常投入

He's very committed

Speaker 0

对他的法律。是的。是的。我极其漫长之类的。

to his law. Yeah. Yeah. I'm extremely long and stuff.

Speaker 1

所以他们正在预订体育场让人们去看电子竞技?

And So they're booking out stadiums for people to go watch electronic sports?

Speaker 0

是的。是的。是的。我真的去过甲骨文球馆,就是勇士队的老球场,去看过一场这样的比赛,我想是在疫情之前。哇。

Yeah. Yeah. Yeah. It's I literally went to Oracle Arena, the old Warriors Stadium, to watch one of these, I think, before COVID. Wow.

Speaker 0

然后所以这这只是在疫情之前。

And then the so it it's just Before COVID.

Speaker 1

哇。那是五年前了。

Wow. That's five years ago.

Speaker 0

Was a

Speaker 1

有一阵子了。

while ago.

Speaker 0

其实我关注这个领域有一段时间了,我觉得在疫情期间它确实迎来了一个高峰。那时候所有人都在玩电子游戏。

So I actually I've been following this for a while, and I actually think it had a really big moment in COVID. Like, everyone was playing video games.

Speaker 1

当然。

Of course.

Speaker 0

当然。不过我觉得现在热度有点回落了,所以我认为它被低估了。你知道吗?好像现在没什么人真正重视它。

Of course. I think it's kind of, like, come back down. So I think it's, like, undervalued. You know? It's, like, I think no one's really appreciating it now.

Speaker 0

但它具备了所有真正爆发的要素。年轻人正在参与其中。另外我想说的是,它在亚洲非常火爆,绝对超级火爆。在韩国和中国也绝对是大市场。

But it has all the elements to, like, really, really take off. And so the youth are are doing it. The other thing I'd say is it is huge in Asia. Like absolutely massive in Asia. It is absolutely big in Korea and China as well.

Speaker 0

比如我们租用了甲骨文体育馆,我去的那场活动就是在甲骨文体育馆。我感觉在亚洲,他们直接租下整个体育场,就像足球场那样。选手们已经像名人一样了。总之,你知道韩国文化正在进入美国市场,我觉得这也是推动整个行业发展的另一股顺风。

Like, you know, we rented out Oracle Arena I think or like the the event I went to was an Oracle Arena. My sense is in Asia, they rent out like the entire stadiums, like the soccer stadiums. And the players are already like celebrities. So anyways, you know, as like, you know, the I know like Korean culture is really making its way into The US as well. Think that's another tailwind for this whole thing.

Speaker 0

总之,我认为电子竞技是值得关注的领域,因为它还有很大的成长空间。

But anyways, esports I think is something you should keep an eye out on because there's a lot of room for growth.

Speaker 1

挺出乎意料的。嗯,听起来不错。简短有力。

Very unexpected. Yeah. Good to hear. Short.

Speaker 0

我的做空观点有点激进,就是我在整个AI产品工具类别上都持看空态度。这涵盖了很多不同的东西。某种程度上是在作弊,因为其中一些我认为已经开始显现了。但我想,大概两年前,可能是像Eval的产品或框架、向量数据库之类的。我对这些相当确定。

My short my short's a little spicy, which is I'm short on the entire category of like tooling around AI AI products. And so this this encapsulates a lot of different things. Kinda cheating because some of these, you know, I think are starting to play out already. But I think, like, two years ago, it was maybe like Eval's products or like frameworks or vector stores. I'm pretty sure of those.

Speaker 0

我认为如今,围绕AI模型的其他工具也引发了很多额外兴奋。所以强化学习环境现在也很热门。不幸的是,我对这些非常看空。并不是真的看空,我只是不太看到那里有很多潜力。

I think nowadays, there's a lot of additional excitement around other tooling around AI models. So RL environments, I think, are really big right now as well. Unfortunately, I'm very short on those. Not really. I don't really see a lot of potential there.

Speaker 0

我看到强化学习及其应用有很大潜力,但我认为围绕我们自身环境的创业空间确实很艰难。主要原因是,第一,这是一个竞争非常激烈的领域,有很多人在其中运作。第二,如果说过去两年告诉我们什么,那就是这个领域发展如此之快,很难尝试适应并理解到底什么样的技术栈能够真正延续到下一代模型。我认为这让工具领域变得非常困难,因为今天非常热门的框架或工具可能根本不会被下一代模型使用。

I see a lot of potential in reinforcement learning and applying it, but I think the startup space around our own environment, I think, is really tough. Main thing is one, it's just a very competitive space. There's just a lot of people kind of operating in. And then two, if the last two years have shown us anything, the space is evolving so quickly and it's so difficult to try and like adapt and understand what the exact stack is that will really, you know, carry through to the next generation of models. I think that just makes it very difficult when you're when you're in the tooling space because, you know, today's really hot framework or really hot hot tool might just not get used in the next generation of models.

Speaker 2

所以我一直在注意到同样的模式,那就是在AI领域打造突破性初创公司的团队都非常务实。务实。他们不是特别,你知道,学术化或追求完美世界等等。这很有趣,因为我觉得我们这一代基本上是在一个非常稳定的技术时刻开始进入科技行业的。技术已经通过SaaS、云计算等积累了很多年。

So I I've been noticing, like, the same pattern, which is the the teams that build like breakout like startups in AI are extremely pragmatic. Pragmatic. They're not super, like, you know, intellectual, but like the perfect world, etcetera. And it's funny because I feel like, you know, our generation has basically started in tech in a very, like, stable, like, you know, moment. Well, you know, technology had been building up, like, you know, for years and years with, like, SaaS, like, cloud, etcetera.

Speaker 2

所以在某种程度上,我们是在那个非常稳定的时刻成长起来的,那时设计很好的抽象和工具是有意义的,因为你能感觉到发展方向。但今天完全不同了。我们不太知道未来一两年会发生什么,所以几乎不可能定义完美的工具平台。

And so we were, in a way, like, raised, like, you know, in that very stable moment where it makes sense at that point to design very good abstractions and toolings because you have a sense of where it's going, but it's so different today. Don't quite know what's going to happen next year or two, so it's almost impossible to define the perfect tooling platform.

Speaker 1

对,对。现在确实有很多这样的情况。很激进,有很多功课要做。Olivier,轮到你了,做多还是做空?

Right, right. Well, there's a lot of that going around right now. Spicy, a lot of homework there. Olivier, over to you, Long short?

Speaker 2

做空,过去一个月我一直在思考教育问题,特别是在孩子的背景下。我对教育相当看空,主要是因为它在那时强调人类记忆。我这么说是因为我自己基本上也经历了教育体系。但你知道,我学到了很多历史事实、法律知识等等。是的,其中一些确实塑造了你的思维方式。

Long short, I've been thinking a lot about education for the past month in the context of kids. I'm pretty short on education, which basically emphasizes human memorization at that point. And I say that having mostly been through education myself. But, you know, like, I learned so much on, like, you know, history facts, like, you know, legal things that are Yeah. Know, some of it, like, does shape your way of thinking.

Speaker 2

坦白说,其中很大一部分就像是知识标记。而这些知识标记,你知道,事实证明,像LLMs(大语言模型)在这方面相当擅长。所以我在这方面相当短缺。没错。你不会

A lot of it, frankly, is just, like, you know, knowledge tokens, And those knowledge tokens, you know, it turns out, like, you know, LLMs are pretty good at it. So I'm quite short on that. That's right. You won't

Speaker 1

需要记忆,但策略是仿生的。你可以直接思考它。

need memory when strategy but it is bionic. You can just think about it.

Speaker 0

那直接进入你的脑海。

That's straight into your head.

Speaker 2

正是。我长期看好吗?坦白说,我认为医疗保健可能是未来一两年内从AI中受益最多的行业。我会说更多。我认为所有要素都已具备,将形成一场完美风暴。

Exactly. Am I long at? Frankly, I think healthcare is probably the industry that will benefit the most from AI in the next like year or two. I would say more. I think like all the ingredients are here for a perfect storm.

Speaker 2

大量的结构化和非结构化数据,你知道,这基本上是制药公司的核心。模型非常擅长消化和处理这类数据。大量的行政工作,你知道,文件繁重,文化厚重。但同时,像那些技术性很强、非常注重研发的公司,你知道,这些公司在某种程度上技术是其核心。所以,是的,我对医疗健康领域相当看好。

A huge amount of, like, structured and structured data, you know, it's basically the heart of, you know, like, the pharma companies. The models are excellent at digesting and processing that kind of data. A huge amount of, like, admin, like, you know, heavy like, documents heavy, like, you know, culture. But at the same time, like, companies which are very technical, very R and D friendly, like, you know, companies, like, you know, who's, like, sort of technology in a way is at the heart of what they do. And so, yeah, I'm pretty bullish on their health.

Speaker 1

这就像是生命科学。所以你指的是进行研究的生命科学组织。正是。明白了。没错。

This is like life sciences. So you mean life sciences research organizations that are producing Exactly. Gotcha. Exactly.

Speaker 0

是的,是的。这几乎就像,你知道,在过去二三十年里,像制药或生物技术公司,如果你看他们做的工作,只有一小部分是实际的研究。而其中很大一部分最终变成了行政、文件之类的事情。而这个领域正是AI大有可为的成熟时机。

Yeah. Yeah. It's almost like, you know, over the last twenty, thirty years, like pharma or like biotech companies have basically if you look at the work that they're doing, only a small amount of it is actual research. And so much of it ends up being admin and documents and things like that. And that area is just so ripe for something to happen with AI.

Speaker 0

我认为这正是我们在安进公司和一些客户身上看到的情况。确实如此。而且这也不是他们想做的事情。我认为这是好事

And I think that's what we're seeing with Amgen and some of these customers. Exactly. And it's also not what they want to do. I think it's good

Speaker 1

我们有

that we have

Speaker 0

一些监管规定,这很明显,但这意味着他们有大量的事情需要处理。所以,当有一种技术能够真正帮助降低这类成本时,我认为它会迅速突破这些障碍。

some regulations there, obviously, but it just means that they have reams and reams of things to kind of go through. And so, when you have a technology that's able to really help bring down the cost of something like that, I think it'll just tear right through it.

Speaker 2

我认为一旦政府和机构意识到,如果你退一步看,这可能是人类进步的最大瓶颈之一。对吧?回顾过去十年,真正突破性的药物有多少?并不多。就像,你知道,如果把这个速度提高一倍,生活会有多大不同?

And I think once governments and institutions are going to realise that like if you step back, it is probably one of the biggest bottleneck to human progress. Right? You step back in the past decade, how many true breakthrough drugs have there been? Not that many. Like, you know, how life would be different, like, if you doubled that rate, essentially?

Speaker 2

所以一旦你意识到利害关系,是的,我的直觉是,我们将在那个领域看到相当大的势头。

So once you realize what is at stake, yeah, my hunch is that we're going to see quite a bit of momentum in that space.

Speaker 1

哇。好吧。那里也有很多功课要做。是的。下一个问题。

Wow. Alright. Lots of homework there as well. Yep. Next one.

Speaker 1

除了Chat GPT之外,你最喜欢的被低估的AI工具是什么?

Favorite underrated AI tool other than Chad GPT maybe?

Speaker 2

我爱Granola。哦,天哪。一个无意识的噩梦。我太想要Granola的答案了。

I love Granola. Oh, man. A mindless nightmare. I want answer so much Granola.

Speaker 1

两票支持Granola。

Two votes for Granola.

Speaker 2

有某种东西,比如

There is something, like

Speaker 1

嘿。那Chad GPT记录怎么样?

Hey. What about Chad GPT record?

Speaker 2

我也喜欢Chad GPT记录,但Granola有一些功能我觉得做得非常好。比如,整个,你知道的,与Google日历的集成非常出色。是的。而且,你知道的,转录的质量和摘要都相当不错。

I like Chad GPT record as well, but there are some features at Granola which I think are really done well. Like, the whole, like, you know, integration with your Google Calendar is excellent. Yeah. And just, you know, the quality of, like, the transcription and, like, the summary is pretty good.

Speaker 1

你是一直开着它吗?因为我知道你的日程排得满满的。你只是一直开着通用法律功能。

Do you just have it on? Because I know your calendar is back to back. You just have general law on.

Speaker 2

所以有趣的是,我内部并不使用Gondola。

So the funny thing is that I don't use Gondola internally.

Speaker 1

我使用Gondola主要是为了

I use Gondola for

Speaker 2

我的个人生活。我明白了。是的。

my personal life mostly. I see. Yeah.

Speaker 1

是的。我明白了。约会时用。我开玩笑的。

Yeah. I see. On dates. I'm joking.

Speaker 0

我会说,是的,Granola实际上会是我的选择。所以Granola有两票。我本来想说对我来说最简单的答案是Codex,作为一名软件工程师。它最近变得如此好用,特别是Codex Cli配合GPT-5。

I'll say, yeah, Granola is actually gonna be mine. So two votes for Granola. I was gonna say the easy answer for me is Codex as a software engineer. It's just like so it's gotten so good recently. Codex Cli, especially with g p t five.

Speaker 0

特别是对我来说,我对编码的迭代循环时间敏感度较低,所以倾向于在Codex上使用GPT-5,我认为这真的非常

Especially for me, I tend to be less time sensitive about, like, you know, the iteration loop with with coding, and so leaning in a g p t five on Codex, think, has been really, really

Speaker 1

Codex有什么变化?因为你知道,Codex也经历了一个发展过程。Codex已经存在一段时间了。我记得它大概是一年多前推出的。那么Codex发生了什么变化?

What about Codex has changed? Because, you know, Codex has also been through a journey. Codex has been around for a bit. I remember, like, it's been launched for, like, a more than over a year ago. So, like, what's what's changed about Codex?

Speaker 1

我,是的。刚才在说它已经存在一段时间了。

I yeah. Was talking about has been around for a bit.

Speaker 0

所以我觉得Codex出现还不到一年。

So I feel like it's been less than a year for Codex.

Speaker 2

我想说,也就几个月吧。

A few months, I would say.

Speaker 0

时间膨胀效应太疯狂了,在这个领域里。感觉它好像已经存在很久了。

It's the time dilation is so crazy, and this field. It feels like it's been around forever.

Speaker 2

一年前还是GP4。哦,你还记得那个演示吗?感觉那像是很久以前的事了。

A year ago with GP four. Oh, like, you remember that demo? Like, that feels like ages ago.

Speaker 0

One甚至还没出来呢。圣诞十二天都还没发生。

One hadn't even come out yet. The twelve days of Christmas hadn't happened yet.

Speaker 2

语音演示是我

The voice demo is I

Speaker 1

觉得当时有个命名的问题。好吧。不过总之是的。

think there was a naming thing. Okay. But anyway yeah.

Speaker 2

哦,有一个Codex模型。

Oh, there was a Codex model.

Speaker 1

这正是我在想的。

What That's I'm thinking.

Speaker 0

有一个Codex模型。

There's there's a Codex model.

Speaker 2

哦,是的。我们是。是的。我们是。我们是。

Oh, yeah. We are. Yeah. We are. We are.

Speaker 2

你不应该为那个困惑负责。

You're not to blame for that confusion.

Speaker 0

另外,我

Also, I

Speaker 2

认为

think the

Speaker 1

我们是。

We are.

Speaker 0

GitHub的那个项目叫Codex,没错。

The GitHub thing was called Codex That's right.

Speaker 2

嗯。是的。是的。

Well. Yes. Yes.

Speaker 1

那是

That's

Speaker 0

对的。但我说的是我们在ChatGPT内部的编程产品

right. But I'm talking about our coding product within ChatGPT

Speaker 1

是的。

Yeah.

Speaker 0

也就是Codex Cloud产品,还有Codex Clai。所以实际上,如果我把答案再缩小一点到Codex Clai,我真的很喜欢它。我喜欢它的本地环境设置。在过去一个月左右的时间里,让它真正变得有用的原因,一是团队做得非常好,消除了所有的小问题,比如产品细节的打磨和小毛病。现在用它感觉就像是一种享受。

Which is the Codex Cloud offering, and then also Codex Clai. So actually, maybe if I were to narrow my answer a little bit more to Codex Clai, which I've I've really, really liked. I like the local environment setup. The thing that's actually made it really useful in the last, I'd say, like, month or so is, one, I think the team has done a really good job of just, like, getting rid of all the paper cuts, like the small product polish and, like, paper cut things. It just it's a it kinda feels like a joy to use now.

Speaker 0

不错。我感觉反应更灵敏了。然后第二点,说实话,是GPT-5。我觉得GPT-5真的能让产品大放异彩。是的。

Nice. I feel more reactive. And then the second thing, honestly, is is g p d five. Like, I I just think g p d five really allows the product to shine. Yeah.

Speaker 0

你知道,归根结底,这个产品真的依赖于底层模型。嗯。当你需要来回迭代模型四五次才能让它正确执行你想要的更改,对比让它多思考一会儿就能一次性准确完成你想要的操作。是的。你会获得一种奇怪的仿生感觉,就像我现在和模型心灵相通,它完全理解我在做什么。

It's you know, at the end of the day, this is kind of a this is a product that really is dependent on the model underlying model. Mhmm. When you have to, you know, like, iterate and go back and forth with model like four or five times to get it right to get it to, like, you know, do the change that you want versus having it think a little bit longer and it just like one shots and does exactly what you want to do. Yeah. You get this like weird like bionic feeling where you're like, I feel so mind melded with the model right now and like perfectly understands what I'm doing.

Speaker 0

是的。所以通过Codex不断获得那种多巴胺冲击和反馈循环,让它变成了我真正非常喜欢且不可或缺的东西。很好。另外我想说Codex对我特别有用的地方是:我把它用于个人项目,也用它帮助我理解代码库,作为工程经理。

Yeah. And so getting like that kind of like dopamine hit and like feedback loop constantly with Codex has made it kind of like an indispensable thing that I really, really like. Nice. And the other thing I'd say Codex is just really good for me is so I use it in my for like personal projects. I also use it to like help me understand code bases like as a engineering manager.

Speaker 0

现在我不再深入参与实际代码工作。所以你实际上可以用Codex真正理解代码库的情况,让它提问和回答问题,快速跟上进度。所以,即使是非编码的使用场景,Codex Clai也非常有用。

Now I'm not as in the weeds on the actual code. And so you're actually able to use Codex to really understand what's happening with the code base, have it ask questions and have it answer about things and really catch up to speed on as well. So, like, even the non coding use cases are really useful with Codex Clai.

Speaker 1

真有趣。Sam昨天发了一条关于Codex使用量激增的推文。我想知道发生了什么,但你并不孤单。

Fascinating. Sam had this tweet about Codex usage ripping, I think, like like, yesterday. So I wonder what's going on there, but you're you're not alone.

Speaker 0

是的。从Twitter反馈来看,我觉得我并不孤单。人们真的开始意识到Codex客户端和GPT-5的组合有多棒。是的。我知道那个团队面临很多扩展挑战,但系统对我来说一直没宕机,所以要为他们点赞。

Yeah. I I think I think I'm not alone just judging from the Twitter feedback. I think people are really realizing how great of a combination Codex client GPT five are. Yeah. I know that team is undergoing a lot of scaling challenges, but I mean, it the system hasn't gone down for me, so props to them.

Speaker 0

但我们正处于GPU紧缺时期,所以我们会看看这种情况能持续多久。

But we are in a GPU crunch, so we'll see how, you know, how long that goes.

Speaker 1

太棒了。太棒了。下一个问题。十年内软件工程师会更多还是更少?大概有,多少,四千万,

Awesome. Awesome. The next one. Will there be more software engineers in ten years or less? There's about, what, forty,

Speaker 2

五千万。专业软件工程师?你是这个意思吗?全职的。全职。是的。

fifty million. Professional software software engineers? That's what you mean? Like full time Full time. Yeah.

Speaker 0

是的。因为这很难说,因为,我认为毫无疑问,未来会有更多的软件工程工作在进行。

Yeah. Because it's a hard one because, like, I think without a doubt, there's gonna be a lot more software engineering going on.

Speaker 1

是的。当然。

Yes. Of course.

Speaker 0

其实有一篇很棒的文章,我想是在我们内部的Slack上分享的。是最近的一个Reddit帖子。我认为这正好说明了这一点。那是一个非常感人的故事。是一个关于某人有一个非言语兄弟的Reddit帖子。

There's actually a really great post that was shared, I think, in our internal Slack. It was like a Reddit post recently. I actually think that highlights this. It was a really touching story. It was a Reddit post about someone who has a brother who's nonverbal.

Speaker 0

我不知道你是否看到了这个。是刚发布的。Reddit上有个人发帖说他们有一个需要照顾的非言语兄弟。他们尝试了各种方法来帮助兄弟与世界互动、使用电脑,但是,比如,视觉追踪不起作用,因为我觉得他的视力不好。所有工具都不起作用。

I actually don't know if you saw this. It was just posted. It's a person on Reddit posted they have a nonverbal brother who they have to take care of. The brother, like, they tried all these types of things to help the brother interact with the world, use computers, but, like, vision tracking didn't work because I think his vision wasn't good. All the tools didn't work.

Speaker 0

然后这个兄弟最终使用了ChatGPT。我不认为他用了Codex,但他用了ChatGPT,并基本上自学了如何创建一套为他非言语兄弟量身定制的工具。基本上,就是一个专门为他们定制的软件应用。正因为如此,他现在拥有了一套由他兄弟编写的定制设置。哇。这让他能够浏览互联网。

And then this brother ended up using ChatGPT. I don't think he used Codex, but he used ChatGPT and basically taught himself how to create a set of tools that were tailor made to his nonverbal brother. Basically, a custom software application just for them. And because of that, he he now has, like, custom setup that was written by his brother Wow. And allows him to, like, browse the Internet.

Speaker 0

我觉得那个视频大概是他在看《辛普森一家》之类的,真的很感人。但我认为这实际上是我们会越来越多看到的场景。这个人不是专业软件工程师,他的头衔也不是软件工程师,但他做了很多软件工程的工作,而且可能做得相当不错。

I think the video was, like, him watching The Simpsons or something like that, which is really touching. But I think that's actually what we'll see a lot more of. Like, this guy's not a professional software engineer. His title is not software engineer. But he, like, did a lot of software engineering, but probably pretty good.

Speaker 0

绝对足够好让他兄弟使用了。所以代码的数量、构建的数量,我认为将会经历一场不可思议的变革。

Good enough definitely for his brother to use. So the amount of code, the amount of, like, building that'll that'll happen, I think, is just gonna go through an incredible transformation.

Speaker 2

没错。

Right.

Speaker 0

我不确定这对像我这样的软件工程师意味着什么。也许会有等效的,或者也许有

I'm not sure what that means for software engineers like myself. Maybe there's, you know, equivalent or maybe there's

Speaker 2

当然,展示出来。

Of course, showing.

展开剩余字幕(还有 82 条)
Speaker 0

是的。更多像我这样的人。更针对我这样的。我们需要更多

Yeah. More of me. More me specific. We need way

Speaker 2

更多你这样的人。没错。但是

more of you. That's right. But

Speaker 0

绝对有更多的软件工程在计算领域。

definitely a lot more software engineering in a lot computing.

Speaker 2

我完全同意。就像,我完全认同物理学家的观点,认为软件存在巨大的短缺。是的,就像全世界都这样。过去二十年里,我们某种程度上一直在接受这一点。但你知道,软件的目标从来就不是成为那种超级僵化、超级难构建的产物。

I buy that completely. Like, I buy completely a physicist that there is a massive software shortage Yeah. Like, the world. Like, we've been sort of a accepting it, you know, for the past twenty years. But, like, the goal of software was never to be that super rigid, super hard to build, you know, artifact.

Speaker 2

它应该是可定制的,像可塑的一样。是的。所以我预计我们会看到,这更像是一种人们工作和技能的重组,会有更多人编程。比如,预计产品经理会越来越多地编码,举个例子。

It was to be, like, you know, customized, like malleable. Yeah. And so I expect that we'll see, like, way more it's sort of a a reconfiguration of, like, people's, like, you know, job and skill set where way more people code. Like, you know, expect that product managers are going to code more and more, for instance.

Speaker 0

你最近让你的产品经理们编程了,如果

You made your PMs code recently if

Speaker 2

你还记得的话。

you remember.

Speaker 0

哦,是的。

Oh, yeah.

Speaker 2

我们确实这么做了。那非常有趣。我们开始基本上不再写PRD,也就是产品需求文档。你知道,经典的产品经理工作,写个五页纸,说我的产品要做这个做那个等等。而现在产品经理基本上是通过编写原型来工作。

We did that. That was very fun. We started, like, essentially not doing PRDs, like product requirements documents. You know, classic PM thing, you write like five pages, like my product does that, etc. And you know PMs have been basically by coding prototypes.

Speaker 2

首先,用Jibkey相当快,是的,氛围感十足

And one, it's pretty fast with Jibkey Yeah, Vibes and

Speaker 0

我觉得只需要几个小时

just a couple hours I think.

Speaker 2

速度很快。其次,它传达的信息量比文档要多得多

Breaking fast. And second, it sort of conveys, like, so much more information than a document.

Speaker 1

是的。没错

Yeah. Yeah.

Speaker 2

就像,你基本上能感受到这个功能的特点,比如是否适合你来写?所以,是的,我预计我们会越来越多地看到这种行为

Like, you get a feel, essentially, for the feature, like, is it you to write or not? So, yeah, I expect that sort of, you know, behavior we're going to see more and more.

Speaker 1

是的。现在你不需要写英文,可以直接写出你想要的实际内容。是的。没错。就是这样

Yeah. Instead of writing English, you can actually now write the actual thing you want. Yeah. Yeah. And that yeah.

Speaker 1

太棒了。给刚开始职业生涯的高中生的建议

That's amazing. Advice for high school students who are just starting out their career.

Speaker 2

我的建议是,我也不确定。也许是保持长青。比如,把批判性思维放在任何其他事情之上。如果你进入一个需要极高批判性思维的领域,比如,数学、物理,或者哲学这类领域,那么无论如何你都会没事的。嗯。

My advice is I don't know. Maybe it's Evergreen. Like, prioritize critical thinking above anything else. If you go in a field which requires, like, extremely high critical thinking, like, you know, I don't know, math, physics, you know, maybe philosophies in that bucket, you will be fine regardless. Mhmm.

Speaker 2

如果你进入的领域在这方面有所削弱,我认为。而且,又回到那种,比如,死记硬背,模式匹配,我觉得你可能就不那么面向未来了。是的。作为一个,你知道

If you go in the field, that sort of turns down, I think. And again, gets back to, like, memorization, like, you know, pattern matching, I think you will probably be less future proof. Yeah. As a, you know

Speaker 1

有什么好方法可以锻炼批判性思维?

What's a good way to sharpen critical thinking?

Speaker 0

用垃圾EBT让它来测试你。

Use trash EBT and have it test you.

Speaker 2

那实际上是在测试你。拥有一个,你知道,世界级的导师,他基本上知道如何把标准设定在你能力范围的20%左右,这实际上可能是一个非常好的方法。不错。

That's actually a test you. Having like, you know, a world class tutor who essentially knows how to put the bar like 20% of what you can do all the time, you know, is actually probably like a really good way to do it. Nice.

Speaker 1

先生,您有什么要补充的吗?

Anything from you sir?

Speaker 0

我的想法是,我认为我们实际上正处在一个非常有趣且独特的时期,对于,比如,更年轻的一代,所以,这也许是一个更普遍的建议,不仅仅针对高中生,也包括更年轻的一代,甚至是大学生。嗯。就像是,我认为建议会是,不要低估你相对于世界上其他人现在拥有的优势,因为你可能是多么地AI原生,或者多么地…有趣。比如,你知道,在工具的使用细节上你是多么深入。我的直觉是,高中生、大学生进入职场时,他们在如何使用AI工具、如何真正改变工作场所方面,实际上会拥有巨大的优势。

Mine is I think it's just I think we're actually in such an interesting like unique time period where the, like, younger so, like, maybe this is a more general advice for not just, high school students, but just, the younger generation, even, college students. Mhmm. It's like I I think the advice would be don't underestimate how much of an advantage you have relative to the rest of the world right now because of how AI native you might be or how Interesting. Like, you know, in the in the weeds of the tools you are. My hunch is like high schoolers, college students, when they come into the workplace, they're gonna have actually a huge leg up on how to use AI tools, how to actually transform the workplace.

Speaker 0

对于年轻一些的高中生,我的建议是:第一,真正沉浸在这个领域里。第二,充分利用你们所处的独特时期——目前职场中可能没有人比你们更深入理解这些工具。一个很好的例子是,我们OpenAI最近迎来了第一批实习生,很多是软件方向的。其中一些人是我见过的最令人惊叹的Cursor高级用户。

And my push for some of the younger, I guess, high school students is one, just really immerse yourself in this thing. And then two, just really take advantage of the fact that you're in a unique time where like no one else in the workforce really understands these tools as deeply probably as you do. A good example of this is actually we had our first intern class recently at OpenAI. A lot of software interns. And some of them were just like the most incredible cursor power users I've like ever seen.

Speaker 2

他们的效率太高了。是的,我真是又惊又喜。

They were so productive. Yeah. I was I was shocked in a good way.

Speaker 0

没错,没错。我当时就想,虽然知道能找到优秀的实习生,但没想到他们能优秀到这个程度。是的。

Yep. Yep. I was like, yeah. I know I know we can get good interns, but like, I don't know if they'd be like this good. Yeah.

Speaker 0

是的。我觉得部分原因在于他们在大学期间就已经开始使用这些工具了,无论好坏。

Yeah. And I think part of it is just like they've grown up using these tools, for better or worse, in college.

Speaker 2

没错。

Yeah.

Speaker 0

但更深层的点是,他们简直就是AI原生代。甚至像我和Olivier也算AI原生代,我们早期就加入了OpenAI。但我们没有像他们那样完全沉浸其中、伴随着AI成长。所以我的建议是:好好利用这个优势,不要害怕在工作中传播这些知识并加以运用,因为这确实是他们相当大的优势。

But I think the meta level point is they they're they're so, like, AI native. And even, like, I don't know, me and Olivier, were, like, kind of AI native. We were hit open AI. But, like, we haven't, like, been steeped in this and kind of grown up in this, and so the advice here would just be like, yeah, leverage that, don't be afraid to kind of go in and spread this knowledge and take advantage of that in the workplace because it is a pretty big advantage for them.

Speaker 1

是的。记不清是谁在Ballenger说过:每一届实习生都变得更快、更聪明,就像笔记本电脑一样一代比一代强。

Yeah. I can't remember who said this to us at Ballenger, but every intern class was just getting faster, smarter, like laptops, like smarter every generation.

Speaker 0

你确定不是在2013年我参加那个锦标赛时达到顶峰吗?没错。对。没错。对。

You sure it didn't peak in 2013 when know when I was in that tournament? That's right. Right. That's right. Right.

Speaker 1

没错。没错。是的。嗯,发生了很多事,你知道,自从你们加入OpenAI以来发生了很多事,对吧?差不多三年,快三年了?

That's right. That's right. Yeah. Well, lots happened, you know, lots happened since you guys joined OpenAI, right? What, three years, almost three years?

Speaker 1

在你们的OpenAI旅程中,什么是玫瑰时刻(最美好的时刻),什么是花蕾时刻(对某事最兴奋但仍有发展机会),什么是荆棘时刻(三年旅程中最艰难的时刻)?

In your OpenAI journey, what has been the rose moment, your favorite moment, the bud moment where you're, like, most excited about something, but but still opportunity ahead, and the thorn, toughest moment of your of your of your three year journey?

Speaker 2

荆棘时刻对我来说很容易回答。就是我们所说的"小插曲",也就是董事会的政变。那确实是一个非常艰难的时刻。是的。有趣的是,事后看来,这件事实际上让公司更加团结了。

The thorn is easy for me. What we call the blip, which is, you know, the coup of the board. Like, that was a really tough moment. Yeah. It's funny because, you know, after the fact, it actually reunited quite a bit the company.

Speaker 2

是的。虽然OpenAI之前就有很强的文化,但你知道,之后更有一种同志情谊的感觉,实际上变得更加强烈了。不过你知道,Cher在休息日泄露了消息。

Yeah. Like, there was a feeling OpenEye had a pretty strong culture before, but, you know, there was a feeling of, like, camaraderie, essentially, that was even stronger. But you know, Cher leaked up on the day off.

Speaker 1

这种反脆弱性很少见。大多数组织经历这样的事情后会分裂、瓦解。但我觉得OpenAI变得更强大,OpenAI回来了。

It's very rare to see that anti fragility. Most orgs after something like that break, break apart. But I feel like Open Eye got stronger, Open Eye came back.

Speaker 2

说得好。我觉得这确实让OpenAI变得更加强大了。是的。基本上,当我看看其他事实。是的。当我看到,你知道,其他比如新闻,像人员离职或者,你知道,任何 essentially 坏消息时,我觉得公司已经建立了,你知道,更厚的皮肤和,你知道,恢复的能力,就像是的。

It's a good point. I feel it made OpenAI stronger for real now Yeah. Essentially, when I look at other facts. Yeah. When I look at, you know, other, like, you know, news, like departures or, you know, whatever, like, you know, bad news essentially, I feel the company has built, like, know, a thicker skin and, you know, an ability to, like, recover, like Yeah.

Speaker 2

快得多。

Way quicker.

Speaker 0

实际上我认为部分原因确实是对的。另一部分我认为也是文化因素。我也觉得这就是为什么这对很多人来说是一个低谷。OpenAI有太多人如此深切地关心我们正在做的事情,这也是他们如此努力工作的原因。你就是对工作非常在乎。

I actually think part I think it's definitely right. Part of it too, I think, is also just the culture. I also think this is why it was such a a low point for a lot of people. So many people just at OpenAI care so deeply about what we're doing, which is why they work so hard. You just care a lot about the work.

Speaker 0

这几乎感觉像是你一生的事业。就像,这是一个非常大胆的使命和事业,这就是为什么我认为这次波折对很多人来说如此艰难,但也是我认为帮助大家重新团结起来的原因,也是我们能够凝聚在一起并变得坚韧的原因。是的。我还有一个不同的最糟糕时刻,就是我们12月那次大宕机,如果你还记得的话。

It almost feels like your life's work. Like, it's a very audacious mission and and thing that you're doing, which is why I think the blip was like so tough on a lot of people, but also is what I think helped bring people back together and why we were able to hold together and and get that that thick skin as well. Yep. Yeah. I have I have a separate worst moment, which was the big outage that we had in December if you yeah.

Speaker 0

你还记得。我记得。那是一次持续数小时的宕机。真的让我们意识到API几乎像公用事业一样至关重要。背景是这样的,我想我们在11月或12月的某个时候经历了大约三四个小时的宕机。

You remember. I do. It was like a multi hour outage. Really highlights to us how essential of almost like a utility the API was. So so the background is I think we had like a like a three, four hour outage sometime in November or December.

Speaker 0

真的非常残酷。完全是零下温度。没人能访问ChatGPT,也没人能调用API。那真的很艰难。从客户信任的角度来看,那真的非常困难。

Really brutal. Pure sub zero. No one could hit ChatGPT and no could hit the APIs. It was it was really rough. That was just really tough just from a, like, you know, customer trust perspective.

Speaker 0

我记得我们和很多客户进行了沟通,基本上是对发生的事情进行事后分析,并讨论我们未来的计划。幸运的是,从那以后我们再没有遇到过接近那种程度的问题。实际上我对过去六个月我们在可靠性方面所做的所有投资感到非常满意。但在那一刻,我认为真的非常艰难。

I remember we we, like, talked to a lot of our customers to kinda, like, postmortem them on what happened and kind of our plan moving forward. Thankfully, we haven't had anything close to that Yeah. Since then. And I've been actually really happy with all the investments we've made in reliability over the last six months. But in that moment, I think it was really it was really tough.

Speaker 2

是的。在积极的一面,比如好的方面,我觉得有两个。第一个是GPT-5真的很棒。就像,冲刺开发GPT-5的过程,我认为真的展现了OpenAI最好的一面。就像,拥有尖端科学研究,极度以客户为中心,极其优秀的基础设施和推理人才。

Yeah. On the happy side, like on the roses, I think I have two of them. The first one would be GPT five was really good. Like, the sprint up to GPT five, I think really, like, you know, showed, like, the best of OpenAI. Like, you know, having, like, cutting edge, like, science research, like, you know, extremely customer focused, like, you know, extreme, like, you know, infrastructure and inference, like, you know, talent.

Speaker 2

事实上我们能够发布如此庞大的模型,并且几乎立即将其扩展到每分钟处理海量token的能力,我认为这本身就说明了问题。所以这一点,我真的很

And the fact that we were able, like, to ship, like, such a big model and, like, scale it to, like, you know, many, many, many tokens, you know, per minute, like, almost, like, immediately, I think speaks to it. So that one, I really

Speaker 0

没有任何中断。

With no with no outages.

Speaker 2

而且没有任何中断。

And With no outages.

Speaker 0

是的。可靠性真的很好。没错。

Yeah. Really good reliability. Yep.

Speaker 2

我记得大约一年前、一年半前我们发布Duty四Turbo时,我们对流量规模感到非常紧张。是的。但我觉得我们在发布这些大规模更新方面真的进步了很多。对我来说第二个开心的时刻就是首届开发者日,真的很有趣。是的。

I can remember when we shipped, like, Duty four Turbo, like, a year ago, a year and a half ago, we were terrified by, you know, like, the the of traffic. Yeah. And I feel we've really gotten, like, much better at, you know, shipping, like, those, you know, massive updates. The second rose, like, you know, happy moment for me would be the first dev day was really fun. Yeah.

Speaker 2

那感觉就像是一个成人礼,OpenAI正在拥抱我们拥有庞大开发者社区的现实。我们将发布新模型和新产品。我记得基本上看到所有我喜欢的人,无论是OpenAI的还是外部的,都在热烈讨论'你们在构建什么?''接下来会有什么新东西?'那感觉真是一个特殊的时刻。

It felt like a coming of age, like OpenAI, like, you know, we are embracing that, you know, we have, like, a huge community of developers. Know, we are going to ship models, new products. And I remember basically seeing, like, you know, all my favorite people, open eye or not, know, like, you know, essentially, nerding out on, you know, what are you building? Like, you know, what's coming up next? It felt really like, you know, a special moment in time.

Speaker 0

不。那其实也是我想说的。所以我就接着你的话说,就是2023年11月的首届开发者日。是的。我确实还记得。

No. That was actually gonna be mine as well. So I'll just pay you back off of that, which is the very first Dev Day twenty twenty three Yeah. November. I actually I remember it.

Speaker 0

所以,我的意思是,显然,自那以后发生了很多好事。但不知为何,对我来说,那是一个非常难忘的时刻。一方面,实际上在开发者日之前非常紧张。我们发布了很多东西,所以我们的团队真的在全力冲刺。

So, I mean, obviously, a lot of good things have happened since then. There's just a very I don't know why. For for me, was a very memorable moment, which was one, it was actually quite a rush up to dev day. We were we shipped a lot. So our team was just really, really sprinting.

Speaker 0

所以就像是一种高压环境,逐渐升温。更不用说,当然,因为我们是OpenAI,我们在Sam的主题演讲中现场演示了所有我们发布的东西。嗯。我只记得坐在观众席后排,和团队一起等待演示开始。一旦演示结束,我们都如释重负地松了一口气。

So was like this high, you know, high stress environment kinda going up. To add to that, you know, of course, because we're OpenAI, we did a live demo on Sam's in Sam's keynote of all the stuff that we shipped. Mhmm. I just remember being in the back of the audience sitting with, like, the team and, like, waiting for the demo to happen. Once it finished happening, we all just, like, let out a huge sigh of relief.

Speaker 0

我们都像是,天啊。

We were like, oh my god.

Speaker 2

谢谢。

Thank you.

Speaker 0

所以我觉得有很多,比如,你知道,之前的积累。对我来说,最难忘的是我记得开发者日刚结束,所有演示都很顺利,所有演讲都很成功。我们举办了after party,然后我晚上开车回家,放着音乐,处于一种恍惚的状态。那真是开发者日的一个完美结尾。

And so there's I think there's just, like, a lot of, like, you know, build up to it. For me, the most memorable thing is I remember right after dev day, all the demos worked well. All the talks worked well. We had the after party, and then I was just in a way mode driving home at night with the music playing. It was just like such a great end to the dev day.

Speaker 0

那就是我记得的。那就是我昨晚的亮点。是的。

That was that was what I remember. That was my rose for the last last night. Yeah.

Speaker 1

太棒了。我猜你们是,但请告诉我你们是否相信AGI(人工通用智能)。是或不是?如果是的话,是什么时刻让你相信的?你的那个时刻是什么?

That's awesome. I assume you guys are, but please tell me if you're AGI pilled. Yes or no? And if so, what was the moment that got you there? What was your moment?

Speaker 1

你什么时候感受到AGI的?

When did you feel the AGI?

Speaker 2

我觉得我已经被AGI迷住了。我觉得我已经被AGI迷住了。

I think I'm AGI pilled. I think I'm AGI pilled.

Speaker 0

你绝对是被AGI迷住了。

You're definitely AGI pilled.

Speaker 2

是吗?好吧。我有过几次这样的体验。第一次是在2023年,我意识到我再也不需要手动编码了,永远都不需要了。说实话,我不是最擅长编码的人。

I am? Okay. I've had a couple of them. The first one was the realization in 2023 that I would never need to code manually, like, ever ever again. Like, I'm not I'm not the best coder, frankly.

Speaker 2

你知道,我选择我的工作是有原因的。但意识到,我们人类必须永远编写类似机器语言的东西,这个我以为理所当然的事情,其实并不是必然的。是的,而且代价巨大,这就是感受到AGI。

You know, I chose my job, like, you know, for a reason. Yeah. But realizing that, you know, what I thought was a given that we humans would have, like, to write, like, basically machine language, like, forever is actually not a given. Yeah. And that, you know, the peso price is huge, feeling the AGI.

Speaker 2

对我来说,第二个感受到AGI的时刻可能是语音和多模态方面的进展。你知道,文本,到了一定程度你就习惯了。就像,好吧,机器能写出相当不错的文本。

The second, like, feel the AGI moment for me was maybe the progress on voice and multimodality. Like, you know, text, like, at some point you get used to it. Like, okay, you know, the machine can write pretty good text.

Speaker 1

是的。语音让它变得真实。

Yeah. Voice makes it real.

Speaker 2

但一旦你真正开始对话,比如,你知道,和一个真正理解你语气的东西交流,比如能懂我的口音,就像用法语那样,感觉就像是一个真正的新时刻。就像,好吧,机器正在超越那种冰冷、机械、确定性的逻辑,转向某种更情感化、更触手可及的东西。是的。那真是太棒了。

But once you start actually talking, like, you know, to something that really understands your tone, like, know, understand my accent, like in French, it felt like sort of a true new moment. Like, okay, machines are going beyond, like, cold, mechanical, deterministic, like, you know, like logic to something, like, much more, like, emotional and, like, you know, tangible. Yeah. That's a great one.

Speaker 0

是的。我的我的经历是,我确实认为我是一个AI迷。是的。可能在过去几年里逐渐变成了一个AI迷。我认为有两个原因。对我来说,是的。

Yeah. Mine mine are so I I do think I'm I am a GI pill. Yes. Probably gradually became a I GI pill over the last couple of think there are two. And and for me yeah.

Speaker 0

我我觉得我实际上对文本模型更感到震惊。我知道多模态模型也很棒。对我来说,它们实际上对应着两个一般的突破。所以,第一个突破是在2022年9月我加入公司的时候。那时还是ChatGPT之前。

I I think I I actually get more get more shocked from the text models. I know the multimodal ones are are really great as well. For me, think they actually line up with with two, like, general breakthroughs. So, the first one was right when I joined the company in September 2022. We it was pre tied GPT.

Speaker 1

是的。两个月前。

Yeah. Two months ago.

Speaker 0

是的。但当时,GPT-4已经在内部存在了。我想我们当时在想办法如何部署它。我想尼克·特里利已经多次谈到过ChatGPT的早期日子。但那是我第一次和GPT-4对话。

Yeah. But at the time, GPT-four already existed internally. And I think we were trying to figure out how to deploy it. Think Nick Tirley has talked about this a lot early days to chat GPT. But it was the first time I talked to GPT-four.

Speaker 0

那感觉就像是从一无所有直接跳到GPT-4,对我来说是最令人震撼的体验。我想对世界上的其他人来说,可能从一无所有到ChatGPT的3.5版本是更大的突破,然后从3.5到4。但对我来说,以及可能其他在那段时间加入的人,从一无所有——或者不是一无所有,而是当时公开可用的东西——嗯——从那个跳到GPT-4简直不可思议。

And it was it was like going from nothing to GPT-four was just the most mind blowing experience for me. I think for the rest of the world, maybe going from nothing to g p d 3.5 in chat was maybe the big one, and then going from 3.5 to four. But for me, and I think for a lot of maybe some other people who joined around that time, going from nothing to or or not nothing, but, like, what was publicly available at the time. Mhmm. Going from that to g t four was just incredible.

Speaker 0

就像,我只记得向它抛出了那么多问题。就像,这东西怎么可能给出一个清晰的答案,但它却每次都打得漂亮。那绝对是不可思议的。

Like, I just remember asking throwing so many things at us. Like, there's no way this thing is gonna be able to give an intelligible answer, and it just, like, knocks it out of the park. It was it was absolutely incredible.

Speaker 2

GPT-4简直太疯狂了。我记得GPT-4发布时我正在OpenAI面试,当时还在电话中纠结:我,约书亚,该不该加入?当我看到那个东西时,我就觉得,好吧。我的意思是,朋友们,你们懂的,那一刻我根本不可能再去研究其他任何东西了。

G p t four was insane. I remember g p t four came out when I was interviewing with OpenAI, and I was still, on the phone. Should I, Joshua, join? And I saw that thing, and I was like, okay. Mean I mean, guys, you know, there is no way I can work on anything else at that point.

Speaker 0

没错。没错。没错。是的。所以GPT,是的。

Yep. Yep. Yep. Yeah. So g b d yeah.

Speaker 0

GPT-4简直疯狂。另一个突破则是推理范式的突破。我个人认为最纯粹的体现是深度研究功能——当我提出一些本以为它不可能知道的问题,看着它持续搜索、深入思考、撰写详细报告的全过程,实在令人震撼。虽然不记得具体查询内容,但那种「通用人工智能时刻」就是:我抛出认为绝对无法解决的问题,而它却完美攻克。

G b d four was just crazy. And then the other one was like, is the other breakthrough, which is, the reasoning paradigm. I actually think the the purest represent representation of that for for me was deep research and throwing, like like asking it to really look up things that I didn't think it would be able to know and seeing it think through all of it, be really persistent with the search, get really detailed with the write up and all of that, that was pretty crazy. I don't remember the exact query that I threw it, but I just remember, like, I feel like the field AGI moments for me are, like, I'll throw something at the model that I I was like, there's no way this thing will be able to get. And then it just, like, knocks out of the park.

Speaker 0

这就是所谓的通用人工智能时刻。我在使用深度研究功能处理某些问题时,确实亲身经历了这种震撼。

Like, that is kind of the field AGI moment. I definitely had that with deep research with some of the things I was asking.

Speaker 2

是的。

Yeah.

Speaker 1

这次对话非常精彩。非常感谢各位。你们正在构建未来,每天都在激励着我们,很感谢这次交流。

Well, this has been great. Thank you so much folks. You guys are building the future. You guys are inspiring us every day, and appreciate the conversation.

Speaker 2

是的。非常感谢。

Yeah. Thank you so much.

Speaker 1

感谢

Thanks for

Speaker 2

邀请我们。

having us.

Speaker 1

提醒大家,这只是我们的观点,并非投资建议。

As a reminder to everybody, just our opinions, not investment advice.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客