本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
AI的强大程度完全取决于它所构建的平台。这就是为什么超过85%的财富500强企业使用ServiceNow AI平台并不令人意外。当其他平台将工具勉强拼凑时,ServiceNow无缝整合人员、数据、工作流程和AI,连接您业务的每一个角落。借助AI代理自主协同工作,任何部门的任何人都能专注于最重要的工作。了解ServiceNow如何让人工智能为人类服务,请访问servicenow.com。
AI is only as powerful as the platform it's built into. That's why it's no surprise that more than 85% of the Fortune 500 use the ServiceNow AI platform. While other platforms duct tape tools together, ServiceNow seamlessly unifies people, data, workflows, and AI, connecting every corner of your business. And with AI agents working together autonomously, anyone in any department can focus on the work that matters most. Learn how ServiceNow puts AI to work for people at servicenow.com.
Adobe Acrobat Studio。全新登场。向我展示PDF的所有潜能。轻松快捷完成工作。PDF空间就是您所需的一切。
Adobe Acrobat Studio. So brand new. Show me all the things PDFs can do. Do your work with ease and speed. PDF Spaces is all you need.
借助AI助手的关键洞察,瞬间完成数小时的研究。一键获取模板。现在您的演示文稿看起来超级炫酷。搞定那笔交易?没错。
Do hours of research in an instant with key insights from an AI assistant. Took a template with a click. Now your prezo looks super slick. Close that deal? Yeah.
你赢了。做那个。正在做。做到了。完成。
You won. Do that. Doing that. Did that. Done.
现在你可以做到。用Acrobat实现。现在你可以做到。用全新Acrobat实现。是时候用全新的Adobe Acrobat Studio做出你最出色的工作了。
Now you can do that. Do that with Acrobat. Now you can do that. Do that with the all new Acrobat. It's time to do your best work with the all new Adobe Acrobat Studio.
嘿。我是Tuffy,我正在主持一档来自The Cut的新播客《Tuffy Talks》。
Hey. I'm Tuffy, and I'm hosting a new podcast from the cut called Tuffy Talks.
把我当作你的工作
Think of me as your work
好闺蜜,来这里为你提供最劲爆的流行文化深度解析,解读名人八卦,畅聊现代生活。新剧集每周三在YouTube或您喜爱的播客应用上线。我们一起逃避实际工作会非常有趣。
bestie who's here to give you all the juiciest pop culture deep dives, read celebrity tea leaves, and yap about modern life. New episodes drop every Wednesday on YouTube or in your favorite podcast app. It's going to be so fun avoiding actual work together.
欢迎来到Decoder。我是Alex Heath,本周四的客座主持人,也是The Verge的副主编。当今AI领域最热门的话题之一是智能体(agents),即AI将从聊天机器人转向在现实世界中可靠地为我们做事。目前智能体的问题在于它们实际上还不太可靠。有很多工作正在进行以解决这个问题,这也引出了今天的嘉宾——亚马逊AGI研究实验室负责人David Luan。
Welcome to Decoder. This is Alex Heath, your Thursday episode guest host and deputy editor at The Verge. One of the biggest topics in AI these days is agents or the idea that AI is going to move from chatbots to reliably doing things for us in the real world. The problem with agents right now is that they aren't really that reliable at all. There's a lot of work happening to fix that, which brings me to today's guest, David Luan, the head of Amazon's AGI Research Lab.
我一直想和David聊聊。他是OpenAI的早期研究领导者,曾帮助推动GPT-2、3和DALL E的开发。离开OpenAI后,他共同创立了专注于智能体研究的Adept实验室。去年夏天,他离开Adept加入亚马逊,现在负责公司在旧金山的AGI实验室。我们在GPT-5发布后立即录制了这期节目,这让我们有机会探讨为什么他认为AI模型的进展正在放缓。
I've been wanting to chat with David for a long time. He was an early research leader at OpenAI where he helped drive the development of GPT-two, three, and DALL E. After OpenAI, he co founded Adept, a research lab focused on agents. And last summer, he left Adept to join Amazon, where he now leads the company's AGI lab in San Francisco. We recorded this episode right after the release of GPT five, which gave us an opportunity to talk about why he thinks progress on AI models is slowing down.
David团队的工作对亚马逊来说是重中之重,这是我第一次听他详细阐述他的工作内容。我还得问问他当初是如何加入亚马逊的。他离开Adept是我称之为'反向人才收购'的众多交易中的首批案例之一,即大型科技公司几乎是通过收购热门AI初创公司来规避反垄断审查。我不想剧透太多,但可以说David去年从初创公司转投大厂,是因为他清楚AI竞赛的走向。我认为这使得他对未来发展的预测值得一听。
The work that David's team is doing is a big priority for Amazon, and this was the first time I've heard him really lay out what he's up to. I also had to ask him about how he joined Amazon. His leaving Adept was one of the first of many deals that I call the reverse aqua hire, where a big tech company all but actually buys a buzzy AI startup to avoid antitrust scrutiny. I don't wanna spoil too much, but let's just say that David left the startup world for Big Tech last year because he knew where the AI race was headed. I think that makes his predictions for what's coming next worth listening to.
David,欢迎来到节目。
David, welcome to the show.
非常感谢邀请我上节目。能来到这里我真的很兴奋。
Thanks so much for having me on the show. I'm really excited to be here.
很高兴你能来。我们有很多话题要聊。我特别好奇你和团队最近在亚马逊的工作内容。但首先,我觉得观众会很乐意了解你的背景故事——你在这个领域深耕多年,职业生涯非常精彩才走到今天。能简单介绍一下你在AI领域的经历以及如何加入亚马逊的吗?
It's great to have you. We have a lot to talk about. I'm super interested in what you and your team are up to at Amazon these days. But first, I think the audience could really benefit from hearing a little bit about you and about your history and how you got to Amazon because you've been in this space for a long time and you have a pretty interesting career leading up to this. And, yeah, could you walk us through a little bit kind of your background in AI and how you got to Amazon?
首先,听说我在这个领域待了很久真的很有意思,因为相对而言确实如此。这个领域太新了,其实我从事AI相关工作也才十五年左右。相比其他许多领域,它简直年轻得不像话。
First off, I find it absolutely hilarious that I've been around the field for a long time because it's true in relative terms. This field is so new, and yet nonetheless, like, you know, I've only been doing AI stuff for about the last fifteen years. So compared to many other fields, it's just so new.
但在AI领域这简直相当于永恒了。
But Well, that's an eternity in AI years.
十五年啊,AI界的永恒。记得刚入行时纯粹是觉得有趣——能构建像人类一样思考甚至实现超人类性能的系统实在太酷了,完全没想到这个领域会爆发成这样。2017到2020年中期间,我带领OpenAI的研究与工程团队开发了GPT-2、GPT-3、CLIP和DALL-E,每天和最好的朋友们尝试各种有趣的研究想法,根本没有现在的这种压力。
Fifteen years. Eternity in AI years. I remember when I first started working in the field, I worked in it just because I thought it was interesting. I thought having the opportunity to be able to build systems that could think like humans and hopefully deliver superhuman performance and things was such a cool thing to do, and I had no idea that it was gonna blow up the way that it did. But my personal background, I led the research and engineering team at OpenAI from 2017 to mid twenty twenty where the teams did GIPTY two and three and Clip and Dali, and every day was just so much fun because you would show up to work and it's just your best friends and all trying a bunch of really interesting research ideas, and there was none of the pressure that exists right now.
之后我在谷歌主导大语言模型项目,训练了当时非常强大的PaLM模型。但不久后我们团队陆续加入不同初创公司,我和团队成员最终创立了Adept——这是首家AI智能体初创公司,我们实质上发明了计算机使用智能体(当然前期已有一些优秀研究基础)。
Then after that, I led the LLM effort at Google, where we trained a model called Palm, which was a, quite strong model for its time. But a bunch of us shortly after that decamped to various startups, and, my team and I ended up starting Adept. It was the first AI agent startup. We ended up inventing the computer use agent effectively. There was some good research beforehand.
我们推出了首个量产版本,大约一年前亚马逊收购我们团队来为其运营智能体业务。
We had the first production one, and Amazon brought us in to go, run agents for Amazon about a year ago at this point.
很棒。稍后我们会深入探讨你在亚马逊的工作。但首先,鉴于你有OpenAI的经历,而现在距离GPT-5发布不到一周,我很想听听你对这次发布的看法——它意味着什么?看到发布时作何感想?我相信你仍有前同事参与其中。这次发布究竟预示着什么呢?
Great. And we'll get into that and what you're doing at Amazon. But first, given your OpenAI experience, and we're talking now less than a week from the rollout of GPT five, I'd love to hear you reflect on the release of GPT five and what it says about the industry, what you thought when you saw it. I'm sure you still have colleagues at OpenAI who worked on it. But, yeah, what does that release signify?
我认为这确实标志着当前达到了一个高度成熟的阶段。各大实验室都已掌握了如何可靠地推出越来越好的模型。我经常强调的一点是,前沿模型实验室的真正职责并非训练模型,而是建造一个能够持续产出更优质模型的工厂。这实际上是一种完全不同的进步哲学。
I think it really signifies a high level of maturity at this point. The labs have all figured out how to reliably tape out increasingly better models. One of the things that I always harp on is that your job as a frontier model lab is not actually to train models. Your job as a frontier model lab is to build a factory that repeatedly churns out increasingly better models. That's actually a very different philosophy for how to make progress.
在'我打造更好模型'的路径中,你只需要考虑:让我调整这个参数,让我改进那个部分,尝试吸引人才来推出更好的版本。但若从模型工厂的角度出发,你真正要做的是构建系统、流程和基础设施,使这些模型变得更智能。关于GPT五的发布,我觉得最有趣的是,如今许多前沿模型的能力正在趋同。
In the I build a better model path, all you do is you think about, you know, let me make this tweak. Let me make this tweak. Let me try to glom on people to get a better release. If you care about it from the perspective of a model factory, what you're actually trying to do is you're trying to figure out how you can build all the systems and processes and infrastructure to make these things smarter. But with the GPT five release, I think the part that I found most interesting about it is that a lot of the frontier models these days are converging in capabilities.
我认为部分原因可以归咎于我在OpenAI的前同事(现MIT教授)提出的'柏拉图表征假说'。您听说过这个假说吗?没有?那么这个假说类似于柏拉图的洞穴寓言(其命名正源于此),认为存在一个唯一现实,而我们人类只看到这个现实的特定呈现——就像洞穴中看到的墙壁投影。
I think in part, there's a explanation that one of my old colleagues at OpenAI, who's now a professor at MIT, came up with called the Platonic Representation Hypothesis. Have you heard of this hypothesis? No. So the Platonic Representation Hypothesis is this idea similar to Plato's cave, which is really what it's named after, that there is one reality, but we as humans, for example, only see a particular rendering of that reality. Like in Plato's cave, it is the shadows that you see on the wall of the cave.
对吧?对大语言模型(LLM)而言也是如此。LLM通过训练数据看到这个现实的片段。比如,每一段新增的YouTube视频(如某人在森林中漫步)本质上都是由我们生活的真实现实生成的。随着用越来越多数据训练这些LLM,它们会变得越来越聪明,最终都会趋同于表征这个我们共享的唯一现实。
Right? And so that's the same thing for LLMs. LLMs see slices of this reality by the training data that it sees. So every incremental YouTube video of, for example, someone going for a nature walk in the woods somewhere is all ultimately generated by the actual reality that we live in. And as you train these LLMs on more and more and more data, the LLMs become smarter and smarter, they all converge to representing this one shared reality that we all have.
因此,如果你相信这个假说,那么你也应该相信所有LLM都将趋同于同一个世界模型。从各前沿实验室交付的模型来看,我认为这正在实践中发生。
And so if you believe this hypothesis, what you should also believe then is that all LLMs will converge to the same model of the world. And I think that's actually happening in practice from seeing Frontier Labs deliver these models.
这个观点涉及面很广。或许需要指出的是,业内许多人并不必然相信我们生活在单一现实中。上次参加Google I/O时,谢尔盖·布林和德米斯都在台上,他们似乎都认为我们可能存在于多重现实中。不知道您这些年在社交或工作圈是否遇到过这种观点,但并非所有AI从业者都认同单一现实论。对吧?
Well, there's a lot to that. I would maybe suggest that a lot of people in the industry don't necessarily believe we live in one reality. I was at when I was at the last Google IO, Sergei Brannen Demis were on stage and they both seemed to maybe believe that we were existing in multiple realities. So I don't know if that's I don't know if that's a thing that you've encountered in your social circles or work circles over the years, but not everyone in AI necessarily believes that. Right?
这种深度哲学讨论超出我的薪资等级了。我个人确实认为我们只有一个现实。
I think that hot takes above my pay grade. I I do think that we only have one.
确实。我们要讨论的内容太多,没法深入多重现实的话题。不过关于万物趋同的观点,确实感觉基准测试开始不再那么重要,而模型的实际改进——如你所说——正在变得商品化。
Yeah. We have too much to cover. We can't get into multiple realities. Yeah. But to your point about everything converging, it does feel like benchmarks are starting to not matter as much anymore and that the actual improvements in the models, like you said, are commodifying.
大家都在达到相同水平。GPT五将在LM竞技场上保持几个月最佳,直到Gemini三或其他什么模型发布...如果真是这样,这次发布还表明:真正重要的或许是人们如何使用这些模型,以及对其产生的情感和依恋。没错,OpenAI重新推出四代就是因为用户对其产生了真实的情感依赖——Reddit上有人说'就像夺走了我最好的朋友'。所以它编码能力是否更强、写作是否更好反而不重要了,重要的是它现在是你朋友。
Everyone's getting to the same point, and GPT five will be the best on LM Arena for a few months until, you know, Gemini three comes out or whatever and so on and so on. And if that's the case, I think what this release has also shown is that maybe what is really starting to matter is how people actually use these things and the feelings and the attachments that they have to them. So Yep. OpenAI bringing back four o because people had a literal attachment to it as a thing that they felt and people on Reddit saying it's like my best friend's been taking it away. And so it really doesn't matter that it's better at coding or that it's better at writing, it's it's your friend now.
这确实有些诡异。我好奇的是,当你看到这种反应和GPT五引发的反响时,是否预见到了这种趋势?你早料到我们会朝这个方向发展,还是这对所有人来说都是新现象?
And that's that's freaky, but I'm curious when you saw that and you saw the reaction to GPT five, did you predict that? Did you see that we were moving that way or is this something new for everyone?
2020年谷歌内部有一个名为Lambda或MENA的项目,基本上就是ChatGPT之前的ChatGPT,但只对谷歌员工开放。即便在那时,我们就开始看到员工对这些AI系统产生个人情感依附。人类太擅长拟人化任何事物了,对吧?所以看到人们与某些模型检查点建立情感联系,我并不感到惊讶。
There was a project called Lambda or MENA at Google in the in 2020 that basically was ChatGPT before ChatGPT, but only available to Google employees. Even back then, we started seeing employees start developing personal attachments to these AI systems. Like, humans are so good at anthropomorphizing anything. Right? And so I wasn't surprised to see that people form bonds with certain model checkpoints.
但谈到基准测试时,让我印象深刻的是,现阶段基准测试本质上就像大家都在为考试而学习。我们知道基准测试的内容是提前设定的,每个人都想发布更高的分数。这就像早期数码相机时代的像素大战。
But I think that when you talk about benchmarking, the thing that stands out to me is benchmarking is really all about at this point, people are just studying for the exam. Right? We know what the benchmarks are in advance. Everybody wants to post higher numbers. It's like the megapixel wars, right, from the early digital camera era.
这些指标显然已经不再重要了,它们与实际拍照质量的相关性非常弱。我认为当前领域缺乏创造力的问题归根结底在于:AGI远不止是聊天,也远不止是代码。这些只是我们已知的两个应用得非常好的用例而已。
Like, they just clearly don't matter anymore. They have very loose correlation with how good of a photo does this thing actually take. And I think the question and the lack of creativity in the field that I'm seeing boils down to, AGI is way more than just chat. It's way more than just code. Those just happen to be the first two use cases that we all know work really well for these models.
还有更多有用的应用和实际有用的基础模型能力,人们甚至还没开始研究如何有效衡量。如果你想在领域内做些有趣的事情,现在更应该问的是:我真正应该追求什么目标?为什么我要花更多时间让这个系统在创意写作上稍微好一点?为什么我要花时间让这个模型在国际数学奥林匹克竞赛上提高百分之X,明明还有这么多未开发的领域?让我和那些真正专注于智能体愿景的人们持续前进的动力,正是寻求解决比现有范围更广阔的问题。
There's so many more useful applications and actually useful base model capabilities that people haven't even started figuring out how to measure well yet. And I think the better question to ask now, if you wanna do something interesting in the field, is what should I actually run at? Why am I trying to spend more time making this thing slightly better at creative writing? Why am I trying to spend my time trying to make this model x percent better at the International Math Olympiad when there's so much more left to do? And when I think about what keeps me and the people that really are focused on this agent's vision that we have going is looking to solve way more breadth of problems than what we've so far.
好的,这正好引出了这个话题。我本来打算稍后再问,但关于AGI——你正在运营亚马逊的AGI研究实验室。我对AGI对亚马逊的具体意义有很多疑问,但我首先好奇的是,当你在OpenAI帮助GPT起步时,AGI对你意味着什么?现在又意味着什么?这个定义对你来说有变化吗?
Okay, that brings me to this topic. I was gonna ask it later, but yeah, AGI, you're running the AGI Research Lab at Amazon. Have I a lot of questions about what AGI means to Amazon specifically, but I'm curious first for you, what did AGI mean to you when you were at OpenAI helping get GPT off the ground? And what does it mean to you now? Has that definition changed at all for you?
OpenAI对AGI的定义是:在经济价值任务上超越人类的系统。虽然我认为在2018年这是一个有趣且近乎悲观主义的北极星目标,但作为领域我们已经远远超越了它。每天让我兴奋的不是如何取代人类完成经济价值任务,而是如何最终为每个知识工作者构建一个通用队友。让我持续前进的动力是:如果我们能拥有可以最终委托执行日常工作中大部分任务的AI系统,就能为人类时间带来巨大的杠杆效应。所以我对AGI的定义——我认为这个定义既可实现又非常专注于帮助人类——第一个最重要的里程碑是:有一个模型能帮助人类在计算机上完成任何他们想做的事情。
The OpenAI definition for AGI we had was a system that outperforms humans at economically valuable tasks. And while I think that was an interesting, almost doomer north star back in 2018, I think we've gone so much past that as a field. What gets me excited every day is not how do I replace humans at economically valuable tasks, it's how do I ultimately build towards, like, a universal teammate for every knowledge worker. And just like what keeps me going is the sheer amount of leverage we can give to humans on their time if we had AI systems that you could ultimately end up delegating a large chunk of the execution of what you do every day too. And so my definition for AGI, which I think is very attractable and is very much focused on helping people, is the first most important milestone that would lead me to say we're basically there is a model that can help a human do anything they wanna do on a computer.
我喜欢这个定义。这实际上比听到的很多说法都更具体和接地气。这也显示出每个人对AGI含义的理解有多么不同。我刚刚参加了Sam Altman关于GPT-5发布的媒体电话会议,嗯...他现在认为AGI是能够自我改进的模型。
I like that. That's actually more concrete and grounded than a lot of the stuff you hear. It also shows how different everyone feels about what AGI means. I was just on a press call with Sam Altman for the GPT five launch, and Mhmm. He was saying now he thinks of AGI as a model that can self improve itself.
我猜这可能与你的说法有关,但听起来你更侧重于实际应用场景。
And I guess maybe that's related to what you're saying, but you're grounding it more on the actual use case, it sounds like.
嗯,我的看法是
Well, the way that I
自我改进确实有趣,但目的是什么?对吧?作为人类,我们为什么要在意AGI是否自我改进?就个人而言,我并不真的在乎。从科学家的角度来看这很酷。
look at it is self improvement is interesting, but to what end? Right? Like, why do we as humans care if the AGI is self improving itself? Like, I don't really care, personally. I think it's cool from a scientist perspective.
我认为对我来说更有趣的是,如何构建这种超级通用技术最有用的形式,然后将其交到每个人手中?我认为,如果我能教会我们正在训练的这个智能体来处理我在电脑上需要完成的任何有用任务,那将给人们带来巨大的杠杆效应,因为我们如今的生活很大程度上都在数字世界中。我认为这非常可行,回到我们之前关于基准测试的讨论。对吧?这个领域如此重视MMLU、MNLU Pro、人文学科期末考试、AMC 12等等。
I think what's more interesting to me is how do I go build the most useful form of this super generalist technology and then be able to put that in everybody's hands? And I think the thing that gives people tremendous leverage is if I can teach this agent that we're training to handle any useful tasks that I need to get done on my computer because so much of our lives these days is in the digital world. I think that's very tractable going back to our discussion about benchmarking. Right? The fact that the field cares so much about, you know, MMLU, MNLU Pro, you know, Humanities last exam, AMC 12, etcetera.
就像,我们不必局限于'AGI对我来说就是做这些'的框框。我认为更有趣的是审视所有有用知识工作者任务的空间,其中有多少可以在你的机器上完成,以及这些智能体如何为你完成它们。
Like, we don't have to live in that box of that's what AGI does for me. I think it's way more interesting to look at the box of the space of all useful knowledge worker tasks, how many of them are doable on your machine, and how can these agents do them for you.
所以可以安全地说,对亚马逊而言,AGI的意义远不止为我购物——这本来是我要开的一个关于AGI对亚马逊意味着什么的讽刺玩笑。嗯。我很好奇,当你加入时与管理团队和安迪·贾西交谈,以及直到今天,你们是如何广泛地为亚马逊定义AGI的战略价值的。因为亚马逊涉足很多领域,它实际上是一个由众多从事不同业务的公司组成的星座,但这个理念却贯穿了所有业务。
So it's safe to say that for Amazon, AGI means more than shopping for me, which is the cynical joke I was gonna make about what AGI means for Amazon. Mhmm. I'd be I'd be curious to know when you joined and you were talking to the management team and Andy Jassy and, you know, still to this day, how you guys talk about the strategic value of AGI as you define it for Amazon broadly. Because Amazon is a lot of things. It's really a constellation of companies, that do a lot of different things, but this idea kind of cuts across all of that.
对吧?
Right?
我认为,如果从计算的角度来看,迄今为止,计算的构建模块一直是:我能在云端的某个地方租用服务器吗?我能租用一些存储空间吗?我能编写一些代码来连接所有这些并为人提供有用的东西吗?计算的构建模块正在改变,对吧?到了这个时候,代码是由AI编写的。
I think that if you look at it from the perspective of computing, right, so far the building blocks of computing have been, can I rent a server somewhere in the cloud? Can I rent some storage? Can I write some code to go hook all these things up and deliver something useful to a person? The building block of computing is changing, right? At this point, the code's written by an AI.
将来,实际的智能和决策将由AI来完成。那么你的构建模块会变成什么样呢?对吧?因此,在那个世界里,亚马逊特别擅长解决智能体问题变得至关重要,因为智能体将成为计算中的原子级构建模块。当那一天到来时,我认为将会由此释放出巨大的经济价值。
Down the line, the actual intelligence and decision making is gonna be done by an AI. And so then what happens to your building blocks? Right? So in that world, it's super important for Amazon to specifically be good at the agent's problem because agents are going to be the atomic building block in computing. And when that is true, I think so much economic value will be unlocked as a result of that.
而且这确实与亚马逊在云服务方面已有的优势非常契合,包括构建庞大的基础设施等等。
And it really aligns up well with the strengths that Amazon already has on the cloud side and putting together ridiculous amounts of infrastructure and all that.
我明白你的意思。我想很多听这个节目的人,即使是科技行业的从业者,在概念上理解智能体是行业的发展方向,但我敢说,绝大多数听众要么从未使用过智能体,要么尝试过但发现它根本不行。我认为这基本就是目前的现状。我其实不太确定,你会举出什么作为智能体的最佳范例?什么是未来发展方向和你所能期待的最佳示例?
I see what you're saying. I think a lot of people listening to this, even people who work in tech, understand conceptually that agents are where the industry is headed, but I would venture to guess that the vast majority of the listeners to this conversation have either never used an agent or have tried one and it doesn't work. I would say that's pretty much the lay of the land right now. I'm not actually sure of like, what would you hold out as this is the best example of an agent? This is the best example of where things are headed and what you can expect.
有什么你可以指出的吗?
Is there something you can point to?
所以我非常理解那些被反复告知智能体是未来,然后去尝试却发现完全不行的人们。让我试着举一个例子,说明智能体的真正潜力相对于它们今天被宣传的样子。目前,它们被推销给我们的方式,在很大程度上,只是一个多了几步的聊天机器人。对吧?就像,你知道,X公司不想安排人工客服代表接待我,所以现在我不得不去和聊天机器人交谈。
So I feel for all the people who have been told over and over again that agents are the future, and then they go try the thing and it just doesn't work at all. So let me try to give an example of what the actual promise of agents is relative to how they're pitched to us today. Right now, the way that they're pitched to us today is, for the most part, it's just a chatbot with extra steps. Right? It's like, you know, company x doesn't wanna put a human customer service rep in front of me, so now I have to go talk to a chatbot.
也许在幕后,它只是点击一个按钮。或者,你知道,你可能玩过一些声称能帮助我在浏览器上做某些事情的产品,比如计算机使用之类的,但实际上它花费的时间是原来的四倍,而且三次中有一次会搞砸。这就是当前智能体的现状。让我们举一个具体的例子:我想完成一个特定的药物发现任务,我知道有一个受体,我需要找到能与之结合的东西。如果你现在打开ChatGPT并和它讨论这个问题,它会去查找所有的科学研究,并为你写出一份格式完美的markdown文档,说明这个受体的作用以及你可能想尝试的一些事情。
Maybe behind the scenes, it clicks a button. Or, you know, you've played with a product that does computer use or something like that that is supposed to help me with something on my browser, but in reality, it takes four times as long and one out of three times it screws up. This is kind of the current landscape of agents. Let's take a concrete example of I wanna do a particular drug discovery task where I know there's a receptor that I need to be able to find something that ends up binding to this receptor. If you pull up ChatGPT today and you talk to it about this problem, it's gonna go and find all the scientific research and write you a perfectly formatted piece of markdown of what the receptor does and maybe some things you wanna try.
但那不是智能体。在我看来,智能体是一个模型和系统,你可以实际将其连接到你的湿实验室,它会去使用实验室里的每一台科学仪器,阅读所有文献,提出下一个最优实验方案,运行实验,查看结果,根据结果做出反应,再次尝试,等等,直到真正为你达成目标。这种方式给你带来的杠杆效应,远远超过当前领域所能做到的程度。
But that's not an agent. An agent in my book is a model and a system that actually literally you can hook up to your wet lab and it's gonna go and use every piece of scientific machinery you have in that lab, read all the literature, propose the right optimal next experiment, run that experiment, see the results, react to that, try again, etcetera until it's actually achieved the goal for you. And the degree to which, like, that gives you leverage is so so so much higher than what the field is currently able to do right now.
本节目由.tech域名赞助。名字里有什么?实际上很多,尤其是当你创业的时候。你可能花了很多时间构思一个能清晰传达商业理念的完美名字。但当你去查.com域名时,可能会发现名字已经被占用,或者至少价格高得像帕洛阿尔托的房租。
Support for this show comes from Dot Tech Domains. What's in a name? Quite a lot actually, especially when you're starting a business. And you probably took the time to craft the perfect name that communicates your business idea clearly. But when it comes to checking the .com, you might find the names already taken or at the very least priced like rent in Palo Alto.
当然,你可以妥协选择奇怪的拼写或额外数字,但使用.tech域名,你无需妥协。在.tech上获得你真正想要的创业名字。绝对没有妥协。更重要的是,当你使用.tech时,你通过域名向客户和投资者表明你正在构建科技。所以如果你心中有一个名字,现在就在像GoDaddy这样的可信平台上用.tech搜索,或访问get.tech/decoder获取。
And sure, you could settle for an odd spelling or extra numbers, but with .tech domain, you don't have to compromise. Get the startup name you actually want on .tech. Absolutely no compromises. What's more, when you use .tech, you signal to your customers and investors that you're building tech with just your domain name. So if you've got a name in mind, search for it now with .tech on a trusted platform like GoDaddy or visit get.tech/decoder to grab it.
那是get.tech/decoder。
That's get.tech/decoder.
作为创始人,你正快速迈向产品市场契合、下一轮融资或第一笔大企业交易。但随着AI加速初创公司的构建和交付速度,安全期望也来得更快,而且这些期望比以往任何时候都高。正确处理好安全和合规可以解锁增长,但如果等待太久,则可能阻碍增长。Vanta是一个信任管理平台,帮助企业自动化安全和合规,覆盖超过35个框架,如SOC 2、ISO 27001、HIPAA等。凭借为快速移动团队构建的深度集成和自动化工作流,Vanta让你快速做好审计准备,并在你的模型、基础设施和客户演变时通过持续监控保持安全。
As a founder, you're moving fast towards product market fit, your next round, or your first big enterprise deal. But with AI accelerating how quickly startups build and ship, security expectations are also coming in faster, and those expectations are higher than ever. Getting security and compliance right can unlock growth or stall it if you wait too long. Vanta is a trust management platform that helps businesses automate security and compliance across more than 35 frameworks like SOC two, ISO twenty seven zero zero one, HIPAA, and more. With deep integrations and automated workflows built for fast moving teams, Vanta gets you audit ready fast and keeps you secure with continuous monitoring as your models, infrastructure, and customers evolve.
这就是为什么像Langcheng、Ryder和Cursor这样快速增长的初创公司都信任Vanta,从一开始就构建可扩展的合规基础。立即访问vanta.com/vox,通过Vanta for Startups计划节省1000美元,并加入已经与Vanta一起扩展的10,000多家雄心勃勃的公司。那是vanta.com/vox,限时节省1000美元。作为创始人,你正快速迈向产品市场契合、下一轮融资或第一笔大企业交易。但随着AI加速初创公司的构建和交付速度,安全期望也来得更快,而且这些期望比以往任何时候都高。
That's why fast growing startups like Langcheng, Ryder, and Cursor have all trusted Vanta to build a scalable compliance foundation from the start. Go to vanta.com/vox to save $1,000 today through the Vanta for Startups program and join over 10,000 ambitious companies already scaling with Vanta. That's vanta.com/vox to save $1,000 for a limited time. As a founder, you're moving fast towards product market fit, your next round, or your first big enterprise deal. But with AI accelerating how quickly startups build and ship, security expectations are also coming in faster, and those expectations are higher than ever.
正确处理好安全和合规可以解锁增长,但如果等待太久,则可能阻碍增长。Vanta是一个信任管理平台,帮助企业自动化安全和合规,覆盖超过35个框架,如SOC 2、ISO 27001、HIPAA等。凭借为快速移动团队构建的深度集成和自动化工作,Vanta让你快速做好审计准备,并在你的模型、基础设施和客户演变时通过持续监控保持安全。这就是为什么像Langchang、Ryder和Cursor这样快速增长的初创公司都信任Vanta,从一开始就构建可扩展的合规基础。立即访问vanta.com/vox,通过Vanta for Startups计划节省1000美元,并加入已经与Vanta一起扩展的10,000多家雄心勃勃的公司。
Getting security and compliance right can unlock growth or stall it if you wait too long. Vanta is a trust management platform that helps businesses automate security and compliance across more than 35 frameworks like SOC two, ISO twenty seven zero zero one, HIPAA, and more. With deep integrations and automated work built for fast moving teams, Vanta gets you audit ready fast and keeps you secure with continuous monitoring as your models, infrastructure, and customers evolve. That's why fast growing startups like Langchang, Ryder, and Cursor have all trusted Vanta to build a scalable compliance foundation from the start. Go to vanta.com/vox to save $1,000 today through the Vanta for Startups program and join over 10,000 ambitious companies already scaling with Vanta.
那是vanta.com/vox,限时节省1000美元。
That's vanta.com/vox to save $1,000 for a limited time.
不过,你是否同意大型语言模型在决策和执行方面存在固有的局限性?当我看到LLMs,即使是前沿模型,仍然会产生幻觉、编造内容、自信地撒谎时,想到将这种技术置于一个结构中,让它去现实世界中做事,与我的银行账户互动,部署代码,在科学实验室工作,这令人恐惧。就像当ChatGPT连拼写都搞不定时,这感觉不像我们即将迎来的未来。所以我在想,LLMs就是终点吗?还是这里还有更多工作要做?
Do you agree, though, that there's an inherent limitation in large language models and decision making and executing things? When I see how LLMs, even still, you know, the frontier ones still hallucinate, still make things up, confidently lie, it's terrifying to think of putting that technology in a construct where now I'm asking it to go do something in the real world, interact with my bank account, ship code, work in a science lab. Like when ChatGPT can't spell right, that doesn't feel like the future that we're going to get. And so I'm wondering, are LLMs it or is there more to be done here?
所以我们从这些模型能力日益趋同的话题开始。虽然这对大语言模型(LLM)成立,但我认为迄今为止对智能体而言并非如此。这是因为训练智能体的方式与训练LLM的方式实际上存在很大差异。众所周知,LLM的主要训练是通过下一个词预测完成的。对吧?
So we started with a topic of how these models are increasingly converging in capability. So while that's true for LLMs, I don't think that's been true to date for agents. And it's because the way that you should train an agent and the way that you train an LLM are actually quite different from each other. So LLMs, as we all know, the the bulk of their training happens from doing next token prediction. Right?
我拥有互联网上所有文章的庞大语料库。让我尝试预测下一个词。如果我预测正确,就获得正向奖励;如果预测错误,就会受到惩罚。对吧?
I've got a giant corpus of every article on the Internet. Let me try to predict the next word. And if I get the next word right, then I get a positive reward. And if I get it wrong, then I'm penalized. Right?
但实际上,这本质上是我们领域称为行为克隆或模仿学习的方法,与货物崇拜现象如出一辙。LLM从未学会为什么下一个词是正确答案,它只学会当看到与先前词汇相似的内容时,就应该说出这个特定的下一个词。问题在于这种方式非常适合聊天场景。
But in reality, what's actually happening this is, you know, in the field we call behavioral cloning or imitation learning, it's the same thing as cargo culting. Right? The LLM never learns why the next word is the right answer. All it learns is that when I see something that is similar to the previous set of words, I should go say this particular next word. The issue with this is this is great for chat.
这对创意用例非常有利——当你需要一些幻觉带来的混乱和随机性时。但若想要它成为真正成功的决策智能体,这些模型需要学习真实的因果机制。这不仅是对人类行为的简单模仿,而是真正学习'如果我做X,其后果就是Y'。
This is great for creative use cases, right, where you want some of the chaos and randomness from hallucinations. But if you want it to be an actual successful decision making agent, these models need to learn the true causal mechanism. Right? It's not, you know, just cloning human behavior. It's actually learning if I do x, the consequence of it is y.
因此问题在于:如何训练智能体使其能够学习自身行动的后果?答案显然不能只是进行更多行为克隆和文本复制,而必须是通过现实世界中的实际试错来实现。这基本上就是我们在亚马逊团队的研究路线图。
And so the question is how do we train agents to be able to learn the consequences of its actions? And the answer obviously cannot be just doing more behavioral cloning and copying text. Right? It has to be something that looks like actual trial and error in the real world. And so that's basically the research roadmap for what we're doing on my in my group at Amazon.
我的朋友安德烈·卡帕西有个很好的类比:想象你要训练一个智能体打网球。你不会让它花99%的时间观看网球视频,只用1%的时间实际打球。你会让这两者保持更均衡的比例。我们在亚马逊实验室正在进行的正是大规模自我对弈。
My friend Andre Karpathy has a really good analogy here, which is, you know, imagine you have to train an agent to go play tennis. Right? You wouldn't have it spend 99% of its time watching YouTube videos of tennis and then 1% of its time actually playing tennis. You would have something that's far more balanced between these two things. So what we're doing in our lab here at Amazon is we're actually doing large scale self play.
如果您记得自我对弈的概念——这是DeepMind在2010年代中期通过围棋击败人类而推广的技术。他们创建了海量的模拟围棋环境,让模型不断与自身对弈。每当发现能击败之前版本的策略时,就通过强化学习获得正向奖励,未来更多采用该策略。
And so if you remember the concept of self play, what it was was a technique that really DeepMind made popular in the mid twenty tens when they beat humans at playing Go. For playing Go, what they did was they spun up a bajillion simulated Go environments. Right? And then they had the model play itself over and over and over again. Every time they found a strategy that was better at beating a previous version of itself, it would effectively get positive reward via reinforcement learning to go do more of that strategy in the future.
通过在围棋模拟器中投入大量算力,它最终发现了超人类的围棋策略,在与世界冠军对弈时走出了人类从未见过的招式,推动了整个领域的发展。我们现在做的是创建大型强化学习训练场,每个训练场都模拟知识工作者可能使用的环境:类似SalesForce的系统、ERP系统等。
And if you spent a lot of compute on this in the Go simulator, it actually discovered superhuman strategies for how to play Go and then ended up, you know, when they played the world champion, making moves that no human had ever seen before and contributed to, like, the state of the art of that whole field. What we're doing is rather than doing more behavioral cloning or watching YouTube videos, what we're doing is we're creating a giant set of RL gyms. And each one of these gyms, for example, is an environment that a knowledge worker might be working in to get something useful done. So here's a version of something that's like Salesforce. Here's a version of something that's like an ERP.
这是CAD程序,这是电子病历系统,这是会计软件。每个可能的知识工作领域现在都是一个模拟器。我们不再仅仅训练LLM处理技术问题,而是让模型在每个模拟器中设定目标,尝试解决问题,判断是否成功解决,然后根据'我是否正确计算了折旧'或'是否正确设计了CAD零件'获得奖励反馈。
Here's a CAD program. Here's electronic medical record system. Here's accounting software. Here's, you know, every interesting domain of possible knowledge work is now a simulator. And now instead of training an LLM just to do tech stuff, we have the model actually propose a goal in every single one of these different simulators, try solving that problem, figure out if it's successfully solved it or not, and then get reward and feedback based on, you know, oh, did I do the depreciation correctly or did I correctly make this part in CAD?
或是'是否成功预订了航班'——举个消费领域的例子。每次这样做时,它都能真正学习自身行动的后果。我们相信这是实现真正AGI缺失的关键环节,目前正在亚马逊大规模推进这项计划。
Or did I successfully book the flight? Or to choose a consumer analogy. Every time it does this, it actually learns the consequences of its actions. And we believe that this is one of the big missing pieces left for actual AGI and that we're really scaling this recipe up at Amazon right now.
这种方法目前在行业中有多独特?你认为其他实验室也在研究这个吗?既然你们在讨论它,我猜应该是的。
How unique is this approach in the industry right now? Do you think the other labs are onto this as well? If you're talking about it, I would assume so.
我认为有趣的是,在我看来,这个领域最终必须能够做到类似的事情,才能超越互联网上可用于训练模型的免费数据有限这一事实。我们在亚马逊所做的,是因为这源于我们在Adept的工作,而Adept从事智能体研究已久,我们比任何人都更关心这个问题,并且我认为已经在这个目标上取得了很大进展。
I think that what's interesting is this field ultimately you have to be able to do something like this in my opinion, to be able to get beyond the fact that there's a limited amount of free floating data on the Internet that you can train your models on. The thing we're doing at Amazon is because this came from what we did at Adept and Adept has been doing agents for so long, we just care about this problem way more than everybody else and I think have made a lot of progress towards this goal.
你称这些为'gems',我一时想到了物理宝石。这会变成物理宝石吗?你有机器人技术的背景吗?
You called these gems, and I was thinking physical gems for a second. Does this become physical gems? Does it you have a background in robotics.
对吧?我以前也做过机器人工作。我们这里还有Peter Beale,他来自Coperion,是伯克利教授,基本上是他或他的学生创建了今天大部分有效的强化学习算法。你说'gems'很有趣,因为我们当时正试图为这个项目找一个内部代号。我们考虑过Equinox、Barry的训练营等等。
Right? I've also done robotics work before. Here we also have Peter Beale, who came from Coperion and is a Berkeley professor that basically created or his students ended up creating the majority of the RL algorithms that work well today. It's funny that you say gems because we were trying to find an internal code name for the effort. We kicked around Equinox and Barry's boot camp and all this stuff.
我不确定每个人都有同样的幽默感,但我们实际上称它们为'gyms',因为在OpenAI,我们有一个非常有用的早期项目叫OpenAI Gym。那是在大语言模型成为主流之前很久的事了。OpenAI Gym是一个视频游戏任务和机器人任务的集合。比如,你能平衡一个放在小车上的杆子吗?你能训练一个强化学习算法让它保持完美居中吗?等等。我们受到启发的是,既然这些模型现在足够聪明了,为什么还要用那种玩具任务呢?
And I'm not sure everybody had the same sense of humor, but but we call them gyms actually because at OpenAI, we had a very useful early project called OpenAI Gym. And what it was was this was far way before LLMs were a thing. And what OpenAI Gym did was that was a collection of video game tasks and robotics tasks. Like, can you balance a pole that's on a on a cart and can you train an RL algorithm that can that can keep that thing perfectly centered, etcetera? What we were inspired to do is now that these models are smart enough, why have toy tasks like that?
为什么不把人类在电脑上实际执行的有用任务放入这些'gyms'中,让模型从这些环境中学习呢?而且我看不出这为什么不能推广到机器人技术。
Why not put in the actual useful tasks that humans do on their computer into these gyms and have the models learn from these environments? And I don't see why this wouldn't also generalize to robotics.
这个的最终状态是一个通过AWS部署的智能体框架系统吗?
And is the end state of this a agent's framework system that gets deployed through AWS?
所有这一切的最终状态是一个模型加上一个系统,它像岩石一样坚固可靠,对于在计算机上完成的各种有价值的脑力工作任务,其可靠性达到99%。我们认为这将作为一项服务在AWS上提供,未来将为众多有用的应用提供支撑。
The end state of all this is a model plus a system that is like rock solid reliable, like 99% reliable at all sorts of valuable knowledge work tasks that are done on a computer. And this is going to be something that we think is gonna be a service on AWS that's gonna underpin effectively so many useful applications in the future.
我最近和Perplexity的CEO Aravind做了一期节目,谈到了他的Comet浏览器。很多消费者方面的人认为,浏览器界面实际上将是在消费者端大规模实现智能体的方式。我很好奇你对此怎么看,这种观点认为仅仅有一个聊天机器人是不够的。你真的需要让ChatGPT或任何模型坐在你的浏览器旁边,查看网页,为你操作,并从中学习。消费者端的这一切是朝着这个方向发展吗?
I did a recent episode with Aravind, the CEO of Perplexity, and his Comet browser. A lot of people on the consumer side think that the browser interface is actually going to be the way to get to agents at scale on the consumer side. I'm curious what you think of that, this idea that it's not enough to just have a chatbot. You really need to have ChatGPT or whatever model sit next to your browser, look at the web page, act on it for you, learn from that. Is that where all this is headed on on the consumer side?
我认为聊天机器人绝对不是长期的答案,或者至少不是我们今天所理解的那种聊天机器人,如果你想构建为你执行操作的系统的话。我对此最好的类比是:我父亲是一个意图良好、聪明的人,他的大部分职业生涯都在工厂工作,他经常打电话给我寻求技术支持。他会说,'大卫,我的iPad出了点问题。你得帮我解决这个。'而我们只是在电话里沟通,我看不到他屏幕上的内容。
I think chatbots are definitely not the long term answer or at least not chatbots in the way we think about it today if you wanna build systems that take actions for you. The best analogy I have for this is so my dad is a very well intentioned, smart guy, spent a lot of his career working in a factory, and he calls me all the time for tech support help. And so he's like, David, something's wrong with my iPad. You gotta help me with this. And we're just doing this over the phone, and I can't see what's on the screen for him.
所以我正在努力搞清楚,比如,你知道,你打开设置菜单了吗?你点击过这个东西了吗?哦,比如这个开关是怎么回事?聊天是一种带宽极低的交互方式。就像这样,试图通过聊天来完成操作,而另一边是一个非常能干的人在努力为你处理事情。
And so I'm trying to figure out, oh, like, know, do you have the settings menu open? Have you clicked on this thing yet? Oh, like, what what's going on with this toggle? Chat is such a low bandwidth interface. Like, that is the chat experience for trying to get actions done with a very competent human on the other side trying to handle things for you.
所以在我看来,目前AI领域一个重大的缺失就是我们缺乏对产品形态的创造力,坦率地说。对吧?我们太习惯于认为人与AI之间的正确界面就是这种垂直的一对一互动,我委派任务,你或许给我一些反馈,或者我问你问题等等。我们一直真正缺失的是这种并行互动,用户和AI实际上有一个共享的画布,他们共同在上面协作。我认为如果你真的考虑为知识工作者打造一个队友,或者只是世界上最聪明的个人助理,你会希望生活在一个你们俩实际上有一个共享协作画布的世界里。
So one of the big missing pieces, in my opinion, right now in AI is our lack of creativity with product form factors, frankly. Right? We're so used to thinking that the right interface between humans and AIs is this, like, perpendicular one on one interaction where I'm delegating something, you're maybe giving me some news back or I'm asking you a question, etcetera. One of the real things we've always missed is this parallel interaction where both the user and the AI actually have a shared canvas that they're jointly collaborating on. I think if you really think about building, you know, a teammate for knowledge workers or even just the world's smartest personal assistant, you would want to live in a world where there's actually a shared collaborative canvas for the two of you.
说到协作,我很好奇你的团队是如何与亚马逊其他部门合作的。你们是相当独立于一切吗?你们参与Nova(亚马逊的基础模型)的工作吗?你们如何与亚马逊的其他部分互动?
Speaking of collaboration, I'm really curious how your team works with the rest of Amazon. Are you pretty walled off from everything? Do you work on Nova, Amazon's foundational model? Like how how do you interact with the rest of Amazon?
亚马逊在我们所做的工作上做得非常好的一点是,我们被允许相当独立地运营。我认为他们认识到,目前一些初创公司的DNA对于实现最大速度是非常有价值的。如果你相信AGI(人工通用智能)还有两到五年,有些人变得更加乐观,有些人变得更加悲观,这并不重要。从大局来看,时间并不充裕。你需要行动得非常、非常快。
What Amazon has done a great job with for what we're doing here is we're allowed to run pretty independently. And I think there's a recognition that some of the startup DNA right now is really valuable for maximum speed. If you believe AGI, right, is two to five years away, some people are getting more bullish, some people are getting more bearish, doesn't matter. That's not a lot of time in the grand scheme of things. You need to move really, really fast.
所以我们被给予了很大的独立性。我们也把我们构建的技术栈贡献了很多给上游的Nova基础模型。
So we've been given we've been given a lot of independence. We've also taken the tech stack that we've built and contributed a lot of that upstream to the Nova Foundation model as well.
那么你的工作,例如,是否已经影响到Alexa Plus,或者这在某种程度上与你无关?
So does your work, for example, is it already impacting Alexa Plus, or is that not something that you're part of in any way?
这是个好问题。Alexa Plus有能力做到,例如,如果你的马桶坏了,就像,哦,天哪,我真的需要一个水管工。Alexa,你能给我找个水管工吗?然后发生的情况是,Alexa Plus会启动一个由我们技术驱动的远程浏览器,然后像人一样使用Thumbtack这样的平台去为你找一个水管工上门,我认为这真的很酷。如果我没记错的话,这是第一个发布的生产级网络代理。
That's a good question. So Alexa Plus has the ability to for example, if your your toilet breaks, it's like, oh, man, I really need a plumber. Alexa, can you get me a plumber? Then what happens is Alexa plus spins up a, remote browser powered by our technology basically that then goes and uses thumbtack like a human to go get you a plumber to your house, which I think is really cool. It's the first production web agent that's that's been shipped, if I remember correctly.
是的。而且,你知道,早期对Alexa Plus的反馈是,对于Alexa来说它是巨大的进步,但仍然很脆弱。仍然有它不可靠的时刻。我在想,这是真正的瑰宝吗?这是大规模应用的瑰宝吗?Alexa Plus是否是让你的系统更快变得更可靠的途径?
Yeah. And, you know, the the early reception to Alexa Plus has been that it's dramatically for Alexa, but still brittle. There's still moments where it's not reliable. And I'm wondering, is this the real gem? Is this the at scale gem where Alexa Plus is how your system gets more reliable much faster?
你必须将其投入生产并部署到……我的意思是,Alexa有数百万甚至数千万的设备在运行。这就是策略吗?还是因为……因为我肯定你已经看到了对Alexa Plus的早期反应是它更好,但仍然没有人们希望的那么可靠。
You have to have this in production and deployed to I mean, Alexa has millions and millions of devices that it's on. Is is that the strategy? Or because because I'm sure you've seen there's the early reactions to Alexa Plus are it's better, but still not as reliable as people would like it to be.
Alexa Plus只是我们众多客户之一。在亚马逊内部真正有趣的是,回到我们之前讨论的,网络数据实际上正在耗尽,并且对于训练代理来说用处不大。真正用于训练代理的是大量环境和大量人员执行可靠的多步骤工作流。所以在亚马逊有趣的是,除了Alexa Plus,基本上每个财富500强企业的运营都以某种方式由亚马逊内部的某个团队代表。对吧?
Alexa Plus is just one of many customers that we have. And what's really interesting about being within Amazon is going back to what we're talking about earlier, web data is effectively running out, and it's not useful for training agents. What's actually used for training agents is lots and lots of environments and lots and lots of people doing reliable multi step workflows. And so the interesting thing at Amazon is that in addition to Alexa Plus, basically every Fortune 500 businesses operations are represented in some way by some internal Amazon team. Right?
比如这个医疗案例,供应链和采购端的一切都发生在零售侧,AWS上有所有这些面向开发者的内容,而智能体训练需要大量私有数据和私有环境。由于我们在亚马逊内部,这些现在都成了一方业务。这只是我们获取可靠工作流数据来训练更智能体的众多方式之一。
Like this one medical, there's everything happening on supply chain and procurement on the retail side, there's all this developer facing stuff on AWS, and agents are gonna require a lot of private data and private environments to be trained. And because we're in Amazon, that's all now 1P. So they're just one of many different ways in which we can get reliable workflow data to train the smarter agent.
你们是否已经通过亚马逊的物流运营在做这件事?比如在仓库里操作,或者亚马逊正在研究的机器人技术?这些与你们的工作已经有交集了吗?
Are you doing this already through Amazon's logistics operations where you can do stuff in warehouses or, you know, the robotic stuff that Amazon is working on? Does that intersect with your work already?
我们与机器人团队的Peter Beale小组合作非常紧密,这很棒。在其他领域,我们正大力推动智能体在亚马逊内部的采用,因此很多相关对话和合作都在进行中。
Well, we're really close to Peter Beale's group on the robotics side, which is awesome. Some of the other areas, we have this a lot a big push for internal adoption of agents within Amazon, and so a lot of those conversations or engagements are happening.
很高兴你提到这个。我正想问,目前智能体在亚马逊内部是如何被使用的?
I'm glad you brought that up. I was gonna ask, how are agents being used inside Amazon today?
正如我们之前所说,由于亚马逊几乎对每个有用的知识工作领域都有内部推进计划,大家对采用这些系统热情很高。我们有一个内部渠道——具体名称就不透露了,但代号是'兴趣连接',与我们正在开发的产品相关。看到全球各地的亚马逊团队踊跃参与令人惊叹,因为我们之前的主要瓶颈是相当长一段时间仅限美国可用,没想到有这么多国际团队渴望启用并将其用于各种运营任务。
Again, as we were saying earlier, because Amazon sort of has a internal effort for almost every useful domain of knowledge work, there's been, you know, a lot of enthusiasm to pick up a lot of these systems. And we have this internal channel called actually, won't tell you what what it's called, but, you know, code name dash interest, which is Okay. Related to the product that that we've been building. And it's just been crazy to see teams from all over the world within Amazon actually, because one of the main bottlenecks we've had is we didn't actually have availability outside of The US for quite a while. It was crazy just how many international Amazon teams wanted to start picking this up and then using it themselves on various operations tasks that they had.
你说的是你们的智能体框架吗?这是尚未公开推出的产品?
This is your just agent framework that you're talking about? This is something you haven't released publicly yet?
我们在三月份发布了研究预览版Nova Act。但可想而知,之后我们增加了更多功能,进展非常顺利。我们一贯的做法是先通过内部团队进行自用测试。
We released Nova Act, which was a research preview that came out in March. But as you can imagine, we've added way more capabilities since then, and it's been really cool. We the thing we always do is we first dog food with internal teams.
是的。你们同事在发布Novaact时称其是构建能可靠使用浏览器的智能体最省力的方式。发布之后,人们如何使用它?虽然我日常工作中不太听说,但想必有公司在使用。我很好奇你们收到哪些反馈。
Yeah. Your colleague, when you guys released Novaact said it was the most effortless way to build agents that can reliably use browsers. Since you've put that out, how are people using Novaact? It's not something that, you know, in my day to day I hear about, but I assume companies are using it. And I'd be curious to hear what the feedback is that you guys have gotten since you put it out.
各类企业和开发者都在使用Novaact。您不太听说是因为我们并非面向消费者的产品。实际上,包括我先前在Adept所做的,整个亚马逊智能体策略都专注于基础型智能体——不是那些三次仅成功一次的花哨功能,而是可靠性达99%以上的底层工作流,这才是目标。
Yeah. So a wide range of enterprises and developers are using Novaact. And the reason why it's not something that you hear about is because we're not a consumer product. If anything, the whole Amazon agent strategy, including what I did before at Adept, is sort of doing norm core agents, not the super sexy stuff that, you know, works one out of three times, but super reliable, low level workflows that work 99 plus percent of the time. So that's the target.
Nova Act发布后,多家企业部署使用并实现了95%以上的可靠性——相比其他智能体产品平均60%的可靠度(相关报道中可见),这是实质性提升。我认为可靠性瓶颈正是整个领域智能体采用率不高的原因。而我们通过极致专注可靠性取得了显著成效,目前已应用于医生护士注册等场景。
Since Nova Act came out, we've actually had a bunch of different enterprises end up deploying with us where they're seeing 95 plus percent reliability, which is, as I'm sure you've seen from the coverage of other agent products out there, is like a material step up from the average 60% level of reliability that folks see with those systems. And I think that reliability bottleneck is why you don't see as much agent adoption overall in the field. And we've been having a lot of really good luck specifically by focusing extreme amounts of effort on reliability. So we're now used for things like, for example, doctor and nurse registrations. Right?
或者我们有另一个客户叫Nivon,以前叫TripActions,他们基本上使用我们来为客户自动化处理大量后端旅行预订。我们有公司通过单一脚本自动化了多达93步的质量保证工作流程等等。所以我认为早期的进展真的很酷。现在面临的是如何在上百万个环境中进行超大规模自我对弈,以实现类似RL智能体的GPT时刻,我们正在全力朝着这个方向努力。
Or we have another customer called Nivon, which is formerly TripActions, which uses us basically to automate a lot of back end bookings for travel for their customers. We've got companies that basically have like 93 step QA workflows that they've automated with a single act script, etcetera. So I think the early progress has been really cool. Now what's up ahead is how do we do this extreme large scale self play on a bajillion gyms to get to something where there's a bit of a GPT for RL agents moment, and we're running as fast as we can towards that right now.
你们对这个目标有清晰的时间预期吗?你觉得距离实现还有两年?还是一年?
Do you have a line of sight to that? Do you think we're two years from that? One year?
说实话,我认为不到一年。我们有明确的时间规划。我们已经为这个特定问题的每个步骤组建了团队,各项工作开始初见成效。每天上班都很有趣,你会发现某个团队当天取得了小而实用的突破,我们正在进行的这个训练循环似乎每天都在加速。回到GPT-5的话题,有人问这是否预示着AI进展会放缓?
Honestly, I think we're sub one year. We have line of sight. We've built out teams for every step of that particular problem, and things are just starting to work. It's just really fun to go to work every day and you realize that one of the teams has made a little very useful breakthrough that particular day and the the whole cycle that we're doing for this training loop seems to be going a little bit faster every day. Going back to GPT five, people have said, you know, does this portend a slowdown in AI progress?
百分之百,我认为答案是否定的,因为一条S曲线衰退时(顺便说一句,我不认为预训练这条曲线已经衰退,但确实现在获得提升比之前更难了),还有可验证奖励的强化学习。每当其中一条S曲线似乎放缓时,总会有另一条曲线接踵而至。
And 100%, I think the answer is no because one one s curve peters out. Right? The first one being pre training, which I don't think has petered out by the way either, but it's definitely at this point, like, less easy to get gains than before. And then you've got RL with verifiable rewards. But then every time one of these s curves seems to slow down a little bit, there's another one coming up.
我认为智能体是下一条S曲线,而我们之前讨论的特定训练方法正是获得下一波巨大加速的主要途径之一。
And I think agents is the next s curve, and the specific training recipe we were talking about earlier is one of the main ways of getting that next giant amount of acceleration.
大家好,Decoder的听众们。我是The Verge的高级记者Tina Nguyen。我来向大家介绍Regulator,我的全新时事通讯,涵盖特朗普政府以及科技巨头与政府之间最新较量。科技与政治一直存在分歧,而现在它们正陷入一场影响所有领域的生存之战——从加密货币到人工智能,从汽车到企业,从你购买的商品到我们在互联网上的存在方式。
Hi, Decoder listeners. This is Virg senior reporter Tina Nguyen. I'm here to tell you about regulator, my brand new newsletter covering the Trump administration and the latest battles between big tech and big government. Tech and politics have always been at odds, and now they're locked in an existential fight that affects everything. From crypto to AI, from cars to corporations, from stuff you buy to the very way we exist on the Internet.
Regulator是您了解这种微妙权力平衡及其如何影响您和企业的必备信息来源。订阅The Verge即可免费获取Regulator。立即前往theverge.com/subscribe注册。网址是theverge.com/subscribe。
Regulator is your essential source covering that delicate balance of power and how it could impact you and your business. Regulator is included with every Verge subscription. Sign up today at theverge.com/subscribe. That's theverge.com/subscribe.
嘿,Vox Media的听众们。我是Mike Murphy。当两个长期相互竞选的政坛老手遇上一位世界级记者会发生什么?你会得到一笔可观的酒吧账单。这就是结果。
Hey, Vox Media listeners. It's Mike Murphy. What happens when you get two political hacks who've been running campaigns against each other for forever and add a world class journalist? You get a big bar tab. That's what you get.
我是David Axelrod,我还要告诉你——你还会收获一档很棒的播客《Hacks on Tap》。
This David Axelrod telling you you also get a great podcast called hacks on tap.
我是John Heilman,我来告诉你我们会带给你什么:一档每周播客,涵盖新闻头条以及推动我们政治格局的长期趋势,由三位见识过从竞选活动到空军一号前舱所有场面的资深人士主持。
This is John Heilman, and I'll tell you what we'll give you. The weekly podcast that covers news, the headlines, and also the longer term trends driving our politics with three guys who've seen it all from the campaign trail to the forward cabin of Air Force one.
每周请加入我们在Vox Media播客网络上的《Hacks on Tap》节目。
Join us every week on hacks on tap on the Vox Media Podcast Network.
我是诺埃尔·金。今天,在Vox的《今日解说》节目中,我将与保守派活动家、作家兼挑衅者克里斯托弗·鲁福对话。为什么?因为克里斯·鲁福从大学、企业乃至特朗普总统那里都能得到他想要的东西。他想要终结DEI。
I'm Noelle King. And today, on Today Explained from Vox, I'm talking to conservative activist, writer, and provocateur Christopher Rufo. Why? Because Chris Rufo gets what he wants from universities, from corporations, from president Trump. He wanted an end to DEI.
他做到了。
He got it.
我们已经在整个联邦政府范围内终结了所谓的多元化、公平与包容政策的专制统治
We've ended the tyranny of so called diversity, equity, and inclusion policies all across the entire federal government
而且他要求政府停止向大学提供联邦资金,除非它们屈服于他的要求。这个他也得到了。他想要让一个晦涩的学术法律理论变成全国性的恐怖象征。也完成了。
and He wanted the government to yank federal funding from universities unless they submitted to his demands. He got that too. He wanted an obscure academic legal theory to become a national boogeyman. Done.
我们已经从公立学校清除了批判性种族理论的毒害。
We have removed the poison of critical race theory from our public schools.
他还想让Cracker Barrel把标志改回去。
He wanted Cracker Barrel to change its logo back.
事实上,我们只需稍加努力就能打破这个桶。
We could, in fact, break the barrel, with just a small amount of effort.
既然他总能如愿以偿,我们认为值得一问:他现在想要什么?克里斯·鲁福的文化革命。《今日解说》每个工作日都在您的订阅源中更新。
Since he's getting what he wants, we thought it was worth asking, what does he want now? Chris Rufo's cultural revolution. Today explained is in your feeds every weekday.
听起来你和你的同事们已经确定了行业下一步的转向方向。这让我对如今Nova的定位有了更清晰的认识——作为大型语言模型,Nova并非行业领先的LLM。我的意思是,它无法与Claude、GPT-85或Gemini等相提并论。是因为Nova本身不够重要,还是说真正重要的是你们讨论的智能体技术会让Nova变得更相关?亦或是Nova也必须成为世界顶尖LLM才重要?或许这种思考方式本身就不对?
It sounds like you and your colleagues have identified the next turn that the industry is going to have. And that starts to put Nova as it exists today into more context for me because Nova as an LLM, it's not an industry leading LLM. I mean it's not in the same conversation as Claude or GPT-eighty five or what have you or Gemini. Is Nova just not as important because what's really coming is what you're talking about with agents and that will make Nova more relevant or is it important that Nova is the best LLM in the world as well, or is is that not the right way to think about it?
正确的思考方式是,每当有一个新崛起的实验室试图加入AI领域的前沿竞争时,你需要押注那些能够真正实现跨越式发展的技术。对吧?有趣的是,每当这些模型的训练方法发生变革时,就会为采用新方法入局的新参与者创造巨大的机会窗口——因为他们不需要追赶所有旧方法,而这些旧方法对现有企业来说反而是负担。举例来说,OpenAI基本上开创了巨型模型的先河,大型语言模型(LLM)的概念源自GPT-2和后来的GPT-3。但最初的LLM仅采用纯文本训练方法,后来我们发现了RLHF(人类反馈强化学习),开始通过RLHF获取大量人类数据。而在转向多模态输入时,你不得不抛弃许多在纯文本领域所做的优化,这为其他人提供了追赶的时间。
The right way to think about it is that every time you have a new upstart lab trying to join the frontier of the AI sort of game, you need to bet on something that can really leapfrog. Right? What's interesting is every time there's a recipe change of how these models are trained, it creates a giant window of opportunity for someone new who's starting to come to the table with that new recipe instead of trying to catch up on all the old recipes because the old recipes are actually baggage for the incumbents. So to give some examples of this, at OpenAI, of course, we basically pioneered giant models, right? The whole LLM thing came out of GPT-two and then three of course, but those LLMs initially, they were text only training recipes Then we discovered RLHF and then they started getting a lot of human data via RLHF, but then in the switch to multimodal input, you kind of have to throw away a lot of the optimizations you did in the text only world and that gives time for other people to catch up.
我认为这正是Gemini能够迎头赶上的部分原因——他们押注了原生多模态等有趣的想法,最终取得了成功。随后,推理模型又为人们提供了另一个追赶的机会。这就是为什么DeepSeek能够震惊世界,因为他们直接量子隧穿式地跃迁到那一阶段,而非逐步推进。我认为下一轮变革将是智能体(agents),尤其是那些没有可验证奖励机制的智能体。如果亚马逊能凭借公司规模优势,更早、更快、更好地掌握这一方法,那将使我们直接站到前沿阵地。
I think that was actually part of how Gemini was able to catch up was that they bet on certain interesting ideas on native multimodal that turned out well for them. Right? But then after that, with reasoning models, right, they gave another opportunity for people to catch up. That's why DeepSeek was able to surprise the world because they straight quantum tunnel to that instead of doing every stop along the way. And I think with the next turn being agents, especially agents without verifiable rewards, if at Amazon we can figure that recipe out earlier, faster, better than everybody else with all the scale that we have as a company, it basically brings us to the frontier at that point.
这是我第一次听亚马逊这样阐述,非常有意思,也很有道理。最后我们来谈谈人才市场和初创企业的现状,以及你加入亚马逊的经过——我想回到Adept的话题。你们创立Adept时,它是否是最早专注于智能体的初创公司?
I haven't heard that articulated from Amazon before. That's really interesting. It makes a lot of sense. Let's end on the state of the talent market and startups and how you came to Amazon actually I want to go back to and Adept. So Adept when you started it, was it the first startup really focusing on agents at the time?
在看到Adept之前,我甚至没听说过智能体这个概念。
I don't think I had heard of agents until I saw Adept.
是的。我们确实是首家专注于智能体的初创公司,因为创立Adept时,我们发现LLM擅长对话却无法执行操作。我无法想象一个不需要解决这个关键问题的世界,因此我们全力聚焦于此。但起步时,'智能体'这个产品类别甚至还没有被命名。
Yeah. Actually, we were the first startup focusing on agents because when we were starting Adept, we saw that LLMs were really good at talking but could not take action. And I could not imagine a world in which that was not a crucial problem to be solved. So we got everybody focused on solving that. But when we got started, the word agent as a product category wasn't even coined yet.
我们当时尝试寻找合适的术语,曾考虑过'大型行动模型'、'行动变换器'等名称。我们的首款产品就叫做行动变换器(Action Transformer),之后'智能体'这个术语才逐渐流行起来。
So we were trying to find a good term, and we started we played with things like large action models and action transformers. So our first product was called action transformer. And then only after that did agents really start picking up as being the term.
请详细说说你决定离开Adept并带着大部分技术团队加入亚马逊的决策过程。我有个说法来形容这种现象:这是一种如今在科技巨头与AI初创公司之间常见的交易结构——反向收购式人才引进(reverse acqui-hire)。核心团队(如你和联合创始人)加入大公司,初创公司依然存在但技术团队撤离,收购方(我知道这不是严格意义上的收购)支付许可费之类,股东获利,但初创公司通常需要在没有创始团队的情况下自行发展。最近的例子有Windsurf与Google,之前还有Scale AI与Meta。我们在Decoder节目中经常讨论这个话题。
Walk me through the decision to leave it behind and join Amazon with most of the technical team, is that right? I have a phrase for this. It's a deal structure that has become common now with big tech and startups and AI, the reverse aqua hire where basically the core team like yourself and your co founders, they join, the rest of the company still exists but you know, the technical team goes away and the acquirer quote unquote, I know it's not an acquisition but pays a licensing fee or something and shareholders make money but the startup is then kind of left to figure things out without its founding team in most cases. And, you know, the most recent example is Windsurf and Google and then there was Scale AI and Meta before that. This is a topic we've been talking about on Decoder a lot.
听众们对此很熟悉,但你们是最早一批这种反向收购式人才引进的案例之一。请详细说说你决定加入亚马逊的时间和原因。
The listeners are familiar with it. But you were one of the first of these such reverse acqui hires. Walk me through when you decided to join Amazon and why.
我希望五十年后,人们记住我更主要是作为AI研究创新者,而非交易结构创新者。首先,人类对智能的需求远远超过当前供给。因此,整个领域投入巨额资金建设全球最大计算集群,并汇聚顶尖人才来驱动这些集群,是完全合理的。如果你能额外投入X美元构建一个智商提高10点、能为人类解决全新范围实用任务的模型,这绝对是值得的交易,任何时候都应该去做。
So I hope, you know, in fifty years, I'm remembered more as being an AI research innovator rather than a deal structure innovator. Well, first off, humanity's demand for intelligence, right, is way, way, way higher than the amount of supply. And so therefore, for us as a field to invest ridiculous amounts of money in building the world's biggest clusters and bringing the best talent together to drive those clusters is actually perfectly rational. Right? Because if you can spend, you know, an extra x dollars to build a model that has plus 10 IQ points and can solve, like, a giant new concentric circle of useful tasks for humanity, that is a worthwhile trade that you should do any day of the week.
所以我认为所有这些公司当前都在努力汇聚人才和算力的临界质量是很有道理的。从我个人的角度,选择加入亚马逊是因为他们深知在智能体领域获胜的重要性,智能体是亚马逊构建顶级前沿实验室并达到规模效应的关键赌注。你听到各大超大规模厂商公布的资本支出数字,简直令人瞠目结舌,而且这一切都是真实的。
And so I think it makes a lot of sense that all these companies are trying to put together critical mass on both talent and compute right now. And from my perspective for why join Amazon, it's because Amazon knows how important it is to win on the agent side in particular, and that agents are an crucial bet for Amazon to really build one of the best frontier labs possible and to get to the level of scale. Right? You're you're hearing all these CapEx numbers from the various hyperscalers. It's just completely mind boggling, and it's it's all it's all real.
对吧?
Right?
光是今年,仅顶级超大规模企业的资本支出就超过3400亿美元,我觉得。是的,这是个天文数字。
It's over 340,000,000,000 in CapEx this year alone, I think, from just the top hyperscalers. Yeah. It's it's an obscene number.
是的,听起来差不多。而Adept,你知道,我们融资了4.5亿美元,这在当时是个很大的数字,但如今
Yeah. That sounds about right. And Adept, you know, we raised 450,000,000, which at the time was a very large number, and then today is
现在就是小钱了。简直是零花钱。那只是一个研究员的成本。得了吧,David。
It's chump change now. It's chump change. That's one researcher. Come on, David.
是啊,只是一个研究员。对吧?那只是一个员工的成本。所以如果你生活在这样的世界里,我认为与一个愿意战斗到底的伙伴合作至关重要,这就是我们选择亚马逊的原因。
Yeah. It's one researcher. Right? That's that's one that's one employee. So if that's the world that you live in, it's really important, I think, for us to partner with someone who's gonna go fight all the way to the end, and and that's why we came to Amazon.
当你与亚马逊达成协议时,预见到这种整合和数字的上升了吗?你知道不仅计算成本,人才成本也会持续上涨?是的。为什么?你当时看到了什么而其他人并不明显?
Did you foresee that consolidation and those numbers going up when you did the deal with Amazon? You knew that it was gonna just keep getting more expensive, not only on compute but on talent? Yes. And why? What what did you see coming that at the time it was not obvious to everyone?
我预见到了两件事。第一,如果你想处于智能的前沿,你必须处于计算的前沿。如果你不在计算的前沿,那么你必须转向去做完全不同的事情。我的整个职业生涯,我只想构建最智能、最有用的AI系统。所以把Adept变成只卖小模型的企业公司,或者变成一个做前置部署工程来帮你在别人模型上部署代理的地方,这些都不吸引我。
Two things I saw coming. One, if you want to be at the frontier of intelligence, you have to have you have to be at the frontier of compute. And if you're not on the frontier of compute, then you have to pivot and go do something that is totally different. And my whole career, all I wanna do is build the smartest and most useful AI systems. So the idea of turning Adept into, you know, enterprise company that only sells small models or, you know, turns into a a place that does forward deployed engineering to go help you deploy an agent on top of someone else's model, none of those things appeal to me.
比如,我想弄清楚剩下的四个关键研究问题通往AGI。我们如何攻克它们?每一个都可能需要数十亿美元的计算集群来运行。那么,我和我组建的这个团队,大家都被同一件事激励,我们怎么有机会去做这些?
Like, I wanna I wanna figure out here are the four crucial remaining research problems left to AGI. How do we nail them? Every single one of them is gonna require, like, two digit billion dollar clusters to go run that. So how else am I gonna be able to have and this whole team that I've put together who all are motivated by the same thing, how are we gonna have the opportunity to go do that?
如果对大科技公司的反垄断审查不像现在这样严格,亚马逊会直接完全收购这家公司吗?
If antitrust scrutiny did not exist for big tech like it does, would Amazon have just acquired the company completely?
我无法再谈论一般的动机和交易结构。我是一个AI研究创新者,不是一个
I can't speak to general motivations and and deal structuring again. I'm a I'm an AI research innovator, not a
我知道我必须问一下。你知道,好吧。那么也许你可以回答这个问题。这些正在发生且我认为会持续发生的交易,它们的二阶效应是什么?对研究界、对初创企业界会产生哪些二阶影响?
know I have to ask. You know, okay. Well then maybe you can answer this. What are the second order effects of these deals that are happening and I think will continue to happen? What are the second order effects on the research community, on the startup community?
我认为这改变了如今人们加入初创公司的考量,因为知道这类交易可能发生,并且会带走你决定加入并押注职业生涯的创始人或创始团队。这是一个转变。这是硅谷过去几年出现的新现象。
I think it changes the calculus for someone joining a startup these days knowing that these kind of deals happen and can happen and take away the founder or the founding team that you decided to join and bet your career on. And that that is a shift. That is a new thing for Silicon Valley in the last couple of years.
我想谈两点。一是老实说,创始人扮演着非常重要的角色。创始人必须真正想要照顾好团队,确保每个人都按比例平等地得到对待,对吧?第二点是,这在当前的人工智能领域非常反直觉,因为只有少数人拥有
There's two things I wanna talk about. One is honestly, the founder plays a really important role. The founder has to want to really take care of the team and make sure that everybody is treated pro rata and equally. Right? The second thing is it's very counterintuitive in AI right now because there's only a small number of people with a
大量
lot of
经验。而且因为未来几年发展会非常迅速,许多价值、市场定位等等都将在未来几年内决定。如果你负责其中一个实验室,并希望确保拥有最好的人工智能系统,你就需要雇佣知道自己在做什么的人。所以市场需求、对这些人的定价实际上是完全理性的,仅仅因为他们数量太少。但反直觉的是,如果你是一个初级人员,实际上并不需要很多年就能发现自己处于前沿领域。
experience. And because the next couple of years is going to move so fast and a lot of the value, the market positioning, etcetera, is going be decided in the next couple of years. If you're sitting there responsible for one of these labs and you want to make sure that you have the best possible AI systems, you need to hire people who know what they're doing. And so there's the market demand, the pricing for these people is actually totally rational just solely because of how few of them there are. But the counterintuitive thing is that it doesn't take that many years actually to find yourself at the frontier if you're a junior person.
该领域一些最优秀的人只是三四年前才开始接触,通过与合适的人合作,专注于正确的问题,只是非常、非常、非常努力地工作,他们就发现自己站到了前沿。就像人工智能研究的某些领域,如果你提出四五个问题,就已经发现了一个无人能解答的问题。然后你就可以专注于这个领域,专注于如何成为这个特定子领域的全球专家。所以我发现这真的很反直觉:真正知道自己在做什么的人非常少,然而从年限上来说,成为知道自己在做什么的人却非常容易。
Some of the best people in the field were people who just started three or four years ago and by working with the right people, focusing on the right problems, just working really, really, really hard, they found themselves at the frontier. Like AI research on those areas where if you ask four or five questions, you already discovered a problem that nobody has the answer to. And then you can just focus on that and focus on how do I become the world expert in this in this particular subdomain. And so I find that really counterintuitive that there's only very few people who really know what they're doing, and yet it's very easy in terms of number of years to become someone who knows what they're doing.
按照你的定义,世界上到底有多少人真正知道自己在做什么?这是我经常被问到的问题。今天早上我还在电视上被问到这个问题。到底有多少人?
How many people actually know what they're doing in the world from your definition? This is a question I get asked a lot. I was literally just asked this on TV this morning. How many people are there?
我认为这取决于你想放宽还是收紧标准。我会说
I think it depends on how generous or tight you wanna be. I would say
有谁能够真正构建并概念化地全面训练一个前沿模型?
there's Who can actually build and conceptualize training a frontier model holistically?
我信任并能交给他们巨额计算资源去做这件事的人数,可能不到一百五十人。
The number of people who can who I would trust with a giant dollar amount of compute to go do that is probably sub one fifty.
一百五十人。
One fifty.
是的。但还有更多人,比如另外大约500人,如果这个领域有一定数量的那150位真正懂行的人形成临界规模,这些人也会成为极其宝贵的贡献者。
Yes. But there are many more people, let's say another 500 people or so, that would be extremely valuable contributors to an effort that was populated by a certain critical mass of that one fifty that really know what they're doing.
但整个市场,这仍然不到一千人。
But the total market, that's still less than a thousand people.
我认为可能不到一千人。但是,再次强调,我不想轻视这个问题。我认为初级人才非常重要。那些来自其他领域的人,比如物理学、量化金融,或者刚做完本科研究的人,他们真的能非常非常快地带来巨大改变。但你希望让他们身边有几个已经从过去的训练尝试中吸取了所有教训的人。
I'd say it's probably less than a thousand people. But, again, I don't wanna I don't wanna trivialize. I think junior talent is extremely important. And people who come from other domains like like physics or quant finance or, you know, have just been doing undergrad research, these people make a massive difference really, really, really fast. But you want to surround them with a couple of folks who have already learned all the lessons from previous training attempts in the past.
这个本就非常小的精英群体,他们正在构建的东西本质上是为了取代他们自己——也许你不同意这一点,但我认为超级智能在概念上会让其中一些工作变得多余。这是否意味着未来他们中更少的人会赚更多钱,因为你只需要一些其他模型的协调者来构建更多模型?还是这个领域会扩大?你认为它会发展到成千上万人吗?
Does the fact that these this already very small group of elite people, does the fact that they're building something that inherently is designed to replace them, maybe you disagree with that, but I think superintelligence conceptually would make some of this redundant. Does it mean there's actually fewer of them in the future making more money because you only need, you know, some orchestrators of other models to build more models? Or does the field expand? Do you think it's going to become thousands and thousands of people?
这个领域肯定会扩大。会有越来越多的人真正学会该领域目前发展出的技巧,并发现下一批技巧和突破。但我认为,这个领域将比其他领域(如软件)规模更小的一个动态因素是,与常规软件工程不同,基础模型训练打破了许多我们认为应该遵守的规则。对吧?比如在软件领域,假设我们的工作是构建Microsoft Word。
The field's definitely going to expand. There's gonna be more and more people who really learn the tricks that the field has developed so far and discover the next set of tricks and breakthroughs. But I think one of the dynamics that's going to keep the field smaller than other fields like software is that unlike regular software engineering, a foundation model training breaks so many of the rules that we think we should have. Right? Like, in software, let's say we're our job here is to build Microsoft Word.
对吧?我可以说,嘿,Alex,你的工作是让保存功能正常。David的工作是确保云存储正常,另一个人的工作是确保UI看起来不错。你可以相当独立地分解这些问题。金融模型训练的问题在于,你做出的每一个决定都会干扰其他每一个决定,因为最终只有一个交付成果。
Right? I can say, hey, Alex, it's your job to make the save feature work. It's David's job to make sure that cloud storage works and someone else's job to make sure the UI looks good. You can factorize these problems pretty independently from each other. The issue with financial model training is that every decision you take interferes with every other decision because there's only one deliverable at the end.
最终的交付成果是你的前沿模型。它就像一个巨大的权重包。对吧?所以我在预训练中做的事情,另一个人在监督微调中做的事情,另一个人在强化学习中做的事情,另一个人为了让模型运行更快所做的事情,它们都会以有时相当不可预测的方式相互影响。所以,这是我见过的除了体育团队之外,人员规模不经济性最严重的领域之一。
The deliverable at the end is your frontier model. It's like one giant bag of weights. Right? So what I do in pre training, what this other person does in supervised fine tuning, this other person does in RL, this other person does to make the model run fast, they all interact with each other in sometimes pretty unpredictable ways. So it has one of the worst diseconomies of scale with number of people of anything I've ever seen, except maybe even sports teams.
对吧?也许这是另一个你不想拥有100个中级人员,而是想要10个最优秀的人的情况。对吧?对。
Right? Maybe that's the one that's the one other case where you don't wanna have like a 100 mid level people. You wanna have 10 of the best. Right? Right.
正因为如此,世界上一些资金最充足的项目的核心参与者的数量,实际上将会受到一定限制。
And because of that, the number of people who are gonna have a seat at the table at some of the best funded efforts in the world, think, is actually gonna be somewhat capped.
哦,所以你认为精英阶层基本保持稳定,但围绕它的支持者和那些非常有意义的贡献者群体在不断扩大?
Oh, so you think the elite elite stays relatively where it is, but the field around it, the people that support, the people that are very meaningful contributors expands?
我认为懂得如何做超级有意义工作的人肯定会增加,但仍然会受到一些限制,因为你不可能让太多人同时参与任何一个项目。
I think the people who know how to do super meaningful work will definitely expand, but it will be still a little constrained by the fact that you cannot have too many people on any one of these projects at once.
对于那些正在评估加入AI初创公司、实验室,甚至是像你们这样的大科技公司AI部门的人,关于他们的职业道路,面对我们讨论的所有这些变化,他们应该如何规划未来几年的发展,你有什么建议?
What advice would you give someone who's either evaluating joining an AI startup or a lab or even an operation like yours in big tech on AI and their career path, how they should be thinking about navigating the next couple of years with all this change that we've been talking about.
首先,小团队搭配大量计算资源是建立前沿实验室的正确配方。这就是我们在亚马逊与Saffin和我的团队正在做的事情。关键是要有机会在特定环境中实践你的研究想法。如果你去一个已经有3000人的地方,你基本上不会有这样的机会——前面有太多资深人士都迫不及待想尝试他们自己的想法。
First off, tiny teams with lots of compute is the correct recipe for building a frontier lab. That's what we're doing at Amazon with Saffin and my team. It's really important that you have the opportunity to run your research ideas in a particular environment. If you go somewhere that already has 3,000 people, you're not really going to have a chance. There's so many senior people ahead that are all too ready to try their particular ideas.
第二点,我认为人们低估了产品、用户界面和模型的协同设计。这将是未来几年最重要的竞争领域。所以,选择一个真正具备强大产品感和用户深度整合愿景的地方非常重要。一个很好的判断标准是:你们只是在打造另一个聊天机器人吗?还是在编程助手领域又多了一个竞争者?
The second thing is I think people underestimate the co design of, like, the product and the user interface and the model. I think that's gonna be the most important game that people are gonna play in the next couple years. And so going somewhere that actually has very strong product sense and a vision for how users are actually going to deeply embed this into their own lives is gonna be really important. And one of the best ways to tell is, are you just building another chatbot? Are you just trying to fight, you know, one more entrant in the coding assistant space?
对吧?这两个恰好是最早获得产品市场契合度且疯狂增长的产品形态。我敢打赌,五年后回顾这个时期,会出现六到七个这样关键的产品形态——事后看会觉得很明显,但如今还没人真正解决。如果你真想进行不对称的上行押注,我会花时间现在就找出这些机会。
Right? Those just happen to be two of the earliest product form factors that have product market fit and are growing like crazy. I bet when we fast forward five years and we look back on this period, there will be six to seven more of these crucial product form factors that will look obvious in hindsight, but no one's really solved today. And if you really wanna take an asymmetrical upside bet, I would try to spend some time and figure out what those are now.
多在健身房花点时间。谢谢大卫,我就不耽误你回健身房了。
Spend some time in the gym. Thanks, David. I'll let you get back to your gyms.
酷。很好。谢谢大家,真的很开心。
Cool. Nice. Thanks, guys. This is really fun.
再次感谢David Luan参加节目,也感谢各位收听。如果想告诉我们您对这期节目的看法或其他希望我们涵盖的内容,请给我们留言。您可以发送邮件至decoder@theverge.com。我们还有TikTok和Instagram账号,请关注decoder pod。
Thanks again to David Luan for joining the show, and thank you for tuning in. If you'd like to let us know what you thought about this episode or what else you'd like us to cover, drop us a line. You can email us at decoder@theverge.com. We also have a TikTok and an Instagram. Check those out at decoder pod.
如果您喜欢《Decoder》,请分享给您的朋友,并在您获取播客的任何平台推荐我们。如果还没有订阅,别忘了订阅The Verge,这样您就可以阅读我们所有的报道和新闻通讯,包括我撰写的《Command Line》。《Decoder》是The Verge出品,属于Vox Media播客网络。我们的制片人是Kate Cox和Nick Statt,编辑是Ursa Wright。
If you like decoder, please share it with your friends and wherever you get your podcasts. And if you haven't already, don't forget to subscribe to The Verge, which gets you access to all of our stories and newsletters, including the one I author called Command Line. Decoder is a production of The Verge and is part of the Vox Media Podcast Network. Our producers are Kate Cox and Nick Statt. Our editor is Ursa Wright.
《解码器》的音乐由Breakmaster Cylinder创作。下次再见。
The Decoder music is by Breakmaster Cylinder. See you next time.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。