Limitless Podcast - 埃隆的制胜AI竞赛配方:Grok5与巨像 封面

埃隆的制胜AI竞赛配方:Grok5与巨像

Elon's Recipe for Winning the AI Race: Grok5 and Colossus

本集简介

本期节目我们将通过埃隆·马斯克的xAI探讨人工智能的最新进展,重点关注Grok4 Grok Fast。我们将讨论马斯克关于Grok 5可能实现通用人工智能(AGI)的断言,以及Grok4在基准测试中的显著提升。 特别介绍Grok Fast两百万token的上下文窗口,以更低成本实现更高效率。节目还将剖析科技巨头重金投入塑造的AI竞争格局。 ------ 🌌 无限总部:收听与关注入口 ⬇️ https://limitless.bankless.com/ https://x.com/LimitlessFT ------ 时间轴 0:00 Grok 5的崛起 1:41 通往AGI的路线图 3:32 AI训练技术的突破 5:06 自然语言的威力 9:49 颠覆者Grok 4 Fast 14:21 AI普惠化的未来 18:14 强化学习革命 21:58 巨像2号与能源竞赛 26:00 全球AI基础设施投资 28:15 结束语与下期预告 ------ 资源链接 乔什:https://x.com/Josh_Kale 埃贾兹:https://x.com/cryptopunk7213 ------ 非财务或税务建议。投资声明详见: https://www.bankless.com/disclosures⁠

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

对埃隆来说,这几周真是大事不断。上周我们做了几期热门节目,讨论了星链、AI五芯片,而这周又迎来另一个重大突破——eJazz。本周我们将发布大量关于Grok和x AI的新消息,相当令人兴奋。在头条新闻中他表示,我现在认为x AI的Grok五有机会实现通用人工智能(AGI)。

It's been a big couple of weeks for Elon. We had a few pretty hit episodes last week talking about Starlink, talking about the AI five chip, and this week, it's just another big breakthrough, eJazz. This week, we're coming out with a lot of new Grok and x AI news, which is pretty exciting. I mean, of the leading headlines. He said, I now think x AI has a chance of reaching AGI with Grok five.

Speaker 0

以前从未这么想过。现在有两件事促成了这个观点,稍后我们会详细讨论,其一是Grok快速模型。它非常出色,在同等规模模型中性能比其他任何模型都高出一个数量级,确实令人印象深刻。但我们要从这张图表说起,eJazz,就是现在屏幕上显示的这张,正是它让埃隆突然意识到——或许,只是或许,Grok五真有可能引领AGI。因为我们在图表上发现了一个异常现象:Grok四原本只是略微领先,但在没有重大版本更新的情况下,现在却遥遥领先。

Never thought that before. And now there's two things that kind of spawn this one, which we'll get into a little bit later, which is the Grok fast model. It is remarkable. It is a full order of magnitude better than anything else for its size, and it is really, really impressive. But the thing we're gonna start with, eJazz, is this chart that we're showing on screen right here, which is the single thing that convinced Elon, wait a second, maybe, maybe, just maybe, Grok five could actually lead to AGI, and it's because we're seeing this crazy anomaly on the chart where Grok four was kind of ahead, but somehow, without any new major release, Grok four is now way ahead.

Speaker 0

那么Itaz,你能解释下这张图表的情况吗?他们是如何在没有发布新模型的情况下这么快取得突破的?这甚至不是x AI官方发布的成果吧?

So, Itaz, can you explain to us, like, what what's going on in this chart? How did they get so good so fast without a major new model release? I mean, this didn't even come from XAI, did it?

Speaker 1

这是个好问题,确实不是x AI直接发布的。实际上是两位独立AI研究员的成果——Jeremy Berman和Eric Pang,他们通过微调(fine-tuning)Grok四模型,显著提升了其智能水平。Josh,他们用终极测试来验证这个模型,就是所谓的ARC AGI基准测试。对于不常研究基准测试的观众来说,ARC AGI测试衡量的是AI模型在类人智能方面的表现。

It's a it's a good question, and, no, it didn't come directly from XAI. It actually came from two random AI researchers, one called Jeremy Berman and the other one called Eric Pang, who tweaked Grok four's model, also known as fine tuning, to basically make it a hell of a lot smarter. And so they put it to the ultimate test, Josh. It's this thing called the ARC AGI benchmark. And for those of you who have, not been spending all your time, researching benchmarks, the arc AGI benchmark tests how good your AI model is at being intelligently human.

Speaker 1

具体来说,它会给AI模型提供从未见过、也不可能受过训练解决的谜题,然后评估其表现。Josh,我问你,在Grok四发布之前,你认为这个基准测试的最高分是多少?

What I mean by that is it presents the AI model with puzzles that it's never seen before that it can't possibly have been trained to solve and sees how good it does. Now, Josh, let me ask you this question. Before Grok four itself was released, what do you think the highest score was on this benchmark?

Speaker 0

更低些,但不确定低多少。我不清楚具体数字,或许我猜比现在的最佳成绩低5%到10%,像是渐进式改进。

Lower, but I'm not sure how much lower. I don't know this particular numbers, but maybe I'll guess five to 10% lower than what the best is now, kind of like an incremental improvement.

Speaker 1

不对。完全不对。实际上低得多。顶尖模型(来自OpenAI、谷歌等)的得分仅在5%到8%之间。

Nope. Nope. Nope. It was way, way lower. In fact, it only scored between five to 8% from the top models from OpenAI, Google, and all those kinds of things.

Speaker 1

然后巨大的差异出现了。Grok 4问世并突破了这一界限,得分达到22%。猜猜这两位随机AI研究者的两个模型得分是多少?

And then big difference came yeah. Grok four came along and it broke that frontier and scored 22%. Guess how much these two models that these two random AI researchers, scored on

Speaker 0

等等。你是说我正在看屏幕,显示79.6%,对吗?这几乎是基础版Grok 4的四倍吗?

Wait. So you're telling me I'm looking at the screen. I'm seeing 79.6%. Is that right? Is this a a four x multiple almost base rock four?

Speaker 1

80%。这与XAI团队毫无关系。我希望你专注于我现在展示的这张图表,看我的光标围绕远处那两个橙色点打转。看到那边的Grok 4了吗?嗯,基本上就是埃隆和xAI团队发布Grok 4时最重量级、最昂贵的模型。

80%. And this had nothing to do with the XAI team at all. I want you to focus on this chart that I'm showing you right now and look at my cursor circling around these two orange dots that are off into the distance. You see Grok four thinking over here, which was Uh-huh. Basically the heaviest, most expensive model that Elon and the x AI team released, when they launched Grok four.

Speaker 1

它们完全被这两个模型击败了。但我猜你肯定在想,乔希,这两个研究者到底是怎么做到的?还有,为什么埃隆没有立刻雇佣他们?

And they were just completely beaten by these two models. But I'm sure you're probably thinking, Josh, how the hell did these two researchers do that? And, you know, why aren't they being hired by Elon immediately?

Speaker 0

他们没有大型实验室的资源。我是说,他们在对抗——如果你记得的话,这些人正收到数十亿美元的邀约为单一雇主工作,而且是一群这样的人。那为什么一个个体能击败这群人呢?

They don't have the resources of a giant lab. Like, they're competing against I mean, if you remember, these people are getting billion dollar offers to come work for a single employer and there's a collection of these. So how is it that one individual is beating a collection of these people?

Speaker 1

这两位研究者引入了两种新颖的模型训练方法。一种叫开源程序合成,另一种叫测试时适应。在解释它们的工作原理前,我想提醒观众,模型真正智能的关键很大程度上取决于其训练数据。人们花费巨额资金——我说的是数亿到数十亿美元——来获取最佳训练数据。

So these two researchers introduced two novel ways of training their models. One is called open source program synthesis, and the other is called test time adaptations. Before I get into an explanation as to how these work, I wanna remind the audience that what really makes a model really intelligent is largely part in due to the data that it's trained on. People spend so much money. I'm talking hundreds of millions to billions of dollars to acquire the best data to train their models.

Speaker 1

之所以如此重要,是因为模型在试图回答问题时,会借鉴其训练数据。对吧?所以它希望能回顾训练数据,在所有这些标记和字符中找到正确答案。对吧,乔希?

And the reason why this is so important is the model, when it's trying to answer a question, draws on the data that it's been trained on. Right? So it's hoping that it can look back on the data that it's been trained on and find the right answer somewhere in all of these tokens and characters. Right, Josh?

Speaker 0

嗯。

Mhmm.

Speaker 1

这些研究人员决定彻底颠覆传统方法,采用了一种称为开源程序合成的技术,让模型实时设计自己的定制解决方案。它甚至不会查看训练数据,而是直接分析面对的难题,尝试将其分解为更小的组成部分。比如,假设解决这个难题需要10个步骤才能达到最终目标(即正确答案)。

These researchers decided to flip that completely on its head. It's this thing called open source program synthesis where the model designs its own bespoke solutions in real time. So it doesn't even look at the data that it was trained on. It just looks at the puzzle that it's presented with, and it tries to break it down into smaller components. So let's say the puzzle has 10 different steps to reach to the end goal, the correct answer.

Speaker 1

它会将问题分解为10个不同的小步骤,而通常模型只会查看完整的10个步骤并思考如何从第一步到第十步。它只是一次解决一个步骤。这就是他们取得的重大突破。如果这听起来耳熟,你可能想到了强化学习技术——基本上是让模型反复尝试解决同一个问题。这个方法非常类似,但它是该领域的进阶版。

It'll break it down into 10 different little steps, whereas normally a model would just look at the complete set of 10 steps and think, how do I get from step one to step 10? It just solves each step one at a time. That was like the massive breakthrough, that they made. And if this sounds familiar, you're probably thinking of this technique known as reinforcement learning, which basically has like the model, like repeatedly go at a problem over and over again. This is pretty similar, but it's the next step up in that field.

Speaker 0

好的,明白了。说实话,这个新闻让我有点恼火,因为方法看起来太简单了。以杰里米·伯曼为例,我研究了他具体的实现方式——他最初用Python编写代码,后来改用纯英文指令。

Okay. Got it. Yeah. This this news kind of really annoyed me because of how seemingly simple it was. I mean, Jeremy Berman, in the case of this, I got some examples of specifically how he did it, and he he was originally writing in Python code, but then he switched to just writing instructions in plain English.

Speaker 0

我认为很多人(包括我自己)都忽略了一个关键点:与大型语言模型交互时,最具挑战性的工作其实只需要用简单的英语完成。你只是在向模型输入句子,希望它能产生更好的输出。虽然底层存在复杂的代码,但实现这一目标的方式其实就是写普通英文。我深入研究后发现:他的系统首先让Grok-4(他选择的模型)生成30条将输入转化为输出的英文规则描述。嗯。

And I think this is such an important thing that a lot of people forget, I mean, myself included, I'm speaking for myself here, that a lot of this really challenging, difficult work with engaging with LLMs is really just done in plain English. You're just writing sentences to a model in hopes that it produces a better output for you. It's not this crazy complex code base, although that exists deep down, but the way that they achieve this is actually just by writing plain English. I did a little bit of digging, I have a few notes on how it works, And his system, it basically starts by having Grok four, he chose Grok four as his model of choice, it produces 30 English descriptions of rules to transform inputs into outputs. So Mhmm.

Speaker 0

接着系统会将这些描述在训练样本上进行测试——把每条描述当作独立测试,评估它们与正确答案的匹配程度。然后前五名的描述会根据错误反馈(比如标出错误单元格等)单独修订,最终合并成最优描述。整个过程形成迭代循环:测试自身、生成更多样本、获取更优质数据、验证输出正确性。这也解释了为什么该模型的输出成本略高,但质量惊人——因为它持续进行这种自我迭代优化。最妙的是:全程只用普通英文。所以正在听英文播客的你完全具备这个能力,因为这不需要什么神奇技巧,只是向模型输入精心设计的提示语,就能产生世界顶尖的输出结果。对我来说这才是最酷的部分,EJAS。

It takes that, and then it tests these descriptions on training examples by pretending each is a test and scoring how well they match the correct outputs, and then the top five descriptions get revised individually with feedback on mistakes, like highlighting the wrong cells and stuff like that, and then it combines the elements into the top one to create these pooled descriptions. It kind of has this iterative loop where it tests itself, it creates more examples, it gets better data, it confirms that it's the right output, and that's generally why you see the actual outputs of this model are a little more expensive, but the quality of it is amazing because it just continues to do this, like, self iterative loop on itself and get better and better and better, Again, all in plain English. So if you are listening to this podcast in English, you are fully capable of doing this because you speak the language, and this isn't anything crazy. It's just very refined prompts that you're feeding to a model that result in these unbelievable outputs that are now best in the world. That's the coolest part to me, EJAS.

Speaker 0

不知道你怎么想。

I don't know about you.

Speaker 1

不,不。我同意。这让我想起三个月前安德鲁·卡帕西那条爆红的推文,他说,'结果新的头号编程语言竟然是'

It no. No. I agree. And it reminds me of, Andrew Carpathi's, hit tweet three months ago where he goes, the new number one programming language turned out to

Speaker 0

'英语'。确实如此,对吧?

be English. It is. Right?

Speaker 1

我想再次强调这件事的重要性。这不仅仅是又一项基准测试的突破——我说的是那个刚被两个无名研究者将成绩提升三倍的最难测试基准。明白吗?这些题目是AI模型从未见过的全新难题集。

And kind of like to emphasize, again, how important this is. This is this isn't just another frontier breakthrough of another benchmark. I'm talking about the hardest benchmark that has just been three X ed by two random researchers. Right? This is again, puzzles that, on problem sets that an AI model has never seen before.

Speaker 1

通常AI模型参加基准测试时都有上下文参考。就像你在学校考试时可以看历年真题和教材,知道会考哪些主题。

Typically when you put an AI model up against a benchmark, it has some kind of context. It's kind of think of yourself taking an exam at school or at university. You can look at past papers. You can look at books. You kind of know what topics they're gonna talk about.

Speaker 1

但对AI模型来说这完全陌生,因此是最严苛的考验。能有系统达成这点,乔什...虽然我不愿这么说,但不得不承认——这近乎通用人工智能(AGI)。连埃隆本人都对此震惊不已,他在推文里说:'我现在认为Xayah有望通过Grok5实现AGI,此前从未这么想过'。

This is completely foreign to an AI model and therefore it is the hardest test. So to have something achieve this almost feels like and Josh, I I hate to say it, but I have to say it like AGI. And I think the fact that, none other than Elon himself was taken completely aback by this. I mean, again, to reiterate the tweet, I now think Xayah has a chance of achieving AGI with Grok five. Never thought that before.

Speaker 1

而他最新表态更耐人寻味:'顺便说,Grok5几周后开始训练,年底前就会面世'。这恰恰说明了本次突破的重大意义。

And the fact that he is now saying, hey. By the way, Grok five starts to train in a few weeks. And, you know what? I think it's gonna be out by the end of this year. I think just speaks to the importance of this development.

Speaker 0

是啊,最让我震惊的是实现如此高水平所需的资源竟如此之少。虽然他们用了独特训练框架,但技术上并非革命性创新。我最终得出的结论就是规模效应——那些最新顶级模型单次查询/单token成本极高,企业资源有限难以普惠。这让我不禁思考:当资源雄厚的公司全力投入这类强化学习(比如Grok5),并以足够压缩高效的方式实现规模化部署,既不导致系统崩溃,又无需收取每月上千美元会费时...我认为Grok5很可能就是这样——强化学习升级版,但更高效,专为规模化而生。

Yeah. I I think one of the things that was really startling for me was the realization of how little resources it takes to get so good. And then I was wondering why clearly this isn't anything super novel, although they did do some unique training frameworks. And I think the reason that I I the conclusion that I came to was just scale. I mean, the cost per query, the cost per token of these new super high end models that just came out is very high, and you can't really scale that to a lot of people because the companies are just resource constrained, So it leads me to believe, it leads me to think, well what happens when a company with a lot of resources dedicates all of their brainpower to this specific type of reinforcement learning, like we're going to see with Grok5, and they do so in a way that's compressed enough, that's efficient enough to actually run it at scale on the servers without melting everything down, without charging a thousand dollars a month per membership, and I think that's probably what we see with Grok5 is this new juiced up reinforcement learning, but efficient, and actually built for scale.

Speaker 0

而且,我的意思是,即便它只是以这两位独立研究者的规格发布,那已经是巨大的成功了,因为这确实非常了不起。

And, I mean, even if it just launches at the specs of these two individual researchers, that's a huge win because that's an incredible Yeah.

Speaker 1

而且它是开源的。开源且对所有人开放。

And it's open source. It's open source and available for everyone.

Speaker 0

嗯哼。这确实相当了不起。是的。所以我认为接下来会有非常有趣的事情发生。如果我是个赌徒,我会在Grok五上押重注。

Uh-huh. It's it's pretty remarkable. Yeah. So I think very interesting things coming. If I was a betting man, I would be betting big on Grok five.

Speaker 0

我认为他们非常清楚地看到了人们真正想要的解决方案。

I think they very much see a solution that people really want.

Speaker 1

我刚才在想,为什么我们俩都觉得这个进展既惊人又令人恼火。我想某种程度上是因为我们都认为,要让今天的AI模型达到AGI(通用人工智能),必须彻底重新设计它们的架构。你知道,Transformer是重大突破,这就是为什么我们现在熟知和使用的模型如此聪明,但它们还没有我们预期的那么聪明。而且之前有种改进停滞的感觉。

I was just thinking about why both of us are finding this development both amazing, but really annoying. And I think it's because to some degree, we both believe that in order for AI models today to get to AGI, we would need to completely rearchitect how they're designed. You know, transformers was the big breakthrough. That's why models that we know and use today are so smart, but it's not as smart as we expected it. And there was this kind of like lag of improvement.

Speaker 1

现在我们突然看到了三倍的提升,这个模型正在打破领先的基准测试。所以我现在开始相信,也许如果我们把数千亿美元投资在后训练阶段——传统上我们一直投资于预训练和算力——但如果我们转而投资后训练,或许能在不彻底重构整个系统的情况下就实现AGI。Josh,你觉得有道理吗?还是我听起来疯了?

And now we suddenly see a three x improvement where this model is kind of breaking this leading benchmark. And so now I think I'm starting to believe that maybe if we invest hundreds of billions of dollars in the post training part, where typically we've been investing in the pre training in the compute, but if we, invest it in the post training, we may clearly reach AGI before redesigning the entire thing upfront. Does that resonate with you, Josh, or am I just do I sound crazy?

Speaker 0

确实有道理。有趣的是我们经常录制节目时期待被惊喜,然后当事情发生时你会想,天啊,我没想到会以这种方式被震惊。这就是其中之一,我确实没预料到会在主流模型发布间隙看到来自独立研究者的新领跑者。

It does. It's funny because we frequently record the show and you expect to be surprised, and then something happens. You're like, oh my I wasn't expected to be surprised in that way. And Yeah. This is one of those things where, I mean, I wasn't expecting to see a new leader in between major model releases from an independent researcher.

Speaker 0

所以这件事居然真的可行,完全颠覆了我的许多预期。而且这还不是XAI团队本周唯一的重磅消息,因为他们刚刚发布了新模型Grok-4 Fast的警报。

So the fact that this is even possible really just blows the doors off of a lot of expectations I had, and this isn't even the only interesting news this week from the XAI team, because they released new model alert, Grok four fast.

Speaker 1

让我告诉你,EJS,

Me tell you, EJS,

Speaker 0

当我看到这个模型的运作方式时,我的反应是:哇哦。再次被彻底震撼,印象超级深刻。

when I saw how this model worked, I was like, woah. This is, again, blown away, super impressed.

Speaker 1

能给我们快速梳理下亮点吗?

Can you run us through some highlights, please?

Speaker 0

参数表是吧。首先最重磅的是,200万token的上下文窗口简直离谱。目前领先的应该是谷歌的Gemini模型,他们有2.5 Pro和Flash两个版本。

The spec sheet. Yeah. So first of all, the leading headline, 2,000,000 token context window is outrageous. I think the current leader is Google with the Gemini model. They have 2.5 pro and flash.

Speaker 0

我记得这两个版本都是100万token。而这个模型能达到200万token上下文。对于不了解的人来说,上下文本质上就是语言模型的实时记忆体,能收集的上下文越多,模型对当前讨论数据的理解就越清晰。想要这个数字更大?目前这个直接翻倍,是迄今为止最大的。

Both of them, I believe, have a million tokens. This is 2,000,000 tokens of context. For those that aren't aware, context is the basically active memory of a language model, it's the more context you can collect, the more clarity it has into the actual data that it's talking about and conversing with. You want that number to be bigger? This is the biggest by far, by a doubling.

Speaker 0

这是第一个重要亮点。第二个可能更夸张:成本比Grok-4低47倍。这太疯狂了,因为你看下面的对比图表,稍微往下滚动就会发现,Grok-4原本和其他顶级模型处于同一水平线。而这个Grok-4 Fast仅次于03模型,高于DeepSeek、高于Claude-4 Sonnet、高于Claude-4 Opus——这个卓越的模型性能超越了许多领先产品,但成本却比基础版低47%。我觉得当我们讨论模型扩展和编程应用时,这会非常有意思。上周我们刚讨论过Grok模型在编程方面的优势,就是因为它既便宜又高效。

So that's a really important headliner. The second one, probably even more outrageous, 47 times cheaper than Grok four, which is crazy because when you look at it on the scale below, if you scroll down just a little bit, Grok four is right in line with every other great model. Is, Grok four Fast is just beneath o three, it's above DeepSeek, it's above Cloud four Sonnet, it's above Cloud four Opus, it's just like this remarkable model that is better than a lot of the leading models, but 47 cheaper than the base model. And I think that's gonna be a pretty interesting thing when we get into scaling these models and using them for code. You just we talked last week about how good the Grock model was for coding because it was so cheap and so effective.

Speaker 0

这又是一个类似的案例,他们采用的方式让我非常着迷——他们竟然能想出这种‘秘密配方’。本质上,他们教会模型只在需要时消耗算力使用工具。他们通过大规模强化学习快速训练Grok4,使其学会何时深入思考,何时快速回答问题。最终效果如屏幕所示,平均比前代模型减少了40%的思考标记消耗,这是重大突破。哦对了,它现在高居LM Arena榜首,简直疯狂。

This is another case of that, and the way they did that, I was so interested in how they were able to come up with this secret sauce to do it, and basically what they did is they taught to model to spend its brainpower only on tools when it helps. So they used this large scale reinforcement learning to train Grok4 fast, to choose when to think in-depth, and when to answer questions quickly. So what that resulted in was about 40%, what we're seeing here on the screen, 40% fewer thinking tokens on average than we've gotten from the previous model, which is a significant difference. Oh, and by the way, it's, number one on LM Arena. So this was crazy.

Speaker 0

Ejaz,看到团队发布这个成果时你是什么反应?

Ejaz, what were your reactions when you saw the team drop this?

Speaker 1

我原本以为这些标记已经够便宜了。记得OpenAir发布GPT五时吗?他们一直炫耀GPT五迷你版说:看,你现在能用上代顶级模型的性能,实际上更智能,而且我记得便宜了五倍左右。当时我震惊到爆粗口。

I already thought these tokens were cheap. I thought these models were cheap enough. Do you remember when OpenAir released GPT five, they kept flexing GPT five mini saying, hey, you now have the power of our previous best model, but actually it's more intelligent. And it's like, I think it was something like five times cheaper. I was like, holy shit.

Speaker 1

天啊!我当时觉得这简直是数量级的突破。现在居然比Grok四便宜47倍?要知道Grok四相比某些前沿模型本来就不贵。我...我真的不知道极限在哪里,但宏观来看,我从未像现在这样确信:尖端超智能将普及到每个人手中。

Holy crap. I was like, that is like crazy magnitude. And now it's like, now we've got 47 X cheaper than Grok four. Grok four, by the way, was already cheap compared to some of the frontier bottles. So I, my, I, I, I don't know how far this can go, but kind of zooming out, I have never been more confident than now that cutting edge super intelligence will be available for anyone and everyone.

Speaker 1

这不会成为只有富人能买的封闭技术。想象一下变革性的影响——哪怕在荒郊野外,某人用手机连着马斯克的新5G星链卫星,借助这个廉价却超级智能的模型,都可能创造出全世界使用的发明。这种可能性太疯狂了。

This isn't gonna be some kind of closeted technology where only the rich can buy devices and run it. I think anyone and everyone will have fair game access to this and think about the dynamics that that changes up chart. Like you can have someone in the complete middle of nowhere with a cell phone attached to Elon Musk's new five gs Starlink satellite that's beaming down to him. And he could kind of produce something that the world ends up using because he has access to this cheap bottle that is actually super intelligent and can be used to create whatever crazy invention that he has or she has that dreams up. I I just think this is insane.

Speaker 0

没错。效率提升总是最让我兴奋的,随着标记成本降低、模型更轻量化,手机不联网也能承载世界知识。这种浓缩高效的进化令人难以置信。我特别想对比其他模型,谷歌其实也在做类似研究——看看Gavin Baker这篇帖子,图表清晰显示谷歌如何主导帕累托前沿,Gemini Pro在某些方面确实出色。

Yeah. The efficiency improvements are the thing that that's always most exciting to me because, I mean, as we get more cheaper tokens and as the tokens become more portable and lightweight, I mean, you could have the world of knowledge on your phone even without necessarily an Internet connection because these models are getting so lightweight, so condensed, so effective. It's like, it's really it's unbelievably impressive, and what I was really interested in is is comparing this to the other models because I know Google was kind of doing a similar thing. They were leading along the frontier, and oh, here. Here's this post from Gavin Baker that I love because it shows how Google has kind of dominated this thing called the Pareto frontier, and on the chart, you can very clearly see how there's this kind of arc that hugs the outer bounds of all of the models, and it shows that, like, Gemini Pro has been really good on a few things.

Speaker 0

我想简单聊聊帕累托前沿概念,它能解释为什么Grok Forrest如此突出。有趣的是,我查资料发现这个理论来自意大利经济学家维尔弗雷多·帕累托——顺便分享这个小彩蛋。

So I I briefly wanna just talk about the Pareto frontier concept because it's really interesting, and and it will explain to you exactly why Grok Forrest is way out there. It totally shattered what it is. So, I mean, basically, it's funny. I was doing a little bit of research on this, and this guy the Pareto frontier is done by an Italian economist named Vilfredo Pareto. So just I just thought that's a fun fact because Yeah.

Speaker 0

这个名字很棒。本质上,它源自经济学家和决策理论,是一种在同时追求多个目标时决定最佳权衡的方法。想象一下,你试图优化两个可能略有冲突的目标,比如你想让产品尽可能强大,同时又要尽可能便宜,就像这些模型。在这个场景中,存在一组最佳解决方案,你无法在不牺牲另一个方面(如成本)的情况下改进一个方面(如性能)。我们在这张图表中看到的是,谷歌做出了一系列这样的决策和权衡,最终实现了沿着这条外沿的绝对帕累托最优结果。

Great name. Basically, it comes from the economist and decision theory, and it's it's a way to decide optimal trade offs when you have multiple objectives you're trying to achieve all at the same time. So imagine you're trying to optimize two things that might conflict a little bit, like, you wanna make product as powerful as possible, but also inexpensive as possible, like these models. So in this scenario, there's this set of best possible solutions where you can't improve one aspect, like the power, without making the other aspect, like the cost, worse. What we're seeing in this chart here is Google has made a series of those decisions, those trade offs, that have led to the absolute Pareto optimal outcome along this outer band.

Speaker 0

Grok所做的实际上是提出了一种新的权衡,这不一定是一种权衡,更像是一种创新,使他们能够突破这个看似极限的外沿限制,彻底打破它,并利用这些最佳因素创造出一个新的帕累托最优权衡。他们通过大量‘魔法’实现了这一点,但本质上他们现在拥有一个非常聪明的模型,其性能实际上高于Gemini 2.5 Flash,且与Pro模型相差不远,但成本却低了一个数量级。我认为,在广泛分发这些代币时,这种成本效益的异常值确实令人难以置信。因此,如果你在编写代码、开发应用程序,或者只是为代币付费,这显然是你想要使用的模型。

What Grok has done is they actually made a new trade off that isn't necessarily a trade off, it's more of an innovation that allows them to unlock this perceived frontier, this limiting factor that was on the outer band, and just shatter it and create a new Pareto optimal trade off using these best things. And they did that by doing a lot of magic, but basically what they have now is they have a really smart model that actually sits above Gemini 2.5 Flash, and not too far below the Pro model, but it is a order of magnitude cheaper, and I think that's where that outlier, that cost effectiveness, is really unbelievable when it comes to distributing these tokens widely. So now if you're writing code, if you're creating an application, if you're just if you're paying for tokens, this is very clearly the model you wanna use.

Speaker 1

你刚才描述的就是埃隆和XAI真正开辟了一条新路,这非常符合埃隆一贯的行事风格。

What you just described is Elon and XAI literally charting a new path, which is kind of like, very behavioral of Elon in general.

Speaker 0

嗯。

Mhmm.

Speaker 1

另一个我觉得非常酷的地方是,强化学习基础设施团队在使这个模型变得如此快速、廉价和高效方面起到了关键作用。对吧,Josh?他们使用了一种类似代理框架的方法,这与他们最初用于训练和迭代该模型的基础设施高度兼容。我想指出的是,Josh,我们本期讨论的两个主题之间存在一个共同点。第一,当我们描述研究人员创建的那两个打破Arc AGI基准的模型时,他们特别使用了一种新的强化学习技术。

And another thing that I thought was really cool about this is the reinforcement learning infrastructure team was kind of key behind getting this model as fast and as cheap and as efficient as we're describing it. Right, Josh? They they use this kind of like agent framework, which was extremely compatible with the infrastructure that they used to train and iterate on this model, in the first place. And what I wanted to point out here is there's a theme between the two topics that we've discussed so far on this episode, Josh. Number one, when we described the two models that the researchers created that broke the Arc AGI benchmark, they specifically used a technique which used reinforcement learning, a new reinforcement learning technique.

Speaker 1

如果你还记得,杰里米·伯曼之所以选择Grok 4,是因为他说这是最好的推理模型,因为它通过强化学习的方式进行了训练。现在我们再次看到这个Grok快速模型因其强化学习而取得的成就。因此,我发现或注意到一个主题,即XAI和埃隆基本上是强化学习领域的领导者,我认为这可能会对他们有利。也许这是一个暗示,即最接近AGI、最快、最便宜的模型都植根于那些完全突破性的强化学习技术中。

And the reason, if you remember, why Jeremy Berman picked Grok four specifically was he said it was the best reasoning model because it in the way that had been trained via reinforcement learning. And now we're seeing yet again this Grock fast model, achieving what it can because of reinforcement learning. So I'm seeing a theme or noticing a theme here where XAI and Elon are basically the leaders in reinforcement learning, which I think is gonna probably play in their favor. Maybe it's a hint that the models that are gonna be closest to AGI that are the quickest that are the cheapest are embedded in reinforcement learning techniques that are just completely breakthrough.

Speaker 0

是的。看起来这个团队确实从第一性原理出发思考问题——这是埃隆的核心观念之一——他们真正关注什么是重要的、什么是有意义的。确实如此。确实如此。而且你在整个产品的发展过程中都能看到这一点。

Yeah. It seems like they the team really reasons I mean, this is a a core Elon notion, but, like, they really do reason from first principles and what's important and what matters. That's true. That's true. And you're seeing that throughout the entire product as they advance.

Speaker 0

我认为最令人兴奋、也是我对这期节目最期待的是比较下一轮模型的表现。比如Gemini三和Grok五,它们将如何相互竞争?因为这两者都将是非凡的模型。在我看来,它们目前是顶尖的选手。至于GPT五,某种程度上算是有点令人失望。

And I think what's really exciting, what I'm most stoked about, for this show in general is is to compare this next round of models. Like, will Gemini three and Grok five, like, how are they going to compete with each other? Because those are both going to be remarkable models. And it seems to me like those are, like, those are currently the top dogs. I mean, as far as GPT five was kind of a little bit of a miss.

Speaker 0

Anthropic最近比较低调。Gemini和XAI则势头正猛。但在今天结束前还有最后一条新闻要分享。

Anthropic's been a little bit quiet. Gemini and XAI are on fire. But there's also there was there was one last thing of news before we sign off today.

Speaker 1

我正想强调这句话给正在收听的听众——上面写着'我们建立了这个强化学习基础设施团队,采用全新智能体框架来加速训练Grok四,但特别目的是为了充分利用Colossus二的算力'。如果我没记错的话,Josh,关于Colossus二有些突发新闻,埃隆还卷入了一些争执。你能详细说说吗?

Well, I I was gonna say, like, I'm highlighting this sentence here for for those, who are just listening. And it says, you know, we built this reinforcement learning infrastructure team with a new agent framework to help train Grok four fast, but specifically so that we can harness the power of colossus two. And if I remember correctly, Josh, there was some breaking news around colossus two. Elon was getting into some fights. Can you can you walk us through it?

Speaker 0

是的,这很有趣。SemiAnalysis发布了一份关于XAI数据中心建设的报告,他们工作非常出色,我强烈推荐大家关注。看到这份报告很有意思,因为通常你只能看到卫星图片或新闻标题,并不清楚实际情况。

Yeah. It's funny. There was this, this whole report from semi analysis, which does a really great job. I highly recommend, checking them out. And they released this report on the XAI data center build out, and it was so funny to see because a lot of times you just see satellite pictures or you read headlines, and you're not really sure what's going on.

Speaker 0

SemiAnalysis的独特价值在于实地考察,结合卫星图像,以科学工程视角解读现状。他们在文章中分享了发现,其中一个故事特别有趣——它生动展现了XAI团队的行事风格:在田纳西州孟菲斯市,他们因能源问题遭遇居民投诉和许可审批困难,而能源正是所有大型AI数据中心的核心。于是他们决定:既然田纳西不欢迎,我们就去密西西比。

The sole purpose of some analysis is to actually have boots on the ground, check the satellite images, and look at it with a scientific engineering point of view where they actually understand what is going on. And they shared their findings in one of these articles, and I found one of these stories was so funny because it's such a testament to how the X AI team works, where they were having problems with their energy generation in Memphis, Tennessee because people were complaining and they were having a tough time getting permits, and the core crux of every large AI data center is energy. So they were like, this is unacceptable. We need energy immediately. So what did they do?

Speaker 0

他们直接跨越州界,在几英里外的密西西比州建造了新发电机,顺利获得许可。报道中有张照片显示他们把输电线拉回田纳西州给数据中心供电。文章还提到正在建设的Colossus二——其规模将达到超1吉瓦的能耗(相当于数十万家庭用电量),配备海量GPU集群,所有设备将进行协同计算。据我所知,这些资源将专门用于训练新一代Grok五模型。

Well, they jumped over the state lines, they went over to Mississippi a couple miles down the road, and they built these new generators right down the road across the state line. They got the permits they needed. They said, You don't want us Tennessee? We'll just go right over to Memphis. You could see here, they took the power lines, they ran them back into Tennessee, and now they're powering the data center, and part of the article was this funny story, but part of the article also is Colossus 2 being built in the sheer scale that Colossus 2 is going to be, and it's going to be over a gigawatt of energy, which is I don't know how many hundreds of thousands of homes is going to power, but this is like a remarkable amount of power and a tremendous amount of GPUs, and they're planning to make these all coherent, and they're using them exclusively, I believe, to train this new Grok five model.

Speaker 0

随着这个新训练中心投入使用,他们将用这台尖端的世界最大超级计算机来训练可能是全球最强的模型。但最有趣的是,这篇文章发布当天,另一家巨头公司的CEO发文说:'等等,我们现有的Colossus一其实比你们的更大些'——这话来自微软CEO萨提亚·纳德拉。

So as this new training center comes online, they will be using this new cutting edge world's largest supercomputer to train the world's, perceivably, best model. But I found this funny because the day that this article came out, there was another post from another CEO of a very prominent company saying, hey. Mhmm. Wait a second. We have something a little bit bigger than Colossus Colossus One currently, and that was from Microsoft's CEO, Satya Nadella, and Yep.

Speaker 0

是啊。他发了个帖子说他们刚刚新增了超过两千兆瓦的新能源产能。所以,Ejaz,这简直是那些建造越来越大规模AI数据中心的人之间的一场疯狂混战,最终导致了今天早些时候爆出的大新闻。但在我们谈到那个天文数字之前,你有什么评论吗?

He yeah. He had this post where he said they just added over two gigawatts of new energy capacity. So, Ejaz, this is just a really crazy brawl between these people who are building larger and larger AI data centers, and it eventually leads to the big news that just dropped a little earlier today. But do you have any commentary before we get to the huge number?

Speaker 1

有的。其实有件事我想指出,当埃隆首次宣布建造这个Colossus二代数据中心时,它因耗资200亿美元登上头条,所有人都觉得疯了。人们高喊这是AI资本支出泡沫,没有产品能证明这些投资合理。而现在微软CEO萨提亚·纳德拉宣布,他可能要投入两倍资金来新建两千兆瓦产能——再次印证了训练新模型对能源和算力的需求确实存在。

Yeah. There's actually one thing I wanted to point out, which is when Elon first announced that he was building out this Colossus two data center, it was it made headlines that it cost $20,000,000,000 and everyone thought it was crazy. Everyone was yelling, this is an AI CapEx bubble. There is no products that prove that all this investment makes sense. And now you have Satya Nadella, CEO of Microsoft, announcing that he's probably gonna be investing, twice as much of that to build two gigawatts of new capacity, again, validating that, there is a need for energy and compute to train these new models.

Speaker 1

别忘了微软上周收购了一家欧洲数据中心,我记得大概花了100亿美元,导致其股价翻了三倍——因为按当时报道,那家数据中心本身不值这个价。这进而引出了今早更重磅的公告:英伟达未来几年将向OpenAI投资不是1亿、不是10亿、不是20亿,而是1000亿美元。你可能会问为什么?因为OpenAI要投资建设的数据中心将产生巨大能量——具体多少千兆瓦我说不准。

Don't forget that Microsoft last week acquired a random European data center for, I think it was about, what, $10,000,000,000, which caused its stock price to three x because it itself wasn't worth that much at the time, of of the reporting happening. And then it leads us to the even bigger announcement, which released this morning, which is NVIDIA will be investing not one, not 10, not 20, but $100,000,000,000 in OpenAI over the next couple of years. And you might be asking why. Well, it's because OpenAI is going to be investing in so many data centers that is gonna produce so much power. I don't know how many gigawatts.

Speaker 1

我想实际上是10千兆瓦,是Colossus二代的10倍,萨提亚项目的5倍——给在场的数学爱好者们。这太疯狂了。Josh,我们是在泡沫中,还是确实需要这一切?

I think it's actually 10 gigawatts, which is 10 x Colossus two, five x fair water, which is Satya Nadella's thing for all my mathematician mathematician fans out there. It is just crazy. Josh, are we in a bubble, or is there a need for all of this?

Speaker 0

问题在于,关于泡沫的讨论我一直摇摆不定。因为1000亿美元实在离谱,就为了让本已卓越的语言模型更卓越?产品确实很棒,但至少对我个人用户而言,使用场景已接近饱和——模型略微变聪明,体验提升有限。这是一派观点;另一派则认为,这可能是未来唯一值得投资的领域。

So here's the thing. As I keep going back and forth about the bubble conversation because a $100,000,000,000 is such an outrageous amount of money to spend That's crazy. Making what is already a remarkable language model even more remarkable. Like, product is great, and at least me personally, as a user of these products, I'm definitely getting closer to a wall of things that I use them for, where if a model is marginally smarter, my experience doesn't get that much better. So that's one school of thought, and then the other is thinking, Well, this is probably the only thing we'll ever need to spend money on going forward, ever.

Speaker 0

所以现在倾注全部资源是合理的。因为一旦实现AGI(人工通用智能),获得超级智能,它就能解决所有问题,并教会你提出更好的问题来解决更高级的问题。按当前改进轨迹来看,似乎确实应该把所有可用资金都投入提升算力。随着我们从太阳能、核能获取更多能源,这些新增的能源和算力将不断用于打造更好的AI,进而深刻改变社会运作方式。

So it makes sense to throw all of it at it now. Because in the case that you do solve Asia, you get hyper intelligence, it solves all of your problems, and it gives you the better questions to ask in order to solve better problems. So it really, it would appear, assuming that we continue on this trajectory of improvement, that it makes sense to take every disposable dollar you can to get better and better compute. And this will probably just extend forever. As we are able to harness more energy from the sun, from nuclear energy, a lot of that new energy and compute will just go to making better AI, which will then serve better downstream effects for how society works.

Speaker 0

所以长期来看是泡沫吗?我认为绝对不是。短期呢?不好说。收入从哪来?我也不知道。

So is it a bubble on the long term? I think absolutely not. On the short term, I don't know. Where do you get the revenue from? I don't know.

Speaker 0

我是说,这确实是一大笔钱,但是

I mean, it's a ton of money, but

Speaker 1

你知道吗?我认为你我之所以对这些基础设施投资的巨额数字与实际所见之间产生脱节感,是因为在通用人工智能(AGI)到来之前,其他领域或职业会先见证它的出现。比如编程就是最明显的例子。编程领域的AI进步速度呈指数级增长,远超其他任何AI功能。

You know what? I I think the reason why you and I feel this disassociation between the amount, how large these numbers are in investing in infrastructure versus what we're actually seeing is we're not gonna be seeing AGI before some other fields or some other professions see it first. Right? The the clear example is coding. Coding has just been on an absolutely exponential improvement rate that has beaten out any other AI feature ever.

Speaker 1

现在已经有AI模型的编程能力堪比资深工程师——这些工程师年薪可达30万到50万美元。所以我猜这笔投资是值得的。我预计这些投资将在我们看不见的行业、应用场景和职业中结出果实,但或许我们能讨论或观察到其影响。比如科学领域可能诞生治愈癌症的新药,诸如此类。

You now have, AI models that can code as well as a senior staff engineer, which is getting paid like 300 to Right? 500 ks, a so my guess is this investment is worth it. My guess is the investment is going to come to fruition in professions, in use cases, in jobs that we won't see, but we'll maybe talk about or see the kind of like effects. Maybe it's in science where we create a new drug that cures cancer or whatever that might be. Right.

Speaker 1

我认为不同类型的专业人士会比普通消费者更早接触AGI,并率先从这些投资中获益

I think different types of professionals will see AGI, and reap the rewards of this investments before, average consumers see

Speaker 0

嗯。

it. Mhmm.

Speaker 1

另外我想补充的是,乔什,这不仅限于美国或西方国家的投入。实际上,我们的竞争对手中国和亚洲其他国家过去五年一直在推进这项工作。他们建设了大规模数据中心,未来五年总容量预计将达300吉瓦。他们对此投入如此巨大,所以这绝非西方独有的现象。

And then I think the other thing that I wanna mention, Josh, is this isn't specific to US or Western spending. In fact, our foes over the seas in China or in Asia have been working on this for, like, the last five years. They've been building out massive data centers, which I think has, like, built up in aggregate of, like, 300 gigawatts over the next five years, at least. And they've been investing in this so heavily. So it's not just a Western thing.

Speaker 1

亚洲同样如此。中国正大力投资这个领域。如果这是泡沫,如果我们完全错了,这将成为全球有史以来最惨痛的失败案例。这不只是美国的问题,也不仅是微软的问题。

It's an Asian thing as well. China's investing so heavily in this. If this is a bubble, if we are completely wrong, this will be the biggest, most highest profile L that the world has taken. It's not just gonna be a US thing. It's not just gonna be a Microsoft thing.

Speaker 1

这不仅仅是Sam Altman个人的事。这将是一场全民参与的事件,有点像世界末日那种级别。

It's not just gonna be a Sam Altman thing. It's gonna be an everyone's involved type of thing, kind of like world ending event.

Speaker 0

没错,大到不能倒。所以我特别喜欢这种激励机制——所有人都被激励着让它成功,因为每个人在技术暴露风险面前都是平等的。这样我晚上才能安心入睡,至少中美在追求AGI(通用人工智能)这点上是一致的,他们都想要最聪明的模型。

Yeah. Too big to fail. So I I do love this incentive structure where everyone is incentivized to make it work because everyone's equally at risk in terms of their exposure to the technology. So that, I think I could be happy to sleep at night, where at least US and China are aligned in one thing in which they want to achieve AGI. They want the smartest models.

Speaker 0

他们会竭尽所能让投资获得回报。所以,嘿,祝他们好运。不过Jess,今天是不是该收尾了?还有其他内容吗?

They're going to make their money pay off the best they can. So, hey, all the power to them. But I think is that a wrap for today, Jess? We got anything else?

Speaker 1

可以收工了。

That is a wrap.

Speaker 0

好,我们这期关于x AI的小节目就到这里。其实有个数据我想趣味验证下——1吉瓦(gigawatt)的电力。根据Grok的说法,1吉瓦能为75万到85万户美国家庭供电。我的天...

That's it. That's a wrap on our our little, like, x AI mini episode. There was one fact that I I wanted to just do a little, like, fun fact check, which is a gigawatt. And according to Grok, it powers approximately 750,000 to 850,000 average US homes per one gigawatt. So Oh my god.

Speaker 0

我们讨论的规模是极其庞大的吉瓦量级。英伟达这个项目就用了10吉瓦,意味着高端估算下,一个数据中心就能供应850万户美国家庭用电。希望这计划能成功吧。目前看来Grok势头正猛,XAI团队火力全开,正处于模型迭代期。

The scale we're talking is, like, a tremendous amount of gigawatts. I mean, this this NVIDIA project is 10 of those, which means that's about, I mean, on the high end, eight and a half million US homes can be powered by a singular data center. So we're gonna hope this works out. I think right now, it seems like I mean, Grok is cooking. The XAI team is on fire, and they are in between models.

Speaker 0

我已经等不及想看到他们新的Colossus训练集群上线了,微软那个巨型集群也是。微软啊,你养着那么大的集群在干嘛呢?快亮出你的数据,让我们看看成绩单!

I cannot wait until they get this new Colossus training cluster up, or even Microsoft's. I mean, Microsoft's got a huge cluster. What are you doing with it, dog? Like, let's let's see let's see your stats. Let's see your numbers.

Speaker 0

在阿卡迪亚排行榜上挂个数字。不过,是的,我想这就是关于XAI所有有趣、激动人心的新内容的总结。评论区都在说买能源股。对,买买能源股。

Put a number up on the Arcadia leaderboard. But, yeah, I think that's that's a wrap on all the fun, exciting, new things about XAI. The comment section is Buy energy stocks. Yeah. Buy buy energy stocks.

Speaker 0

我们阅读了所有评论。我逐条读过每条评论,也尽量回复。所以希望你能分享对节目的看法,或者你认为目前AI竞赛中谁领先?嗯哼。

I we read all the comments. I read every single comment. I try to reply to them too. So I would love for you to share either what you think about the show or or who do you think is winning this AI race currently? Do you Mhmm.

Speaker 0

比如,我们是不是有点像是患了'埃隆失调症'?对他建造的一切都过分着迷?还是说这种感觉其实挺靠谱的?我觉得我们有充分证据表明他们做得不错。所以想听听你是否同意,评论区可以热闹一下。

Like, are we just kind of, like do we have Elon derangement syndrome? Are we just kind of, like, obsessed with everything he builds, or is this act it feels like it's pretty grounded. I feel like we have some good examples about how well they're doing. So I'd love to hear if you agree or disagree. That would be a fun little thing for the comments.

Speaker 0

总之,今天的节目就到这里。本周还有几期精彩内容,系好安全带。下一期...我想埃贾兹和我,可能还会请位嘉宾,到时候我们可能会激烈辩论。幸好是远程录制,因为下期说不定会'见血'。

But, anyway, that's a wrap on today's episode. We have a couple more exciting ones coming this week, so buckle up. The next one the next one coming, I think Ejaz, myself, and we might even have a guest for that episode. We'll be probably in an all out brawl. It's good that we're recording remotely because we might, like the blood could possibly be drawing next episode.

Speaker 0

所以准备好迎接那期吧,这周有很多值得期待的。但本期就到这里,一如既往感谢观看。别忘了订阅、点赞、评论,这些常规操作。

So, yeah, buckle up for for that one. There's lots to look forward to this week. But that's it for this week this episode. So thank you so much for watching as always. Please don't forget to subscribe, like, comment, all the fun things.

Speaker 0

分享给你的朋友,我们下期见。

Share it with your friend, and we will see you guys on the next one.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客