2025年人工智能十大事件

本集简介

从深度求索的震撼首秀与万亿美元AI基建热潮，到泡沫争议、MIT企业应用反弹、AI人才争夺战，以及推理技术、智能体与氛围编程的崛起——本集将盘点2025年塑造行业格局的十大标志性AI事件，这些事件也为2026年奠定发展轨迹。内容包括：为何智能体基础设施悄然成为年度最重要基石，以及Gemini 3、Opus 4.5和GPT-5.2等跨越式模型如何重塑未来预期。焦点话题：深度求索R1、星际之门计划、AI泡沫论战、企业投资回报迷思、人才争夺、推理模型、氛围编程、智能体基础设施与下一代模型。由以下机构为您呈现： KPMG毕马威——探索AI如何将可能变为现实。收听全新KPMG《AI赋能无限可能》播客，获取助力企业智能决策的深度洞察。立即聆听，用每期节目塑造您的未来。https://www.kpmg.us/AIpodcasts Blitzy.com——访问https://blitzy.com/，以天数而非月数构建企业级软件 Robots & Pencils——提供云端原生AI解决方案，驱动实效成果 https://robotsandpencils.com/ Superintelligent智能体就绪度测评——登录https://besuper.ai/ 获取企业智能体就绪度评分《AI每日简报》助您掌握AI领域最重要资讯与讨论。在任意播客平台订阅节目：https://pod.link/1680633614 赞助合作请联系：sponsors@aidailybrief.ai

From DeepSeek’s shockwave debut and the trillion-dollar AI infrastructure buildout to the bubble debate, the MIT enterprise adoption backlash, the AI talent wars, and the rise of reasoning, agents, and vibe coding, this episode walks through the 10 defining AI stories that shaped 2025 and set the trajectory for 2026, including why agent infrastructure quietly became the most important foundation of the year and how next-leap models like Gemini 3, Opus 4.5, and GPT-5.2 reset expectations for what’s coming next. In the headlines: DeepSeek R1, Project Stargate, the AI bubble debate, enterprise ROI myths, talent wars, reasoning models, vibe coding, agent infrastructure, and next-generation models. Brought to you by: KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.kpmg.us/AIpodcasts⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Blitzy.com - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Interested in sponsoring the show? sponsors@aidailybrief.ai

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

今天在AI每日简报中，为您呈现2025年十大AI新闻。

Today on the AI Daily Brief, the 10 biggest AI stories of 2025.

Speaker 0

AI每日简报是一档每日播出的播客和视频节目，聚焦AI领域最重要的新闻与讨论。

The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.

Speaker 0

好了，朋友们，在开始之前先做个快速公告。

Alright, friends, quick announcements before we dive in.

Speaker 0

首先，感谢今天的支持方：毕马威、超级智能机器人、铅笔与布利茨。

First of all, thank you to today's sponsors, KPMG, Super Intelligent Robots and Pencils and Blitzy.

Speaker 0

要获取无广告版本的节目，请前往patreon.com/aidailybrief，或在Apple播客上订阅。

To get an ad free version of the show, go to patreon.com/aidailybrief, or you could subscribe on Apple Podcasts.

Speaker 0

如需了解有关节目、赞助、演讲等所有相关信息，请访问aidailybrief.ai。

And for all of the information that you could possibly be looking about for the show, sponsorship speaking, etcetera, go to aidailybrief.ai.

Speaker 0

现在，我们正处于年终报道的初期阶段。

Now we are in the early stages of our end of year coverage.

Speaker 0

从今往后，我们的每一期节目都将回顾过去或展望未来。

From here on out, of our episodes will be either looking back or looking forward.

Speaker 0

今天，我们从2025年十大AI新闻开始。

And today we're starting with the 10 biggest AI stories of 2025.

Speaker 0

这些新闻并没有按排名顺序排列。

Now these are not in ranked order.

Speaker 0

相反，我将它们按线性与叙事相结合的方式排列，但当我提到我心目中的年度最大新闻时，我会特别指出。

Instead, put them in a combination of a linear and narrative sequence, but I will call out when I hit my vote for the biggest story of the year.

Speaker 0

我们首先来谈谈今年的第一大新闻，那就是DeepSeek R1发布引发的轩然大波。

And we're gonna kick off with the very first big story of the year, which was the absolute hullabaloo around the release of DeepSeek R1.

Speaker 0

早在2024年，DeepSeek的模型就开始引起人们的关注。

Now DeepSeek started to have models that people were paying attention to at the 2024.

Speaker 0

但在今年一月，当他们发布首个推理模型R1时，所有人都为之震惊。

But in January, when they released their first reasoning model, R1, everyone stood up and took notice.

Speaker 0

这背后有几个原因。

There were a couple of reasons for that.

Speaker 0

首先，当所有美国实验室都在花费数亿甚至数十亿美元训练模型时，DeepSeek却声称R1的训练成本仅需几百万美元。

First of all, while all the American labs were spending hundreds of millions, if not billions of dollars to train their models, DeepSeq was saying that R1 was trained for just a few million dollars.

Speaker 0

此外，DeepSeek 还在发布模型的同时推出了自己的聊天机器人应用。

On top of that, however, alongside the model, DeepSeek also released their very own chatbot app.

Speaker 0

这款应用迅速登顶应用商店排行榜，甚至一度取代了 ChatGPT。

And it rocketed to the top of the app store charts, even displacing ChatGPT for a while.

Speaker 0

当市场试图消化这一消息时，AI 股票出现了大幅抛售。

As markets tried to digest the news, there was a deep sell off of AI stocks.

Speaker 0

英伟达在单日内损失了 5930 亿美元的市值。

Nvidia lost $593,000,000,000 in market cap in a single day.

Speaker 0

这是股票历史上单日最大的市值损失。

The single biggest one day loss in stock history.

Speaker 0

当然，市场后来恢复了，但 DeepSeek 的这个故事为全年其余时间的诸多趋势奠定了基础。

Now, of course, markets recovered, but this deepseq story set up so many of the themes that would shape the rest of the year.

Speaker 0

我们稍后会讨论的一个趋势是推理能力的崛起。

One that we'll discuss in a few minutes is the rise of reasoning.

Speaker 0

DeepSeek 应用如此受欢迎的部分原因在于，当时 OpenAI 虽然已经发布了他们的 o1 推理模型，且 o1 在性能上仍领先于 DeepSeek R1，但 o1 完全被设为付费墙后的内容。

Part of what made the deepseq application so popular was that while OpenAI had released their o one reasoning model at that point, and while o one remained ahead of what you could get with DeepSeek r one, o one was at the time entirely behind a paywall.

Speaker 0

因此，绝大多数人从未见过推理模型。

So the vast majority of people had never seen a reasoning model.

Speaker 0

他们对DeepSeek应用中展示的推理过程以及结果的差异化质量都感到欣喜。

They were delighted both with the reasoning traces that DeepSeek exposed in their app as well as just the differentiated quality of the results.

Speaker 0

当然，那个市场的不安情绪只是掩盖了我们过去五个月围绕AI泡沫争论所面临的一切。

Of course, that market squirm would pretend everything that we've been dealing with for the past five months around the AI bubble debate.

Speaker 0

从长远影响的角度来看，DeepSeek确实证明了一件事：中国模型在性能上已非常接近西方闭源模型，远超大多数人年初的预期。

And from a lasting legacy perspective, one thing that was absolutely true about DeepSeek was that Chinese models were much closer and nipping on the heels of western closed source models than the vast majority of people had thought coming into the year.

Speaker 0

这一趋势贯穿了全年，像Kimi、Qwen以及后续的DeepSeek模型都跻身于最优秀模型之列。

That has played out throughout the year with models like Ken and Quimmy as well as later DeepSeek models being right up in the thick of things as some of the best models available.

Speaker 0

你可以看到Kimi K2和DeepSeek 3.2紧随Gemini 3、GPT-5.2和Opus 4.5之后，但领先于其他几乎所有模型。

You can see Kimmy K2 and DeepSeek 3.2 behind Gemini three, GPT 5.2, and Opus 4.5, but ahead of pretty much everything else.

Speaker 0

它还引发了一场关于美国对华政策的持续辩论，这一辩论在全年中不断演变。

It would also kick off a back and forth debate around the appropriate US policy vis a vis China that has continued to be dynamic throughout the year.

Speaker 0

而最新的一项重大变化是，特朗普政府决定允许英伟达向中国出售H200芯片——这是我们多年来允许出口到中国的最先进的芯片。

With the latest big change, of course, being the Trump White House deciding to allow Nvidia to sell H200 chips into China, the most advanced chip we've allowed to be sold to China in a number of years.

Speaker 0

总的来说，DeepSeek 的故事在2025年伊始就一鸣惊人，此后从未让人失望。

All in all, DeepSeek story started 2025 off with a bang and it has not let down ever since.

Speaker 0

我们今年的第二个重大AI故事也始于一月，那就是大规模的AI基础设施建设。

Our second big AI story for the year also kicked off in January, which was the massive AI infrastructure build out.

Speaker 0

这一切起初看似平淡无奇，只是OpenAI和几个合作伙伴如软银、MGX和甲骨文宣布，计划在未来四年投资五千亿美元在美国建设AI基础设施。

It started oh so innocently, just OpenAI and a couple of friends like SoftBank, MGX and Oracle announcing their intention to invest a half trillion over the next four years to build AI infrastructure in The United States.

Speaker 0

该计划名为‘Project Stargate’，于1月21日星期二在白宫公布，时任总统特朗普出席了发布会，现场还有甲骨文创始人拉里·埃里森、OpenAI首席执行官萨姆·阿尔特曼以及软银首席执行官孙正义。

The initiative was called Project Stargate, and it was anounced at the White House on Tuesday, January 21, with President Trump in attendance with Oracle founder Larry Ellison, OpenAI CEO Sam Altman, and SoftBank CEO Masayoshi Son.

Speaker 0

当然，自那以后，AI基础设施相关的投资合作在整个一年中持续增加。

Now, of course, since then, the AI infrastructure deals have done nothing but increase throughout the year.

Speaker 0

我们看到各大云服务商的资本支出和扩张规模大幅增长，微软、谷歌、亚马逊、Meta等几乎所有主要公司都上调了2025年和2026年的资本支出预期。

We have seen a massive amount of hyperscaler CapEx and expansion with basically every major company, Microsoft, Google, Amazon, Meta, all increasing their guidance around their CapEx for '25 and '26.

Speaker 0

我们还见证了黑石、微软、MGX等公司共同发起的全球AI基础设施投资伙伴关系，这是一个规模达一千亿美元的投资平台，专注于数据中心及其电力供应。

We saw initiatives like the Global AI Infrastructure Investment Partnership between BlackRock, Microsoft, MGX, and others, which was a $100,000,000,000 investment vehicle focused on data centers and the electricity to power them.

Speaker 0

我们还看到了埃隆·马斯克的XAI公司启动的Colossus扩建计划，该公司试图将其现有算力从十万块GPU扩展到一百万块甚至更多。

We had Elon Musk's XAI Colossus expansion, which sees that company attempting to scale from their current 100,000 GPUs to a million GPUs or more.

Speaker 0

当然，随着所有这些数据中心的建设，能源需求也随之增加，催生了如谷歌与NextEra能源公司的合作——该协议旨在建设配备自备发电设施的吉瓦级数据中心园区，其电力来源得益于对核能的投资。

And And of course with all this data center build out, there is also going to be energy requirements leading to announcements like the Google and NextEra Energy Partnership, which is an agreement to develop gigawatt scale data center campuses that have power generation on-site, thanks to an investment in nuclear.

Speaker 0

正如我们之前讨论的，这一主题贯穿了全年，直到夏季末期，它一直是推动股价上涨的主要因素。

Now, as we discussed, this was a theme throughout the year, and right up until the end of the summer, it was a major theme driving up stock prices.

Speaker 0

但随后，甲骨文与OpenAI的交易出现了。

But then came the Oracle and OpenAI deal.

Speaker 0

在八月，甲骨文宣布，在截至8月31日的财季中，其新增了3170亿美元的未来合同收入。

At the August, Oracle revealed that it had added $317,000,000,000 in future contract revenue during its quarter that ended August 31.

Speaker 0

这导致该公司股价飙升高达43%，一度使其净资产超过埃隆·马斯克。

That led the company's stock price to surge by as much as 43%, temporarily pushing his net worth up over even Elon Musk.

Speaker 0

几天后，当人们发现OpenAI是这约3000亿美元收入的主要客户时，市场开始变得更为谨慎。

When a couple days later, it was revealed that OpenAI was the customer driving about $300,000,000,000 of that, markets started to get a little bit more nervous.

Speaker 0

而这自然引出了我们今年的下一个重大话题——人工智能泡沫之争。

And this of course brings us to our next big story of the year which is the AI bubble debate.

Speaker 0

如果我们仅从讨论热度，尤其是主流媒体的关注度来看，这无疑是今年最大的人工智能话题。

Now if we were just looking for what theme or topic was most discussed, particularly in mainstream media, for sure, this is the biggest AI story of the year.

Speaker 0

正如我所说，至少从为此耗费的笔墨数量来看。

Like I said, at least in terms of the amount of sheer ink spilled on it.

Speaker 0

每周甚至到现在，都源源不断出现与AI泡沫辩论相关的文章。

Every week even to now sees an endless stream of AI bubble debate related articles.

Speaker 0

有趣的是，许多关注点都集中在甲骨文与OpenAI的这笔大交易，以及他们为建设所承担的债务上。

And interestingly, a lot of the focus is on Oracle, that big deal with OpenAI, and the debt that they're taking on to finance the build out.

Speaker 0

泡沫讨论中的一个关键主题是收入的循环性。

One of the key themes of the bubble conversation is the circularity of revenue.

Speaker 0

我相信你一定见过这张图表，它展示了包括微软、OpenAI、英特尔、甲骨文、英伟达、XAI和AMD在内的主要公司之间错综复杂的投资与客户关系网络。

I'm sure you've seen some version of this chart which shows the dense web of investment and customer relationships between major companies including Microsoft, OpenAI, Intel, Oracle, NVIDIA, XAI, and AMD.

Speaker 0

对一些人来说，这看起来像一座纸牌屋。

Now to some, this screams house of cards.

Speaker 0

对另一些人来说，这展现了推动经济大规模AI化的密集关系网络。

To others, it shows the dense web of relationships that is driving the mass AIification of the economy writ large.

Speaker 0

AI泡沫的讨论如此普遍，以至于现在已有专门的维基百科词条，其中还包含关于这种循环融资的章节。

AI Bubble Talk is so ubiquitous that it now has its very own Wikipedia entry, complete with a section on that circular financing.

Speaker 0

这部分之所以如此引人入胜且富有共鸣，是因为它在短期内无法被证实或证伪。

Now part of what makes this such a juicy and resonant theme is that it's one that's impossible to prove or disprove in the short term.

Speaker 0

换句话说，即使我们正处于人工智能泡沫之中，这种泡沫以OpenAI未能履行这些大额交易财务义务等方式显现并产生问题，也不会在短期内发生。

In other words, even if we are in the midst of an AI bubble, the way that that would be manifest and problematic in terms of, for example, OpenAI missing financial obligations with these big deals is not coming to bear in the short term.

Speaker 0

这意味着，市场参与者试图将他人拉向自己对世界的看法时，叙事之争就有了肥沃的土壤。

That means that it's ripe territory for narrative debates as market actors try to drag participants to their view of the world.

Speaker 0

之前我提到过一个不错的资源，如果你对这个故事感兴趣，可以参考《Exponential View》发布的繁荣与泡沫监测工具。

Now one good resource that I pointed to before if you are interested in this story comes from Exponential View who put together a boom and bubble monitor.

Speaker 0

这个工具源于一篇博文，他们分析了五个历史性的金融泡沫指标：经济压力、行业压力、收入增长势头、估值热度和融资质量，并将它们转化为实时追踪器。

This came out of a blog post where they looked at five historic indicators for financial bubbles: economic strain, industry strain, revenue momentum, valuation heat, and funding quality, and now turn them into a live tracker.

Speaker 0

目前，他们认为我们仍牢牢处于繁荣阶段，五个指标中只有一个（即行业压力）处于红色区域。

Now at this stage, they argue we are still firmly in boom territory with only one in the five gauges in the red, which is the industry strain.

Speaker 0

尽管如此，这里仍有许多值得关注的地方，这是一个绝佳的资源。

That said, there is a lot to watch here and it's a great resource.

Speaker 0

你可以在 boomerbubble.ai 上找到它。

You can find it at boomerbubble.ai.

Speaker 0

现在我们转向下一个故事，这个故事我不得不勉强提及。

Now moving on to our next story, one that I have to begrudgingly include.

Speaker 0

如果AI泡沫辩论是今年最受争议的话题，那么今年被引用最多的媒体内容，让我非常不情愿地说，就是那份声称95%的生成式AI试点项目都失败了的麻省理工学院报告。

If the AI bubble debate was the most debated topic of the year, the most referenced media of the year, to my great chagrin, was the MIT report that argued that 95% of generative AI pilots are failing.

Speaker 0

在我的‘十大事件’笔记中，我把这个称为企业采用与麻省理工的谎言。

Now, in my notes about the 10 biggest stories, I called this enterprise adoption and the MIT lie.

Speaker 0

虽然我已经多次谈论过这份麻省理工报告，但我想再次，为了记录在案，在这个回顾节目中，彻底拆穿它那彻头彻尾的荒谬之处。

And while I've talked about the MIT report a lot, I do want to one more time and for posterity as part of this recap episode, rip it to shreds for the utter garbage that it is.

Speaker 0

这样做的主要原因有两个。

Two big reasons for that.

Speaker 0

首先是方法论问题。

First of all, the methodology.

Speaker 0

其次是分析中蕴含的惊人且错误的逻辑跳跃。

And second of all, the incredible and incorrect leaps of logic that are embedded in the analysis.

Speaker 0

首先，从方法论角度来看，这项研究——我用尽全力打上最大的、最强烈的引号——考察了几个方面。

So first of all, from a methodology perspective, this study, which I say in the biggest, most aggressive air quotes I can manage, looked at a couple of things.

Speaker 0

首先，它查看了近期提及人工智能的上市公司的财报，以确定是否有公司提到收入增长。

First, it looked at recent earnings report of public companies who mentioned AI to see if any of them talked about revenue acceleration.

Speaker 0

然后，它将此与大约50次随意的高管访谈配对，这些高管似乎是他们能够接触到的。

It then paired that with around 50 convenience interviews from random executives they apparently had access to.

Speaker 0

这就是这项研究的全部方法论。

This is the entire methodology for this thing.

Speaker 0

这不仅是一个极其薄弱的数据来源，而且认为一家公司若在财报中未提及AI带来的收入增长就意味着其试点项目失败，这种想法简直荒谬至极。

Not only is that a radically underwhelming data source, but the idea that an organization not mentioning revenue gains from AI in a report means that their pilots are failing is absolutely ludicrous.

Speaker 0

再者，人们可能会认为，既然有麻省理工学院这样声望卓著的机构背书，声称95%的AI试点项目失败，那么他们一定是询问了一大批企业AI负责人，他们的试点是否成功，结果95%的人表示失败。

Again, one would think that with a headline backed by someone as prestigious as MIT that says that ninety five percent of pilots are failing, you would assume that they asked a bunch of enterprise AI leaders if their pilots were succeeding or failing and 95% of them said that they were failing.

Speaker 0

对吧？

Right?

Speaker 0

但事实并非如此，这仅仅是根据财报中未明确提及收入增长而做出的推断，别无其他。

But no, this is an inference from a missing articulation of revenue gains in earnings reports and nothing more.

Speaker 0

如果我们暂时对研究作者保持一点善意，他们显然没有预料到这项研究会产生如此巨大的影响。

Now if we can be charitable to the study authors for a moment, they obviously didn't know that it was going to have the impact that it had.

Speaker 0

它被卷入了一件比这份报告本身大得多的事情中。

And it became caught up in something that was much bigger than just the one report.

Speaker 0

然而，坦率地说，这有损麻省理工学院的声誉。

However, still frankly it didn't befit the MIT name.

Speaker 0

我认为他们应该为自身思维的质量感到羞愧。

And I do think they should be embarrassed at the quality of their thinking.

Speaker 0

现在，我把我的讲台收起来，不再继续下去了，但我们必须承认，这份报告之所以引起如此强烈的共鸣，是有原因的。

Now packing my soapbox away for the rest of the episode, we do have to acknowledge that there was a reason that this report was so resonant.

Speaker 0

它出现时，正值一个环境已经成熟，多种因素共同作用，使这份报告带有了一种确认偏误的色彩。

It came into a ready and waiting environment where a combination of factors made this report have an element of confirmation bias.

Speaker 0

首先是市场开始转向，这似乎完美地证明了原因所在。

The first was that markets were starting to turn, and this seemed like perfect evidence of why.

Speaker 0

在报告发布后的头几周里，大量传播来自华尔街分析师和投资者，他们将其作为评估AI市场的一部分。

A huge part of the amplification that happened in the first couple weeks after this was announced came from Wall Street analysts and investors who were talking about it as part of their assessment of AI markets.

Speaker 0

但第二点是，暂且搁置AI泡沫的争论。

But the second thing was hold aside the AI bubble debate.

Speaker 0

从企业角度来看，2025年获得的许多经验都围绕着这样一个主题：要真正掌握人工智能并从中获得价值，仅仅在员工面前简单地部署一个聊天机器人是远远不够的。

A lot of the learnings of 2025 from an enterprise perspective were around the theme that to be good at AI and to really get the value out of this technology, it was going to take more than just dropping a chatbot on top of your people.

Speaker 0

显然，成熟的企业从未认为这会如此简单。

Obviously, sophisticated organizations never thought it was going to be that simple.

Speaker 0

但当这项研究发布时，人们开始普遍认识到：要充分释放人工智能的价值，我们必须以更宏大、更全面、更系统的方式思考问题。

But at the time this study came out, there was the beginning of a broad recognition that, okay, to really get the full value out of AI, we're going to have to think in bigger, more comprehensive and systemic terms.

Speaker 0

我们必须重新设计系统。

We're going to have to redesign systems.

Speaker 0

我们必须解决数据准备问题。

We're going to have to address data readiness.

Speaker 0

我们必须思考我们赋予智能体的上下文环境。

We're going to have to think about the context that we give agents.

Speaker 0

而这正是它真正产生实质影响的部分。

And that is the real substantive piece that it interacted with.

Speaker 0

然而，如果你想知道2025年企业采用人工智能的真实故事，那就是：尽管上述所有认知和领悟都在发生，但要真正充分释放人工智能和智能体的价值，仍需更多努力，而实际的采用率依然只是稳步前进。

Still, if you want to know the actual story of enterprise adoption over the course of 2025, it was that even as all that learning that I was just mentioning and realization was happening, that to really get the full value out of AI and agents, it was going to take more, adoption was still just a steady lineup.

Speaker 0

而且这不仅是一个稳步前进的过程，这些人工智能实施还带来了实际价值。

And not only was it a steady lineup, the AI implementations that were happening were leading to value.

Speaker 0

在我们的AI投资回报率基准研究中，我们发现约44%的用例报告了适度的投资回报率，约38%的用例报告了显著或变革性影响的高投资回报率。

In our AI ROI benchmarking study, we found that around forty four percent of use cases were reporting modest ROI and about thirty eight percent were reporting high ROI of either significant or transformational impact.

Speaker 0

仅有5%的用例报告了负投资回报率。

Only five percent were reporting negative ROI.

Speaker 0

请记住，负投资回报率并不意味着失败。

And keep in mind, negative ROI does not mean failure.

Speaker 0

负投资回报率意味着尚未实现投资回报，即在短期内，资源投入仍高于由此带来的收益。

Negative ROI means failure to reach ROI yet, where the outlay of resources is still higher than the gain from that outlay of resources in the short term.

Speaker 0

但如果你观察那些积极使用人工智能的领导者，2025年他们对这项技术价值的乐观情绪持续上升。

But if you look at leaders who are interacting with AI, 2025 saw their optimism about the value of this technology go nothing but up.

Speaker 0

对比普华永道2024年与2025年的全球首席执行官研究：2024年，大多数首席执行官（63%）表示，他们预计将在三到五年内看到人工智能的投资回报。

Comparing KPMG's global CEO study from 2024 to 2025: In 2024, the majority of CEOs (sixty three percent said that they expected to see ROI from AI in three-five years.

Speaker 0

20%的人认为将在一到三年内实现，16%的悲观者认为将超过五年。

20% of thought it would be one-three years, and 16% of pessimists thought it was going to be more than five years.

Speaker 0

到2025年，这一时间点已大幅提前。

By 2025, that had pulled forward massively.

Speaker 0

2025年受访的首席执行官中，有三分之二认为他们将在一到三年内看到回报。

Two thirds of CEOs surveyed in 2025 thought that they would see ROI within one-three years instead.

Speaker 0

19%的人表示，回报仅需六个月到一年即可实现。

19% said that it was just six months to a year away.

Speaker 0

如今，认为需要五年以上才能实现回报的人不到2%。

And now less than 2% thought it was going to take more than five years.

Speaker 0

听好了，我认为理解这份麻省理工学院的报告为何尽管存在缺陷，却引发如此强烈反响，是很有价值的。

Look, I do think it is worth understanding why this MIT report, as bad as it was, struck such a nerve.

Speaker 0

但当你层层剥离后，2025年企业采用AI的故事其实是：采用率更高、回报开始显现，并真正认识到要获得下一阶段的价值，还需要付出更多努力。

But when you peel the layers away, the story of enterprise adoption in 2025 is more adoption, starting ROI, and a real recognition that to get the next set of value, it's going to take more work.

Speaker 0

当然，AI领域存在炒作，但毕马威正在将AI的潜力转化为商业价值。

Sure, there's hype about AI, but KPMG is turning AI potential into business value.

Speaker 0

他们已将AI嵌入企业各个部门的智能代理中，以提升效率、改善质量，并为客户和员工创造更好的体验。

They've embedded AI in agents across their entire enterprise to boost efficiency, improve quality, and create better experiences for clients and employees.

Speaker 0

KPMG 已经亲身体验过，现在他们可以帮你做到同样的事。

KPMG has done it themselves, now they can help you do the same.

Speaker 0

了解他们的旅程如何加速你的进程，请访问 www.kpmg.usagents。

Discover how their journey can accelerate yours at www.kpmg.usagents.

Speaker 0

本期节目由我的公司 Super Intelligent 赞助。

Today's episode is brought to you by my company, Super Intelligent.

Speaker 0

Super Intelligent 是一个 AI 规划平台。

Super Intelligent is an AI planning platform.

Speaker 0

而现在，随着我们步入 2026 年，我们所合作的企业普遍展现出一个明确的主题：要让 2026 年成为大规模部署 AI 的一年，而不仅仅是更多试点和实验。

And right now, as we head into 2026, the big theme that we're seeing among the enterprises that we work with is a real determination to make 2026 a year of scaled AI deployments, not just more pilots and experiments.

Speaker 0

然而，许多合作伙伴正卡在某个 AI 平台期。

However, many of our partners are stuck on some AI plateau.

Speaker 0

可能是治理方面的问题。

It might be issues of governance.

Speaker 0

可能是数据准备方面的问题。

It might be issues of data readiness.

Speaker 0

也可能是流程映射的问题。

It might be issues of process mapping.

Speaker 0

无论哪种情况，我们都在推出一种名为‘突破瓶颈’的新评估方式，正如你从这个名字中猜到的那样，它的目标是突破AI的瓶颈。

Whatever the case, we're launching a new type of assessment called Plateau Breaker that, as you probably guessed from that name, is about breaking through AI plateaus.

Speaker 0

我们将部署语音代理来收集信息，诊断阻碍你突破瓶颈的真实瓶颈所在。

We'll deploy voice agents to collect information and diagnose what the real bottlenecks are that are keeping you on that plateau.

Speaker 0

在此基础上，我们会制定一份蓝图和行动计划，帮助你顺利跨越瓶颈，实现规模化部署和真正的投资回报。

From there, we put together a blueprint and an action plan that helps you move right through that plateau into full scale deployment and real ROI.

Speaker 0

如果你希望了解更多关于‘突破瓶颈’的信息，请发送邮件至 contactbsuper.dot.ai，邮件主题请注明‘plateau’。

If you're interested in learning more about Plateau Breaker, shoot us a note: contactbsuper dot ai with plateau in the subject line.

Speaker 0

AI不是一个一次性项目。

AI isn't a one off project.

Speaker 0

它是一种需要随着技术发展不断演进的伙伴关系。

It's a partnership that has to evolve as the technology does.

Speaker 0

Robotics 和 Pencils 与客户并肩合作，在自动化、个性化、决策支持和优化的各个阶段推动实用型AI的落地。

Robots and Pencils work side by side with clients to bring practical AI into every phase: automation, personalization, decision support, and optimization.

Speaker 0

他们通过实际实验验证有效的方法，并构建能够放大人类潜力的系统。

They prove what works through applied experimentation, and build systems that amplify human potential.

Speaker 0

作为AWS认证合作伙伴并拥有全球交付中心，Robots and Pencils 将广泛覆盖与高度定制化服务相结合。

As an AWS certified partner with global delivery centers, robots and pencils combines reach with high touch service.

Speaker 0

当其他人交接后离开时，他们始终持续参与。

Where others hand off, they stay engaged.

Speaker 0

因为伙伴关系不是一份项目计划，而是一种承诺。

Because partnership isn't a project plan it's a commitment.

Speaker 0

随着人工智能的发展，他们的解决方案也将不断进步。

As AI advances, so will their solutions.

Speaker 0

这才是长期价值。

That's long term value.

Speaker 0

进步始于找到正确的合作伙伴。

Progress starts with the right partner.

Speaker 0

从 robotsandpencils.com/aidailybrief 开始，与 Robots and Pencils 一同前行。本集由 Blitzy 赞助播出，Blitzy 是一个拥有无限代码上下文的企业级自主软件开发平台。

Start with robots and pencils at robotsandpencils.com/aidailybrief This episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context.

Speaker 0

Blitzy 使用数千个专用AI代理，持续思考数小时，以理解包含数百万行代码的企业级代码库。

Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code.

Speaker 0

企业工程领导者在每个开发冲刺开始时都会使用Blitzy平台，提交他们的开发需求。

Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements.

Speaker 0

Blitzy平台会提供一份计划，然后为每个任务生成并预编译代码。

The Blitzy platform provides a plan, then generates and pre compiles code for each task.

Speaker 0

Blitzy 自主完成超过80%的开发工作，同时为完成冲刺所需的剩余20%人工开发工作提供指导。

Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.

Speaker 0

公开公司采用Blitzy作为预IDE开发工具，并结合其首选的编码助手后，工程效率提升了五倍，从而将AI原生的软件开发生命周期引入组织。

Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre IDE development tool, pairing it with their coding pilot of choice to bring an AI native SDLC into their org.

Speaker 0

访问 blitzy.com 并点击获取演示，了解Blitzy如何将您的SDLC从AI辅助转变为AI原生。

Visit blitzy.com and press Get a Demo to learn how Blitzy transforms your SDLC from AI assisted to AI native.

Speaker 0

今年AI领域的下一个重大故事，必然是AI人才大战。

Our next major story of the AI year has to be the AI talent wars.

Speaker 0

一直以来，AI领域的人才都备受重视。

Now talent was always valued in AI.

Speaker 0

这从来就不是一个问题。

That was never a question.

Speaker 0

多年来，大型实验室的顶尖研究人员一直拿着非常非常高的薪水，这会让大多数人感到极其满意。

Top researchers inside the big labs have for a number of years been making very, very hardy salaries that would make most people extremely happy.

Speaker 0

然而，今年年中，随着各实验室之间对人才的竞争加剧，薪资水平开始攀升至新的极端高度。

However, around the middle of this year, that started to get to new extreme levels as competition between the labs for talent started to ratchet up.

Speaker 0

其中一部分原因是实验室的衍生公司带着人才一起离开。

Now, a little bit of that was spinouts from the labs who were bringing people along with them.

Speaker 0

OpenAI 前首席技术官米拉·马雷达迪成立了自己的 Thinking Machines 实验室，并带走了大批人才。

OpenAI's former CTO, Mira Maradi, started her own Thinking Machines lab, bringing a bunch of talent with her.

Speaker 0

另一位前 OpenAI 领导者伊利亚·苏茨克韦尔再次创办了安全超级智能公司，从其他实验室挖走了大量人才。

Another former OpenAI leader, Ilya Sudsgever, started his safe superintelligence once again, recruiting a bunch of talent away from the other labs.

Speaker 0

但真正升温的是今年夏天中期，马克·扎克伯格开始为他的超级智能实验室招募人才。

But where things really heated up was the middle of the summer when Mark Zuckerberg started recruiting for his superintelligence lab.

Speaker 0

开始有报道称，开出的薪酬条件简直疯狂至极。

Reports started coming in of just absolutely crazy offers.

Speaker 0

六月，萨姆·阿尔特曼表示，Meta曾向一些OpenAI员工提供高达一亿美元的薪酬，当时他吹嘘说没人接受这个报价，但这种情况没持续多久。

In June, Sam Altman said that Meta had offered some OpenAI staff up to 100,000,000 bragging at the time that no one had taken him up on that offer, although that wouldn't last for long.

Speaker 0

从那以后，这些数字变得越来越离谱。

And the numbers just got crazier from there.

Speaker 0

我们开始看到越来越多的九位数报价，人们开始将这些薪酬与职业运动员的收入相比较。

We started to see more and more of those 9 figure offers, and people started making the comparison to professional athletes.

Speaker 0

红杉资本甚至发表了一篇文章，题为《为什么AI实验室正变得越来越像体育球队》。

Sequoia even wrote a piece called Why AI Labs Are Starting to Look Like Sports Teams.

Speaker 0

如今，这在很大程度上以Meta收购Scale AI达到高潮，这笔交易耗资150亿美元，似乎主要是为了将Scale的首席执行官亚历山大·王招至麾下，领导他们的超级智能实验室。

Now in many ways, this culminated with the sort of but not exactly acquisition of Scale AI by Meta, which cost Meta $15,000,000,000 and seemed like mostly a way for them to get their hands on Scale CEO Alexander Wang to lead that superintelligence lab.

Speaker 0

尽管秋季关于九位数交易的惊人头条新闻有所减少，但AI人才争夺战仍在激烈进行。

And while the insane headlines about 9 figure deals may have died down over the course of the fall, the AI talent wars continue apace.

Speaker 0

最近，我们看到一些现有公司的人才被大量挖走，尤其是苹果公司，由于其AI战略陷入困境，现在很难留住人才。

More recently, what we've been seeing is the gutting of some incumbents, particularly Apple, who are having an extremely hard time keeping talent right now as their AI strategy flounders.

Speaker 0

我们将看到这一切如何在2026年之前尘埃落定，但我的猜测是，随着明年临近，人才将继续成为所有实验室之间争夺的关键战场。

Now we'll see how this all shakes out heading into 2026, but my guess is that talent is going to continue to be a key battleground for all these labs as we head into next year.

Speaker 0

从这里开始，我们将转向一些更关注人工智能实质内容的故事，而非其市场和生态系统。

From here we move into some stories that are a little bit more about the substance of AI, rather than the market and the ecosystem around it.

Speaker 0

下一个故事是如此普遍且无处不在，以至于它可能根本不像一个故事，因为它就是我们一整年所处的现实。

The next story is one that is so ubiquitous and surrounding us that it might not even seem like a story as it was just our reality throughout the year.

Speaker 0

这就是我所说的‘推理能力的崛起’。

And that's what I'm calling the rise of reasoning.

Speaker 0

我在讲述DeepSeek的故事时提到，其应用之所以迅速登上应用排行榜榜首，是因为这是大多数免费AI用户——显然他们占了绝大多数——首次使用推理模型。

I mentioned back in the DeepSeek story that a big part of why their app rocketed to the top of the app charts was that it was the first time that most free AI users, which obviously represents the vast majority of them, had used a reasoning model.

Speaker 0

当然，一旦你使用过推理模型，就很难再回去了。

And of course once you use a reasoning model, it is very hard to go back.

Speaker 0

在年底时，我们从OpenRouter获得了一些相关数据。

Towards the end of the year we got some numbers around this from OpenRouter.

Speaker 0

OpenRouter是一个平台，允许开发者将应用程序连接到多种大语言模型，这意味着他们不必被锁定在某个特定生态系统中，而是可以根据不同需求、模型宕机或其他原因自由切换模型。

OpenRouter is a platform that allows developers to connect their applications to a variety of LLMs, meaning that they don't necessarily have to be locked into one ecosystem, but there can be model switching based on different needs, or based on the models going down, or whatever the reason is.

Speaker 0

在这一年中，经过一万亿（100,000,000,000,000）个token的消耗，从年初几乎为零的起点开始，推理类token如今已占总消耗量的50%以上。

And over the course of the year and 100,000,000,000,000 tokens, from a starting point of basically zero at the beginning of the year, reasoning tokens now represent over 50% of the total consumed.

Speaker 0

如果你在今年下半年使用过 Gemini 2.5 Pro、Claude 3.7、Gemini 3 或 GPT-5，基本上任何模型，你很可能默认使用的就是推理模型。

If you used three or Gemini 2.5 Pro or Claude after 3.7 or Gemini three or GPT-five or basically any model in the second half of this year, chances are that by default you are using reasoning models.

Speaker 0

尽管如此，我之所以要特别提出这一点，是因为虽然对我们这些在AI领域的人来说这显而易见，但推理模型与非推理模型之间的区别，对外行用户来说未必广为人知。

Now that said, and the reason that I wanted to call this out as an explicit story, is that while this may be obvious to us in the space, the difference between reasoning and non reasoning models is not necessarily widely known outside of AI users.

Speaker 0

伊桑·马利克教授引用了一项最近的研究，发现临床领域的大型语言模型虽然能轻松通过医学考试，但在真实的临床任务中表现却很差。

Professor Ethan Malik referenced a recent study that found that clinical LLMs could ace medical exams but at the same time perform weekly on realistic clinical tasks.

Speaker 0

问题是，这项研究使用的是 GPT-4 和 Claude 3 Opus。

The problem is that the study was using GPT-four and Claude three opus.

Speaker 0

伊桑写道：我不愿反复提起这一点，但研究在评估AI能力时，不能把推理模型和早期模型混为一谈。

Ethan wrote: I hate to keep bringing this up, but studies cannot lump reasoners with earlier models when considering AI abilities.

Speaker 0

尽管研究并不总是必须使用最新模型，但它们应当测试模型能力随规模扩大的趋势，以预测未来的发展。

And while studies don't always need to use the latest models, they should test to see if there are trends in ability as model size scale to anticipate the future.

Speaker 0

当然，推理模型所开启的正是今年的另一个重大故事，如果让我选一个今年最重要的故事，那毫无疑问就是‘氛围编程’的兴起与普及。

Now of course, part of what the reasoning models opened up is our next big story of the year, and the one that if I had to commit to a single biggest story of the year would absolutely be my number one, is the emergence and growing ubiquity of vibe coding.

Speaker 0

天啊，关于氛围编程，还有什么没被说过的呢？

Man, what to say about vibe coding that hasn't already been said?

Speaker 0

它最初起源于如此朴素的形态。

It started with such humble origins.

Speaker 0

这是二月份安德烈·卡帕西的一条推文。

This tweet from back in February from Andre Karpathy.

Speaker 0

他说，出现了一种新的编程方式，我称之为‘氛围编程’，你完全顺应氛围，拥抱指数级进步，甚至忘记代码的存在。

He said, there's a new kind of coding I call vibe coding, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.

Speaker 0

这是因为大语言模型变得太强大了。

It's possible because the LLMs are getting too good.

Speaker 0

当我遇到错误信息时，我会直接复制粘贴，不做任何评论。

When I get error messages, I just copy paste them in with no comment.

Speaker 0

通常这样就能修复问题。

Usually that fixes it.

Speaker 0

代码已经超出了我通常的理解范围。

The code grows beyond my usual comprehension.

Speaker 0

我得花很长时间仔细阅读它。

I'd have to really read through it for a while.

Speaker 0

有时大模型无法修复bug，我就绕过它，或者随机要求一些修改，直到问题消失。

Sometimes the LLMs can't fix a bug, I just work around it or ask for random changes until it goes away.

Speaker 0

我正在构建一个项目或网页应用，但这其实不算编程。

I'm building a project or a web app, but it's not really coding.

Speaker 0

我只是看看东西、说说内容、运行一下，再复制粘贴一些代码，大部分时候都能正常工作。

I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Speaker 0

当然，‘氛围编程’只是对更广泛的AI和智能体驱动编程的一种简略说法。

Now, of course, Vibe coding was shorthand for a much broader array of AI and agentic enabled coding.

Speaker 0

我们看到了Lovable和Replit等消费级应用的迅猛增长，但同时也见证了Cursor和Cognition等工具的崛起——这些工具专为专业开发者设计，支持AI驱动的智能体编程。

We saw massive growth in consumer apps like Lovable and Replit, but then we also saw the rise of Cursor and Cognition these tools that were for AI enabled in agentic coding but for professional developers.

Speaker 0

普遍公认的是，编程成为生成式AI最重要的应用场景，这一点在数据中得到了充分体现。

Pretty much universally, it's acknowledged that coding became the first most important use case of GenAI, which was expressed in the numbers.

Speaker 0

Menlo Ventures在其年度企业AI研究报告中发现，55%的部门级AI支出——约40亿美元——可归因于编程相关用途。

Menlo Ventures in their annual study of enterprise AI found that 55% of departmental AI spend, about $4,000,000,000, could be attributed to coding.

Speaker 0

排名第二的是IT领域，支出为7亿美元；Replit和Lovable的年经常性收入（ARR）均突破了1亿美元，并持续增长。

The next highest category was IT at $700,000,000 Replit and Lovable both surged ahead of $100,000,000 in ARR and have continued to grow.

Speaker 0

与此同时，Cursor 的收入正逼近8亿美元，使这些公司成为历史上增长最快的营收公司之一。

Meanwhile, Cursor is closing in on $800,000,000 making these companies some of the fastest growing revenue companies in history.

Speaker 0

事实上，Vibe Coding 变得如此普遍，以至于到年底，专业开发者和软件工程师内部及周边的讨论已经发生了一些转变。

Indeed, Vibe Coding became so ubiquitous that by the end of the year, the conversation had shifted a little bit inside and around professional developers and software engineers.

Speaker 0

这一群体现在在许多情况下正面临 Vibe Coding 的负面影响，比如需要大量的代码审查、产生的技术债务，以及关键编程技能的退化。

That group is now in many cases wrestling with the downsides of vibe coding, whether it's the amount of review that's required or technical debt that gets created or the atrophy of key coding skills.

Speaker 0

除了这些问题之外，还有如何设计现代 AI 编程栈的疑问。

On top of those issues, there's also just questions of how to design the modern AI coding stack.

Speaker 0

人们希望在多大程度上、在什么情境下获得超快速的 AI 辅助，而不是完全自动化地替他们完成大块的编码工作？

How much and in what context do people want super fast AI assistance versus full automation that does just big chunks of the coding work for them.

Speaker 0

无论如何，今年发布的模型都表明，对于大型模型实验室而言，编程用例是最重要的，几乎所有公司都认为它不仅对解锁编程市场至关重要，也是让 AI 能力拓展到其他通用场景的关键。

Whatever the case, model releases throughout the year have shown that for the big model labs, there is nothing more important than the coding use case, with basically all of them seeing it as key not only to unlocking the coding market, but as key to making AI capable for other general use cases.

Speaker 0

我认为，随着我们进入明年，Vibe Coding 的讨论将开始出现分化。

I think as we head into next year, we're going to start to see a fork in the Vibe coding conversation.

Speaker 0

目前，我们仍在谈论专业软件工程师和软件工程团队所使用的 AI 驱动和智能体编码，使用的术语与非技术人员首次接触代码时所用的完全相同。

Right now, we're still talking about AI enabled and agentic coding that professional software engineers and software engineering organizations are doing with the same set of terminology and in the same breath as what nontechnical people are doing with code for the first time.

Speaker 0

我不认为这些真的是同一件事，我认为这些讨论将会逐渐分化。

I don't think those are really the same thing, and I think those conversations are going to break apart a bit.

Speaker 0

而且说实话，尽管今年Vibe Coding已经无处不在，但它在2026年将要产生的影响甚至更大。

I also think, frankly, that as ubiquitous as vibe coding was this year, the impact that it is poised to have in 2026 is even greater.

Speaker 0

换句话说，我不认为这只是一个会消融在其他所有AI应用中的趋势。

In other words, I don't see this as a trend that will dissipate into all the other things that you can do with AI.

Speaker 0

相反，我认为这是一种根本性的能力转变，将永久改变大量知识工作者的工作方式。

Instead, I think this is a fundamental capability shift that will change how a huge portion of knowledge workers do their work forever going forward.

Speaker 0

我觉得我们对此才刚刚触及皮毛，这正是为什么我会在这些年终特辑的多个访谈中持续探索这一主题。

I think we've barely scratched the surface on that, which is why, of course, I'm exploring it through a couple of different interviews throughout the course of these end of year episodes.

Speaker 0

现在继续围绕编码和智能体这个主题，很多人曾将2025年视为智能体之年。

Now staying on the coding and agent theme a little bit, a lot of people had 2025 pegged as the year of agents.

Speaker 0

我实际上认为确实如此，尽管它所指的含义与我们年初预想的有所不同。

I actually tend to think that was true, although it meant something different than we thought going into the year.

Speaker 0

其中一部分原因在于，这一年是编码智能体之年。

Part of that was that it was the year of coding agents.

Speaker 0

但另一方面，今年许多关键事件都围绕着代理基础设施、上下文的崛起，以及各竞争对手模型实验室决定采用同一套标准，以共同加速发展。

But another part of that was that a lot of the key events of this year were about agent infrastructure, the rise of context, and the decisions that all of the competing model labs made to go in on the same set of standards in order to all move further faster.

Speaker 0

Anthropic在2024年推出了模型上下文协议（MCP），并获得了一些初步关注。

Anthropic introduced the Model Context Protocol at the 2024, and it got some initial attention.

Speaker 0

然而，真正引起人们广泛关注并成为全球AI构建者主要话题的，是从二月到三月这段时间。

However, towards the February and into March is when it really started to capture people's attention and became a major theme for AI builders everywhere.

Speaker 0

MCP为代理连接外部服务和数据源提供了途径，极大扩展了这些代理的能力。

MCP, of course, was a way for agents to connect to external services and data sources, greatly expanding what those agents can do.

Speaker 0

一个特别有趣的现象是，回顾计算历史，常常会出现持续数年的标准之争，不同阵营各自支持不同的标准，最终拖慢了整个领域的发展步伐。

And one of the things that was really interesting is that if you look back at the history of computing, there have often been standards wars that lasted years at a time, where groups who wanted one set of standards fought against groups who wanted another set of standards, all of which ultimately served to slow down overall development in whatever field they were in.

Speaker 0

但今年这种情况并没有发生。

That did not happen this year.

Speaker 0

一旦MCP达到转折点，其他竞争对手实验室很快便意识到，与其对抗，不如加入其中。

You could tell as soon as MCP hit that inflection point that the other labs considered competing and then ultimately decided to just get on board.

Speaker 0

3月26日，萨姆·阿尔特曼发推称：‘人们喜爱MCP，我们很高兴将在我们的所有产品中支持它。’

On March 26, Sam Altman tweeted, People love MCP and we are excited to add support across our products.

Speaker 0

3月30日，谷歌母公司Alphabet首席执行官桑达尔·皮查伊写道：‘选择MCP还是不选择MCP？’

On March 30, Alphabet CEO Sundar Pichai wrote, To MCP or not to MCP?

Speaker 0

这就是问题所在。

That's the question.

Speaker 0

请在评论区告诉我你的看法。

Let me know in the comments.

Speaker 0

4月9日，他进一步回应：‘感谢大家的反馈。’

Followed up on April 9 with Love the feedback.

Speaker 0

那就选MCP吧。

To MCP it is.

Speaker 0

但这不仅仅是MCP的问题。

And it wasn't just MCP.

Speaker 0

代理基础设施的其他部分也在各大实验室中获得了类似的采纳。

Other parts of agent infrastructure also saw similar uptake across the labs.

Speaker 0

同样在4月9日，谷歌宣布了代理间协议。

Also on April 9, Google announced the agent to agent protocol.

Speaker 0

代理间通信，顾名思义，是一种代理通信协议。

Agent to agent, like it sounds, is an agent communication protocol.

Speaker 0

在发布时，它被明确表述为MCP的补充，而一个月内，就连谷歌的竞争对手微软也采纳了A2A。

It was explicitly framed when it was announced as a complement to MCP, and within a month, you even had Google competitor Microsoft embracing A2A.

Speaker 0

最近，我们看到了类似的现象，即Anthropic Skills。

More recently, we've seen a similar phenomenon with Anthropic Skills.

Speaker 0

Skills是一种通过文件和文件夹系统，使通用代理能够访问特定上下文知识或指令的方法。

Skills are a way to give generalized agents access to specialized context knowledge or instructions using a file and folder system.

Speaker 0

在12月，OpenAI也开始支持这一框架。

And in December, OpenAI started supporting the framework as well.

Speaker 0

在所有这些代理基础设施之上，我们还出现了新兴的领域——上下文工程。

Now on top of all this agent infrastructure, we also had the emergent discipline of context engineering.

Speaker 0

提示工程关注的是找到向LLM发出正确提示以获得期望结果的方法，而上下文工程则专注于确保LLM能够获取到完成你期望任务所需的正确信息或上下文。

Whereas prompt engineering was all about figuring out the right way to prompt an LLM to get the results that you wanted, context engineering is all about making sure that the LLM has access to the right information or context to do the work that you're hoping to have it do.

Speaker 0

综合来看，所有这些使得2025年成为代理基础设施之年，为2026年代理在实际应用中的影响力爆发奠定了基础。

Taken together, all of this kind of makes 2025 the year of agent infrastructure, which sets up 2026 to be the year of agent impact in practice.

Speaker 0

当然，我确信还有很多基础设施有待建设，而且这些界限最终都相当模糊。

Now, of course, I'm sure there's a lot more infrastructure still to be built, and these lines are ultimately pretty blurry.

Speaker 0

但我认为，这种对上下文的关注，以及关键代理基础设施的出现和聚集，是今年的关键AI故事之一。

But I think that this focus on context, and the emergence and rallying around of key agent infrastructure is a key AI story of the year.

Speaker 0

最后，今天在我们的年度十大AI故事中，我要统称为模型的下一个飞跃。

Lastly today in our 10 biggest AI stories of the year are what I'm calling collectively the next leap of models.

Speaker 0

我指的是Gemini 3、Opus 4.5和GPT 5.2。

With this I'm referring to Gemini three, Opus 4.5, and GPT 5.2.

Speaker 0

我原本计划在本集中列出今年最具影响力的模型发布排名，但显然，现在你们也能看出这期节目已经很长了，这一点我本该能预料到。

Now I had initially planned to include in this episode my countdown of the most impactful model releases of the year, but obviously at this point you can tell the show is getting pretty long, which I probably could have predicted.

Speaker 0

因此，我会把这部分内容单独做成一期节目。

And so I'm going to move that into its own episode.

Speaker 0

但就我们这里讨论的最后一个故事而言，当GPT-5发布时，对许多人来说是个巨大的失望。

But for our purposes here for this last story, when GPT-five came out, it was for many a big disappointment.

Speaker 0

事实上，它进一步加剧了关于AI泡沫的争论。

In fact, it helped really fuel the AI bubble debate.

Speaker 0

越来越多的人声称人工智能已经遇到瓶颈，并将GPT-5作为证据。

The chorus of people saying that AI had hit a wall got louder and louder, pointing to GPT-five as evidence.

Speaker 0

所有这些意味着，谷歌在Gemini 3发布前承受了巨大压力，而他们无疑成功应对了这一挑战。

All of this meant that there was huge pressure on Google leading into the release of Gemini three, and it was a challenge that they undeniably met.

Speaker 0

事实上，谷歌在11月发布了两个极其重要的模型：Gemini 3以及他们的图像模型Nano Banana 2。

Google, in fact, released two incredibly important models in November, Gemini three as well as their image model, Nano Banana two.

Speaker 0

其中一个影响是针对谷歌自身。

One impact was for Google itself.

Speaker 0

自ChatGPT发布以来，谷歌首次真正成为整个行业的主导者。

For the first time really since ChatGPT launched, Google appeared to be in the driver's seat across the industry as a whole.

Speaker 0

但不仅如此，市场也受到了影响。

But even beyond that, there were impacts on the market as well.

Speaker 0

Gemini 3反驳了人工智能已遇瓶颈的说法，带来了更多乐观情绪，让人们相信我们仍会看到持续的增长与采用，而这无疑有助于为未来五年市场试图定价的重大交易提供合理性。

Gemini three served to counteract the narrative that AI had hit a wall, giving more optimism that we'd see continued growth and adoption, which of course could help justify these big deals over the next five years that markets were trying to figure out how they should price in.

Speaker 0

就在AI社区还在消化Gemini 3之际，Anthropic发布了其最先进的模型Opus 4.5。

Just as the AI community was digesting Gemini three, Anthropic dropped its most advanced model, Opus 4.5.

Speaker 0

自从这款模型发布以来已经过去几周了，我从未见过哪款模型在刚推出时就获得如此高的评价，并且持续提升其声誉。

Now it's been a few weeks since this came out, and I don't know that I've ever seen a model that started on such a high note in terms of people's perceptions and just continued to grow in esteem.

Speaker 0

许多人认为，OPUS 4.5 在人工智能的编程能力方面实现了一个根本性的飞跃。

Many people have argued that OPUS 4.5 represents a fundamental level up when it comes to the coding capabilities of AI.

Speaker 0

由于 OPUS 4.5，我看到有人重新调整了他们对软件工程职业未来的预期和时间表。

I've seen people reset their timelines and how they think about the future of software engineering jobs because of OPUS 4.5.

Speaker 0

即使是非软件工程师，OPUS 4.5 的能力也已渗透到像 Replit 和 Lovable 这样的‘氛围编码’应用中。

Even for non software engineers, OPUS four point five's capabilities have found their way into the vibe coding apps like Replit and Lovable.

Speaker 0

同时也提升了这些平台原本能胜任的功能。

Transforming what those platforms can do competently as well.

Speaker 0

当然，所有这些都引发了 OpenAI 的一些担忧。

Now, of course, all of this prompted some concern from OpenAI.

Speaker 0

在 Gemini 3 发布前的几周里，一份后来被泄露的内部备忘录显示，阿尔特曼曾预判谷歌的强势回归将带来一些不利影响。

In the weeks leading up to Gemini three, an internal memo that was later leaked saw Altman forecast some rough vibes due to a resurgent Google.

Speaker 0

这份‘不利影响’备忘录后来升级为全面的‘代码红色’警报，促使公司调整优先级，暂停大量长期和短期项目，集中全力推进 ChatGPT 及新模型的发布。

The rough vibes memo was upgraded to a full on Code Red and a shift in priorities away from a bunch of long term and short term efforts to just focus on ChatGPT in new model releases.

Speaker 0

正是这次红色警报行动让我们提前获得了GPT 5.2的高级版本，虽然它不像Gemini 3和Opus 4.5那样获得普遍赞誉，但确实拥有大量支持者，包括我自己在内，我认为GPT 5.2 Pro在商业战略相关的使用场景中，是目前最出色的模型。

It was that Code Red effort that got us an advanced release of GPT 5.2, which while not necessarily seeing the same universal praise as Gemini three and Opus 4.5, certainly has a lot of proponents, including myself, who think that GPT 5.2 Pro is, for my use cases around business strategy, the best model out there.

Speaker 0

但综合来看，这一系列NextLeap模型不仅证明了AI发展并未真正进入平台期，更让我们在迈向2026年时，拥有了远超2025年起步时的真正超级能力。

But you take it all together, and this set of NextLeap models have not only demonstrated that AI development hasn't really hit a plateau, but also leaves us heading into 2026 with veritable superpowers compared to where we were heading into 2025.

Speaker 0

当然，目前看来，这种势头短期内不会放缓。

And of course, it doesn't appear like that's going to slow down anytime soon.

Speaker 0

我们预计OpenAI将在一月份发布另一款模型，而其他实验室肯定也不会落后太久。

We are anticipating another OpenAI model in January, and you gotta think that the other labs won't be far behind.

Speaker 0

因此，朋友们，这就是我列出的2025年十大AI大事。

And so friends, that is my list of the 10 biggest AI stories of 2025.

Speaker 0

就像我在另一期节目中说的，我会倒数并真正盘点出本年度最具影响力的模型发布。

Like I said in another episode, I will countdown and it will actually be a countdown of the most impactful model releases of the year.