本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
你好。
Hi.
我是《纽约时报烹饪》的埃里克·金。
This is Eric Kim with New York Times Cooking.
作为一名食谱开发者,我花费大量时间尝试创作既快捷简单又独具特色的菜肴。
As a recipe developer, I spend a lot of my time trying to come up with dishes that are quick, easy, but also very special.
对我来说,这意味着像韩式辣酱三文鱼这样的菜肴。
For me, that means dishes like Cochujarro salmon.
这是一道外皮酥脆的三文鱼排,裹着会形成焦糖般气泡的咸甜酱汁。
It's a crispy salmon fillet with a salty sweet glaze that bubbles up in candies.
我喜爱烹饪这道菜,因为它只需要二十分钟。
I love cooking this because it only takes twenty minutes.
你可以在《纽约时报烹饪》上获取这个食谱和更多灵感。
You can get this recipe and so many more ideas on New York Times Cooking.
访问nytcooking.com获取创意灵感。
Visit nytcooking.com to get inspired.
凯西,今天我们有个关于双子座三号发射的特别紧急播客节目。
Casey, we have a special emergency podcast episode today about the launch of Gemini three.
是的,凯文。
Yes, Kevin.
这是硅谷AI极客们热议已久、翘首以盼的。
Hotly awaited much discussed among AI nerds here in Silicon Valley.
我们终于要接触到真家伙了。
We are finally about to get our hands on the genuine article.
是的。
Yeah.
通常来说,我们不会打破周五的发布计划,专门为一家人工智能巨头发布的新模型做一期特别节目。
So normally, we wouldn't break our Friday publication schedule to publish a special episode just about our new model coming out from one of the big AI companies.
他们一直在发布新模型。
They're releasing models all the time.
但我们认为本周值得讨论这个模型——特别是Gemini 3,有几个原因。
But there are a couple reasons that we thought it was worth doing this this week to talk about this model, Gemini three, in particular.
首先是我们有机会采访到谷歌两位人工智能高管——Demis Esabas和Josh Woodward。
The first is that we got some time with Demis Esabas and Josh Woodward, two of the leading AI executives at Google.
Demis是谷歌DeepMind的首席执行官,这是他们内部的人工智能实验室。
Demis, of course, is the CEO of Google DeepMind, which is their in in house AI lab.
Josh Woodward则是Gemini团队的副总裁,同时也负责谷歌其他相关事务。
And Josh Woodward is the VP of the Gemini team and some other stuff there at Google.
我们很期待与他们交流,探讨这次重大新模型发布。
So we are excited to talk to them and ask them about this big new model release.
不过我认为还有其他几个原因让我们对此特别感兴趣。
But I think there are a couple other reasons we were interested in doing this as well.
确实。
Yeah.
凯文,一个重要原因是——相比其他模型发布,这次似乎更引起了谷歌竞争对手的关注。
I mean, one big thing, Kevin, is just that maybe more than other model releases, this one seems to have the attention of Google's competitors.
我们从其他人工智能实验室的工作人员那里听到很多风声,似乎Gemini 3在某些方面的突破可能会对他们的业务造成威胁。
We're hearing a lot of whispers from folks who work at other AI labs that, it seems like Gemini three has managed to figure some things out in a way that may be bad for their businesses.
在AI行业里,大家普遍有种感觉,谷歌这几年在人工智能领域确实经历了一些挣扎。
And I think around the AI industry, there's sort of this feeling that Google, which kind of struggled in AI for a couple years there.
他们推出了Bard和Gemini的初期版本,但这些产品存在一些问题。
They had the launch of Bard and the first versions of Gemini, which had some issues.
我认为他们当时被视为在追赶行业前沿水平。
And I think they were seen as sort of catching up to the state of the art.
而现在的问题是:这次是否意味着他们要重新夺回王座?
And now I think the question is, like, is this kind of them taking their crown back?
稍后我们会和Demis、Josh深入探讨这些,不过Casey,我们先聊聊已知的Gemini 3吧。
So we'll get into all that with Demis and Josh, but let's just talk, Casey, about what we know about Gemini three.
本周早些时候他们开了简报会,向我们透露了些关于新模型及其功能的信息。
They held a briefing early this week and told us a little bit about the the new model and what it can do.
那么关于Gemini 3我们了解到哪些内容?
So what did we learn about Gemini three?
嗯。
Yeah.
就最让我感兴趣的功能而言,谷歌分享了几点不同之处。
Well, so in terms of what it can do, which is always the most interesting to me, Google shared a few different things.
除了常规升级——比如编程能力更强、氛围编程更出色之外,
One, in addition to saying all the things you would expect, like, it's better at coding, and it's better at vibe coding.
它还能在你提问时为你生成交互界面。
It also is gonna do some new things around generating interfaces for you when you ask it a question.
现在你向大多数聊天机器人提问,它们只会用文本回复答案。
So nowadays, you ask most chatbots a question, it'll spit back an answer in text.
也许它会显示一张图片给你。
Maybe it shows you an image.
根据谷歌团队的说法,Gemini 3将开始为你定制界面。
According to the Google folks, Gemini three is just gonna start building custom interfaces for you.
他们展示了一个例子:有人想了解画家梵高,Gemini 3就直接编写了一个包含各种图片和交互元素的互动教程。
So they showed an example where somebody wanted to learn about Vincent van Gogh, the painter, and Gemini three just sort of, like, coded up an interactive tutorial that had all sorts of, like, images and interactive elements.
他们还展示了另一个例子:构建一个百万美元房产的按揭计算器——这大概是谷歌员工能想象的最低购房预算了。
They showed another example that involved building a mortgage calculator for buying a home over a million dollars, which is the lowest amount of money that anyone at Google can imagine spending on a home.
凯文,这些就是你能在Gemini 3中看到的功能类型。
So these are the kinds of things that you can expect to find in Gemini three, Kevin.
是啊。
Yeah.
我认为谷歌在Gemini 3发布会前简报和资料中贯穿的主题就是:新模型在所有方面都优于前代Gemini 2.5 Pro。
So I would say the theme of the briefing and of the materials that Google shared ahead of the Gemini three launch was this is just kind of better than their last model, Gemini 2.5 Pro, in basically all respects.
有几个让我印象深刻的基准测试,其中一个叫'人类终极考试'——这是门非常难的跨学科测试,题目基本达到研究生或博士水平。
Some of the benchmarks that caught my attention, one was this benchmark test called humanity's last exam, which is sort of a very hard interdisciplinary exam that consists of a bunch of questions, like, basically, a graduate student or PhD level.
他们之前的Gemini 2.5 Pro模型在该测试中得分约21.6%,而Gemini 3 Pro达到了37.5%。
And their previous model, Gemini 2.5 Pro, got about a 21.6% on that test, and Gemini three Pro gets a 37.5% on that test.
这基本上就是所有基准测试的总体情况。
That's basically the story of all of these benchmarks.
他们列举了十多个不同基准测试案例,新模型都轻松超越了旧版本。
They they gave more than a dozen examples of various benchmarks where the new model just beats the old one handily.
不过说实话,对很多人来说这些提升可能无关紧要。
And, you know, to a lot of people, I think that may not matter.
大多数使用谷歌AI产品的人,可能并不是为了解决物理学上的新颖问题。
Most people who are using Google's AI products are probably not out there trying to solve, like, novel problems in physics.
但他们对此的基本宣传就是,这是最先进的模型。
But their basic pitch for this is just like, this is a state of the art model.
任何你能用ChatGPT或Claude甚至旧版Gemini完成的任务,用Gemini三Pro都能做得更好。
Anything that you could do with ChatGPT or Claude or even the older versions of Gemini, you can do better with Gemini three Pro.
他们还提到了测试所谓的Gemini代理功能,这个功能终于实现了我期待已久的一件事——它能浏览你的收件箱、理解内容、建议回复,还能归类整理邮件,真正帮你掌控收件箱,这是我自己一直做不到的。
They also talked about testing what they're calling the Gemini agent, which is going to be able to do one thing in particular that I've been waiting for somebody to do forever, which is look through your inbox, understand its contents, propose replies, kind of, you know, organize like emails together and really sort of help you get your inbox under control in a way that I personally have never been able to.
所以我们基本上只看到了一些动态GIF演示,但这绝对是我拿到Gemini三后要尝试的第一件事。
So we basically only saw a few animated GIFs about that, but that will definitely be one of the first things that I try when I get my hands on Gemini three.
是啊。
Yeah.
需要说明的是,他们不会立即向所有人推出这个功能。
And they are not, we should say, rolling this out to everyone right away.
本周它将面向Gemini应用用户和AI模式用户开放,后者是谷歌主搜索引擎侧边栏的一个标签页。
It's going to be available this week for users in the Gemini app and also in the AI mode, which is sort of the tab off to the side of the main Google search engine.
开发者也能在各种产品中使用它,但他们没有明确说明何时会整合到Google文档或Gmail这些日活数十亿的热门服务中。
It will also be available for developers in various products, but they're not sort of saying when this will come to things like the Gemini integrations in Google Docs or Gmail, these very popular things that, you know, are used by billions of people a day.
不过有意思的是,他们已经把这个模型引入了谷歌搜索,虽然是以非主搜索栏的AI模式存在。
But I thought it was interesting that they have brought this model to Google Search, albeit in this AI mode that's not sort of the main search bar.
这在我看来意味着,他们认为能以足够低的成本提供这个模型,让数十亿人使用而不至于拖垮服务器或产生巨额费用。
That to me suggests that they feel like they can serve this model cheaply enough to make it potentially something that billions of people could use and that that would not melt their their servers and incur, you know, billions of dollars of costs.
没错。
Yeah.
目前数据显示,AI概览功能的使用率持续攀升,他们每个季度的收入都在增长。
So far, they say that the usage keeps going up for AI overviews, and every quarter, they continue to make more money.
看来这对他们很有效。
So it seems to be working out for them.
虽然对互联网其他部分不利,但谷歌确实从中获益良多。
Not working out for the rest of the web, but it's working out well for Google.
是啊。
Yeah.
但我觉得,谷歌相比竞争对手的明显优势在于,他们拥有日活数十亿用户的产品线,可以逐步将Gemini三代整合进去,从而获得更多使用数据,并借此优化模型。
But I think that's, like, obviously, Google's big advantage here over their competitors is that, you know, they have products that are used by billions of people a day, and they can kind of shove Gemini three into those products over time and just get more and more usage and get more data and and use that to improve their models.
没错。
Yeah.
所以我们总是建议学生:第一步,先建立非法垄断。
Which is why we always tell students when they ask us for advice, step one, build an illegal monopoly.
对。
Yes.
说到学生,谷歌本周另一项重要公告是向全美大学生免费提供一年Gemini付费版使用权,我认为这步棋很聪明。
And speaking of students, the other notable announcement that Google is making this week is that they are giving all US college students a year of free access to a paid version of Gemini, which is, I think, a smart move.
虽然感觉有点别扭,这相当于告诉学生们:
I feel a little gross about it, like, essentially telling students, hey.
何不用它来完成部分作业,或者辅助考试呢?
Why don't you use this to maybe do some of your homework, maybe help you with your exams?
第一剂我们免费提供。
We'll give you the first hit for free.
是的。
Yeah.
你知道吗,今天早上的简报会上我也注意到,我记得有三个人用了'学到什么'这个说法。
You know, I I was also struck during the briefing that we had this morning that I believe three different people used the phrase learn anything.
这似乎已成为谷歌宣传中非常突出的一点——他们将Gemini包装成学习工具,或许这只是'帮你做作业工具'的委婉说法。
This seems like it has become a very prominent plank of Google's messaging is they're presenting Gemini as a learning tool, which I maybe is just sort of a euphemism for a do your homework tool.
我不知道。
I don't know.
对。
Yes.
好的。
Okay.
以上就是我们目前了解的Gemini 3相关信息。
So that is what we know about Gemini three.
我们将在周二Gemini 3全面发布后自行进行测试和评测。
We will be doing our own testing and reviewing of Gemini three once it is fully out on Tuesday.
但现在,我们想先向大家介绍基本情况,并带来我们对谷歌DeepMind的Demis Esabas和Josh Woodward的专访。
But for now, we wanted to just give you the basics and also bring you our interview with Demis Esabas and Josh Woodward of Google DeepMind.
在进入正题前,我们显然需要先做AI相关声明。
And before we get to that, we should obviously make our AI disclosures.
我任职于纽约时报公司,该公司正在就大型语言模型训练问题起诉OpenAI和微软。
I work for the New York Times company, which is suing OpenAI in Microsoft over the training of large language models.
而我男朋友在Anthropic工作。
And my boyfriend works at Anthropic.
Demis和Josh,欢迎来到Hardfork。
Demis and Josh, welcome to Hardfork.
很高兴来到这里。
Great to be here.
谢谢。
Thank you.
两年前,Sundar Pichai告诉我们,已故的Bard就像一辆改装的本田思域,在与更强大的赛车竞争。
So two years ago, Sundar Pichai told us that Bard, rest in peace, was a souped up Civic that was in a race with more powerful cars.
Gemini三号是什么类型的车?
What kind of car is Gemini three?
这个问题问得好。
That's a good one.
Demis,你想回答吗?
Demis, do you wanna take it?
嗯,我希望它比本田思域快一点。
Well, I hope it's a bit faster than a Honda Civic.
其实我不太喜欢用汽车来类比。
You know, I don't really think of it in terms of cars.
也许可以比作那些很酷的直线加速赛车。
Maybe it's one of those cool drag racers.
是啊。
Yeah.
看来大家对这款模型真的很期待。
So people are really excited about this model.
我们一直在听取那些早期测试用户的反馈。
We have been hearing from folks that have been sort of early testing it.
显然,你们已经展示了很多基准测试结果。
Obviously, you guys have shown off a lot of the benchmarks.
非常令人印象深刻。
Very impressive.
Gemini在具体层面上能做哪些之前AI模型做不到的事?
What can Gemini do on a concrete level that previous AI models couldn't?
好的,我来回答。
Well, I'll jump in.
可能有几个突出的特点。
Maybe a couple of things that stand out.
首先,我们看到这个模型在推理和同时多步思考方面表现优异。
One, we're starting to see this model really excel on reasoning and being able to think many steps at the same time.
过去的模型有时会思路中断或迷失方向。
Sometimes models in the past would lose their train of thought, lose track.
而这个模型在这方面要好得多。
This one's way better at that.
明天你们还将看到各种新的生成式交互界面。
The other thing you'll see tomorrow as well is all kinds of new generative interfaces.
这是我们迄今为止最擅长创建新型交互界面的模型。
This is our best model yet at being able to create new types of interfaces.
它能真正为用户提供定制化设计和问题解答。
It gives people really a custom design and answer to their questions.
那么第三点我想说的是,我们在编程本身投入了大量资金。
Then maybe the third thing I would say is we've put a lot of investment in coding itself.
通过许多编程示例,你会看到一些新产品问世,比如谷歌反重力项目也会某种程度上展示这一点。
A lot of the coding examples, you'll see some new products coming out, like Google Antigravity will also kinda showcase that.
有种讨论认为,对于普通用户而言,聊天用例似乎已经解决——像Gemini这类产品的普通用户几乎想不出什么问题能生成与上个模型有显著差异的内容。
There's been some discussion that for average users, the chat use case can feel solved that sort of average users of products like Gemini kind of almost can't even think of a question to ask it that will generate something that feels meaningfully different from what they were able to get in the last model.
你觉得Gemini三代在多大程度上符合这种感受?你认为普通用户真的能注意到差异吗?
To what extent does that feel true to you in Gemini three, and to what extent do you think average folks are really gonna notice a difference?
是的。
Yeah.
我们在测试中观察到的一个现象是——Demis你也可以补充——对我们而言,这个模型更简洁、更具表现力。
One of the things, I guess, we're seeing in some of the testing, and Demis, feel free to chime in too, is I think these are really for us, this is a model that it's more concise, it's more expressive.
它开始以更易理解的方式呈现信息。
It starts to present information in a way that's much easier to understand.
我认为对大多数人来说,这将产生立竿见影的重大影响。
And I think for most people, that's going to be a big immediate effect.
然后有趣的是这些模型如何开始与其他类型信息交互。
And then I think what starts to get interesting is how these models start to interact with other types of information.
我们经常讨论学生将如何利用这个模型学习,甚至它如何经你授权连接你在其他谷歌产品中的数据。
So we talk a lot about how students are going to be able to learn with this model, or even how this model can connect to other types of data you might have in other Google products with your permission.
这些方式正开始展现它超越了标准问答的文本交互模式。
These are the ways I think we're starting to show kind of it's going beyond just the standard text kind of q and a back and forth.
嗯。
Yeah.
我想补充一点,你知道,它在各方面的可靠性简直令人难以置信,当你使用时会注意到这一点。
I think I'd add to that just like, you know, its general reliability on things is incredibly you know, you'll notice that when you use it.
我认为我们在角色设定上也下了很大功夫,内部我们称之为它的风格。
I think also we work quite hard on the persona, which we call it internally, like the style of it.
我觉得它更简洁。
I think it's more succinct.
我认为它更切中要点。
I think it's more to the point.
这很有帮助。
It's helpful.
我感觉它的风格更好了。
I feel like it's got a better style about it.
我发现与它进行头脑风暴和使用时更令人愉快。
I I find it more pleasant to to to brainstorm with and use.
然后我认为,你知道,在很多方面几乎都有质的飞跃。
And then I think, you know, I I think there are various things where there's almost a step change.
我感觉它在氛围编程这类事情上跨越了一个实用性的门槛。
I feel like it's crossed a sort of threshold of usefulness on things like vibe coding.
我最近重新开始研究游戏编程。
I've been getting back into my games programming.
我打算在圣诞节期间给自己安排几个相关项目,因为我觉得它在前端等领域已经达到了极其有用且强大的程度,而之前的版本可能不太擅长这些。
I'm gonna I'm gonna set myself some projects over Christmas on that because I feel like it's actually got to a point where it's incredibly useful and and capable on front end and things like this that perhaps previous versions weren't so good at.
丹尼斯,上次五月份你来节目时说过,你认为距离通用人工智能还有五到十年,期间可能需要几项重大突破。
Dennis, the last time we had you on the show in May, you said that you think we're five to ten years away from AGI, and that there might be a few significant breakthroughs needed between here and there.
Gemini三号及其表现是否改变了那些时间线,或者它是否包含了你认为必要的那些突破?
Has Gemini three and observing how good it is changed any of those timelines, or does it incorporate any of those breakthroughs that you thought would be necessary?
没有。
No.
我想它...我想它某种程度上完全按计划进行,如果你明白我的意思的话。
I think it's I think it's sort of dead on track if you if you if you see what I mean.
我们真的对这个进展感到非常高兴。
I we're really happy with this progress.
我认为这是一个绝对惊人的模型,完全符合我的预期,实际上也是我们自Gemini开始以来过去几年一直遵循的发展轨迹,我认为这是行业内最快的进步速度。
I think it's an absolutely amazing model and is is right on track of what I was expecting and and the trajectory we've been on actually for the last couple of years since the beginning of Gemini, which I think has been the fastest progress of anybody in the industry.
而且我认为我们将继续保持这种发展轨迹,我们预计这种情况会持续下去。
And I think we're gonna continue doing that trajectory, and we we we expect that to continue.
但除此之外,我仍然认为还需要一两样东西才能真正实现你所期望的通用智能应有的全面一致性,以及在推理、记忆方面的持续改进,可能还包括你们也知道我们正在与Simmer和Genie合作的世界模型等概念。
But on top of that, I still think there'll be one or two more things that are required to really get the the consistency across the board that you'd expect from a general intelligence and also improvements still on reasoning, on memory, and perhaps things like world model ideas that you also know we're working on with Simmer and Genie.
它们将在Gemini基础上构建,但会以各种方式扩展它。
They will build on top of Gemini, but but extend it in various ways.
我认为其中一些想法也将是实现物理智能等目标所必需的。
And I think some of those ideas are gonna be required as well to fully solve physical intelligence and things like that.
所以...我...我是说两者都成立。
So I'm I I'm both are true.
我对Gemini三号的进展真的非常满意。
I I'm really happy with the progress of Gemini three.
我想人们会感到相当...相当惊喜,但这正是我们预期中的发展进度。
I think people are gonna be pretty pretty pleasantly surprised, but it's on track of what we were expecting the progress to be.
我认为这意味着还需要五到十年时间,可能还需要一两次突破。
And I think that means still five to ten years with with one or two more perhaps breakthroughs required.
你提到了Gemini三号的风格。
You mentioned, Gemini three's style.
最近有很多关于AI伴侣的讨论,人们与它们建立的关系。
There's been a lot of discussion recently about AI companions, the relationships people are developing with them.
你如何看待Gemini的个性?你希望用户与它建立什么样的关系?
How do you think about Gemini personality, and what kind of relationship do you want users to have with it?
在应用内部,我们团队更多将其视为一种工具,帮助你处理日常事务的工具。
I would say in the app itself, we see it on the team a lot as almost like a a tool, or it's something you're using to kinda work through and kinda cut through your day.
无论是解答各类问题还是协助创作,这正是我们期待它大放异彩的方向。
And so whether it's kind of if it's helping on different types of questions you have or helping you create things, that's really where we see it really kind of excelling and kind of the direction we wanna see it.
从宏观角度看,无论是Gemini还是其他项目如notebook.lm或Flow,我们都在思考如何让AI真正成为你工具箱里的超级工具,无论是写作、研究还是电影创作。
I think if you zoom out, if you look at Gemini or some of our other projects like notebook.lm or Flow, we're really kind of trying to think through how does AI really be this superpower, kind of super tool in your toolbox that you can use, whether it's for writing or researching or creating films or whatnot.
这才是我们真正的关注重点。
That's really more where we're focused.
长远来看,我们团队非常关注能追踪诸如'今天我们帮你完成了多少任务'这类指标。
I think over time, we're really interested on the team to be able to track things like how many tasks did we help you complete in your day.
这是一种让我们兴奋的新型衡量标准,也是最初谷歌搜索的工作方式。
That's a new type of metric that I think we get excited about and a way that the original sort of Google search worked.
你会来使用它。
You would come to it.
你会试图获取答案或跳转到某个页面,然后继续下一步。
You would sort of try to get an answer or sent to a page and sort of move on from there.
嗯,所有这些听起来都非常好且负责任,但我很好奇你们没有把这东西做成一个情色伴侣,会错过多少病毒式的互动机会。
Well, all that all sounds very good and responsible, but I'm wondering about all the viral engagement you're leaving on the table by not making this thing an erotic companion.
重大疏忽。
Big oversight.
不予置评。
No comment.
是啊。
Yeah.
在Gemini 3发布前的几天和几周里,你们的一些竞争对手一直非常紧张。
Some of your competitors have been very nervous in the days and weeks leading up to Gemini three.
我想他们也开始听到和我们一样的传言,说这个模型相当出色。
I think they've started hearing the same rumblings that that that we have about this model being quite good.
或许叙事正在从谷歌在AI领域追赶的角色,转变为现在处于竞赛领先地位,或者至少处于领导地位。
And maybe the narrative shifting from sort of Google playing catch up in AI to now sort of being on top of of the race or at least in a in a leadership position there.
你觉得谷歌目前在AI竞赛中处于领先地位吗?
Do you feel like Google is ahead in the AI race right now?
听着。
Look.
正如你们非常清楚的,这是一个激烈的竞争环境,可能是有史以来竞争最激烈的。
It's a as you guys know very well, it's a ferocious, you know, competitive environment, probably the most competitive there's ever been.
所以一个人永远不能,你知道,真正重要的是你从当前位置的进步速度。
So one can never you know, it's almost really the only important thing is your rate of progress right from where you are.
这就是我们专注的,我们对此感到非常高兴。
And that's what we're focusing on, and we're very happy about that.
我的意思是,我并不真的认为这是一种我们重新领先或类似的情况。
I mean, I don't really see it as a sort of like, you know, we were we're back in the lead or something like that.
我们一直在这一领域的研究部分处于领先地位。
We we've always pioneered the research part of this.
我认为这更像是进入我们的节奏,确保下游体现在我们所有产品中。
I think it's like getting into our groove in making sure that downstream reflected in all of our products.
而且我认为我们在这方面确实渐入佳境。
And I think we're really getting into our stride there.
我想你实际上在上次IO大会上已经看到了这一点。
I think you saw that actually last IO, I would say.
我们在这方面越来越得心应手,比如GDM就像是谷歌的引擎室。
And we're getting better and better at that, like with GDM being sort of the engine room of Google.
当然,我们有Gemini应用、Notebook LM这些AI优先产品,同时也在为所有现有的优秀谷歌产品赋能,无论是地图、YouTube、安卓,当然还有搜索,都加入了AI优先功能,在某些情况下甚至从AI优先的角度重新构想产品,通常底层都采用了Gemini技术。
And of course, there's a Gemini app, there's Notebook LM, these AI first products, but there's also powering up all these amazing existing Google products, whether that's Maps, YouTube, Android, you know, search, of course, with AI first features and actually, in some cases, reimagining things from an AI first perspective with, you know, often Gemini under the hood.
这方面进展非常顺利。
And that's going amazingly well.
我认为我们只处于这个演进过程的中期,但看到用户对这些新功能获得的价值和兴奋感,比如Workspace和Gmail等,确实非常令人振奋。
I think we're only midway through that evolution, but it's very exciting to see how, you know, much value and excitement our users are getting when they see each of those new features and, you know, for example, Workspace and Gmail and so on.
那里几乎有无限的可能性。
There's almost almost endless possibilities there.
因此我们对此感到非常兴奋,同时也对我们正在构想和原型开发的所有这些AI优先产品充满期待。
So we're really excited about that as well as all of these AI first products that we're also imagining and and prototyping.
上周我们节目请了一位历史学家,他正在AI Studio中使用谷歌未发布的模型,这个模型能够转录这些非常古老的文档,并正确推理出比如19世纪加拿大毛皮贸易中糖的计量单位等问题,让他感到非常震撼。
We had a historian on the show last week who was using an unreleased Google model in AI Studio, and it had sort of blown his mind with how it was able to transcribe these very old documents and reason correctly about, you know, what kind of you know, what what was the measurements of the sugar in this sort of eighteen hundreds fur trade in Canada?
你认为你能明确告诉我们,这个人是否使用了Gemini三号吗?
Do you think you can tell us once and for all, was this man using Gemini three?
那个问题我不太确定。
Not not sure about that one.
好的。
Okay.
不过我要说,这个模型在建立这些联系方面确实非常出色。
I I will say the model is, though, quite amazing at making these connections.
而且我不知道那位历史学家是否使用了旧文件或日记之类的照片。
And I don't know if the historian was using kind of photos of old documents or diaries or whatnot.
他当时就是在做这个。
That's what he was doing.
是的。
Yeah.
那确实是。
That most certainly was.
好的。
Okay.
它在这方面非常出色。
It's very good at this.
而且,你知道,像我这样字迹潦草的人,你可以拿一页笔记,它能轻松处理并继续完善。
And, you know, someone like me who has pretty poor handwriting, you could take a page of notes, and it'll kind of take that and run with it with no problem.
小菜一碟。
No sweat.
你在这次通话中提到,将把这项功能整合到AI模式的搜索中,作为谷歌主搜索引擎的一个侧边栏。
You mentioned that on this call that you're gonna be integrating this into search in the AI mode that sort of is is a side tab on the main Google search engine.
这是否意味着你们找到了比以往模型更高效、更低成本的运行方式?
Does that mean that you found a way to serve this model more efficiently and cheaply than previous models?
我认为我们始终处于技术前沿。除了模型整体性能的持续提升外,我们真正擅长的还包括模型效率、蒸馏技术以及我们首创并正在应用的诸多技术手段。
I think we're we're always on the cutting I think I feel like the thing we do really well apart from the overall performance of our models and getting better and better at that is is is the efficiency of our models and the distillation techniques and many, many other techniques that we sort of created and pioneered that we're now putting to use.
显然这对我们至关重要,因为我们需要应对诸如AI概览等极端使用场景,服务数十亿用户。
Obviously, we we it's necessary for us because we have extreme use cases of things like AI overviews and others that we have to serve billions of users.
当然,我们的云服务企业客户也非常看重这种效率优势,包括成本效益。
And then, of course, some of our cloud customer enterprise customers really appreciate that efficiency, cost efficiency too.
因此我们始终致力于保持在成本与性能的帕累托前沿。
So we've always tried to be on this Pareto frontier of cost to performance.
无论你更重视性能还是成本,都能在我们的模型家族中找到适合的版本。
And wherever you want to be on that frontier, if you value performance most or if you value cost the most, then there'll be one of the models in the model family for you.
虽然今天我们只发布了Pro版,但我们也正在为3.0时代开发其他系列模型。
So, of course, we're only announcing Pro today, but we are also working on the other family of models for the three point o era.
相关消息很快就会公布。
So you'll see a lot more about that pretty soon.
确实。
Yeah.
似乎每次前沿模型发布时,我们都会重新讨论规模法则的问题。
It seems like every time we see the release of a new frontier model, we get to revisit the discussion about scaling laws.
我们是否开始看到收益递减的现象了?
And are we beginning to see diminishing returns?
我可以预测未来几天会有几个推特账号对此事发表看法。
And I can predict a few Twitter accounts that will probably have something to say about this over the next few days.
所以我想在展开讨论之前先问问你们,你们如何看待这与Gemini三号的关系?
So I thought I would just sort of ask you before we have that discourse, how are you guys thinking about that in relation to Gemini three?
是的。
Yeah.
我们对Gemini三号相比2.5版本的进展感到非常满意。
We're very happy with the the progress Gemini three represents over 2.5.
就像我们之前讨论的,进展基本符合预期且按计划进行,我们对此非常满意。
So I would say, sort of actually referencing to what we discussed earlier, the the the progress is basically what we were expecting and on track, and we're and we're really pleased with it.
但这并不意味着存在某种收益递减的情况。
But that that's not to say that it's like there is some kind of diminishing returns.
人们听到收益递减时,会想到是零还是指数级增长?
People when they hear diminishing returns, think of is it zero or exponential?
对吧?
Right?
但其实还有中间状态。
But there's also in between.
所以收益可能递减,但不会每个时代都呈指数级翻倍,不过仍然非常值得去做。
So they can be diminishing, but it's not like gonna, like, exponentially double with every era, but it's not it's but it's still well worth doing.
对吧?
Right?
而且而且而且这项投资的回报率极高。
And and and extremely good return on that investment.
所以我认为我们正处于那个时代。
So I think we're in that era.
然后,如我所说,我怀疑——虽然还需观察——我们仍需要一两个突破性进展。
And then, you know, as I said, I'm I my suspicion is, although we'll see, is that still one or two more breakthroughs are required.
要实现AGI,必须取得研究上的突破。
Research breakthroughs are required to get all the way to AGI.
但与此同时,显然需要尽可能大规模地发展这些基础模型,即我们今天正在构建并持续取得重大进展的多模态基础模型。
But in the meantime, you're gonna obviously need as scaled a possible versions of these foundation models, multimodal foundation models that we're building today and still seeing great progress on.
对。
Right.
在今天展示的众多基准测试中,您认为哪个对普通用户来说最为重要?
Which of the many benchmarks that you showed off today do you feel like is going to matter most to the average user?
哦,这是个好问题。
Oh, that's a good question.
我认为大多数人不会像我们这样密切关注基准测试,但这些测试始终是个代理指标。
I I think most people don't look at the benchmarks as closely as we do, but the benchmarks are always a proxy.
对吧?
Right?
比如看到在LA Marina上突破1500 Elo分数固然很棒。
So you look at something like cracking the 1,500 Elo on LA Marina, that's great.
但真正重要的是产品中的用户满意度。
But what really matters is kind of the user satisfaction in the products too.
令我们鼓舞的是,这些指标仍在朝着同一方向提升。
And I think what's been encouraging to us is these are still moving in the same direction.
它们是彼此很好的替代指标。
They're good proxies for each other.
最终,我认为我们会发布所有基准测试结果,我们为此感到非常自豪,它们代表了惊人的进步。
Ultimately, I think we'll put out all the benchmarks, and we're very proud of them, and they represent amazing progress.
但你也需要能够将这些转化为真正重要的产品体验。
But you also have to be able to translate that into product experiences that matter.
所以我们尝试在每次发布时兼顾这两方面。
So we try to do both with every one of these releases.
随着模型能力的提升,是否出现了任何新的危险能力或安全隐患?
Any new dangerous capabilities or safety concerns that come with the increased power of the model?
我认为,嗯,我们在这个模型上花了相当长的时间,因为它是前沿的,具有一些新能力,而且正如基准测试所示,它非常强大。
I think, well, we've done we've taken quite a long time on this model to because it's it's frontier and, you know, has some new capabilities, and it's it's very capable, as you can see from the benchmarks.
而且正如乔希所说,我们不会过度依赖这些内部基准测试。
And and as as Josh said, we don't we don't, you know, we make sure to not over index internally on those benchmarks.
它们只是整体表现的替代指标,这就是为什么我们全面关注它们,并最终关注用户体验。
They're just a proxy for overall performance, and that's why we care about them across the board and then ultimately how how our users experience them.
但我们花了大量时间进行测试,安全测试,与安全机构以及我们合作的外部测试人员一起检查各个维度,当然也进行了大量内部测试。
But we spend a lot of time on with on on testing, safety testing, all the different dimensions with the safety institutes and also external testers that we work with as well, as well as, of course, doing a ton of internal testing.
所以我想说,这是我们迄今为止测试最彻底的模型。
So I would say this is our most thoroughly tested model so far.
你想提一下那些新出现的能力吗?无论是否与安全相关?
Do you wanna mention any of those sort of new capabilities that popped up, whether or not it was for for a safety thing?
那里有什么让你觉得‘好吧’的东西吗?
What was there something in there where you thought, okay.
是的。
Yeah.
我们确实需要确保将这些发送给一批外部研究人员吗?
We definitely need to make sure we're sending this to a bunch of external researchers?
对。
Yeah.
你看,关键是要确保我们在工具调用使用、函数调用这类功能上投入了大量精力。
Well, look, it's just it's just making sure we we've worked really hard on things like tool call usage and function calling and and these kinds of things.
显然,这些对编码能力至关重要,开发者们非常需要这些功能,而且它们对整体推理能力也非常重要。
Obviously, they're super important for coding capabilities, and and developers want that and so on, and it's very important in general for reasoning.
但这也让它们在风险更高的领域(比如网络安全)更具潜力。
But it also makes them more capable for for for riskier things too, like cyber.
所以我们必须,你知道,在提升这些维度的同时要加倍谨慎,既要满足所有良性用例的需求,又要持续监控这些功能不被滥用。
So we have to be, you know, we have to be sort of doubly cautious as we improve those dimensions for all the good use cases that we're continually checking on all those kinds of measures that they can't be they can't be misused.
我们正处于AI泡沫中吗?
Are we in an AI bubble?
我觉得这个问题太过非黑即白了。
I think we we it's it's too binary a question, I would say.
我个人认为——这完全是我自己的观点——AI行业的某些领域可能确实存在泡沫。
I I think I mean, my view on this is just strictly my own opinion, is that there are some parts of the of the AI industry that are probably in a bubble.
比如你看那些种子轮融资就能达到上百亿美元规模,却几乎没有任何实质内容的情况。
You know, if you look at, like, seed investment rounds being multi $10,000,000,000 rounds with basically nothing.
虽然团队很优秀,但这可能就是某种泡沫的初期迹象。
It seems I mean, there's talented teams, but it seems like that that might be the first signs of some kind of bubble.
另一方面,从我们的视角来看,我认为有许多令人惊叹的工作和价值,不仅体现在所有新产品领域——比如Gemini应用、Notebook LM,还有更具前瞻性的机器人技术和游戏领域。
On the other hand, you know, I think there's a lot of amazing work and value to at least from our perspective that we see that not only are they all the new product areas, so Gemini App, Notebook LM, thinking more forward, robotics, gaming.
我是说,这些领域有惊人的应用潜力,不仅限于Gemini,还包括我们的其他模型如Genie。
I mean, there's incredible uses of and and not just Gemini, but some of our other models, Genie.
你可以想象我过去在游戏行业的背景。
You can imagine my my old games paying background.
你知道,我迫不及待想思考在那里能实现什么。
You know, I'm itching to to to think about what could be done there.
还有药物发现领域,我们正与Isomorphic和Waymo合作开展的工作。
And I and drug discovery, we're doing with isomorphic and Waymo.
这些都是全新的待开发领域。
And so there's all these new greenfield areas.
它们需要时间成长为价值数百亿美元的巨大业务,但我认为实际上有潜力发展出五到十个Alphabet将参与其中的项目,这让我非常兴奋。
They're gonna take a while to mature into massive multi $100,000,000,000 businesses, but I think that there's actually potential for half a dozen to a dozen there that that that I think Alphabet will be involved with, which I'm really excited about.
但眼前就有回报,我们当然还有核心引擎部门。
But also immediate returns, we got, of course, the engine room.
这是谷歌的核心引擎部分,我们正将这些技术推向所有这些令人惊叹的、拥有数十亿用户的日常产品中。
You know, this is the engine room part of Google where we're pushing this into all of these incredible, you know, multi billing user products that people use every day.
我们几乎有数不清的想法。
And there's there's just almost we have so many ideas.
关键在于执行。
It's just about execution.
比如,你会如何围绕这个重组工作空间?
Like, how would you reorganize workspace around that?
安卓,YouTube。
Android, YouTube.
那里潜力巨大。
There's just so much potential there.
我认为其中很多方面还会带来短期收入与直接回报,同时我们也在投资未来,更不用说云计算收入和TPU等业务,我认为这些也将非常庞大。
And I think a lot of that will also bring in, near term near term revenue and and and direct returns while we're also investing in, the future, not to speak of, you know, cloud revenue and TPUs and all of that, which I think is also gonna be huge.
所以我对Alphabet的现状感到非常满意,无论是否存在泡沫。
So I feel really good about where we are as Alphabet, whether or not there's a bubble or not.
我认为我们的职责是在两种情况下都保持优势。
I think our job is to be winning in both cases.
对吧?
Right?
如果没有泡沫且形势持续,我们将抓住这一机遇。
If there's no bubble and and things carry on, then we're gonna take advantage of that opportunity.
但如果出现某种泡沫并导致收缩,我认为我们也将处于最佳位置来利用这种局面。
But if there is some sort of bubble and there's a retrenchment, I think we'll also be best placed to take advantage of that scenario as well.
好的。
Alright.
假设即将迎来感恩节,地点在湾区。
Let's imagine it's Thanksgiving coming up, and it's it's the Bay Area.
我们的一位听众把话题从令人沮丧的政治转向人工智能,给大家带来些兴奋点。
And one of our listeners, you know, changes the subject from politics, which is upsetting everyone to AI, give give people something to be excited about.
然后有人说,嘿。
And someone say, hey.
我听说双子座三号刚刚发布。
I heard Gemini three just came out.
比如,它到底能做什么呢?
Like, what can it actually do?
你会让我们的听众向朋友展示什么例子呢?无论是在手机还是笔记本电脑上,让他们见识一下这个,还能拯救感恩节?
What's the example that you would have our listeners show their friends, whether it's on their phone and their laptop, to be get a load of this and save Thanksgiving?
是啊。
Yeah.
我不确定这能否拯救感恩节,但应该能带来一些欢笑。
I don't I don't know if it'll save Thanksgiving, but it could probably provide some laughs.
要知道,我们的Gemini图像模型仍然是世界顶尖的。
You know, we're our imagery models in Gemini are still best in the world.
所以我想说的是,拿起你的手机。
So what we've what I would say, grab your phone.
可以是iPhone或安卓,都无所谓。
It can be, you know, iPhone, Android, doesn't matter.
把它拿出来。
Pull it out.
你可以自拍,把自己放进去并编辑。
You can take a selfie, put yourself in it and edit it.
现在还有大量人在这么做,这非常有趣。
People are still doing that at huge amounts, and it's great fun.
然后我认为你可以同时展示新版Gemini 3的其他各种功能。
And then I think you can then show off any kind of other capabilities in the new Gemini three alongside it.
但我们看到很多人因为这些有趣的用例而来,然后也开始尝试应用的其他部分。
But this is what we're seeing people coming for a lot of these interesting use cases and then starting to try other parts of the app too.
你在这里听到了。
You heard it here.
纳米香蕉将拯救感恩节晚餐。
Nano Banana will save Thanksgiving dinner.
先生们,谢谢你们。
Gentlemen, thank you.
很高兴能交谈,感谢你们抽时间。
It's great to talk, and thanks for making the time.
非常感谢。
Appreciate it.
感谢
Thanks for
邀请我们。
having us.
谢谢大家。
Thank you all.
谢谢,伙计们。
Thanks, guys.
《硬分叉》由惠特尼·琼斯和雷切尔·科恩制作。
Hard fork is produced by Whitney Jones and Rachel Cohen.
我们的编辑是珍·福扬特。
We're edited by Jen Foyant.
本期节目由克里斯·伍德负责技术制作。
Today's show is engineered by Chris Wood.
原创音乐由黛安·王、罗温·尼米斯托和丹·鲍威尔创作。
Original music by Diane Wong, Rowan Nimisto, and Dan Powell.
视频制作由苏利亚·罗克、帕特·冈瑟和克里斯·肖特完成。
Video production by Souria Roque, Pat Gunther, and Chris Schott.
您可以在YouTube上观看完整剧集,网址是youtube.com/hardfork。
You can watch this full episode on YouTube at youtube.com/hardfork.
特别感谢保拉·舒曼、谭沛荣、达莉亚·哈达德和杰弗里·米兰达。
Special thanks to Paula Schuman, Puiweng Tam, Dahlia Haddad, and Jeffrey Miranda.
您仍可像往常一样通过hardfork@nytimes.com给我们发送邮件。
You can email us as always at hardfork@nytimes.com.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。