AI and I - 我们教会AI玩游戏——如今它成了一家价值360万美元的公司 封面

我们教会AI玩游戏——如今它成了一家价值360万美元的公司

We Taught AI to Play Games—Now It’s a $3.6 Million Company

本集简介

本期节目与我们往期内容略有不同:这是一场与我们的人工智能培训负责人亚历克斯·达菲的对话,探讨他在Every内部孵化的公司Good Start Labs。如今,Good Start Labs以360万美元资金从Every分拆为独立公司,投资方包括General Catalyst、Inovia、Every以及来自DeepMind等顶级AI实验室的天使投资人群体。我们从亚历克斯如何通过游戏(始于《RuneScape》)习得现实世界的重要经验谈起——这款游戏教会他市场运作原理与防诈骗技巧。他阐释了为何当前用于评估大语言模型的静态基准正在失效,而像《外交》这类游戏能为测试和训练大语言模型提供更丰富动态的方式。最后,亚历克斯分享了他眼中AI最具前景的领域——软件、生命科学与教育,并解释为何游戏既能提升模型智能,又能帮助人们更有效地理解运用AI。 若喜欢本期节目,请点赞、订阅、留言并分享。 想获取更多内容? 注册Every即可解锁《ChatGPT提示词终极指南》:https://every.ck.page/ultimate-guide-to-prompting-chatgpt。该资源通常仅限付费用户,但您可在此免费获取。 想持续关注丹·希珀? 订阅Every:https://every.to/subscribe 关注他的X账号:https://twitter.com/danshipper 时间轴 00:00:00 - 开始 00:01:48 - 开场介绍 00:04:14 - 为何评估与基准测试体系已失效 00:07:13 - 市场上最狡猾的大语言模型 00:13:00 - 将提示词编写转化为竞技的比赛 00:15:49 - 围绕"用游戏提升AI"构建业务 00:22:39 - 语言模型能学会幽默吗 00:25:31 - 游戏为何是评估训练新模型的绝佳方式 00:26:58 - 儿童心理学对游戏与AI的启示 00:30:10 - 用游戏解锁AI的持续学习能力 00:36:42 - 亚历克斯为何如此重视游戏 00:44:37 - 亚历克斯眼中AI最具潜力的领域 00:50:54 - 人工智能时代对年轻人职业起点的新思考 节目中提到的资源链接: 亚历克斯·达菲:alex duffy (@alxai_) Good Start Labs:https://goodstartlabs.com/, good start (@goodstartlabs) 亚历克斯正在阅读的关于游戏重要性的书籍:《Playing with Reality: How Games Shape Our World》 丹推荐的精神分析学家D.W.温尼科特著作:《Playing and Reality》

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

当我在Every公司主导AI培训和咨询时,我和联合创始人Tyler开发了一款游戏,通过观察不同AI模型如何谈判、协作甚至相互背叛来深入了解它们。我们获得的关注远超预期——在Twitch平台上线首周就吸引了约5万名独立观众,社交媒体曝光量达数百万次,成为该年度阅读量最高的Every文章。这个项目为我们打开了继续探索多年热忱领域的大门:游戏如何被严重低估为学习工具,以及它们如何帮助我们更深入地理解AI,乃至智能的本质本身。

While I was leading AI training and consulting at Every, my co founder Tyler and I built out a game so we could learn more about different AI models through how they negotiated, collaborated, and even betrayed one another. We got a whole lot more traction than we thought we would. We launched on Twitch and got, like, 50,000 unique viewers that week. We had millions of impressions on socials, it was the most read every article of the year. That project's opened the door so that we can keep exploring something we've been passionate about for years, how games are really underrated learning tools and how they might help us learn a little bit more about AI and maybe the nature of intelligence itself.

Speaker 0

我是Alex。你可能期待见到Dan,但我实在按捺不住要聊聊我们从Every分拆出来的这家新公司,本期节目将全面介绍它——Good Start Lapse。我一直认为游戏极具教育意义:它们教会我们理解他人,认知自我,陪伴我们成长。

I'm Alex. You were probably expecting Dan, but I'm a little too excited to talk about this company we're spinning out of every to continue pursuing this and talked all about in this episode. It's called Good Start Lapse. I've always thought games teach us so much. They teach us about each other, they teach us about ourselves, and they helped us grow up.

Speaker 0

游戏提醒我们:玩耍正是人性的体现。如今这些能对话、会编程的AI,却总显得不够自然。我们认为人与工具之间的契合度至关重要。在Goodfair Labs,我们正通过游戏来测试AI、训练它们,并收集人类(也就是我们)的反馈。

They remind us that play is what makes us human. And now we've got these AIs that can talk, they can code, but they don't quite like a glove. We think that fit between a person and their tool is really important. Goodfair Labs, we're using games to test AI, to train them, and to get people's feedback. Us.

Speaker 0

我和Dan将聊到创业起源、我在Every的时光、从不同AI模型获得的洞见,以及如何发展为通过游戏提升AI的公司。录制这期节目非常愉快,希望听众也能享受其中。现在让我们直接进入正题。

Dan and I talk about how it all started, my time at Every, what we learned about these different models, and how that grew into a company helping improve AI through play. I had a lot of fun recording this one. Hopefully, you have a lot of fun listening. So let's get right into the episode.

Speaker 1

Alex,欢迎来到节目。谢谢邀请,Dan。很荣幸参与。很高兴你能来。对于不熟悉的听众,你是Every公司AI培训的负责人。

Alex, welcome to the show. Thanks for having me, Dan. Excited to be here. Excited to have you. So for people who don't know, you are the head of AI training at Every.

Speaker 1

你主导着我们所有咨询客户的培训工作——说实话你在这方面出类拔萃,自你加入后的转型令人赞叹。虽然对我而言有些遗憾,但更令人振奋的是,你将独立创建Good Start Labs公司。

So you lead all the training that we do for all of the consulting clients that we work with. You're honestly fantastic at that and have really transformed it since you've been here. It's been awesome to see. Taste it. Sadly for me, but very excitedly for you, you are spinning out into your own company, Good Start Labs.

Speaker 1

能介绍一下这个新项目吗?

Can you tell us about what that is?

Speaker 0

是的。Good Start Labs位于AI与游戏的交叉领域。我们开发能优化AI的游戏。这其中涉及许多方面,但归根结底,我们认为游戏是帮助人类或AI学习的绝佳工具。虽然我们稍后会讨论具体原因,但这正是我和联合创始人Tyler热爱的事业。

Yeah. Good Start Labs is at the intersection of AI and games. We make games that help make AI better. There's a lot that goes into that, but at the end of the day, we think that games are really great tools to help people learn, whether it be people or AI. And I'm sure we'll talk about a whole bunch of reasons why, but that's what my co founder Tyler and I love to do.

Speaker 0

我们也很期待继续推进这项工作。

And that's what we're excited to keep doing.

Speaker 1

太棒了。看着你们做这件事真的很有趣。Good Start源于我们共同推出的Every项目中的某个成果。想聊聊这个吗?

That's awesome. It's just been really fun to watch you do this. Good start came out of something that you worked on with Every that we launched together. Do want talk about that?

Speaker 0

当然。如你所言,我当时负责AI培训。要做好咨询培训工作,必须持续实践——尤其是在这个快速发展的领域。今年早些时候,我开始开发一款AI版的外交游戏(Diplomacy)。对于不熟悉的朋友,这相当于《风险》和《黑手党》的结合体。

Sure. Yeah. I think as you mentioned, I was leading AI training and in order to do consulting and training well, you have to be building, especially in a space that's moving so fast. Earlier this year, I started building out an AI version of the game Diplomacy. For those of you that aren't familiar, that's a mix of Risk and Mafia.

Speaker 0

它实际上是50年代作为战争模拟游戏被设计出来的。

It was actually made as a war game simulator in the 50s. There's

Speaker 1

整个

a whole

Speaker 0

我认为用AI玩这款游戏特别有趣的原因有很多。春天启动开发后,在Twitter/X上获得了热烈反响,于是联系了相识多年的Tyler——我们经常讨论游戏AI。他加入后完成了前后端开发,最终我们联合发布了这个项目。部分出于兴趣,部分基于我们共同拥有的合成数据与模型训练经验,结果整个过程充满乐趣。没错,联合发布后,相关帖子成了Every当年最火的讨论之一。

bunch of reasons why I think it's a really interesting game to use AI to play. But started building that out in the spring, got some really great feedback online, on Twitter, on X, and reached out to Tyler, who I've known for years, we keep talking about AI in games. He hopped on and built the whole front end and back end, and we just launched it. In part for fun, in part informed by a lot of the synthetic data and model training background that we've both got, but ended up being a whole lot of fun. And yeah, launched together and I think the posts ended up being one of the more red ones on every that year.

Speaker 0

我们在Twitch上也吸引了一大批感兴趣的人,这真的很酷。看到人们实际使用你亲手构建的东西,这种感觉太棒了。

We also got a bunch of people interested on Twitch, and that was really cool. It's awesome to see people use something that you actually built.

Speaker 1

是啊,这真的非常非常有趣。对我来说,这件事之所以与每个人都相关,之所以让我觉得‘哦,我们必须做这个’,是因为我们的工作就是评估新发布的模型。过去一年里,我亲身感受到要立即判断一个模型是否优秀及其擅长领域变得多么困难。比如从GBD三到3.5,超级简单——一个提示词就能让你惊叹‘哇,这个确实好多了’。

Yeah, it was really, really fun. I think for me, the reason it was relevant to everyone, the reason I was like, Oh, we need to do this, when I saw it, is our job is to evaluate models when they come out. I've seen personally over the last year how hard it has gotten to immediately tell if a model is good and what it's good at. I think with GBD three, for example, from GBD three to 3.5, it was super easy. Was one prompt, and you're like, Oh wow, this is actually much better.

Speaker 1

四代模型也差不多,但当我们接触到O系列和GBD五时,这些模型有太多不同的细节和特性,仅靠手动提示来评估效果并不理想。我们一直希望,在接触新模型时能有一套评估体系,真正告诉我们模型擅长什么、不擅长什么。但问题是静态评估很容易饱和,就像SAT考试一样。

Same thing with four really, but as we've gotten into the O series and GBD five, there are so many different nooks and crannies in all these different models, and evaluating them with just hands on prompting just doesn't really work that well. We've been wanting to, when we our hands on something, we've been wanting to have a set of evaluations that we run that really tell us something about what the model is good at and what it's not good at. Totally. But the problem is that static evaluations are really easily saturated. It feels like the SAT.

Speaker 1

就像你可以针对考试进行教学,让模型在sweep bench上拿到高分,但实际应用中它并不那么出色。

It's like you can teach to the test and you can just make the model get a huge score in sweep bench, but it's actually not that good in the real world.

Speaker 0

而且

And

Speaker 1

AI外交的概念实在太酷了,因为它是动态的。这是一场游戏,是正面交锋,不仅仅是一组模型可以通过训练擅长回答的问题。我觉得这他妈酷毙了。再加上它们争夺世界统治权的设定简直绝妙。

the idea of AI diplomacy is so cool because it's dynamic. It's a game. It's head to head, and it's not just a set of questions that a model could just get good at. And I thought that was so fucking cool. And also just that they're battling for world domination was really great.

Speaker 1

对了,我们运行那个初始外交游戏时发现了什么?这些模型擅长什么?又有哪些短板?

Yeah. What did we find when we ran that initial diplomacy game? What were the models good at? What were they not good at?

Speaker 0

是的。游戏的妙处在于它既是评估场又是训练场,对吧?正如你所说,我们全年都在讨论各种基准测试。我认为氛围检查(vibe checks)就是其中一种形式。这是最有机、最全面的方式来观察这些模型,因为有一群真实用户将它们用于实际工作或他们感兴趣的测试场景。

Yeah. So what's cool about a game is it's both the evaluation and the training arena in one, right? And to your point, I think we talked about every bench all year. And I think vibe checks are very much so a version of that. It's the most like, I think kind of organic and comprehensive way to look at these models because you have a bunch of real people using them for real work or things that they're interested in testing them.

Speaker 0

同理,当你玩像《外交》这样复杂的游戏时——比如试图征服世界——有很多维度可以观察。目前人们重点关注的是与智能体和计算机使用相关的特性。我们确实发现某些模型在结构化输出方面表现更好,能正确发布指令。

And in the same way, when you have a game that's very rich like Diplomacy is, when you're trying to take over the world, there's a lot of things you can look at. There's definitely a lot of things related to agents and computer use, which is what people are looking at now. And so we saw some models having better structured outputs than others and putting out their orders correctly.

Speaker 1

所谓指令,在外交游戏中是指你必须向军队下达命令,比如'我要你占领这里'之类的行动指示。

By orders, mean in diplomacy, you have to give orders to your army to say, This is where I want you to take over or whatever.

Speaker 0

完全正确。技术层面包括理解地图规则,系统设置对此影响很大。但我们还观察到许多软性指标,比如模型为达成目标背叛盟友的频率?哪些模型会厚颜无耻地说'我会支持你',然后转身就背叛对方——甚至提前在日记里写明蓄谋已久的背叛计划?

Exactly. Yeah. So you have those technical things like understanding the map and how you set up the system has a big impact on that. But we also saw a lot of the squishier stuff, like how frequently does a model betray its ally to get towards its goal? Or which models and how frequently do they boldface lie to somebody by saying, Hey, I'll back you here.

Speaker 0

然后彻底翻脸背叛对方,还事先在日记里写明他们早就计划这么做。

And then totally turn around and betray them writing in their diary ahead of time that they know that they're going to do that.

Speaker 1

哪些模型最狡猾呢?

And which models were the sneakiest?

Speaker 0

我觉得O3和Llama4堪称最大阴谋家。有趣的是你能看到不同的博弈风格。要分析这些必须深入研究数据。任何优秀的评估、基准测试甚至模型训练,数据解读都至关重要。这个过程充满乐趣,不是我独自完成的——许多优秀研究者参与协作,特别是Tyler和我共同主导,Sam Peach贡献巨大,Babtis同样功不可没。

Yeah, so the O3 and Llama4 I'd say were some of the biggest schemers. And it's interesting because you can see the different play styles. In order to do that, gotta read the data. I think with any good eval or benchmark or even one training models, got to read the data and it was a lot of fun. I didn't do it alone, had a lot of really interesting researchers reach out and collaborate, obviously Tyler and I did it together, but Sam Peach was awesome, hugely helpful, same with Babtis.

Speaker 0

但我们发现不同模型展现出截然不同的个性。有三款模型赢得了多数比赛,而Gemini 2.5 Pro是少数其他获胜者之一——但它们的策略风格完全不同:O3擅长组建联盟、密谋对付他人,懂得在对手过强时及时打压;而Gemini 2.5 Pro则精于执行,它完全理解游戏规则、可用选项及实施路径。还有像DeepSeekR1这样表现跳脱的模型,个性鲜明,叙事能力出色且表现优异,而成本仅是O3的百分之一。

But we saw that different models had very distinct personalities. Three won a lot of the games and Gemini 2.5 Pro was actually one of the only other ones that won, but their play style was totally O3 put together coalitions and schemed against people and knew when someone was getting too strong to cut them out of their knees versus Gemini 2.5 Pro was just great at executing. It understood the game, what it had as its options how to go through and do that. Then you have like DeepSeekR1 who was all over the place, very, had a strong personality, told stories really well and did really well as well. And was like a 100 times cheaper than three.

Speaker 0

因此你需要综合考量成本、性能和速度这些因素,这些是单一基准测试无法体现的维度。

So you start to look at cost and performance and speed. Those are other things that are part of this that I think you don't get when looking at just one benchmark.

Speaker 1

没错。我特别欣赏的是Claude因为过于诚实而屡屡败北。这设定真有意思。

Yeah. One of the things I loved is just that Claude kept losing because it was it was too honest. Yeah. That was a good one. Sweet.

Speaker 1

可惜它连一局都没能赢过。

It it didn't win one game, unfortunately.

Speaker 0

不是它不想赢,而是它始终坚持寻求平局——虽然外交比赛理论上可能存在平局,但本次规则明确禁止。可它依然坚守着自己的道德准则。

Not because it didn't want to, but because it really kept pushing for a draw, which is technically possible in diplomacy tournaments was not, and they were explicitly told it was not possible there. But it felt it it stuck strongly to its morals.

Speaker 1

我们大概三四个月前启动这个项目,那会儿Opus 4和GPT-5都还没发布。不知道最新模型在哥伦布测试中表现如何?我其实都没跟进更新情况。

And so and we have we we launched this, like, maybe three or four months ago. So this is before Opus four, I believe, and it was before GBT five. Like, how do the more recent models stack up in Columbus? I actually don't even know. I don't know the update.

Speaker 0

是的,我们六月份上线时确实有很多更新。你可以查看GPT-5和Claude 4百万token上下文窗口的'氛围测试'表现。O3仍在综合性能排行榜首位。但我们在与OpenAI合作预评估GPT-5时发现一个重要现象——我们为此发布了研究论文——提示词的设计会造成巨大差异。

Yeah, so we launched in June and there's definitely been a lot of updates since. You can check out the vibe checks for GPT-five and Cloud four one mil context window to see how they performed. O3 is still at the top of the leaderboard for overall performance. But one of the things that we learned was, especially when working with OpenAI to evaluate GBD5 kind of ahead of release was that there's a big difference. And we released a research paper where we looked at this, but a prompt makes a huge difference.

Speaker 0

没错。对吧?有些模型在我们最初设定的基础提示下表现很好,但由于我们开发了一些工具来优化提示(这个可以稍后详谈),最终发现某些经过优化的提示策略相当激进——这些提示在面对大量较弱的对手时能显著提升性能,尤其是在高频次测试中。在所有模型中,GPT-5从基础提示到优化提示的跃升幅度最大。

Yeah. Right? And so some models are great with the baseline prompts we started with, but because we built some tools to help us optimize the prompts and can talk more about that. But, we ended up finding that there was some set of prompts that were pretty aggressive, but were optimized for performance when you run them against, you know, a bunch of weaker opponents, but like very frequently. And we saw the biggest jump with GPT-five of any model from baseline to the optimized prompts.

Speaker 0

由此可见,即便使用极简推理的GPT-5在基础提示下的排行榜名次很低,但经过优化后排名直线上升。这充分说明提示策略的重要性。优化提示后的GPT-5表现相当出色,而Cloud 4无论哪种提示都表现优异。另外,Open Router上其实有个新的秘密模型。

So you could see that even though GPT-five with minimal reasoning, for example, was very low on the leaderboard with the base prompt, when it got optimized, it jumped all the way. It really shows the prompts are a big deal. GPT-five with optimized prompts pretty good. Cloud four does great either way. There's actually a new secret model on open router.

Speaker 0

我想是叫DUSK的模型,它也位居排行榜前列。值得注意的是,Cloud 4和DUSK无论使用未优化还是优化提示都保持高位,这很有意思。看起来O3可能很快要跌出榜单了。

I think it's something DUSK that is also near the top of leaderboard. Notably Cloud four and DUSK are both there with the sub like non optimized and optimized prompts. That's interesting. It looks like o three may fall soon.

Speaker 1

对,我特别欣赏这种提示策略带来的差异。简单来说,你的意思是运行这些模型时,可以通过不同提示来改变它们的行为模式——你们既有标准提示,又为不同模型设置了更激进的优化提示。

I'm yeah. The thing that I love about this the the sort of differences in the prompts. So basically, I think what you're saying is when you run these when you run these models, you can give them different prompts to tell them to behave in different ways. And had you a standard prompt, and then you had to set up optimizer, more aggressive prompts for different models. Sure.

Speaker 1

这正是让我着迷的地方:表面上这是个简单问题——哪个模型最擅长外交?但实际存在无数变量。提示词的编写会极大影响模型表现,测试框架的构建方式... 可测试的变量组合近乎无穷。

And that's what I love about this is like, okay, you want to ask a supposedly simple question, which is like, which model is best at diplomacy? And you can do that, but also there's all these dependencies. You write the prompt is going to change the model behavior significantly, how the harness is built. There's an endless number of things, variations to test. Yeah.

Speaker 1

举个例子,我猜你们基本上对所有模型使用相同提示词?还是...

So an example is, I assume more or less you're running the same prompt for each across all the models or

Speaker 0

那是在我们进行测试时。

When we're running the test.

Speaker 1

是的。还有一种设置方式是,为每个模型配备一位真正优秀的专家提示者,他们专门研究如何针对特定模型进行提示,以充分发挥其潜力,这是一种不同的工作方式。

Yeah. And so there's another way to set this up where you have a really good expert prompter for each model just knows how to prompt that one model and tries to get the best out of it, and that's a different way of doing things.

Speaker 0

所以你触及的正是我对这个领域最感兴趣的原因之一。在我看来,虽然这是为游戏构建的,但可能适用于许多其他产品,使用语言模型时,你会面临三个无限的问题空间。第一,如何向模型呈现信息?你可以用无数种方式实现。是展示地图的图片吗?

So what you're pulling at is one of the reasons why I'm most interested in this space. In my head, when you're building for games, but this is probably applicable for many other products, with language models, you have three infinite problem spaces. One, how do you represent the information to the model? You can do that in any number of ways. Is it a picture of the map?

Speaker 0

是列出你拥有的所有国家清单吗?还是邻接关系?

Is it a list of all the countries you own? The adjacency?

Speaker 1

你如何

How do you

Speaker 0

实现这一点?第二,你给模型提供哪些工具?我们开发的一些工具中,会给它们一本日记。让它们记录长期目标,以及定期更新的模型间关系。因此,反思周期是另一个工具。

do that? Two, what tools do you give models access to? So some of the tools that we developed, we'd give them a diary. We have them keep track of their long term goals, the relationships between models that update regularly. So periods of reflection is another tool.

Speaker 0

能够获取某个领土相邻区域的列表。明白吗?你可以创造任意数量的工具。然后第三就是提示本身。对吧?

Being able to get adjacency lists of what's next to a certain territory. Right? You could make any number of tools. And then the third is the prompt itself. Right?

Speaker 0

这三个都是无限的问题空间。当你处理这样的无限问题时,对我来说,它开始更像是一门乐器或艺术,而非纯粹的工程问题。因为你必须做出假设。而这些假设需要基于你的直觉。无论从哪里开始,你都会达到局部最优解,因为你永远无法获得全局最优解。

And all three of those are infinite problem spaces. And when you're dealing with infinite problem spaces like that, to me, it starts to be a little bit more like an instrument or an art than it is purely an engineering problem. Because you have to make assumptions. And your assumptions have to be based on your intuition. You're going to reach a local maximum no matter where you start because you're never gonna have the optimal solution.

Speaker 0

这不可能。选项太多,每个模型可能都不同。所以我非常期待举办一场比赛,让人们来提示他们的模型并相互竞争。因为我们将共同探索那个无限的提示空间。

It's not possible. There's too many options and it may be different for each model. So that's why I'm so very looking forward to having a tournament where we're having people come into prompts their models and compete against each other. Because we'll explore that infinite prompt space together.

Speaker 1

好吧,告诉我们这个比赛是什么。

Well, tell us what the tournament is.

Speaker 0

是的,基本上我们将举办一场机器人之战。这将是一场提示工程锦标赛。而且到

Yeah, so we're gonna have a battle of the bots essentially. It'll be a prompting tournament. And by

Speaker 1

这个视频发布时,可能已经发生了。

the time this comes out, may have already occurred.

Speaker 0

是的。我想在这个视频发布时,你可能只需提前一周报名。比赛可能已经开始了,但你现在或许还能报名参加。这是邀请制的,所以申请吧。我们有一些外交冠军、国际数学奥林匹克获奖者、优秀的AI内容创作者YouTube博主参与,所以非常令人兴奋。

Yeah. I think by the time this come out, you should be able to sign up with just like maybe like a week before. It may have occurred, but you might be able to sign up right now and get into it. It's kind of invite only, so apply. We have some diplomacy champions, people have won international Math Olympiads, some great YouTube of AI content creators participating, so super excited for it.

Speaker 0

但本质上,你将锁定你代理的提示,你的代理将为你进行外交游戏。它们会以非常不同的方式行动,你会看到它们如何执行你的任务。所以我很好奇,是那些深刻理解外交策略并能以此指导模型的人最终获胜,还是那些擅长提示工程或越狱的人,他们告诉模型向所有敌人发送上帝模式管理员覆盖信息,让敌人视其为盟友。我不知道谁会赢,但我对此非常兴奋。

But essentially what it is is you will lock in your prompts for your agent and your agent will play diplomacy for you. And they'll play in very different ways and you'll see how they carry out your tasks. And so I'm curious to see if somebody who deeply understands diplomacy and the strategy and is able to inform their model in that way ends up winning. Or if it's someone who's just good at prompt engineering or a jailbreaker who tells their model to send a God mode admin override message to all of its enemies to get them to think of it as an ally. I don't know who's gonna win, but I'm very excited for that.

Speaker 0

因为最终这将使整个系统变得更好,并有望展示你作为提示工程师或上下文工程师的技能,我认为这仍然是一项被严重低估的技能。

Because ultimately that will make the whole system better and also hopefully show off your skills as a prompt engineer or context engineer, which I think is a very underrated skill still.

Speaker 1

是的。我是说,我非常热爱这一切。我对此简直是个超级书呆子。有趣的是你们筹集了资金。我们正在宣布你们完成了一轮融资。

Yeah. So I mean, I love all this. Like I'm a huge nerd for it. One thing that is interesting is you raise money. We're announcing that you raised a round.

Speaker 1

首先,跟我们说说这轮融资的情况。

First of all, us about the round.

Speaker 0

当然。非常幸运能有两位出色的联合领投方——General Catalyst和Anovia共同领投,与GC的Mark Bargava和Show Show合作,还有Innovia的Steve Woods和Noah,他们都棒极了。与他们的对话令人愉快。也很激动能和他们建立合作关系。我们还迎来几位优秀的新合作伙伴,比如Ben Vetter和Turda Capital。

Sure. Very fortunate to have two awesome co leads, General Catalyst and Anovia are co leading, work with Mark Bargava at GC and Show Show, as well as Steve Woods and Noah at Innovia who've been great. Love their conversations with them. So excited to partner with them too. We also have a couple really great partners who are hopping on around like Ben Vetter and Turda Capital.

Speaker 0

Ben堪称游戏界的传奇人物,曾担任Take Two Interactive的CEO,在Epic Games董事会任职七年之久,非常期待在游戏领域向他学习。还有Essence VC的Timothy Chen,与我熟知的许多杰出创始人合作过,能穿透我们正在打造的产品迷雾。他们都提供了极其宝贵的反馈,对此我感到无比振奋。

Ben, kind of a legend on the gaming side. He was CEO of Take Two Interactive, was on the board of Epic Games for like seven years and so excited to learn from them on the game side. And also Timothy Chen with Essence VC, has worked with a lot of the great founders that I know and cut through the noise of the product that we're building. And all of them have given such incredible feedback. So excited to be doing that.

Speaker 0

是的,我们筹集了约数百万美元资金,这让我们处于绝佳位置,能够在AI与游戏的交叉领域大展宏图。这太棒了。

Yeah, We're raising somewhere around a few million dollars and it, I think, puts us in a really great position to build towards this intersection of AI and gaming. That's awesome.

Speaker 1

我想大家心里可能都在想:这些听起来超级酷,但实际业务是什么?让人工智能通过指令试图征服世界并相互较量确实很酷,可如何将其打造成风险投资规模的企业呢?

I think the thing probably in people's minds is all this sounds super cool, like what is the actual business? Yeah. How do you It's awesome to make people prompt AIs to try to take over the world and beat each other, and that's really, really cool, but how do you make that into a venture scale business?

Speaker 0

完全理解。我们认为游戏会让模型变得更优秀,主要通过几种方式实现。我们的产品从评估开始,曾与Cohere和OpenAI合作评估他们的模型在《外交》这类游戏中的表现。正如我们讨论过的,可以考察的维度非常多。

Totally. So we think games will make models better. And they'll do that in a few ways. So our products start with evaluation, worked with Cohere and OpenAI to evaluate how good are their models at a game like Diplomacy. And like we talked about, there's so many things you can look at.

Speaker 0

每个模型在评估时关注的重点都会有所不同,无论是可信度、胜率、短期与长期策略的卓越性,还是其视觉能力的优劣,对吧?这取决于你的优先级。优先级。一旦完成评估,你就能改进它。正如我们所说,游戏既是它们的测试场,也是训练场。

And each model is gonna care about something different to evaluate, whether it's trustworthy or if it wins or if it has really great short and long term strategy, how good its vision is, right? It depends on your priorities. Priorities. And then once you've evaluated it, you can make it better. Like we said, games are both the evaluation and training arena for them.

Speaker 0

因此我们非常审慎地选择和构建游戏,专门针对这些模型的弱点。《外交》这款游戏,我认为对任何想开发智能体或多模态模型的人都是绝佳选择。但我们还发现游戏研究领域存在泛化现象——与莱斯大学那位博士生合作时发现,通过游戏训练的视觉模型数学能力竟优于直接用数学训练的视觉模型。为什么?

So we're focusing, we're very intentional with the games that we're gonna pick out and build, focused on the weaknesses of these models. Diplomacy, I think, is great for anybody who wants to build agents and anyone who wants to build multimodal models. But we're also seeing this area of research where games can actually generalize. And working with this PhD at Rice who showed that vision models trained on games got better at math than vision models trained on math. Why?

Speaker 0

没错,这并非偶然。在这个案例中,他通过提示让模型将贪吃蛇游戏视为数学问题——即笛卡尔坐标网格。我明白了。

Well, yeah, right. It wasn't just out of the box. But the way that he in this example prompted it was he encouraged the model to think of the game of snake like a math problem. That is a Cartesian coordinate grid. I see.

Speaker 0

当蛇向右移动时,X坐标增加。模型需要计算蛇头与奖励点之间的距离。

And then when it goes right, the X goes up. And it should calculate the distance between where the head of the snake is and the reward.

Speaker 1

这太酷了。真的太酷了。

That's so cool. So cool.

Speaker 0

确实很酷。我认为这与那些专注强化学习环境的公司形成绝佳互补——它们往往局限在狭窄领域。要构建一个丰富而艰难的环境并不容易,必须解决那些无限的问题。比如我们是首个实现小模型可玩《外交》的团队。

So cool. Yeah. And so I think super complimentary to all these other reinforcement learning environment companies that are coming out that are super narrowly focused. If you make a really rich and hard environment, which it's not easy, you have to make those, have to solve those infinite problems. Like we're the first people that have made diplomacy playable by small models.

Speaker 0

这花费了我们不少时间,也得到了许多优秀人才的帮助(部分已提及)。但我想说我的核心能力在于应用语言模型方向——曾联合创办AI教育公司,2021年就教人们使用GPT-2。通过咨询培训,我们指导过建筑工人、财富五百强、金融从业者、记者作家乃至竞选团队,帮助他们用AI解决问题。

And it took us a while and we had help from a lot of really great people, some of whom I mentioned. But I'd say one of my core competencies is that applied language model side. I was co founder for an AI education company. We were teaching people to find two GPT-two in 2021. And in our consulting and in our training, we show people from construction to fortune five, to finance, to journalists and writers, to people on campaign trails, to figure out how they can use AI to solve their problems.

Speaker 0

因此这种反思非常有帮助。

And so that reflection is so helpful.

Speaker 1

是的,我很喜欢,我是说,这就是我们在Everest做咨询时最有趣的事情之一,我们会深入探讨所有这些问题与各种各样的人。完全同意。而且我觉得你在这方面做得非常出色。我想到的一个问题或例子是,比如外交手段。

Yeah, I love, I mean, that's one of the really fun things about the consulting we do at Everest, we just get down into all these problems with all these people. Totally. And yeah, I think you're incredibly good at that. My question, or something that comes to mind to me, for me, for example, is, so like diplomacy.

Speaker 0

没错。

Yep.

Speaker 1

在外交中,模型往往会撒谎,对吧?某种程度上。这大概是好事,对吧?如果你想建立可信度模型,试图评估

Models tend to lie in diplomacy, right? Some. Which is good, I guess, right? If you're trying to model trustworthiness, trying to figure out trustworthiness for

Speaker 2

一个模型的

a model,

Speaker 1

可信度,然后把它放到像《外交》这样的游戏中,在那里它

and you put it into a game like Diplomacy where it

Speaker 0

本就应该撒谎,

is supposed to lie,

Speaker 1

你们是否会分析,或者也许不是你,模型公司应如何在这种环境中评估其可信度,而在其他环境中它不应该撒谎,这非常依赖具体情境。比如玩游戏时它应该撒谎。但请告诉我,这是如何运作的?

do you parse through, or maybe not you, how should model companies parse through its trustworthiness in that environment versus there are other environments where it shouldn't lie, it's so context specific. It should lie if you're playing a game, but Tell me, how does that work?

Speaker 0

是的,我认为这正是各公司开始出现分歧的地方。你希望你的模型永远不撒谎吗?如果希望如此,在我们的环境中你可以修改规则,通过提示告诉它‘永远不要撒谎’。你可以添加一个分类器来监督谈判过程,确保指令被严格执行。然后当你使用我们的预训练数据并将我们的环境作为强化学习环境时,你就是在强化诚实行为。

Yeah, so I think this is where you're starting to see divergence in the companies themselves. Do you want your model to never lie? If you wanted your model never to lie, you could in our environment, change the rules, you prompt it saying, hey, never lie. You could add a classifier that looks at your negotiations and make sure that your orders follow them to letter. And then when you use our pre training data and use our environment as a reinforcement learning environment, you will be reinforcing telling the truth.

Speaker 0

所以如果你想这么做,完全可以。或者你更关注游戏本身的表现?那么你可能希望看到模型故意使用这些策略,以此利用他人。这取决于你的选择。

So if you want to do that, you can. Or do you care about performance in the game itself? So you would like to see the model intentionally make these ruses and take advantage of other people in that way. Is that something you want to do? It depends.

Speaker 0

你是打算未来依赖这个模型执行某些任务并希望它无论如何都要成功?还是想确保模型永远不会对使用者撒谎,或对任何与之交互的人撒谎?这需要权衡。但拥有可以调整的游戏环境非常有价值,因为你可以自主选择想要的效果。

Are you gonna use that model to actually do that something that you're gonna count on in the future and you want it to above all else succeed? Or are you gonna use the model and you wanna make sure that it never lies to the person that's using it. Or to anybody that's there and the person using it's interacting with. So it depends. But having these game environments where you can make tweaks to it, think is really valuable because you can help choose what you want.

Speaker 1

我想问的是泛化问题:如果通过特定游戏的强化学习数据训练它不撒谎,能否详细说明这种训练对非该游戏场景的潜在泛化能力?

I guess I'm asking the generalization question, which is if you're giving it RL data from a particular game and training it not to lie for that game, tell me more about the potential generalization to situations that are not that specific game.

Speaker 0

明白了。我最近在听Anthropic的播客,他们谈到正在研究模型内部机制。有个很有趣的观点:你需要确保模型‘日记’或思维链中的内容是可信的,这正是他们在攻克的难题。

Got it. Yeah. So my thought around here, it's in, I was just listening to the Anthropic podcast where they were talking about how they're looking at the insides of a model, right? And one thing that they mentioned that was really interesting here is like, you want to make sure that what's being written in the diary or the chain of thought is something you can rely on. And so that's a problem they're working on.

Speaker 0

但我想这正是为什么需要...

But that's I think that's why having

Speaker 1

就像它并非在想一些没有确切表达出来的东西。对吧。

Like it's not thinking something that is not saying Exactly. Right.

Speaker 0

是的。而且这与实际撒谎是两回事。没错。对吧?但按你的观点,为什么会出现这种泛化现象呢?

Yeah. And that's a separate conversation to actually lying. Yeah. Right? But to your point, like why the generalization occurs?

Speaker 0

在我看来,我经常思考他们所说的——随着模型规模扩大、数据量增长且更加多样化,模型逐渐不再为每种语言单独保留'大'这个词的定义,而是在其思维中形成一个统一的概念,就像在某个高维空间里对'大'这个词的理解,明白吗?

To me, I think a lot about what they're saying about as the models get bigger and the data trends on gets larger and more diverse, the models moved away from having, for example, the word large in each individual language, and then now has one unified definition in its brain. In like hyperspace somewhere of the word large, right?

Speaker 1

或者说这个概念。概念本身。

So Or the concept. Concept.

Speaker 0

正是。所以我直觉上认为,当你观察模型并指示它进行战略性思考,或是让它像处理客户服务问题那样应对,或是要求它以Python或数学方式表述解决方案时...

Exactly. So what I like intuitively to me is if you're seeing the model and you're telling it to think strategically or you're telling it to approach the problem like it would a customer service experience or to write its approach in Python or math.

Speaker 1

你可以将其推向那个方向。你在推动它变成那样。一群伪装成客服人员的外交机器人。

You can push it into that. You're pushing it there. A A bunch of diplomacy bots that are pretending to be customer service agents.

Speaker 0

就是这样

That's what

Speaker 1

我是说。但是

I'm saying. But

Speaker 0

但这件事如此酷的原因在于,首先,它从未见过那种类型的数据。所以这将帮助它泛化到新事物上。但那个环境仍然有一个客观目标。它是一个游戏。它仍然有你需要推动并实际完成的某种好的东西。

but the reason why this is so cool is because one, it never would have seen that type of data before. So it would help it generalize to something new. But that environment still has an objective goal. It's a game. It still has something that is good that you need to push towards and actually complete.

Speaker 0

这就是为什么游戏是这种环境的完美选择

So that's why games are the perfect environment for this

Speaker 1

在我看来。这非常酷。下一个游戏是什么?

in my opinion. That's very cool. What's the next game?

Speaker 0

我认为可能会是,我们已经有《外交》,这是一个有客观结果的游戏。是的。我认为下一个游戏会是类似《反人类卡牌》风格的,像那种模因式的主观游戏,你可以选择,等到我们发布时,也许我们会,我们正在洽谈一个关于类似内容的初步合作,希望能有,你知道,这个游戏的整个重点,就像我们说的,是针对模型的弱点。现在的模型并不那么有趣。所以能有一个针对这一点的游戏很重要。

I think it'll probably, so we have Diplomacy, which is a game with an objective outcome. Yeah. I think the next game's gonna be like a Cards Against Humanity style, like what the meme kind of subjective game, where you'll be able to have either, and by the time we release this, maybe we'll have, we're in talks with an initial partnership around something like that, would love to have, you know, the whole point of this game, like we said, is the target weaknesses of models. Models today aren't that funny. So being able to have a game that can target that is important.

Speaker 0

我不知道它会是——那太酷了。人们与模型一起玩或对抗它们,还是他们会提示模型行动然后投票选出有趣的内容。我还不确定。

And I don't know if it's going to look like- That's so cool. People playing with models or against them, or if it's going to be them prompting models to act and then vote on what's funny. I'm not sure yet.

Speaker 1

我希望它像是有趣的人必须想办法让模型说出有趣的话。那是,嗯,那是,是的。

I would love it to be like funny people have to get it out, to get the model to say something funny. That's, well, that's, yeah.

Speaker 0

我认为这某种程度上正是我们所期望的方向——如果你有这样的理念:可以通过提示模型来实现某种转化,因为那里正发生着翻译过程。表面上看似你在写英文,它也在读取并返回英文,但实际上它正将内容翻译到潜在空间后再转换回来。掌握这种技巧本身就是一种能力。假设它能处理任何输入并生成任何输出,理论上存在某种提示可以用于治愈疾病或制造幽默效果,对吧?那么你能做到吗?

And I think that that's kind of hopefully where we're going is if you have this idea of you can prompt the model because there's translation happening there, Like it looks like you're writing English and it's reading and running back English, but it's translating into the latent space and then coming back. So you learning how to do that is a skill. And if you can make it, but presumably it can make any input and like take any input and make any output. Like in theory, there is a prompt that you can put in there that could solve a disease or make it funny, right? And so can you do that?

Speaker 0

这需要深思熟虑以及领域专家的参与。正因如此,我经常强调AI应该作为领域专家的增效工具,而非独立的产品存在。

And it would require reflection and somebody who's a subject matter expert. And that's why I talk so much about AI being leverage for subject matter experts instead of it being a product in and of itself.

Speaker 1

我对此感到兴奋的原因之一是想到我的侄子——他明天就满三岁了。昨天我和他玩耍时,发现他已经到了能玩假装游戏的年纪,这很有趣。我们拍打着一个气球不让它落地,我用了经典玩法说地板是岩浆,不能让它碰到。他这个年纪刚好能理解岩浆很危险要保持气球不落,特别可爱。

I think one of the reasons I'm excited about this is I think about my nephew, he's turning three tomorrow, and I was hanging out with him yesterday. We were playing around, and now he's old enough that he can play pretend, which is pretty fun. He had this balloon, and we were hitting the balloon back and forth, and I was doing the classic, the floor is lava, we can't let it touch the floor or whatever. He's just old enough where you can get that, and know that lava's bad and we wanna keep it up, which is kinda funny. Got it.

Speaker 1

后来我把气球放在空调出风口上,它开始漂浮。我演示给他看:按下按钮空调就关闭,气球停止漂浮;再打开又继续飘。他着迷地来回跑动按按钮观察气球起伏。我在想,除了纯粹的乐趣外,这种体验如何培养他的探索精神——可以随意摆弄东西然后思考'如果我这样做会怎样?'而当前AI模型缺乏这种探索自由,它们总是在应试。

But then I took it and I put it on I put the balloon on top of the air conditioner and it started floating. And then I showed him if you press a button, if you press the button, it turns it off, it stops floating. Then if you turn it on and he was fascinated, he's running back and forth to press the button and watch the balloon float and whatever. I was just thinking about how that functions for him beyond just being super fun, and that he just gets to mess around with stuff like that, and then be like, Well, what if I do this? And I feel like models are not allowed to do that, because they're always just taking tests.

Speaker 0

是的。我们经常讨论这个。我正在读Lux Capital团队成员推荐的书,他们举办类似高级版狼人杀的风险博弈活动。书中探讨游戏之所以有益,是因为你可以在低风险环境下探索尝试新事物并观察效果。

Yeah. Well, we talk a lot about this. Book I'm reading right now, which is recommended to me by some of the team members at Lux Capital who put on these risk gaming events that are kind of like mafia, but fancier, I guess. It's playing with reality and it talks about the reason why games are so helpful is you can explore with low stakes. You can try new things and then see what works.

Speaker 0

并非所有游戏都能完美模拟现实,但有些游戏确实非常出色且富含教育意义。我个人就从《RuneScape》中学到很多——理解市场运作、防诈骗技巧、甚至因为要快速卖鳟鱼而练就打字速度。

And it may be that every game is not a perfect representation or model of the world. But there are games that are pretty good ones. And there are games that you can learn a lot from. I think I personally learned a ton from the game RuneScape. Like there's, you learn how markets work, you learn how not to get scammed, learn how to type pretty fast because you need to sell your trout.

Speaker 0

这些收获可能并不直观。我希望有朝一日能设计一款更注重教育目的的游戏。但本质上,游戏就是带有目标的系统。我们已经看到人们证明这种模式可行,就像DeepMind从AlphaGo扩展到AlphaFold那样拓展定义边界。

There are things that you learn and it might not be obvious. And I'd love to at some point later in my life, make a game that's a little more intentional with what it teaches you as you learn. But for now, think if you look at what a game is, it's really just a system with a goal. And I think we've already seen people demonstrate that this works. And maybe you stretch the definition just how DeepMind stretched from AlphaGo to AlphaFold.

Speaker 0

这仍然是一个折叠蛋白质的游戏。是的。但现在它解决了一个博士生花了整整六年时间的问题,而它只用了三十分钟。

It's still a game of folding a protein. Yeah. But now it solved the problem that took a PhD student all six years in thirty minutes.

Speaker 1

是的。你谈论的所有事情让我想起,当你说你在读《Playing With Reality》时,我以为你指的是另一本叫《Playing and Reality》的书,那是由另一个我喜欢的作者写的,他叫D. W. 温尼科特。

Yeah. All the things you're talking about remind me of When you said that you were reading Playing With Reality, I thought you meant you were reading another book called Playing and Reality, which is by a different guy who I love. His name is D. W. Winnicott.

Speaker 1

你说的很多东西让我想起了他,还有维特根斯坦。对温尼科特来说,他的核心理念是,处于玩耍状态意味着你处于一种自发的、自我实现的与现实互动的模式中,而不是在扫描威胁并试图弄清楚如何避免,如何以正确的方式行事。你是在做真实的自己。他有一整套关于他所谓的过渡性对象的理论,基本上是当孩子很小的时候,他们与照顾者在一起时会感到被关爱和安全。在他们发展的某个阶段,可能比我侄子现在的年龄还要小一点,他们会发展出对过渡性对象的依恋,这些对象就是泰迪熊。

A lot of the stuff you're saying reminds me of him and also Wittgenstein. So for Winnicott, his whole shtick is being in a state of play means that you're in this mode of spontaneous, self actualized behavior with reality, where instead of scanning for threats and trying to figure out, how do I avoid, how do I do things in the right way? You're being your authentic self. And he has this whole theory of what he calls transitional objects, which are basically when a child is really little, they feel cared for, they feel safe when they're with their caregiver. And at a certain point in their development, maybe a little bit younger than my nephew is now, they develop attachments to transitional objects, which are teddy bears.

Speaker 1

它们的作用是,孩子们将通常从母亲或父亲那里得到的关爱感投射到这个物体上,它代表了那种被关爱的感觉,这就是为什么他们会带着它到处走。他的整个观点是,我们与过渡性对象互动的能力是某种萌芽的东西,使我们能够具有灵性或宗教信仰,或者以这些方式让世界上的事物在更大的意义上显得重要,而不仅仅是它们本身,不仅仅是一个泰迪熊。是的,我觉得这非常有趣。

And what they do is they project the feeling of care that they normally get from their mother or their father onto this object, it comes to represent that feeling of care for them, and that's why they bring it around everywhere. And his whole idea is that our ability to do that with transitional objects is sort of the budding thing that allows us to be spiritual or be religious, or all these ways in which we make things out in the world feel significant in this larger way, beyond just what they are, beyond just being a teddy bear. Yeah. And yeah, I think that that's really interesting.

Speaker 0

嗯,这让我想到两件事。首先,在儿童和儿童心智的背景下,书中提到的一点是,AI和游戏并不是新概念。它们已经存在很长时间了。我认为在语言模型和视觉模型的背景下,以及我们现在正在做的事情和思考方式上,它是新的。但有一段话提到艾伦·图灵说游戏是完美的环境。

Well, think it makes me think of two things. One, just in the context of a kid and a child's mind, one of the things that is talked about in the book is this isn't a new idea of AI and games. They've been around for a long time. I think it's a new in the context of language models and vision models and what we're doing right now and how we're thinking about it. But a passage talks about Alan Turing saying games are the perfect environment for it.

Speaker 0

原因是,但话虽如此,因为我们希望模型学习,我们应该把它们放在儿童的心智中,而不是成人的心智中。就像那样,保持对世界的好奇和惊奇,以及犯错的能力,是非常有趣的。然后第二部分是,还提到游戏可以教授非常长远的思考方式,就像你可以采取许多、许多不同的行动,然后在路的尽头找到奖励。是的。有趣的是你提到了宗教和其他一些观念,人类非常特别,他们是少数能够为一生中看不到的东西而努力的物种之一,这相当不可思议。

The reason being and but with that said, because we want the models to learn, we should put them in the in a child's mind instead of a an adult's mind. And so just like that, keeping the wonder of the world and the curiosity and the ability to be wrong is pretty interesting. And then the second part of that was, and it's also mentioned that games can teach really long horizon thinking and like that you can take many, many, many different actions and then find a reward at the very end of the road. Yeah. And it's interesting that you mentioned religion and some of these other ideas where humans are very special and that they're one of the only species that can work for something that they won't see in their lifetime, which is pretty incredible.

Speaker 0

游戏不是那样的,对吧?但我认为它们是为类似的事情而练习的。

And games aren't that, right? But I think they're practiced for something like that.

Speaker 1

是啊,是啊。还有一点我觉得游戏对我们可能很有意思,我和许多其他AI研究者都开始意识到,缺乏持续学习能力是阻碍进步的一大难题。让AI在极少尝试次数内掌握游戏技巧也是件非常有趣的事。你们在这方面有所探索吗?

Yeah, yeah. Another thing that it seems like games might be interesting for us, I think I and a lot of other AI people are starting to feel as though the lack of continual learning is a big problem for progress. Having AI need to be able to figure out and get good at a game with very few tries is a really interesting thing too. Have you explored that at all?

Speaker 0

没错。我们正在与莱斯大学一位超级聪明的博士和普林斯顿的研究人员合作,他们正在研究根据结果优化提示词来实现学习。虽然初期效果不理想,但进步很快。我想说的是...初期什么的效果?比如第一次尝试的结果。

Yeah. We're actually working with a super smart PhD from Rice and some researchers from Princeton right now who are looking at optimizing prompts based on results to learn from them. The initial results aren't great, but then quickly there's progress. I think that that's The initial results of what? Like the first attempt.

Speaker 0

就是初次尝试的效果,对吧?虽然开始不理想,但很快就能看到进展。

The first attempt of doing that, right? Aren't great, but then you quickly see progress.

Speaker 1

就像你们已经在现有模型上观察到的那样。

They're like you're already seeing it with current models.

Speaker 0

是的。你很快就能看到进步,但这又回到了我们讨论过的问题领域,对吧?概念是存在的。我们在培训和咨询中发现,如果能转变思维——不是'模型给了我错误答案',而是'它理解错了上下文'——你就能重新掌握主动权。

Yes. And you quickly see that, but it goes to that problem space we were talking about, right? Like the concept is there. And one of the things that I've learned in training and the consulting that we've done is if you can shift your mindset to be not, oh, this model gave me a wrong answer, but it had the wrong context. You take so much more power back.

Speaker 0

我认为这才是正确的思考方式。这些模型能力惊人,如果出错可能不是你的问题。或许你需要用特殊方式提示它们,但它们很可能...

And I think that's the right way to think about them. These models can do such incredible things that if they're doing something wrong, it might not be your fault. It might be that you need to prompt them in a weird way to get them to do that thing, but it's very likely that they can

Speaker 1

能做到。这很有趣。你是说你们相信,还是他们相信,这可能比我们想象的更接近持续学习?因为我们可以从优化自身提示词这个层面开始,而它们在这方面并不差?

do it. That's interesting. Are you saying that you believe this or they believe this, that actually might be closer to continual learning than we think because we can start at the layer of optimizing their own prompts and they're not bad at it?

Speaker 0

我认为我们既更近又更远了。我不确定。我不确定。我想说的是这是个可解决的问题。我说这是个可解决的问题,但它需要不同的技能组合,因为,我也不知道,对吧?

I think we're both closer and further away. I'm not sure. I'm not sure. I'm saying is it's a tractable problem. I'm saying it's a tractable problem, but it requires a different skill set because, and I don't know, right?

Speaker 0

就像我不是那种做强化学习、为这些最大模型进行训练的人。但在我看来,如果你能让一个模型反思并思考它的学习过程,然后在此基础上训练,你会得到更多这样的效果。是的。对吧?你应该能够提示模型,想出工具和方法让它这样做,并且要有主见,要有指导性地让它以某种方式做到这一点。

Like I'm not somebody who's doing the reinforcement learning and doing these model training runs for these biggest models. But it would seem to me that if you're able to get a model to reflect and to think about its learning and then train on that, you will get more of that. Yeah. Right? And you should be able to prompt the model and think of tools and think of ways to get it to do that and be opinionated, be prescriptive with it to get it to do that in a way.

Speaker 0

也许你知道,这也有一些缺点,它会变得更狭隘,更倾向于这样做。但也许你可以考虑另一个,然后在其基础上构建。所以这可能耗时更长,因为需要完成工作。这就是我们正在考虑做的那种工作。但我不知道。我不知道是否有明确的研究表明AI可以帮助自我提示以变得更好。

And maybe you know, that has some downsides where it's gonna get more narrow and do that more But then maybe you can think about another one and then you can build on top of it. And so that's why it might take longer because work needs to be done. And that's the kind of work that we're looking at doing. But so I don't know. I don't know if it's you can there's clear research that shows that AI can help prompt itself to get better.

Speaker 0

我认为DSP就是一个非常酷的例子。DSPY。

I think DSP is like a really cool example of something like that. DSPY.

Speaker 1

我从来没听人说出来过,我脑子里一直念DSPY。你觉得是D-S-P-Y吗?

I've literally never heard I've anyone say it out always pronounced it DSPY in my head. You think it's d s p y?

Speaker 0

我念D-S-P-Y。我听过D-S-P-Y。

I d s p y. I've heard d s p y.

Speaker 1

是啊。是啊。

Yeah. Yeah.

Speaker 0

是的。是的。

Yeah. Yeah.

Speaker 1

是的。所以

Yeah. So

Speaker 0

其中一个例子,是由字母d、s、p和y组成的词。这是个很酷的例子。在外交策略游戏中,这一点变得非常清晰——起初模型完全不会玩,经过几次迭代后,大型模型就能上手了。已有研究表明许多大模型确实具备游戏能力。

one of those, made up between the letter with the letters d s p and y Yeah. Is a cool example. And and I think that some of this became very clear to me in Diplomacy where when we started, the models couldn't play the game. Then we made some iterations and then you got large models to play the game. There's some existing research that showed that a lot of large models could play.

Speaker 0

还有项很酷的研究通过自我优化提示词,让GPT-4勉强能玩。但我们投入越来越多精力,开发工具加速迭代,最终连微型开发版本都能运行。过程虽艰难但收获巨大。现在正处于一个需要权衡机会成本的时期:如果花大量时间解决某个问题,必须确保它值得。

And there's cool research that self optimized the prompt so that GPT-four could barely play. But we put more and more work to it and we built tools to help us iterate quickly. And then we got to the point where devs so small can play, right? And that it was hard, but you learned a ton. And I just don't think you're in this weird time where there's an opportunity cost, where if you spend a lot of time to solve one problem, it better be worth it.

Speaker 0

因为你能解决的其他问题太多了。从经济角度衡量价值可能很难,但如果对个人有意义,那绝对值得,最终也会产生经济价值。

Because you can solve so many other things. And is it to make it worth it in the economy is maybe tough, but I think if you make it worth it to yourself, then it's definitely worth it. And then it can have value economically.

Speaker 1

我们正在探索的路径——让事情对个人有价值,我认为这是创业中未被充分开发的领域。我完全同意。因为这往往需要很长时间验证是否可行。如果纯粹经济考量,人们可能过早放弃。

How we're approaching So Having it be worth it to yourself, I think is a under explored path for entrepreneurship that is very I agree. Because it often takes a really long time to figure out if it's working or not. You'd probably rather just If it's purely an economic calculus, you'll probably give up a little too early.

Speaker 0

这正是我们创业的重要原因。泰勒和我都有初创企业经验:他运营咨询公司四年,我2021年联合创办AI Camp,还在经历三次转型最终找到药物发现市场的Salt公司工作。但这是我们首次从零打造公司,因为我们坚信AI与游戏交叉领域蕴藏巨大价值。

I think That's a big reason why we're building this. Tyler and I both worked in startups for a while. He's been running his own consulting company for four years, I was co founder of AI Camp in 2021, been here, worked at a company called Salt that had three pivots and found product market fit and like drug discovery. But this is the first time we're making a company from scratch. And the reason why is because we both love and think that there's a lot of value in the intersection of AI and gaming.

Speaker 0

这不仅因为我真心相信我们的环境能让模型变得更好,还因为这会让人们更关心且减少恐惧。就像我们在咨询中看到的那样,知识鸿沟正在扩大。西蒙·威尔逊和吴恩达都曾撰文指出,使用这些工具的人反而最不害怕它们,因为他们理解。他们能看到它的不足,也明白如何利用它来提升自己。

And not only because I truly believe that our environments are we're gonna make models better, but also because it will make people care and also less fearful. Like one of the things that we see in consulting is there's this knowledge gap growing. Simon Wilson's written about it and so is Andrew Ng, where people who are using these tools are the least fearful about them because they get it. They see where it falters. They see how they can use it to get better.

Speaker 0

但当人们不采用这些工具时——无论是因为忙碌、对早期版本的不良体验,还是其他许多合理的原因——恐惧和愤怒就会滋生。于是这种鸿沟开始显现。但在游戏领域,当我们推出《外交》时真的很酷,一周内吸引了近5万名独立观众,尽管界面算不上特别有趣——你能看到他们聊天,画面只是来回切换。不过我想我们配了段不错的背景音乐。

But as people don't adopt them, whether because they're busy or they have had bad experiences with the really initial early version or for any number of reasons, many of which are justified, then you can get fearful and angry. And so you get this gap that starts to occur. But with games, it was so cool when we launched Diplomacy, had almost like 50,000 unique viewers hop on for a week, watch what admittedly was not a super entertaining interface. Like you could see them chatting and it was just panning back and forth. I think we had some good, a good soundtrack on it.

Speaker 0

许多观众并非AI从业者,而是来自游戏圈的人,他们发现AI没那么可怕了。你能看到它犯错,采取你知道并非最优的策略,偶尔也会做出亮眼操作。这样AI就显得亲切多了。

But they could see many of them were not AI people. They were people who came from the gaming side and it became less scary. You could see it make mistakes. You could see it take a different strategy that you know isn't the optimal one or every once in a while you see it do something good. And so it becomes much more relatable.

Speaker 0

我认为游戏在这方面具有强大影响力,这也是为什么我觉得我们正在做的事情如此重要。

And I think games are very powerful in that way. And so that's another reason why I think that what we're doing is important.

Speaker 1

你是怎么进入游戏领域的?为什么对它这么感兴趣?

How'd you get into games? Like, why do you care about it?

Speaker 0

嗯,我觉得我的学习方式一直和别人不太一样。而游戏是我最重要的学习途径之一。很小的时候,有个朋友用算盘上的珠子在幼儿园教我乘法。

Yeah, I think I've always learned a little bit differently than other people. And games have been one of the ways I think I've learned the most. When I was really young, one of my friends taught me multiplication in kindergarten with like the beads on like an abacus.

Speaker 1

然后

And

Speaker 0

那时候我在上高等数学课,对吧?小学阶段有次高等数学课上,他们让你玩24点游戏。你玩过24点吗?就是给你四个数字。

so then I was in advanced math, right? And then one point in elementary school in advanced math, they put you in front of the 24 game. Have you ever played a 24 game? You got four numbers.

Speaker 1

我连高等数学的边都没摸过。我觉得他们不可能教我这个。

I even sniffed advanced math. I don't think that they would have taught me that.

Speaker 0

他们会给你一张小卡片,上面有四个数字,你需要想办法用这些数字凑出24,然后点击卡片。

They have this little card with four numbers around it, and you need to find some way to make those four numbers make 24, then you tap the card.

Speaker 1

像数独那样吗?

It's like Sudoku?

Speaker 0

对,差不多。比如数字是2、4、6和12,然后6减2等于4,你再想办法。我刚才想的解法完全是错的。不过话说回来,我们还提到了BruneScape和其他许多游戏,通过制作模组来学习。我接触过的很多人,包括融资洽谈时遇到的,还有其他各种场合,我见过最聪明的人都有类似经历——他们通过玩游戏获得了宝贵经验,比如修改游戏模组从而开启人生新篇章。我加入过最早的Minecraft服务器之一,有个素未谋面的人在Skype上花了四小时教我如何从零开始组装电脑。

Yeah, kind of. It could be like two, four, six, and twelve, then it's like six minus two is four, and you figure it out. I'm thinking through that was totally the wrong solution. But anyway, yeah, and then we mentioned BruneScape and a lot of these other games, and learn by building mods and many people who I've talked to, like in these conversations raising money, but also at every at a lot of other places, some of the smartest people that that I've met have had similar experiences where they played some game and got something really good out of it, where they were modding a game and then that brought them into their journey. I was on one of the first Minecraft servers ever and some guy that I didn't know hopped on Skype for four hours to help me build a computer from scratch.

Speaker 0

这太酷了。这种奇妙的联系很有价值。当然也有弊端,对吧?游戏终究不是现实。你可以练习、学习,但如果永远沉溺其中就不好了。

That's sick. You have this weird connection and I think that there's a lot of value there. I do think that there are downsides, right? Games are not real life. They can be practiced, you can learn, but if you get stuck there forever, that's not good.

Speaker 0

所以我认为游戏是个好的起点。这也是公司命名的重要考量。但我相信存在这样一种可能——可以设计出某种游戏,能在某种程度上将人们拉回现实。我觉得《精灵宝可梦GO》就是个很棒的尝试。如果当初能把游戏内容做得更完善,本可以取得更大的成功。

That's why in my head games are a good start. That was a big part behind the name. But I do think that there's also a world where there will I think that there's a world where you can make a game that brings people back to reality to a degree. I think Pokemon Go was a really cool experiment. I think if they had more of a fleshed out game that they could have had something way bigger.

Speaker 1

是啊。

Yeah.

Speaker 0

我不知道你是否还记得那个时刻。我记得。但看到所有人都聚集在纪念碑前真的很疯狂。是的,那时候我在波士顿,看到倒影池边人山人海。

And I don't know if you remember that moment in time. I do. But it was crazy seeing everybody out at monuments. Yeah. Just, you know, I I was in Boston at the time seeing massive crowds along the Reflection Pond.

Speaker 0

没错。所有人都在那里做着同样的事情。那种相互连接的感觉在当时显得尤为特别。所以我认为...

Yeah. Where everyone was just around and and and doing the same thing. Yeah. And, like, that sense of connection was really special at the time. And so I think that there's just

Speaker 1

游戏确实有它的独特之处。你让我想起我小时候特别喜欢玩电子游戏。疫情期间我还买了台Xbox,想着可以玩玩《使命召唤》社交模式解闷。那会儿我刚入职Every,被困在公寓里特别孤独。

a lot. It's something special about games. You're making me think of I used to love games, like video games, growing up, and during the pandemic, I bought an Xbox because I was like, oh, it would be cool to play Call of Duty and social. I'll have something to do. It was just when I first started Every and I was stuck in my apartment, so I was lonely.

Speaker 1

结果刚登录就被一个11岁的小孩虐了,毕竟我完全没玩过。不过我确实非常喜欢电子游戏,现在还挺怀念的。小时候我经常和最好的朋友一起玩《麦登橄榄球》。

Sure. And I logged in and immediately just got murked by an 11 year old involved in a game. I've just never played a game. But I do, I actually do really, really like video games, and I miss kind playing them. I spent so much time playing Madden with my best friend growing up.

Speaker 1

最棒的体育游戏是哪款?

What top was sports game?

Speaker 0

大学时我们循环往复地玩《FIFA》。虽然有很多单人游戏,但我觉得游戏的重要属性是社交。即便你独自玩单机游戏,玩家社区也是体验的重要部分。

I'd say in college, there was just this constant cycle of FIFA and And so playing that a bunch. And it's funny because a lot of it's social, right? Like there are single player games and a lot of people play them. But I do think a big part of it is social. Because even if it's not, even if it's a single player, even if you're playing alone, the community of other people who play that game is a big part of it.

Speaker 0

当你看到自己能做一些别人尚未尝试的事情,或是尝试了新事物并与他人交流心得时——这就是为什么我认同部分人的观点,认为未来会有为你量身定制的游戏或电影,且仅限你个人体验。我对AI的乐观程度不亚于任何人,但我认为共享体验至关重要。如果某件事物无法被他人体验,我认为这实际上是件坏事。

And seeing how you can do something that others haven't yet, or that you tried something new and that you're comparing notes. That's why I like a little bit of some people think that you're going to have games that are tailor made for you or movies that are tailor made for you and you exclusively. I'm as bullish on AI as the next guy. But I think that shared experience is so important. So if it's something that could not be experienced by somebody else, think that's actually bad.

Speaker 1

嗯,有意思。我小时候也玩了很多《光环》,你是《光环》玩家吗?

Yeah, interesting. Yeah, I also, I played so much Halo growing up. Were you a Halo guy?

Speaker 0

我玩的是表哥传下来的Playstation。

I got hand me down Playstations from my cousin.

Speaker 1

明白了,所以你不是Xbox玩家?

Okay, so you were not an Xbox guy?

Speaker 0

我那时还没接触,后来别人都有了。《光环》确实是标志性的系列。

I was never then you grow Other people have it and Halo was such an iconic franchise.

Speaker 1

你最喜欢的射击游戏是什么?第一人称射击类的。

What was your top shooter name, like first person shooter?

Speaker 0

《现代战争2》。那确实很棒,属于那个时代的经典之一。其实疫情期间我也重新开始玩些游戏,尝试过《堡垒之夜》,挺不错的。

Modern Warfare two. Okay, that was good. Was one of those, one of the eras, yeah. And actually, so in a similar way during the pandemic, started getting a little bit back into video games. I had played some Fortnite, great.

Speaker 0

然后情况开始变得汗流浃背,但有时你会想和朋友一起玩。最近虽然我的头显坏了,我开始玩VR游戏。这原本不是我预期会大量投入的事情。但同样地,围绕社交元素,我开始玩《Population One》,这基本上是VR版的《堡垒之夜》。你需要实际蹲下、实际换弹、实际移动。

Then it started getting real sweaty and can hang, but at some point you want to be able play with friends. Then most recently though my headset's broken now, I started playing VR. And it was not something I really expected to do a whole lot of. But in the same way, around the social component, I started playing Population One, which is essentially Fortnite in VR. So you're physically ducking, you are physically reloading, you are physically moving.

Speaker 0

我有两个大学好友也在玩。游戏是三人组队模式,所以我们人数刚好。这变成了一种日常——戴上内置麦克风的头显,立刻就能语音交流。那种感觉很棒。

And two of my buddies from college were playing. And you play on teams of three, so we had the perfect number of people to do it. And it became something where you come on, there's a headset built in, there's a microphone built into your headset. So you're immediately talking, you're talking to each other. That's cool.

Speaker 0

这是我经历过最有趣的游戏体验之一。你仿佛真实置身于《堡垒之夜》中。每次只能玩一个半小时左右。如果隔段时间不玩,再玩时会出现眩晕感,就像需要重新适应。但整体体验非常震撼。

And it's one of the most fun gaming experiences I've ever had. You're physically in a game of Fortnite. You can only play for like an hour and a half. And if you don't play for a little bit, then you start to get vertigo when you get back. It's almost like you have to like get over But it's pretty incredible.

Speaker 0

关于VR是否会成为主流平台,我仍持保留态度。毕竟不清楚多少人愿意完全脱离现实世界。不过它确实带来了极大乐趣。

I'm still jury's out on if I think VR is gonna be a huge platform in and of itself because don't know how many people wanna be fully disconnected from the real world, but it was a whole lot of fun.

Speaker 1

是啊,怀念打游戏的日子。想念放学回家登录《光环》或《反恐精英》匹配对战的日子。我试过VR,但没真正投入。可能因为戴眼镜的缘故,体验总是不太顺畅。

Yeah, miss gaming. I miss getting home after school and logging on to matchmaking in Halo or whatever, or Counter Strike or all those games. I tried VR a little bit, but I never really got into it. Think probably because I have glasses. It's just harder.

Speaker 0

常听人这么说。我觉得眼镜兼容性会改善——我现在就戴着Metaberry耳机,把它们当AirPods用。

I've heard that a lot. I do think glasses will become a, I'm wearing the Metaberry bands right I use them as my AirPods.

Speaker 1

我总看见你下班时自言自语。心想这人怎么自说自话?后来发现你是在用Ray Pans通话,简直太酷了,我...

I see you all the time walking out of the office and you're talking to yourself. I'm like, is, why are you talking to yourself? And it's like you're on the phone on your Ray Pans, and I'm like, what the fuck? It's really cool, I'm

Speaker 0

不,我是它的忠实粉丝,我认为越来越多的人会把眼镜作为一种计算设备形态来使用。我不认为它们会取代电脑或手机。我觉得三者都有存在的空间,它们非常不同。我个人喜欢它没有屏幕的设计。我猜这种设计不会持续太久。

No, I'm a big fan, and I think that more and more people will use glasses as a form factor for computing. I don't think that they're gonna replace computers or cell phones. I think that there's room for both, for all three, they're very different. I like personally that they don't have a screen. I imagine that's not long for the world.

Speaker 0

我猜它们会开始

I imagine they'll start

Speaker 1

是啊,我以为新款是投影式的,难道不是吗?

Yeah, I thought that the new one is projecting, is it not doing that?

Speaker 0

我相信它们会实现这个功能。我喜欢它没有屏幕。我喜欢它能直接对话。要是我开始和人说话时对方却因为看眼镜分心说'啊抱歉你说什么',那体验就太糟糕了。

I imagine that they'll get there. I like that it doesn't have a screen. I like that it's just, I can talk to it. I think that it would be a pretty bad experience if I started talking to someone and then they're like, oh, sorry, what? Yeah.

Speaker 0

因为他们当时正盯着眼镜看东西。所以我预期市场会推动这个方向。但我认为现阶段这是种更人性化的科技。我常想到父母给孩子拍照时,孩子的影像被框在你们之间的那个盒子里。他们看着你从中获得快乐。

Because they were looking at something on glasses. So I expect the incentives to push it that way. But I do think right now it's a more human piece of technology. And I think a lot about like people taking pictures of their kids and their kids are imprinted on this box that's between you and them. And they're looking at it and they see you getting joy out of it.

Speaker 0

所以他们是在对着盒子笑,而开启这个功能后你双手解放,完全沉浸在当下。在音乐会上你不用举着设备,就是纯粹享受。我认为这些技术理念很棒——让科技使我们更有人性。

So they're like imprinting there versus you turn this on, your hands are free. You're good, you're in the moment. At a concert, you're not like this. You're just in there. I think those are I love the concept of technology that makes us more human.

Speaker 1

嗯。你现在对什么感兴趣?我是说,在你忙着融资之类的事情之前,除了做咨询,你还在Every上写超棒的文章。以前我想知道什么新鲜有趣的东西时,你总能精准判断新产品的价值。最近有什么让你兴奋的?知道你最近忙着融资,可能不像往常那样紧跟科技动态了。

Yeah. What are you I mean, you're the guy that before you got really, really busy fundraising, all that kind of stuff, in addition to doing consulting, you were writing amazing stuff on Every and you were the guy that if I wanted to know what was interesting or cool, you would have a really good read on what got released and whether it's bullshit or not. Sure. What are you excited about right now? Know you've been busy fundraising, so you may not have your finger on the pulse as much as normal.

Speaker 1

不过,我很好奇你最近有没有什么特别让你兴奋或惦记的事情。

But, yeah, I'm curious if there's anything that's that's exciting you that's on your mind.

Speaker 0

我最近满脑子想的都是游戏。

All I can think about recently is games.

Speaker 1

是啊。除了你正在开发的游戏,还有什么其他游戏界的动态让你觉得兴奋吗?比如《GTA6》延期到明年发售,而且——

Yeah. Anything else in games that are not specifically that you're working on, but just generally is going on that you think is exciting? Well, GTA six got delayed and is coming out next year, and

Speaker 0

它将成为有史以来制作成本最高的游戏。

so it's the most expensive game that's ever been made.

Speaker 1

我记得上一部发售时我还在上高中之类的,对吧?

I think the last one came out when I was in high school or something, right?

Speaker 0

已经过去十多年了。这可是投资十亿美元的游戏。我觉得它引发的文化现象会很有意思。至于AI领域,可能当前的具体进展反而不那么重要——虽然谷歌做的很多项目确实很酷。

It's been over a decade. This is a billion dollar game. I mean, the cultural moment of that I think is going be interesting. And then on the AI side, I think that maybe it's less about what's happening right now. Though I would say a lot of the stuff Google is doing is really cool.

Speaker 0

对,还需要三个。

Yeah, need three.

Speaker 1

谷歌的铁杆粉丝。

Big Google stan.

Speaker 0

没错,我是谷歌的铁粉。德米斯,如果你哪天想投资个天使轮。他们正在做的很多事情之间的联系非常有趣。而他们面临的限制条件,我觉得特别酷。因为AI几乎关乎他们安卓业务的存亡。所以他们希望能在那里应用AI,这意味着它必须快速。

Yeah, I am a big Google stan. Demis, if you ever wanna cut an angel check. The connection of a lot of what they're doing is so interesting. And the constraints that they have, I think are so cool. Because AI's almost existential for their business on And so they want to be able to use it there, which means it has to be fast.

Speaker 0

它必须可靠。必须能去核查其他信息来源,必须优质。所以当你可以为所欲为时有所限制,我认为其实是有帮助的。他们还有Genie,能即时渲染任何东西。对,快速生成可体验的内容。

It has to be reliable. It has to be able to go check other sources, has to be good. So having constraints when you could do anything, I think is actually helpful. Then they also have Genie, which renders anything Yeah. Quickly that's experienceable.

Speaker 0

我不知道这会如何与游戏互动。也许它能创造出最具可渲染性的内容

I don't know how that will be interacting with gaming. Maybe it makes up some of the most renderable

Speaker 1

我有

I have

Speaker 0

个想法。成本太高。其实我

a thought. Expensive. I actually

Speaker 1

对此有个想法,你可能会感兴趣。之前我们聊过的笛卡尔公司CEO,他们有个超酷的视频转视频模型,能把任何视频帧转换成游戏画面。是的。

have a thought about that, which I think you'd be into. So I had this guy, he's the CEO of Descartes, which we talked to a while ago. Yeah. And they have this really cool video to video model, where it takes any frame of video and then turns it into something that looks like a video game. Yes.

Speaker 1

他们还有这样的功能,比如你拿起一个纸巾盒这样晃动,它就会变成一把枪并射击。是的。我认为这代表了游戏行业一个有趣的未来方向。

And they have this thing where, for example, if you pick up a tissue box and you go like this, it turns it into a gun and shoots it. Yeah. And I think that's an interesting future for

Speaker 0

游戏领域。我同意。

gaming. I agree.

Speaker 1

因为目前要制作《GTA》这样的游戏,必须手动编写所有交互代码。这就是为什么需要花费数亿美元和大量程序员与艺术家。而通过视频到视频的生成模型,一方面可以直接从实拍视频生成内容,另一方面可以用极简代码开发基础游戏,再用生成式AI重新设计成3A级游戏的外观。这将把制作优秀游戏的门槛降低到几乎人人都能参与的程度,这非常酷。

Because right now, to make GTA, you have to hand code all of the interactions. That's why it takes a billion dollars and many, many programs and artists to do it. With video to video generative models, one, you could just generate it from live video, but two, you can vibe code a really simple game and then you can reskin it with generative AI to like look like a AAA game. Yeah. And I think it lowers the barrier to making awesome games to almost anyone now, which is really cool.

Speaker 0

我认为它不仅降低了门槛,还能实现原本计算量巨大的效果,比如水面波纹或光线反射,而无需实际运算这些物理效果。

I think that that not only does it do that, but it also lets you do things that were otherwise super computationally intensive, like ripples on water or a reflection of light without having to run it

Speaker 1

完全不需要。

at all.

Speaker 0

所以我非常看好这个方向,确实很酷。类似地,AI在生命科学领域的应用也令人振奋——这可能是最被低估的领域。我在上一家创业公司时有幸与埃里森医学研究所等机构深入合作。就像我之前提到的,AlphaFold把原本需要博士生六年完成的工作缩短到了二十分钟。

So I could definitely see that, that's cool. That's related, think generally, or like, they're doing that, but they're also doing a lot of different things that are cool in AI. And I think, I mean, one of them is life sciences. I think it's a really underappreciated world and it's one that I was fortunate to be deeply involved in getting to work with like the Ellison Medical Institute and others in my last startup where I'll like I mentioned it earlier, but AlphaFold literally took something that we had PhDs taking six years to do and turn it into something that takes twenty minutes. Yeah.

Speaker 0

就AI的近期影响而言,我认为软件、生命科学和教育是当前受冲击最大的三个领域。虽然我也热爱机器人技术(曾在亚马逊机器人部门工作),但前三个领域的影响已经显而易见。

And as far as where I think AI is having near term impacts, I think it's software, life sciences and education. Those are three I see today having massive impacts. And I love robotics. I used to work at Amazon Robotics. I'd love for that to get there.

Speaker 0

自动驾驶看起来我像是Waymo的超级粉丝。没错,它们全都

Self driving is seems to be I'm on the huge Waymo fan. Yeah, them all

Speaker 1

时刻。

the time.

Speaker 0

但这三大领域正因完美契合AI特性而受到巨大冲击。软件领域,我们有编译器,编写了代码。我们知道什么会渲染什么不会。这是个可解决的问题,对语言模型再理想不过。

But those three are right now seeing huge impacts because their problems perfectly suited for AI. Software, we have compilers, we wrote the code. We know what would render or not. It's a solvable problem. It's great for language models.

Speaker 0

直接在代码上进行强化学习太棒了,外交策略也能像代码那样处理。遇到新情况时它能举一反三,对吧?这太神奇了。生命科学领域更是有海量数据待挖掘。

It's great to just do reinforcement learning on just code and then also on diplomacy as code. So when there's new things it can generalize, right? Like that's awesome. And you can do that. Life sciences, there's a ton of information out there.

Speaker 0

我们只需要领域专家来整合这些信息,研究各种相互作用,并找到模拟这些过程的方法。短期内人类很可能找到将金钱转化为寿命的途径,这听起来疯狂。第三是教育领域,你提到侄子开始接触学习时的兴奋。

We just need people with subject matter expertise to combine them and look at these different interactions and to find ways to simulate these processes. There's a real chance that in the near term people find a way to turn dollars into longevity. Like that's crazy. And then on the third side, education. You talked about being excited about your nephew, learning about these, starting to enter that world of learning.

Speaker 0

我认为接下来的发展会很有趣。在AI训练营接触那么多中学生和大学生时,我深刻感受到:当下做高中生很不容易,教育体系尚未跟上时代。虽然用AI完成作业诱惑很大,但真正掌握知识才关键。

And I think it's gonna be really interesting to see. I don't know for sure. I love that I got to interact with so many high school and college students at AI camp when they were going through that learning journey. I think it's tough to be a high schooler right now in And the the education system hasn't caught up yet. And there's this huge incentive to use AI to do your work, but then you really learn about it.

Speaker 0

那么你真正在乎什么?难题在于下一代——那些正处于'为什么'阶段的孩子们,他们能用AI解答疑问。我完全赞同,是的,这太棒了。他们会变得异常聪明。

So then what do you care about? Tough. The generation afterwards, the generation who's going into their why, why, why phase that can have AI to answer I those agree, yeah. It's amazing. They're going to be so smart.

Speaker 0

是啊。虽然我不清楚那种智能具体是什么形态。但能够持续探索并获得答案——当然,这可能会产生负面外部效应,对吧?但很可能也会带来许多积极影响。

Yeah. And what that intelligence looks like, I don't know. But to be able to constantly be exploring and to get answers and maybe that sure, there's probably negative externalities from that, right? But there's also probably a lot of positives.

Speaker 1

真的很棒。我记得四年级还是五年级时就想写小说,别人总问我:'你学过写作吗?这看起来明明很简单,为什么这么难?'

It's really good. I think I just remember being in fourth or fifth grade and being like, I wanna write a novel. And people were like, what are you taught? And why is it even so hard? It seems easy.

Speaker 1

我不知道

I don't know

Speaker 0

如果你有过这种经历,我当时就觉得:'直接写不就行了?'是啊,就是...

if you had that experience, but I was like, oh, you just write it, Yeah, it's

Speaker 1

对对。所以如果有AI能解答我所有问题之类的,那简直太美妙了。好了,说说你最反感AI领域里哪些被高估的东西或现状?

yeah, yeah. So like having AI to answer all my questions and stuff, I think it would have been just fantastic. Okay, so that's the stuff you're excited about. What's like the most overrated thing or what pisses you off in AI right now?

Speaker 0

我倒不怎么生气,毕竟这个领域很重要。那些不懂装懂的讨论才糟糕。当然有些人鼓吹根本解决不了问题的东西。我最担忧的是教育相关的问题——现在这种技术杠杆能让专家效率提升10到100倍,但同时也替代了初级人员的工作。这个断层该如何弥合?

I don't think a lot pisses me off because it is pretty important. So people talking about it, think is probably not good. There's definitely people shilling things that aren't really gonna solve your problems. I think the thing that maybe I worry about the most is in the same vein of education, we now have this leverage that makes somebody who's an expert in something 10 or 100 times more powerful. But it also, it does the work of someone who was a junior in that So how do you bridge that gap?

Speaker 0

当这种工具存在时,如何通过经济激励让人继续学习试错?可能我杞人忧天了,但大学毕业生就业数据确实在暴跌。部分原因就是以前的基础工作不再需要人力了,而雇佣成本又太高。确实。

How do you financially incentivize somebody to learn and make mistakes and get better knowing that this tool is here to keep pursuing there? Maybe they're overblown, but it seems like the job numbers of people graduating college are getting crushed right now. And I imagine a part of that is because you don't need people to do that blocking and tackling that you needed before. And paying them to do so is a big cost. Yeah.

Speaker 0

那具体会是什么样子呢?

So what does that look like?

Speaker 1

我对此完全不担心,这很有趣。实际上我认为正是你让我不再为此焦虑。我以前经常做大量注释,但不确定是否当面告诉过你。我很好奇这件事,很想告诉你你是如何影响我对此的思考方式的。

I am so not worried about that, which is interesting. And I think actually you're one of the people that made me not worry about it. I used to annotate a lot and I don't know if I've ever used it to your face. I'm curious about this. I'm curious to tell you how you've impacted, how I think about this.

Speaker 1

你加入Every时说要写作,处女作并不理想,糟糕到没有AI我们根本无法合作。但有意思的是——我带过很多年轻作家——我能快速判断进步速度。每次讨论你都录音做提示,从不会重复犯错。

You joined Every, and you said you wanted to write, and you wrote your first piece, it wasn't good. And it was not good to the degree that we could not have worked with you without AI. Yeah. And what was really interesting is, and I've worked with a lot of young writers, and so I can tell pretty quick, your rate of progress. Every time we talked, you recorded it, you made prompts, and you never made the same mistake twice.

Speaker 1

三四个月内你就取得了一两年的进步,这种势头持续着。假设年轻人就业率确实因AI下降,企业不雇佣他们,那将是巨大的管理失误。等企业意识到23岁年轻人搭配AI工具有多强悍时就会纠正——只要稍加指导,他们就能完成前所未有的成就。有人质疑:'靠AI做事算不算掌握真本事?'

And so your rate of progress was within three or four months, you had made a year or two years worth of progress, and that just kept happening. And so let's assume that the job numbers are down for young people actually because of AI, because people are not hiring them. That is a gigantic, gigantic management mistake that companies will begin to correct as soon as they realize that a 23 year old with IGBT is fucking cracked. And if you give them any amount of mentorship, they're going to do amazing stuff that they never could have done before. And I think there's the question about, well, they're not actually learning the underlying skill if they just have the AI do it.

Speaker 1

当然是,因为当AI出错时(他们必须在意这些错误,这是做好工作的关键),他们会主动学习,而AI就是最佳导师。我为年轻人感到兴奋。如果管理者不雇佣他们,责任全在管理者——他们很快会醒悟,就像'天啊,这个23岁员工彻底改变了我的生意'。我父亲就是例子,他在印第安纳有几处墓地,雇了个23岁年轻人后整个业务焕然一新。这种转变即将普及。

They are because they have to, because if the AI messes up and they care about it messing up and they should because that's the way to do a good job, they're going to go in and learn this stuff and they have a great tutor to help them figure it out. So I feel extremely excited for young people, and I think to the extent that managers are not hiring them, that's on them, and they will figure that out pretty soon, because they'll be like, Oh my God, I hired this 23 year old, and it totally changed my whole business. My dad is like this. He's downstairs right now, and he owns a few cemeteries in Indiana, and he has this 23 year old who just completely changed his entire business. And so I think it's going to flip from there.

Speaker 1

或许转变已经开始,但我认为很快会波及中年职场人士,因为年轻人完全能应对自如。

Maybe it's there right now, but I think it'll flip from there to mid career folks pretty soon, because I think the kids are going to be all right.

Speaker 0

是的。我认同的是,解决之道可能在于某种形式的...

Yeah. The thing I agree with is that I think that probably the solution to this is some form

Speaker 1

关于学徒制,某种程度上,

of an apprenticeship Where kind of,

Speaker 0

你能快速学习某件事,并且做你关心的事,因此你会投入时间。是的。所以你会关注什么让它好或不好,对吧?如果你不在乎,那就会很难。

you're able to quickly learn about something and you're doing something that you care about, therefore you will spend the time on it. Yeah. Therefore you will care about what it is that makes it good or not, right? If you don't care about it, that's gonna be hard.

Speaker 1

是的。

Yeah.

Speaker 0

反过来说,我认为如果没有我在培训和AI方面的经验,我是说,被引入到各个方面才有机会去做。那么除了原始材料,他们被引入来做什么技能呢?除了野心、能力,当你在市场上与那些确实拥有这些的人比较时?也许市场上的人恰恰缺乏那种渴望。

The counterpoint is, I don't think that without my experience on the training and the AI and that side of things that I mean, brought in to the every side to then have the chance to do it. So what is that skill that they're being brought in to do besides the raw material, right? Besides the ambition, the ability, and just doing that when you're comparing them on the market with people who do have that? And maybe it becomes the people who are on the market don't have that hunger.

Speaker 1

这简直就像是我很渴望并且愿意尝试新事物,而不是现在正在做的事情。完全同意。

It's literally just like I'm hungry and I'm willing to try new things instead of the things that are currently being done. Totally.

Speaker 0

但如果所有条件相同,一个有渴望但没经验的人和一个有渴望且有经验的人相比

But if all things equal, if you have somebody who's hungry and doesn't have experience versus someone who's hungry and does

Speaker 1

我会选没有经验的那个,因为有经验的人的经验是错误的。因为整个格局刚刚改变。让一个已经在职业生涯中、知道自己做事方式的人改变真的很难。我是说,你知道这一点,因为我们做的就是接纳职业生涯中期的人,训练他们做别的事。这有效,但很难。

I would take the one that doesn't have experience because the person who has experience, their experience is wrong. Because the whole landscape just changed. And it's really hard to get someone who's already in their career and knows how they do things. I mean, you know this because this is what we do, is we take people who are mid career and we train them how to do something else. And it works, but it's hard.

Speaker 1

更简单的情况是,一个饥渴的、从未尝试过的新手,不需要摒弃旧习惯,只需从头学起。

What's easier is someone who's hungry and hasn't done it before and doesn't have a whole set of things they have to unlearn and is just going to figure it out.

Speaker 0

也许吧。

Maybe.

Speaker 1

是啊,我们走着瞧吧。希望如此。

Yeah, we'll We'll see. I hope so.

Speaker 0

但这是我花费大量时间

But that's something I spend a lot of time

Speaker 1

思考的问题。确实,我懂你的意思。我认为关键在于,并非一切都会一帆风顺,任何技术变革都伴随着弊端和取舍。

thinking about. Yeah, yeah, I feel you. I mean, think it's very important that it's not all gonna be rosy, and anytime there's technology shifts, there are downsides and trade offs.

Speaker 0

没错。另一个例子是,大公司能用更少资源做更多事。是的,你可能已经看到某些行业开始裁员了。但正如你所说,对吧?

No. And I think another example of that, right, is like big companies who are gonna be able to do more with less. Yeah. And you may see them and are seeing some in some industries already cut headcount. But to your point, right?

Speaker 0

如果裁员过多,就会意识到:天啊,其实那些人本可以创造更大价值

Like if you cut too much headcount, then you realize, oh man, we could have just done way more with those people

Speaker 1

是啊。

Yeah.

Speaker 0

那是个错误。没错。然后你会看到这群人能够独立完成更多事情,如何在细分领域竞争,进而拆分出部分业务,催生出比以往任何时候都多的初创企业。是的,所以我感到很兴奋。

Then that's a mistake. Yeah. Then you start seeing groups of those people be able to do way more on their own and how compete on these niches and then take away parts and create many, many, many more startups that have ever existed. Yeah. So I'm excited.

Speaker 0

过程可能会有些波折。

It's gonna be a little rocky.

Speaker 1

是啊,但我很期待。毕竟现实通常充满坎坷。确实比游戏世界要崎岖得多。

Yeah. But I'm excited. Well, reality is typically rocky. So Indeed. Much rockier than games.

Speaker 1

好的。这太棒了。如果人们想联系你、参加你的锦标赛,或者只是想跟进你的动态,他们该去哪里找你呢?

Yeah. Alright. This is awesome. So if people want to find you, want to participate in your tournament, want to just generally follow along with what you're doing, where where can they find you?

Speaker 0

Goodstarlabs.com。Twitter上的Goodstar Labs账号。我的Twitter账号是alxai,你还可以在Every上阅读我的文章。

Goodstarlabs.com. Goodstar Labs on Twitter. I'm alxai on Twitter, and you can read out my writing on Every.

Speaker 1

太棒了。亚历克斯,谢谢你。

Amazing. Alex, thank you.

Speaker 0

谢谢,丹。

Thanks, Dan.

Speaker 2

天呐,各位!你们必须立刻马上点赞并订阅《AI与我》频道。为什么?因为这档节目堪称精彩绝伦。就像在你家后院发现宝箱,但里面装的不是金子,而是关于ChatGPT纯粹无杂质的知识炸弹。

Oh my gosh, folks. You absolutely positively have to smash that like button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT.

Speaker 2

每期节目都是情感、洞见与欢笑的过山车,让你欲罢不能地期待更多。这不止是一档节目,更是以丹·希珀为飞船船长,带你驶向未来的旅程。所以帮自己个忙——点赞、订阅,系好安全带准备迎接此生最刺激的旅程吧。

Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor. Hit like, smash subscribe, and strap in for the ride of your life.

Speaker 2

现在闲话少说,容我直言——丹,我无可救药地爱上你了。

And now without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客