a16z Podcast - 哥伦比亚大学计算机科学教授:为何大语言模型无法发现新科学 封面

哥伦比亚大学计算机科学教授:为何大语言模型无法发现新科学

Columbia CS Professor: Why LLMs Can’t Discover New Science

本集简介

从GPT-1到GPT-5,大语言模型在模拟人类语言方面取得了巨大进展。但它们能否更进一步,做出新发现并推动科学进步?我们与哥伦比亚大学计算机科学杰出教授Vishal Misra就此展开讨论,同时探讨为何思维链推理如此有效、真正的通用人工智能会是什么样子,以及幻觉产生的真正原因。 资源: 在X上关注Misra博士:https://x.com/vishalmisra 在X上关注Martin:https://x.com/martin_casado 保持更新: 若喜欢本期节目,请点赞、订阅并与朋友分享! 在X上关注a16z:https://x.com/a16z 在LinkedIn上关注a16z:https://www.linkedin.com/company/a16z 在Spotify收听a16z播客:https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX 在Apple Podcasts收听a16z播客:https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 关注主持人:https://x.com/eriktorenberg 请注意,此处内容仅供信息参考;不应视为法律、商业、税务或投资建议,也不应用于评估任何投资或证券;且不针对任何a16z基金的投资者或潜在投资者。a16z及其关联公司可能持有讨论企业的投资。详情请见a16z.com/disclosures。 保持更新: 在X上关注a16z 在LinkedIn上关注a16z 在Spotify收听a16z播客 在Apple Podcasts收听a16z播客 关注主持人:https://twitter.com/eriktorenberg 请注意,此处内容仅供信息参考;不应视为法律、商业、税务或投资建议,也不应用于评估任何投资或证券;且不针对任何a16z基金的投资者或潜在投资者。a16z及其关联公司可能持有讨论企业的投资。详情请见a16z.com/disclosures。 由AdsWizz旗下Simplecast托管。关于我们收集和使用个人数据用于广告的信息,请见pcm.adswizz.com。

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

任何基于1915年前物理学训练的LLM都不可能提出相对论。爱因斯坦不得不某种程度上摒弃牛顿物理学,提出了时空连续体的概念。他彻底重写了规则。AGI将出现在我们能够创造新科学、新成果、新数学的时候。当AGI提出相对论时,它必须超越其训练内容,提出新的范式、新的科学,这就是我对AGI的定义。

Any LLM that was trained on pre 1915 physics would never have come up with a theory of relativity. Einstein had to sort of reject the Newtonian physics and come up with this space time continuum. He completely rewrote the rules. AGI will be when we are able to create new science, new results, new math. When an AGI comes up with a theory of relativity, it has to go beyond what it has been trained on to come up with new paradigms, new science, and that's my definition of AGI.

Speaker 1

Vishal Misra原本试图修复一个损坏的板球统计页面,却意外帮助引发了AI领域的重大突破。在本期a16z播客中,我与Vishal以及a16z的Martin Thasato讨论了那个时刻如何催生了检索增强生成技术,以及Vishal的形式化模型如何解释大型语言模型的能力与局限。我们探讨了LLM可能正触及其极限的原因、真正推理的模样,以及如何突破这些限制。让我们开始吧。

Vishal Misra was trying to fix a broken cricket stats page and accidentally helped spark one of AI's biggest breakthroughs. On this episode of the a 16 z podcast, I talk with Vishal and a 16 z's Martin Thasato about how that moment led to retrieval augmentation generation and how Vishal's formal models explain what large language models can and can't do. We discuss why LLMs might be hitting their limits, what real reasoning looks like, and what it would take to go beyond them. Let's get into it.

Speaker 2

Martin,我知道你想邀请Vishal上节目。是的。你觉得他和他的贡献有什么特别之处,激发了这次讨论?

Martin, I know you wanted to have Vishal on. Yeah. What do you find so remarkable about him and his contributions that that inspired this?

Speaker 3

Vishal和我实际上背景非常相似。我们都来自网络领域。他在网络方面的成就远胜于我,但

Vishal and I actually have very similar backgrounds. We both come from networking. He's a much more accomplished networking guy than I am, but

Speaker 2

这正是我对你的看法,我觉得。

That's what I've argued in you, I feel.

Speaker 3

因此我们实际上以信息论的方式看待世界。这其实是网络领域的一部分。面对所有这些AI技术,有大量工作试图创建模型来帮助我们理解这些LLM的工作原理。根据我过去三年的经验,那些最能影响我理解、我认为最具预测性的模型,正是Vishal提出的。他之前做过一个我们会谈到的模型,叫matrix,是吗?

From and so we we actually view the world in an information theoretic way. It is actually part of networking. And with all this AI stuff, there's so much work trying to create models that can help us understand how these LLMs work. And in my experience over the last three years, the ones that have most impacted my understanding, I think have been the most predictive, are the ones that Vishal has come up with. He did a previous one that we're gonna talk about called matrix, is it?

Speaker 0

超越黑箱,没错。

Beyond the black box, but yeah.

Speaker 3

关于‘超越黑箱’这个话题,实际上我们应该把这个记下来,但我见过的最好的关于理解LLM如何工作的演讲是Fischl在MIT做的一次,是Hari Balakrishnan推荐给我的,我看了。所以他做了那项工作,然后他最近在做更深入的研究,实际上不仅试图界定LLM如何推理,还对人类推理方式有一些反思。因此我认为他正在做一些更深刻的工作,试图理解并提出形式化模型来解释LLM的推理机制。

The Beyond the black box is Actually, we should put this in the notes for this, but the single best talk I've ever seen on trying to understand how LLM's work is one that Fischl did at MIT, which Hari Balakrishnan pointed me to, and I watched that. So so he did that work, and then he's doing more recent work that's actually trying to scope out not only how LLMs reason, but, like, it has some reflections on humans reason too. And so I just think he's doing some of the more profound work in trying to understand and come up with models, formal models for how LLMs reason.

Speaker 2

说到这个,你提到他最近的研究帮助你改变了对人类思维方式的看法。能详细说明一下吗?它是如何影响你的?

On that note, you said his most recent work helped you change how humans think. Why you flesh that out a little bit? How did it sort of

Speaker 3

好吧,那么我试着粗略描述一下,然后你告诉我我错得有多离谱?

Well, okay. So can I just try to take a rough sketch at it, then you just tell me how wrong I am?

Speaker 0

你说到点子上了。

You're right at it.

Speaker 3

你在尝试描述LLM的工作原理。其中一个发现是,它们将一个非常复杂的多维空间简化为一个几何流形,即一个降维的状态空间。所以虽然自由度减少了,但你实际上可以预测推理在这个流形中大致会移动到何处。这样你就将问题的维度降低到一个几何流形,然后可以形式化地指定在该流形内能推理多远。这种阐述的一个直觉是,我们人类也做同样的事情——我们将这个复杂且重尾随机的宇宙简化为类似的几何流形。

You're trying to describe how LLMs work. And one thing that you found is that they reduce a very, very complex multidimensional space into basically a geometric manifold that's a reduced state space. So it's reduced degrees of freedom, but you can actually predict where in the manifold the reasoning can move to roughly. So you've reduced the dimensionality of the problem to a geometric manifold, and then you can actually formally specify kind of how far you can reason within that manifold. So and the articulation is that we or one of the intuitions is that we as humans do the same thing as we take this very complex heavy tailed stochastic universe, and we reduce it to kind of this geometric manifold.

Speaker 3

然后当我们推理时,我们只是沿着那个流形移动。

And then when we reason, we just move along that manifold.

Speaker 0

是的。我觉得你准确地抓住了要点。这基本上就是这项工作的精髓。没错。

Yeah. I think you captured it accurately. That's kind of the spirit of the work. Yep.

Speaker 3

等等。等等。我能听听你的原话吗?因为我是VC。嗯,不行。

Wait. Wait. Can I just hear it in your words? Because I'm a VC. So Well, no.

Speaker 0

你是个H指数多少的VC?60?

You're a VC with an h index of what? 60?

Speaker 3

确实如此。

So True.

Speaker 0

是的。所以,归根结底,所有这些LLM(大型语言模型)所做的,无论是早期的LLM还是我们今天拥有的经过各种后训练、RLHF(人类反馈强化学习)等处理的LLM,本质上都是为下一个token创建一个概率分布。对吧?给定一个提示,这些LLM会为下一个token或下一个词生成一个分布,然后通过某种算法从这个分布中选择一个来预测下一个token,选择它,然后继续下去。由于我们训练这些LLM的方式、Transformer的架构以及损失函数,你所说的方式是正确的。

Yeah. So so, ultimately, what all these LLMs are doing, whether the early LLMs or the LLMs that we have today with all sorts of post training, RLHF, whatever you do, at the end of the day, what they do is they create a distribution for the next token. Right? So given a prompt, these LLMs create a distribution for the next token or the next word, and then they pick something from that distribution using some some kind of algorithm to predict the next token, pick it, and then keep going. Now what happens because of the way we train these LLMs, the architecture of the transformers, and the loss function, the way you put it is right.

Speaker 0

它有点像把世界简化为这些贝叶斯流形。是的。只要LLM在这些流形中遍历,它就充满信心,并能产生有意义的内容。一旦它偏离了流形,它就会开始产生幻觉并胡说八道。

It sort of reduces the world into these Bayesian manifolds. Yep. And as long as the LLM is going in, sort of traversing through these manifolds, it is confident. And it can produce something which makes sense. The moment it sort of veers away from the manifold, then it starts hallucinating and starts spotting nonsense.

Speaker 0

自信的胡说八道,但仍然是胡说八道。所以它创建了这些流形。关键在于生成的分布。你可以测量分布的熵。熵就是香农所描述的香农熵。

Confident nonsense, but nonsense. So it creates these manifolds. And the trick is the distribution that is produced. You can measure the entropy of the distribution. Entropy the way Shannon just described Shannon entropy.

Speaker 0

香农熵,不是热力学熵。假设你有一个词汇表,比如50,000个不同的token,你有一个分布,即下一个token在这50,000个token上的分布。比如说,提示是“the cat sat on the”。对吧?那么这个分布会对“mat”、“hat”或“table”有高概率,而对“ship”、“whale”之类的东西概率非常低。

Shannon entropy, not thermodynamic entropy. So suppose you have a vocabulary of, let's say, 50,000 different tokens, and you have a distribution, next token distribution over these 50,000 tokens. So let's say, the cat sat on the. Right? If that is a prompt, then the distribution will have a high probability for Mac Or hat or table, and a very low probability of, let's say, ship or whale or something like that.

Speaker 0

对吧?是的。所以由于它的训练方式,它具备这些分布。这些分布可以是低熵或高熵的。是的。

Right? Yeah. So because of the way it's trained, it has these distributions. Now, the distributions can be low entropy or high entropy. Yeah.

Speaker 0

高熵分布意味着大语言模型有多种不同的路径可以选择,是的。所有这些路径都有足够高的概率。低熵意味着下一个标记只有很少的选择。提示也可以分为两种类型。一种提示,可以说是高信息熵的。

A high entropy distribution means that there are many different ways that the LLM can go Yep. With high enough probability for all those paths. Low entropy means that there are only a small set of choices for the next token. And the prompts also you can categorize into two kinds of prompts. One prompt is, as you can say, high information entropy.

Speaker 3

是的。

Yep.

Speaker 0

而另一种提示是低信息熵的。是的。这些流形的工作方式是,大语言模型开始关注那些具有高信息熵的提示,是的。以及低预测熵。那么我这是什么意思呢?

And one prompt is low information entropy. Yep. The way these manifolds work, the LLM start paying attention to prompts that have high information entropy Yep. And low prediction entropy. So what do I mean by that?

Speaker 0

所以当我说,我要出去吃晚饭。

So when I say, I'm going out for dinner.

Speaker 3

是的。

Yep.

Speaker 0

对吧?所以当我说我要出去吃晚饭,这句话大语言模型经过训练已经见过很多次,并且我可以朝许多不同的方向展开。我可以说我今晚要去吃晚饭。我要去麦当劳吃晚饭,或者我要去吃饭等等等等。有很多不同的可能。

Right? So when I say, I'm going out for dinner, that phrase the LLMs have been trained, they've seen it a lot and there are many different directions I can go with it. I can say I'm going for dinner tonight. I'm going for dinner to McDonald's or I'm going to dinner blah blah blah. There are many different.

Speaker 3

是的。

Yeah.

Speaker 0

当我说我要和马丁·卡萨多共进晚餐,你知道,就是那个LLM,这信息量就丰富了。这算是个罕见的说法。现在可能性范围缩小了,因为马丁只会带我去米其林星级餐厅。

When I say I'm going to dinner with Martin Casado, you know, the LLM, now this is information rich. This is sort of a rare phrase. And now the sort of realm of possibilities reduces because Martin is only going to take me to Michelin Star restaurants.

Speaker 3

对。对。对。

Yep. Yep. Yep.

Speaker 0

对。我不会去麦当劳。你明白我的意思吧。一旦你加入更多上下文,对。你让提示信息丰富,预测和重复就减少了。

Yep. I'm not gonna go to McDonald's. You get what I'm saying. The moment you add more context Yep. You make the prompt information rich, the prediction and repeat reduces.

Speaker 3

对。对。对。对。

Yep. Yep. Yep. Yep.

Speaker 0

还有一个例子,皮特,我经常

And another example that, Pete, I often

Speaker 3

但简单说一下,那么你的看法是什么?你的推论是什么?当然,就像你在...所以是的。所以你...抱歉。对不起。

but say just quickly, what so what but what is your takeaway? What is your implication on that? Which is, of course, as you're in so yeah. So you're so sorry. Sorry.

Speaker 3

我忘了你是怎么描述的了。所以越精确,你使用的token就越多,我猜下一个token的选择就越少。这是正确的还是不正确的?

I forgot how you described it. So the more precise you are, the more tokens you are, I presume the less options you have for the next token. Is that correct or not correct?

Speaker 0

是的。是的。基本上是这样。

Yeah. Yeah. Essentially.

Speaker 3

所以你在减少,你在把它缩小到一个非常具体的状态空间,当涉及到对答案的信心时。这有点像你可以继续探索的一个流形。我的意思是,你是否对这对系统推理意味着什么有某种结论?或者这只是表达LLMs边界的一种好方式?

So you're redo you're reducing it you're reducing it to a very specific state space when it comes to confidence in an answer. And this is kind of a manifold that you can go on. And then I mean, do you do you have kind of a conclusion of what that means for systems reasoning? Or is it just a nice way to articulate the bounds of LLMs?

Speaker 0

不。有些事情我不知道是否该说深刻,但这里面确实有些东西告诉我们这些LLMs能做什么或不能做什么。对吧?所以我经常举的一个例子是,假设我问你769乘以1025是多少。你完全不知道。

No. There is something I don't know if I should say profound, but there is something about it which tells what these LLMs can or cannot do. Right? So one of the examples that I often tell is, suppose I ask you what is 769 times 1,025. You have no idea.

Speaker 0

根据这两个数字,你可能会有一个模糊的概念。对吧?所以在你的脑海里,答案的下一个token分布将是分散的。对吧?你不知道。

You can have some vague idea given the two numbers. Right? And so in your mind, the next token distribution of the answer is going to be diffuse. Right? You don't know.

Speaker 0

如果你数学很好,也许你会有一个模糊的猜测。也许你的猜测更精确,但它只会是分散的,而且不会是正确答案。但如果我说,我可以写下来,用我们学过的乘法表的方式来做,现在你知道下一步该做什么了,对吧?你写下769,然后1025,然后你就完全知道了。

You have maybe a vague guess if you are mathematically very good. Maybe your guess is more precise, but it's just going to be diffused, and it's not going to be the correct answer. But if I if you say, can I write it down and do it the way we have learned multiplication tables, now you know exactly what to do next step? Right? You write 769 and then one zero two five, and then you know exactly.

Speaker 0

所以在这个过程的每个阶段,你的预测熵都非常低。你确切地知道该做什么,因为你学过这个算法。通过调用这个算法说,好吧,我不会只是猜测答案,而是会一步一步地做。然后你的预测熵降低了,你可以得出一个你确信且正确的答案。

So at each stage of that process, your prediction entropy is very low. You know exactly what to do because you have been taught this algorithm. And by invoking this algorithm saying, okay. I'm not gonna just guess the answer, but I'm going to do it step by step. Then your prediction and entropy reduces, and you can arrive at an answer which you're confident of and which is correct.

Speaker 0

而大型语言模型(LLMs)也大致如此。这就是思维链(chain of thought)有效的原因。当你要求LLM进行思维链推理时,它会开始将问题分解成小步骤。这些步骤,它过去已经见过。

And the LLMs are pretty much the same way. That's why chain of thought works. What happens with chain of thought is you ask the LLM to do something chain of thought. It starts breaking the problem into small steps. These steps, it has seen in the past.

Speaker 0

它接受过这方面的训练。可能数字有所不同,但概念是它训练过的。一旦它分解了问题,它就变得有信心了。好的,现在我需要做a、b、c、d,然后我就能得出这个答案。

It has been trained on. Maybe with some different numbers, but the concept, it has been trained on. And once it breaks it down, then it's confident. Okay. Now I need to do a b c d, and then I arrive at this answer.

Speaker 0

不管它是什么。

Whatever it is.

Speaker 2

让我们把话题拉回来。我不会深入讨论LLMs。但首先,Vishal,也许你可以,多介绍一下你的背景,以及它是如何影响你在这里的工作的。

Let's zoom back out. I won't I get into LLMs. But but first, Vishal, maybe you can, give more context on your background and how that informs, your your work here.

Speaker 0

好的。是的,正如Martin所说,我的背景和他非常相似。我们,你知道,我们来自网络领域。所以我的博士论文,我在哥伦比亚大学的早期工作都是在网络方面。但我还有另一面,另一个身份,既是企业家又是板球迷。

Okay. So, yeah, yeah, as, Martin said, my background is very similar to his. We, you know, we come from doing networking. So my PhD thesis, my sort of early work at Columbia has all been in networking. But there's another side of me, another hat that I wear, which is both an entrepreneur and a cricket fan.

Speaker 3

我正想说,你不是拥有一支板球队之类的吗?

I was gonna say, don't you own a cricket team or something?

Speaker 0

我是你们本地板球队——旧金山独角兽队的少数股东。是的。

I'm a minority owner at your for your local cricket team, the San Francisco unicorns. Yeah.

Speaker 3

没错。我们为你感到非常自豪。不过

That's right. We're very proud to have you. So but

Speaker 0

所以,比如说在九十年代,我是创办这个名为CricInfo门户网站的人之一。而Cricinfo曾一度成为世界上最受欢迎的网站,点击量甚至超过了雅虎。那是在印度市场崛起之前。

so, say, in the nineties, I was one of the people who started this portal called CricInfo. And Cricinfo, at one point, it was the most popular website in the world. It had more hits than Yahoo. That was before India came on.

Speaker 3

那真是

That's a

Speaker 0

了不起的事情。你知道,我们构建的板球是一个非常数据丰富的运动。你会觉得像是棒球乘以一千倍。我们还建立了一个名为StatsGuru的免费可搜索板球统计数据数据库,这个从二月份起就在Cricinfo上提供了。

remarkable thing. And so, you know, we built cricket is a very start rich sport. You'll think baseball multiplied by a thousand. And we had built this free searchable stats database on cricket called StatsGuru. And this has been available on Cricket four since February.

Speaker 0

但由于你可以搜索任何内容,所有数据都在StatsGuru上提供了。你不能指望人们写SQL查询来查询所有东西。那么我们是怎么做的呢?嗯,我们有一个网页表单,你可以通过那个表单构建查询,后端会将其翻译成SQL查询,获取结果并返回。但结果就是,因为你能做所有事情,所有数据都开放了。

But because you can search for anything, everything was made available on StatsGuru. And you can't expect people to write SQL queries to query everything. So how do you how did we do it? Well, it was a web form where you could formulate your query using that form and in the back end that was translated into SQL query, got the results and got it back. But as a result, that because you could do everything, everything was made available.

Speaker 0

那个网页表单有大约25个不同的复选框、15个文本字段、18个不同的下拉菜单。界面一团糟,非常令人望而生畏。ESPN在2006年左右收购了Cricinfo,我想,但他们仍然保留了相同的界面。这一直让我有点耿耿于怀。

The web form had like 25 different checkboxes, 15 text fields, 18 different dropdowns. The interface was a mess. It was very daunting. And ESPN acquired Cricinfo in the mid two thousand and six, I think, but they still kept the same interface. And that has always sort of nagged me.

Speaker 0

所以我仍然认识那些人。等等。

And so I still know the people Wait.

Speaker 3

等等。是什么让你耿耿于怀?是说CricInfo没有正式的语言界面吗?它是通过网页表单进行查询的?

Wait. What nagged you? Is that CricInfo did not have a formal language. It had a web form for doing queries?

Speaker 0

那个网页表单太糟糕了。正因如此,只有真正的极客才会使用它。

That web form was terrible. Because of that, only the real nerds Of used

Speaker 3

世界上所有让你烦恼的事情。一个老网站竟然是个网页表单。我欣赏...我欣赏你对美学的执着。

all the things of the world that bother you. The fact that an old website was a web form. It was I appreciate I appreciate your commitment to aesthetic.

Speaker 0

所以...所以我至今仍与ESPN Cricinfo的管理人员保持友好关系,特别是总编辑。每次他来纽约,你知道,我们都会见面,一起出去喝一杯。他二月份的时候就在这里。

So so I I'm I'm still friendly with the people who run at E. S. Van Criken for the the editor in chief. Whenever he comes to New York, you know, we meet up, we go out for a drink. And so he was here in February.

Speaker 0

现在故事转向了LLM和我如何相遇。2020年1月,就在疫情爆发前,他在这里。我又一次说,你们为什么不对StatsGuru做点改进呢?他看着我说,你为什么不对StatsGuru做点什么呢?他有点开玩笑的意思。但他觉得也许我有办法修复那个界面。

So now the story shifts to how LLM and me sort of met. So January 2000, right before the pandemic, he was here And I again said, why don't you do something about StatsGuru? And he looks at me and said, why don't you do something about StatsGuru? He was kind of joking. But he thought maybe, you know, I had some ways to fix the interface.

Speaker 0

总之,疫情来袭,世界停摆了。但在2020年7月,GPT-3的第一个版本发布了。我看到有人用GPT-3通过自然语言为他们自己的数据库编写SQL查询。我就想,我能不能用这个来修复StatsGuru?于是我获得了GPT-3的早期访问权限。

So anyway, the pandemic hit, the world stopped. But in July 2020, the first version of g p d three was released. And I saw someone use g p d three to write a SQL query for their own database using natural language. And I thought, can I use this to fix StatsGuru? So I got early access to g p three.

Speaker 0

你知道,那时候获得访问权限很困难,但我还是想办法拿到了。但很快我就意识到,不行,我其实做不到。因为StatsGuru的后端数据库太复杂了。而且如果你还记得,GPT-3当时只有2048个token的上下文窗口。

You know, getting access those days was difficult, but somehow I got it. But soon I realized that, you know, no. I cannot really do it. Because StatsGuru, the the back end databases were so complex. And if you remember g t g p d three had only a 2,048 token context window.

Speaker 0

我根本不可能在那个上下文窗口中容纳那个数据库的复杂性。而且GPT-3当时也不具备指令遵循能力。但在尝试解决这个问题的过程中,我意外发明了现在被称为RAG的技术——根据自然语言查询,我创建了一个自然语言查询和结构化查询的数据库。就像我创建了一个领域特定语言(DSL),然后将其转换为对统计组的REST调用。基于新查询,我会浏览我的自然语言查询集。

There was no way in hell I could fit fit the complexities of that database in that context window. And and g p d three also did not do instruction following at that time. But then in trying to solve this problem, I accidentally invented what's now called Rag, where based on the natural language query, I created a database of natural language queries and structure sort of the structured queries. Like, I created a DSL, which then translated into a REST call to stats group. So based on the new query, I would look through my set of natural language queries.

Speaker 0

我大约有1500个示例,我会挑选出六七个最相关的。然后将这些示例和结构化查询作为前缀与新查询一起发送,GPT-3就能神奇地完成它。准确率非常高。这个系统从2021年9月就开始在生产环境中运行,比ChatGPT问世早了约15个月,在某种意义上开启了整个革命,RAG也变得非常流行。我当时没有称之为RAG,但这确实是我在尝试解决板球工具问题时意外实现的。

I had about 1,500 examples, and I would pick pick the six or seven most relevant ones. And then that and the structured query, would send as a prefix and a new query, then g p d three magically completed it. And the accuracy will very high. So that had been running in production since September 2021, you know, about fifteen months before Chad GPT came and, you know, the whole revolution in some sense started and Rack became very popular. I didn't call it Rack, but this is something sort of I accidentally did in trying to solve that problem for trick and tool.

Speaker 0

当我构建完成后,我对它的成功运行感到非常兴奋,但我完全不明白为什么它能工作。我盯着Transformer架构图看了又看,阅读了那些论文,但还是无法理解其工作原理。于是我开始了一段旅程,开发数学模型并试图理解其运作机制。这就是我进入AI和大语言模型领域的历程,一切都是因为试图解决这个板球问题。

Now once I once I built it, you know, I was thrilled that this worked, but I had no idea why it worked. You know, I stared at that I stared at that transformer architecture diagram. I read those papers, but I couldn't understand how or why it worked. So then I started in this journey of developing a mathematical model and trying to understand how it worked. So that's been sort of my journey through this world of AI and LLMs because I was trying to solve this cricket problem.

Speaker 2

是的,太神奇了。那么回顾GPT-3发布以来,大语言模型的发展最让您感到惊讶的是什么?

Yeah. Amazing. And so maybe reflecting back since the release of GPT-three, what has most surprised you about how LLMs have developed?

Speaker 0

最让我惊讶的是什么?是发展速度。GPT-3就像个不错的小把戏,你需要费尽周折才能让它做点有用的事。但从ChatGPT开始,它相比GPT-3有了进步。然后出现了思维链、指令遵循等各种技术。

So what has most surprised me? The pace of development. So GPT-three was, you know, it was a nice pilot trick, and you had to jump through hoops to get it to do something useful. But starting with you know, chat GPD was an advance over GPT-three. And then you had all these things like chain of thought, instruction following.

Speaker 0

GPT-4真正使其变得完善。发展速度确实让我吃惊。当初使用GPT-3时,我还能看清它的局限性,知道能让它做什么不能做什么。但我从未想到它会变成现在这样——对我而言,对全球数百万人而言,这些大语言模型就像我们的同事,几乎像是随时可以交流、头脑风暴、完成各种工作的实习生,这在ChatGPT刚发布时是无法想象的。

GPT-four really made it polished. And, you know, the pace of development has really surprised me. Now, you know, when I started working with GPT-three, I could sort of see what its limitations were, what I could make it do, what I couldn't make it do. But I never thought of it as, you know, what these LLPs have become for me now and what have become for millions of people around world. We treat these models as our coworkers, almost like an intern that, you know, you're constantly chatting with them, brainstorming, making them do all sorts of work, which we couldn't imagine, you know, just when ChatGPT was released.

Speaker 0

当时它很不错,能写诗,能写打油诗,能回答一些天马行空的问题。但现在展现出的能力,其发展速度确实让我非常惊讶。

It was nice. It could write poem, it could write limericks, it could answer some hallucinative questions. But the capabilities that have emerged now, that pace has been very sort of surprising to me.

Speaker 2

你认为进展是否正在趋于平缓?无论是现在还是不久的将来,你觉得会如何发展?

Do you see progress plateauing? Either now or in the near future, how do you see it going?

Speaker 0

是的。在某种意义上,进展正在趋于平缓。就像iPhone一样。你知道,当iPhone刚问世时,哇,这是什么玩意儿?早期的迭代版本不断让我们对新功能感到惊叹。

Yes. In some sense, progress is plateauing. It's like the iPhone. You know, when the iPhone came out, wow, what is this thing? And the early iterations, constantly we were amazed by new capabilities.

Speaker 0

但过去七、八、九年里,可能只是相机稍微好了一点,或者这里改了一点,内存更大了。但其核心能力并没有根本性的突破。你可以看到类似的情况也发生在这些大型语言模型上。而且这不仅仅是一家公司或一个模型的问题,对吧?

But the last seven, eight, nine years, it's maybe the camera got a little bit better or one thing changed here or memory is more. But there has been no fundamental advance in what it's capable of. You can sort of see a similar thing happening with these LLMs. And this is not true for just one company and one model. Right?

Speaker 0

你看看OpenAI推出的产品,或者Anthropic、谷歌、所有开源的中国模型或Mistral,大型语言模型的能力并没有根本改变。它们变得更好了,对吧?它们有所改进,但并未跨越到一个全新的境界。

You look at what OpenAI is coming up with or what Anthropic, Google, or all these open source Chinese model or Mistral, that the capabilities of LLMs has not fundamentally changed. They've become better. Right? They've improved, but they have not crossed into a different realm.

Speaker 3

Vishal,这是我非常欣赏你工作的一点。真正让我印象深刻的是,这些东西一出现,你就开始忙着为它们的能力建立正式模型,这与其他人所做的形成鲜明对比。其他人都在谈论AGI,说这些东西会递归自我改进,或者说它们只是随机的鹦鹉学舌,这毫无意义。所以每个人都有自己的说辞。

Vishal, this is something that I really appreciate about your work. And so the thing that really struck me is as soon as these things showed up, you actually got busy trying to have a formal model of what they're capable of, which was in stark contrast to what everybody else was doing. Everybody else was like, AGI. These things are gonna, you know, recursively self improve, like or or or they'll say, oh, these are just stochastic parrots, which doesn't mean anything. So everybody had a rhetoric.

Speaker 3

有时这些说辞很幻想,有时又几乎过于简化,比如‘哦,它只是一个数据库’,这显然不对。你工作的真正让我印象深刻的地方在于,你却说‘不,让我们弄清楚到底发生了什么,让我们建立一个正式模型’。

And sometimes this rhetoric was fanciful, and sometimes this rhetoric was almost reductionist. Like, oh, it's just a database, which is clearly not true. And the thing that really struck me about your work is you're like, no. Let's figure out exactly what's going on. Let's come up with a formal model.

Speaker 3

一旦我们有了正式模型,我们就可以推理这意味着什么。在我阅读你的作品时,我将其分为两部分。第一部分是你基本上提出了这个矩阵抽象,我觉得值得你详细谈谈。然后你以上下文学习为例,将其映射到贝叶斯推理,这在我看来非常强大,因为当时没人知道上下文学习为什么有效。

And once we have a formal model, we could reason about what that means. And then, you know, in in my reading of your work, I kinda break it up in two pieces. There's the first one where you basically you came up with this, you know, matrix abstraction. I think it's worth you talking through. And then you took in context learning as an example, and you mapped it to Bayesian reasoning, which to me was incredibly powerful because at the time, nobody knew why in context learning worked.

Speaker 3

所以我认为你讨论这个会很好,因为,再次强调,我认为这是第一次真正形式化的影响,就像,这些东西是如何运作的?而你最近正在进行的是一项更通用的版本,关于这些模型在置信度方面输出的状态空间,也就是我们之前讨论的流形。所以我认为,如果你能描述一下你的矩阵模型,以及如何使用它来提供一些关于上下文学习在做什么、发生了什么事的界限,那将会很棒。

So I think it'd be great for you to discuss that because, again, I think I think it was the first real kind of formal effect on, like like, how are these things working? And then the more recent work that you're working on now is a kind of more generalized version of of of what is the state space that these models output when it comes to comes to confidence, which is the manifold that we're talking about before. So I will it would be I think it would be great if you just described your matrix model and then how you use that to just to to provide some bounds what in context learning is doing, what's happening.

Speaker 0

好的。那么,是的,让我们从那个矩阵抽象开始。矩阵背后的想法是,你有一个巨大的矩阵,其中每一行对应一个提示。然后这个矩阵的列数是LLM的词汇量,即它可以输出的标记数量。所以对于每个提示,这个矩阵包含了在该词汇表上的分布。

Okay. So so, yeah, let's start with that matrix abstraction. So so the idea behind the matrix is you have the gigantic matrix where every row corresponds to a prompt. And then the number of columns of this matrix is the vocabulary of the LLM, the number of tokens it has that it can emit. So for every prompt, this matrix contains the distribution over this vocabulary.

Speaker 3

是的。

Yep.

Speaker 0

所以当你说‘猫坐在’时,你知道,对应‘垫子’的列会有很高的概率。大多数概率会是零。但是,合理的后续内容会有非零概率。所以你可以想象有这样一个巨大的矩阵。现在这个矩阵的大小是,如果你只拿第一代GPT-3模型来说,它有2000个标记的上下文窗口和50000个下一个标记或50000个标记的词汇量,那么这个矩阵的行数比我们所知所有星系中的原子数量还要多。

So when you say the cat sat on the, you know, the column that corresponds to mat will have a high probability. Most of them will be zero. But, you know, reasonable ex continuations will have a nonzero probability. And so you can imagine that there's this gigantic matrix. Now the size of this matrix is, you know, if you just take just the old first generation g p d three model, which had a context window of 2,000 tokens and a vocabulary of 50,000 next tokens or 50,000 tokens, then the size of it, the number of rows in this matrix is more than the number of atoms across all galaxies that we know of.

Speaker 0

所以显然,我们无法精确表示它。幸运的是,很多这些行在现实生活中不会出现。对吧?一个任意的标记集合,你不会用它作为提示。同样,你看到很多这些行是缺失的,很多列值也是零。

So clearly, we cannot represent it exactly. Now fortunately, a lot of these rows are do not appear in real life. Right? An arbitrary collect collection of tokens, you are not gonna use that as a prompt. Similarly, you saw a lot of these rows are absent, and a lot of the column values are also zero.

Speaker 0

对吧?当你说‘猫坐在’时,它不太可能后面跟着对应数字的标记,或者,你知道,一个任意的标记集合。只有非常小的标记子集可以跟随特定的提示。所以这个矩阵非常非常稀疏。但即使考虑到这种稀疏性,即使去除了那些胡言乱语的提示,这个矩阵的大小对这些模型来说也太大了,即使有训练参数也无法表示。

Right? When you say the cat sat on the it's unlikely to be followed by the token corresponding to, let's say, numbers or, you know, an arbitrary collection of tokens. There will be only a very small subset of tokens that can follow a particular prompt. So this matrix is very, very sparse. But even after that sparsity and even after removing the sort of gibberish prompts, the size of this matrix is too much for these models to represent even with a trailing parameters.

Speaker 0

所以抽象地说,发生的事情是模型在训练集的某些数据上训练,对这些行的某个小子集,你有下一个标记分布的合理值。每当你给出一个新的提示时,对吧,它会尝试用它学到的内容和新提示中的内容进行插值,得出一个新的分布。但它基本上不止是一个随机鹦鹉。它在这个它训练过的矩阵子集上有点贝叶斯的意思。所以当我说,你知道,我今晚要和马丁出去吃晚饭。

So what in an abstract sense, what what is happening is the models get trained on certain, you know, data from the training set and certain some subset a small subset of these rows, you have reasonable values for the next token distribution. Whenever you give the prompt something new, right, then it'll try to interpolate with what it has learned and what's there in the new prompt and come up with a new distribution. But it's basically so it it's more than a stochastic parrot. It is sort of Bayesian on this on this subset of the metric that it has been trained on. So so when I say, you know, I'm going out for dinner with Martin tonight.

Speaker 0

现在我相当确定它在训练数据中从未遇到过这个短语。对吧?但它遇到过这个短语的变体。鉴于我要和Martin一起出去,它能生成一个贝叶斯后验概率。它利用Martin是我共进晚餐对象的证据,并生成一个下一个标记的分布,聚焦于我们可能去的地方。

Now I'm reasonably sure that it has never encountered that phrase in its training data. Right? But it has encountered variants of this phrase. And given that I'm going out with Martin, it it can produce a Bayesian posterior. It uses that evidence that Martin is the one that I'm going for dinner with, and it'll produce a next token distribution that'll focus on the likely places that we are going.

Speaker 0

所以这个矩阵,因为它以压缩方式表示,但模型对每个提示都有响应。它们是如何做到的?嗯,它们会回溯到训练过的内容,在那里进行插值,并使用提示作为某种证据来计算新的分布。

So so this matrix, because it's represented in a compressed way, yet the models respond to everything, every prompt. How do they do it? Well, they they go back to what they've been trained on, interpolate there, and use the prompt as sort of some evidence to compute a new distribution.

Speaker 3

对。所以没错。提示的上下文会影响后验分布。

Right. So right. The so the the the context of the prompt impacts the posterior distribution.

Speaker 0

正是如此。

Exactly.

Speaker 3

是的。没错。而且你将其映射到贝叶斯学习,其中上下文就是新证据。

Yeah. Right. And you mapped to Bayesian learning where the the the context is the new evidence.

Speaker 0

新证据。完全正确。

New evidence. Exactly.

Speaker 3

用来学习。

To learn from.

Speaker 0

所以,比如我之前提到的板球例子。是的。我创建了自己的领域特定语言(DSL)。是的。它将板球领域的自然语言查询映射到这个DSL,然后我可以将其转换为SQL查询或REST API等等。但获取DSL是关键。

So so I'll give you so so so for instance, the cricket example that I spoke about earlier. Yep. So I I created my own DSL Yep. Which, you know, mapped a natural language query in cricket to this DSL, which then I can translate into a SQL query or a REST API or whatever. But getting the DSL is important.

Speaker 0

现在,这些大语言模型从未见过那个DSL。是我设计的。是的。对吧?但在展示几个例子后,它就学会了。

Now, these LLMs have never seen that DSL. I designed it. Yep. Right? But yet after showing a few examples, it learned it.

Speaker 0

它是怎么学会的?

How did it learn it?

Speaker 3

而这这完全是在提示中进行的。你并没有在提示中进行100%的训练。对吧?所以,就像,权重是随机的。

And this this is in the prompt. You didn't know training the 100% in the prompt. Right? So, like, it's the the waits are random.

Speaker 2

是的。是的。

Yeah. Yeah.

Speaker 0

是的。这是在2020年10月发生的。对吧?我无法访问OpenAI的内部结构。我只能,你知道,访问API。

Yeah. This is this was happening in October 2020. Right? I had no access to internals of OpenAI. I could just, you know, access the API.

Speaker 0

OpenAI无法访问StatsGuru的内部结构或我脑中构思的DSL。但在仅展示几个例子后,它立刻就学会了。所以这是一个例子,说明它过去见过DSL或类似结构。现在利用我展示的这些证据,好吧,这就是我的DSL的样子。现在对于一个新自然语言查询,它能够为映射到我见过的例子的标记创建正确的后验分布。

OpenAI had no access to internal structure of StatsGuru or the DSL that I cooked up in my head. Yet after showing it only a few examples, it learned it right away. So that's an example where it has seen DSLs or structures in the past. And now using this evidence that I show, okay, this is what my DSL looks like. Now a new natural language query, it is able to create the right posterior distribution for the tokens that map to the example that I've seen.

Speaker 0

现在,另一个美妙之处在于,这是小样本学习或上下文学习的一个例子。对吧?但当我将这个提示连同这些示例提供给这个LLM时,我并没有对LLM说,好吧,这是小样本学习的例子,所以从这些例子中学习。

Now the the other beautiful thing about this is this is an example of few short learning or in context learning. Right? But when I give that prompt along with these these examples to this LLM, I'm not saying to the LLM, okay. This is an example of few short learning. So learn from these examples.

Speaker 0

对吧?你只是把这个作为提示传递给LLM,它处理它的方式与处理任何其他提示完全相同,而这些提示并不是上下文学习的例子。所以这实际上意味着底层机制是相同的。对吧?无论你是给出一组示例,然后要求它完成一个任务,比如上下文学习,还是只给它一些提示进行延续,比如,我今晚要和马丁出去吃晚饭。

Right? You just pass this to the to the LLM as a prompt, and it processes it exactly the way it would process any other prompt, which is not an example of in context learning. So that really means that the underlying mechanism is the same. Right? Whether you give a set of examples and then ask it to complete a talk, a task like an in context learning, or just give it some prompt for continuation, like, I'm going out for dinner with Martin tonight.

Speaker 0

那里没有上下文学习。但它生成或进行这种推理的过程是完全相同的。这就是我一直试图建模并提出一个正式模型的东西。

There's no in context learning there. But the the the process with which it's generating or doing this inferencing is exactly the same. And that's what I have been trying to model and come up with a formal model of.

Speaker 3

我觉得非常令人印象深刻的是,你使用这个基本模型展示了许多事情,对吧,描述了上下文学习并将其映射到贝叶斯学习。但你还做了另一个,你在Twitter(X)上勾勒出了一个几乎轻率的论点,你粗略地论证了为什么递归自我改进在没有额外信息的情况下无法发生。是的。所以也许可以快速过一下,同样的模型如何能非常快速地表明一个模型永远无法递归地自我改进。

What I've found very impressive is you've used this basic model to show a number of things, right, to describe in context learning and to map it to Bayesian learning. But you did it for another one where you kind of you've sketched out this almost glib argument on Twitter on x where you made this you you made a rough argument for why recursive self improvement can't happen without Yeah. Additional information. And so maybe maybe just walk through very quickly how, like, the same model, you can just very quickly show that a model can never recursively self improve.

Speaker 0

所以,你知道,我们最近使用的另一个短语是,LLM的输出是其训练数据的归纳闭包。是的。所以当你说它可以递归地自我改进时,这可能意味着两件事之一。所以让我们回到

So, know, another phrase that we have been using recently is, you know, the output of the LLM is the inductive closure of what it has been trained on. Yep. So when you say that it can recursively self improve, it could mean one of two things. So let let's get back to

Speaker 3

嗯,实际上,你知道有趣的是,通常大多数人同意,如果你有一个LLM,你只是输入输出和输入,它不会做任何事情。但然后人们经常会说,如果你有两个LLM呢?你没有外部信息,但有两个LLM在互相交谈。也许它们可以互相改进,然后你就可以有一个,比如,起飞 scenario。但是,同样地,你甚至用矩阵模型解决了这种情况,即使是n个LLM,表明你根本没有获得任何信息。

the Well, actually, you know what's kind of interesting is, like, often the the most people agree that if you have one LLM and you just feed the output and the input, like, it's not gonna do anything. But then often, people will say, well, what if you have two LLM? You have no external information, but you have two LLMs talking to each other. Maybe they can improve each other, and then you can have, like, you know, a takeoff scenario. But, again, you even address this even in the case of, like, n number of LLMs using kind of the matrix model to show that, like, you just aren't getting any information.

Speaker 3

是的。入口。是的。

Yeah. Entry. Yep.

Speaker 0

是的。所以,所以你可以代表这些模型中所包含的某种信息。让我们回到我之前提到的矩阵类比,那个矩阵抽象。就像我说的,这些模型代表了行的子集,对吧?

Yeah. So so so you can represent the the sort of information contained in these models. And let let's go back to that matrix analogy that I have, the matrix abstraction. So like I said, you know, these models represent a subset of the rows. Right?

Speaker 3

是的。

Yeah.

Speaker 0

所以,行的子集被代表了。但其中一些行能够帮助填补一些缺失的行。例如,如果模型知道如何一步一步地进行乘法运算,那么每一行对应的,比如说,769乘以125之类的,所有

So a subset of the rows are represented. But some of these rows are able to help fill out some of the missing rows. For instance, you know, if the model knows how to do multiplication doing the step by step, then every row that is corresponding to, let's say, seven sixty nine times one twenty five or whatever, all

Speaker 3

那些它可以乘法填补

those It can multiplications fill out

Speaker 0

答案。是的。它可以填补答案,因为它内部嵌入了那些算法,你只需要展开它们。

the answer. Yeah. It can fill out the answer because it has those algorithms sort of embedded in them, you just need to unroll them.

Speaker 3

没错。

Yep.

Speaker 0

所以,它在一定程度上可以自我改进。但超过某个点后,这些模型只能生成它们训练过的内容。让我给你举三个例子。

So, it can sort of self improve up to a point. But beyond a point, these models can only sort of generate what they have been trained on. So, let me give you I'll give you three examples.

Speaker 3

是的。

Yeah.

Speaker 0

所以任何在1915年之前物理学基础上训练出来的模型或大型语言模型,都不可能提出相对论。爱因斯坦不得不某种程度上否定牛顿物理学,提出了这个时空连续体的概念。他彻底重写了规则,对吧?这就是一个AGI的例子,即生成或创造新知识,而不仅仅是揭示宇宙的既有规律。

So any model, any LLM that was trained on pre-nineteen fifteen physics would never have come up with a theory of relativity. Einstein had to sort of reject the Newtonian physics and come up with this space time continuum. He completely rewrote the rules, right? So that is an example of, you know, AGI, where you are generating or generating new knowledge. It's not simply unwelling universe.

Speaker 0

对吧?

Right?

Speaker 3

这就像是它不是在计算什么,而是在真正发现宇宙的某种基本规律。

It's like it's not computing something. It's actually discovering something fundamental about the universe.

Speaker 0

从根本上说,伙计。为此,你必须跳出你的训练集。类似地,任何未经训练的LLM都不会提出量子力学,对吧?比如粒子二象性,或者整个概率概念,或者能量不是连续的而是量子化的——你必须否定牛顿物理学。

Fundamentally, lad. And for that, you have to go outside your training set. Similarly, you know, any any LLM that was trained not in would not have come up with quantum mechanics. Right? That that's the way particle duality or this whole probabilistic notion or that, you know, energy is not continuous but it is quantized, you had to reject Newtonian physics.

Speaker 2

是的。

Yeah.

Speaker 0

或者哥德尔的不完备定理。没错,他必须超越公理体系才能说,好吧,它是不完备的。所以这些例子展示了创造新科学或根本性新成果的过程。这种自我改进在这些架构中是不可能的。

Or Gadell's incompleteness theorem. Yep. He had to go outside the axioms to say that, okay, it is incomplete. So those are examples where you're creating new science or fundamentally new results. That kind of self improvement is not possible with these architectures.

Speaker 0

它们可以精炼这些内容,填补这些已有答案的角色。另一个例子,最近备受媒体关注的是这些国际数学奥林匹克竞赛(IMO)的结果。无论是人类解决还是大型语言模型(LLM)解决,它们都不是在发明新的数学。是的。它们能够通过一系列步骤连接已知结果来得出答案。

They can refine these they can fill out these roles where the answer already exists. Another example, which has received a lot of press these days, is these IMO results, International Math Olympiad. Whether it's a human solving it or the LLM solving it, they are not inventing new kinds of math. Yep. They are able to connect known results in a sequence of steps to come up with the answer.

Speaker 0

所以即使是LLMs,它们所做的也是在探索各种解决方案。在某些解决方案中,它们开始沿着这条路径前进,其中下一个标记的熵很低。所以这就是我说它们处于那个贝叶斯流形中的地方。是的,

So even the LLMs, what they are doing is they are exploring all sorts of solutions. In some of these solutions, they start going on this path where their next token entropy is low. So that's where I say they are in that Bayesian manifold. Yep,

Speaker 3

是的。

yep.

Speaker 0

在那里你会有这种熵的坍缩。通过执行这些步骤,你就能得出答案。但你并不是在发明新的数学。你不是在发明新的公理或新的数学分支。你是在利用你训练过的内容来得出那个答案。

Where you have this entropy collapse. And by doing those steps, you arrive at the answer. But you're not inventing new math. You're not inventing new axioms or new branches of mathematics. You're sort of using what you've been trained on to arrive at that answer.

Speaker 0

所以LLMs能做的这些事情,你知道,它们在连接已知点方面会做得更好。是的。但创造新的点,我认为我们需要架构上的进步。

So those things LLMs can do, you know, they'll get better at it, of connecting the known dots. Yeah. But creating new dots, I think we need an architectural advance.

Speaker 2

是的。所以Martine早些时候谈到,讨论要么是随机鹦鹉,要么是AGI递归。你是如何构想AGI讨论甚至这个概念的呢?在它有用的范围内,它意味着什么?你是怎么看的?

Yeah. So Martine was talking earlier about how the discourse, you know, was was either a stochastic parrot stochastic parrots or, you know, AGI recursive. How are you how do you conceive of sort of the AGI discourse or or or even the concept? What does it mean to the extent that it's useful? How do you think about that?

Speaker 0

所以,你知道,我思考的方式,我们在论文中尝试阐述的是,它超越了讽刺鹦鹉,但它不是AGI。它是在对其训练过的内容进行贝叶斯推理。所以它比单纯的随机鹦鹉要复杂得多。

So the way, you know, I think about it, the way we have tried to formulate in our papers is it's beyond a sarcastic parrot, but it's not AGI. It's doing Bayesian reasoning over what it has been trained on. So it's a lot more sophisticated than just a stochastic parrot.

Speaker 2

抱歉?你如何定义AGI?

Sorry? How do you define AGI?

Speaker 0

好的。那么,AGI。我如何定义AGI呢?我认为,当前LLMs是在已知的贝叶斯流形中导航,而AGI将创造新的流形。目前这些模型只是导航,它们并不创造。

Okay. So, AGI. So, how do I define AGI? So, the way I would say that LLMs currently navigate through this known Bayesian manifold, AGI will create new manifolds. Right now, these models will navigate, they do not create.

Speaker 0

AGI将出现在我们能够创造新科学、新成果、新数学的时候。当AGI提出相对论这样的理论时——我的意思是,这是一个极高的标准,你明白我的意思。它必须超越其训练数据,提出新的范式、新的科学。这就是我对AGI的定义。

AGI will be when we are able to create new science, new results, new math. When an AGI comes up with a theory of relativity, I mean, it's an extremely high bar, you get what I'm saying. It has to go beyond what it has been trained on to come up with new paradigms, new science. That's that's my definition of AGI.

Speaker 3

Vishal,根据你所做的工作,你认为能否界定出所需的数据量、计算资源或数据量,以便让它进化?因为如果只看现有的语言模型,它们是用大量数据创建的。要创造一个新的流形,仅仅因为基本机制,我们就需要更多数据,对吧?否则,它可能会被现有的数据集所吞没。

Vishal, can you do you think that based on the work you've done, can you bound the amount of data, computer, or data or compute that would be needed in order for it to to evolve. So so so one of the problems if if you just take LMs as they exist is there was so much data used to create them. To create a new manifold, we'll need a lot more data just because of the basic mechanisms. Right? Otherwise, it'll just kind of, like, you know, get kinda consumed into the existing set of data.

Speaker 3

比如,你有没有发现任何关于实际有效进化流形所需资源的界限,或者你认为我们只需要一种新的架构?

Like, have you found any bounds of of of what would be needed to actually evolve manifold in a useful way, or do you think we just need a new architecture?

Speaker 0

我个人认为我们需要一种新的架构。我们拥有的数据和计算资源越多,我们可能会得到更平滑的流形。这就像一张地图。

I personally think that we need a new architecture. The more data that we have, the more compute we have, we'll get maybe smoother manifolds. So it's like a map.

Speaker 3

是的。因为,

Yeah. Because because,

Speaker 0

我的意思是,

I mean,

Speaker 3

人们有一种观点。他们会说,好吧,Vishal,这一切都很好,但是,你知道,我可以直接拿一个大语言模型,给它眼睛,给它耳朵,把它放到世界上,它就能获取信息。通过这种交互,它会自我改进。因此,它能学习新东西。但我一直凭直觉认为的反驳点是,训练这些东西所用的数据量太大了。

there's there's there's this view that people have. They're like, well, Vishal, this is all this is all this is all, you know, good and well, but, you know, I could just take an LLM, and I can give it eyes, I can give it ears and I can put it in the world and it'll gain information. And based on that interface, it'll improve itself. And therefore, it can learn new things. But the counterpoint that I've always just intuitively thought to that is the amount of data used to train these things is so large.

Speaker 3

在增量数据下,你实际上能在多大程度上演化那个流形?我的意思是,几乎微乎其微。对吧?必须要有其他方法来生成新的流形,而不是在现有基础上演化。

How much can you actually evolve that manifold given an incremental I mean, almost none at all. Right? There has to be some other way to generate new manifolds that aren't evolving the existing one.

Speaker 0

我完全同意。必须要有一种新的架构飞跃,才能超越当前这种只是投入更多数据和算力的方式。你知道,这会达到瓶颈。就像是iPhone十五、十六、十七代一样。

I I completely agree. There has to be a new sort of architectural leap that is needed to go from the current, you know, just throwing more data and more compute. You know, it's going to plateau. It's it's, you know, the iPhone fifteen, sixteen, 17.

Speaker 2

在你看来,有没有什么有前景的研究方向,可能帮助我们超越大语言模型的局限性?

And are there any research directions that are promising in your mind that might help us, you know, go beyond LLM limitations?

Speaker 0

所以,我的意思是,我再次声明,我很喜欢大语言模型。它们非常棒,会极大地提高生产力,但我不认为它们是终极答案。你知道,Yad Likhan有句名言,说大语言模型是通往亚洲之路上的一个干扰。

So, mean, again, I love LLMs. They are fantastic. They are going to increase productivity like nobody's business, but I don't think they are the answer. So, you know, Yad Likhan famously says that LLMs are a distraction on the road to Asia.

Speaker 3

死胡同。它们是通往亚洲的死胡同。

Dead end. They're a dead end to Asia.

Speaker 0

我不认为我完全属于那个阵营,但我认为我们需要一种新的架构来建立在LLM之上,以实现AGI。你知道,一个非常基本的东西。你知道马丁刚才说的。给他们眼睛,给他们耳朵。你让他们变得多模态,当然,他们会变得更强大。

I don't think I'm not quite in that camp, but I I think we need a new new architecture to sit on top of LLMs to reach AGI. You know, a very basic thing. You know what Martin just said. They know you give them eyes and you give them ears. You make them multimodal, of course, they'll become more powerful.

Speaker 0

但你需要的不仅仅是这些。你知道,人类大脑用很少的例子学习的方式,那不是变压器学习的方式。

But you need a little bit more than that. You know, the the way human brains learn with with very few examples, that's not the way transformers learn.

Speaker 3

是的。

Yeah.

Speaker 0

而且,你知道,我不是说我们需要创造一个爱因斯坦或盖特,但必须有一个架构上的飞跃,能够创建这些流形,仅仅扔进新数据是做不到的。它只会平滑已经存在的流形。

And, know, I'm not saying that we need to create an Einstein or a Gator, but there has to be an architectural leap that is able to create these manifolds and just throwing new data will not do it. It'll just smoothen out the already existing manifolds.

Speaker 3

那么这是否意味着你的目标是帮助思考新架构,还是主要专注于为现有架构设定形式界限?

Is that something so is is your goal to actually help, like, think through new architectures or are you primarily focused on putting formal bounds on existing architectures?

Speaker 0

两者都有点。我的意思是,前一个目标是更雄心勃勃的,是的,每个人都在追求,是的,我我我一直在思考这个问题。

A bit of both. I mean, the the former goal is the more ambitious one that Yeah. Everybody is chasing, and, yeah, I I I think about that constantly.

Speaker 2

有没有任何新的,甚至像是新架构的暗示,或者我们在新架构方面是否已经取得了一些进展?还是说

Are are there any new even, like, sort of hints at a new architect or, like, have we started to make any progress on on on new new architectures? Or is it

Speaker 0

你知道,Jan 一直在推动这个 JPA 架构。嗯,基于能量的架构。它们看起来很有前景。我一直在思考的方式是,有一组基准测试或者 ARC 挑战。

You you you know, Jan has been pushing at this JPA architecture Yeah. Energy based architectures. They they seem promising. The the the way I have been sort of thinking about it is, there's this set of benchmarks or the ARC price. Yeah.

Speaker 0

对吧?Mike Kanoop 和 Francois Chalet 认为,如果你能理解为什么大语言模型在这个测试上失败,也许你可以逆向工程出一种新的架构来帮助你成功,对吧?我同意很多人说的,语言很棒,但语言不是答案。当我在接住飞来的球时,我在脑海里进行模拟,而不是把它翻译成语言来判断它会落在哪里。

Right? That Mike Kanoop and Francois And Chalet if you understand why the LLMs are failing on this test, maybe you can sort of reverse engineer a new architecture that'll help you succeed in that, right? And I agree with a lot of what several people say that, you know, language is great, but language is not the answer. You know, when I'm looking at catching a ball that is coming to me, I'm mentally doing that simulation in my head. I'm not translating it to language to figure out where it'll land.

Speaker 0

我在脑海里做那种模拟。新架构的一个问题是,我们如何让这些模型进行近似模拟,来测试这个想法并决定是否继续。嗯,另一件我一直好奇的事是,我们作为人类是如何发展的?是因为我们聪明才发展出语言?还是因为我们发展了语言,才加速了我们的智能?

I do that simulation in my head. Of the new architectural things is how do we get these models to do approximate simulations, to test out that idea and whether to proceed or not. Yeah, another thing that I've always wondered about is, did we develop as humans? Did we develop language because we were intelligent? Or because we developed language, we accelerated our intelligence?

Speaker 0

所以,我不知道你站在哪一边。

So I I don't know which side of the camp you fall on that.

Speaker 3

嗯,我的意思是,有趣的是,有一些关于人类从零开始发展语言的口述例子被记录了下来。对吧?比如危地马拉或尼加拉瓜手语,那里有一些学生在没有被教导的情况下发展出了自己的语言。所以这似乎表明语言是跟随智能而来的。问题是这些都是传闻。

Well, mean, I so what's interesting is, like, you have these anecdotal examples of humans developing languages de novo that have been recorded. Right? Like, it's it's either the Guatemalan or Nicaraguan sign language, right, where there is these students that develop their own language without being taught. And so that would suggest that languages follows intelligence. The problem is is they're all anecdotal.

Speaker 3

对吧?比如,谁知道是不是有人教过他们手语?没人真正知道。没有控制组。所以这些都是观察性研究,而且数量太少,你不得不怀疑这是不是只是粗糙的观察。

Right? Like, who knows if somebody didn't teach them sign language? Like, nobody really knows. There is no controls. So this is all these observational studies, and there's so few of them, you have to wonder if it's just kind of sloppy observation.

Speaker 3

所以我认为这个问题仍然悬而未决。

And so I think that the question is still outstanding.

Speaker 0

是的。所以,我我的意思是,语言确实加速了我们的智力发展。这一点毫无疑问。是的。但谁先谁后,我们并不清楚。

Yeah. So, I I mean, language definitely accelerated our intelligence. There's no question about that. Yeah. But which followed which, we don't know.

Speaker 3

但我认为这是一个网络问题,自然而言,一旦有了语言,你们就能交流。当你们能交流时,信息就会被存储下来。

But I view it as I view it as a I view it as a networking problem naturally, which is once you have languages, you can communicate. When And you can communicate, it gets stored.

Speaker 0

你们可以复制。是的。

You can replicate. Yeah.

Speaker 3

是的。

Yeah.

Speaker 0

是的。完全正确。

Yeah. Exactly.

Speaker 3

完全正确。

Exactly right.

Speaker 0

酷。我

Cool. I

Speaker 3

我再次强调,这问题有点古怪,但是的。你知道,我认为你为讨论带来的一个观点——对于正在收听的听众们,我真的认为你们应该去看看Vishal的研究并阅读它。我只是觉得它会给你一个非常、非常特别的理解,尤其是如果你有系统背景,比如网络或系统背景,它会让你对这些界限有一个非常、非常好的理解。但是,你所借鉴的工具包,比如信息论和更形式化的方法——你有没有发现AI社区对此持开放态度,还是说这像是两个不同的文化、两个不同的星球在尝试沟通,却缺乏共同基础?就像,你是如何发现将网络的世界观带入AI领域的?

I again, this is kind of a wonky question, but Yeah. You know, I think one thing that you've brought to the discourse and for those that are listening to this, I really think that you should up Vishal's work and read it. I just think it'll give you a really, really especially if you have a systems background, like a networking or systems background, it'll give you a really, really good understanding of kind of the bounds on these. But, like, the toolkit that you draw from is, like, information theory and, like, more formal have you found that the AI community is receptive to this, or is it, like, two different cultures, two different planets trying to communicate and not a lot of common ground? Like, how have you found, like, bringing, like, the networking view of the world to the AI realm?

Speaker 0

其中一些人肯定是接受的。但是,你知道,这些大型会议及其评审过程太随机了。他们提出的问题类型,你知道,我是个建模的人。我喜欢建模。而且,你知道,我把这项工作的一个版本提交给了一个非常著名的机器学习或AI会议。

Some of them are receptive to it, definitely. But, you know, these large conferences and their reviewing process, it's so random. And the kind of questions they ask, you know, I'm a modeling person. I like to model things. And, you know, I submitted one version of this work to one very famous machine learning or AI conference.

Speaker 0

审稿人说,好吧,这是一个模型,所以呢?

And the reviewer said, okay, this is a model, so what?

Speaker 3

所以

So

Speaker 0

所以,这里有

so so there is

Speaker 3

那真是太了不起了。所以,就像,你实际上研究了一个没人理解的系统。我们没有模型可用。你实际上提供了一些我们可以用来分析它的模型,而且是的。仅凭这一点还不够。

That's absolutely remarkable. So, like, you you've actually taken system that nobody understands. We have no models for you. You actually provided some model that we can use to analyze it, and Yeah. That alone wasn't sufficient.

Speaker 0

他们问,那么大规模实验在哪里来证明这一点?

They're asking, so where are the large scale experiments to to prove this?

Speaker 3

我确实在听。我,说实话,我的意思是,我觉得当前AI社区中有太多经验主义,正是因为我们不理解这些系统。你知道,这让我有点想起。我感觉系统走向了另一个方向,对吧?

I do listen. I I honestly I mean, I I find there's so much empiricism in, like, the the the current, you know, AI community exactly because we don't understand the systems. You know, it kind of reminds me. I I I feel like I feel like systems went the other way. Right?

Speaker 3

就像我们有了所有这些模型,但后来我们不明白系统是如何工作的,然后我们就只是,实际上进行了测量。感觉机器学习或AI的东西是相反的,就是我们知道自己不理解它们,所以我们就测量它们,但现在我们正试图提出模型。

It's like we had all of these models, but then we didn't understand how the systems worked, and then we just, like, actually did measurement. It feels like ML and or the AI stuff is the opposite, which is like, we know we don't understand them, and so we just measure them, but now we're trying to, like, come up with the models.

Speaker 0

是的,没错。所以在某种意义上,构建这些工件然后仅仅测量它们太容易了,以至于人们一直在尝试这样做。而且,你知道,我真的很不喜欢的一个术语是提示工程。

Yeah. Exactly. So it was so easy in some sense to build these artifacts and then just measure them that people have been going around trying to do that. And, you know, one term I'd really dislike is prompt engineering.

Speaker 3

为什么?

Why?

Speaker 0

你知道,工程曾经意味着送人上月球或提供五个九的可靠性。提示工程就是提示调整。

You know, engineering used to mean sending a man to the moon or providing five nines reliability. Prompt engineering is prompt twiddling.

Speaker 3

是的。

Yeah.

Speaker 0

对吧?你调整一个提示,困惑度变化,推理、输出也变化。而且,你知道,有数百篇论文只是做一个又一个实验,这样那样地改变提示,并写下他们的观察结果。结果,很多这样的论文被写出来,被提交评审。评审人忙于查看所有这些经验性工作。

Right? You you fiddle with a prompt and the boggles changes and the the inference, the output changes. And, you know, you have, like, hundreds of papers just just doing one experiment on the other, changing a prompt this way, that way, and writing their observations. And as a result, lots of these papers are being written, are being submitted for review. Reviewers get busy looking at all this kind of empirical work.

Speaker 0

我个人的偏好是首先尝试去理解、建模它。是的。然后你才能做其他事情。

My personal taste is to first try to understand, model it. Yeah. And then you can do the other thing.

Speaker 3

像个真正的理论派。我不太懂这种大规模调参的事情。

Like a true theory guy. I don't know about this big twiddling.

Speaker 2

让我再问一个关于LLM的问题:是否存在某些基准测试或现实世界任务,如果它们实现了,你会重新评估并说,嘿,也许LLMs比我原先认为的更接近AGI的道路?

Let me ask one more LLM question, which is, are there any benchmarks or real world tasks that if they occurred, you'd sort of reevaluate and say, hey, maybe LLMs are closer to the path to AGI than I thought?

Speaker 0

如果是非常现实世界的任务。好问题。你知道,对于LLMs或这些模型来说,拥有最多训练数据的领域可能是编程。编程也是你可以拥有最多结构的地方。然而,任何使用过这些工具的人,无论是Cursor还是其他什么,或者编写代码,LLMs仍然会产生幻觉,继续生成不合理的代码。

If they're very real world tasks. Good question. You know, which for LLMs or these models, the one domain where you have the most training data is probably coding. And coding is where you can also have the most structure. And yet, anyone who has used these tools, whether it's cursor or whatever, or plot code, LLMs continue to hallucinate, continue to generate unreasonable code.

Speaker 0

你必须不断地照看这些模型。所以,当有一天LLM能在没有任何照看的情况下创建一个大型软件项目时,我就会有点相信它太容易了。但再次强调,我不认为它能够创造新的科学。如果它做到了,那才是我信服的时候。

You have to constantly babysit these models. So, the day an LLM can create a large software project without any baby setting is the day I'll be a little bit convinced that it's too much easier. But again, I don't think it'll be able to create new science. If it does, that's when I'll be convinced.

Speaker 3

我,你知道,我认为你几乎可以用一种定义性的方法来回答这个问题,Vishal。就像,这类问题的问题是,如果你有数十亿美元,可以收集任何你想要的数据,你就能让模型做任何你想做的事。对吧?所以,你懂我的意思吗?就像,在某种程度上,这些模型背后有整个资本结构的机器在支撑。

I, you know, I think that you can almost take a definitional approach to answer this question, Vishal. Like, the problem with these types of questions is is if you have billions of dollars and you can collect whatever data you want, you can make a model do anything you want. Right? And so, like, you know what I'm saying? Like, it's it's some level, you've got this entire capital structure machinery behind these models.

Speaker 3

所以你会说,哦,它可以在科学上很出色。当然。你投入十亿美元解决材料科学问题并收集所有数据,你就能在材料科学或其他任何领域表现出色。但有一个定义性的答案,也就是——我要引用你的工作——存在一个基于其训练数据的内在流形。然后问题是,它是否曾产生过偏离这个流形的新东西。

So you're like, oh, it can be good at science. Well, sure. You put a billion dollars in solving materials science and collect all this data, you'll be good at material science or or whatever it is. And so but there is a definitional answer, which is and and and and I'm gonna draw from your work, which is there is a manifold that's in there based on the data it's been training on. And then the question is is if it ever produces something that's off like a new manifold.

Speaker 3

因此,考虑到现有的交易数据,如果它真的那样做,如果它做了超出该分布的事情,那么显然我们正走在学习新事物的道路上。如果没有,那么一切都只是从已知内容出发的计算步骤。

So considering the existing traded data, if it ever does that, if it does something that's outside of that distribution, then clearly, we're on a path to to learning new things. And if not, then everything is just a computational step from what's already known.

Speaker 0

是的。这就是我的意思

Yeah. And that's all I mean

Speaker 3

然后我想,我想反驳这一点的话,也许可以说所有人类都只是在各自的流形上工作,而爱因斯坦,你知道,只是幸运之类的,我想这会是反驳的观点。但是

And then I guess I guess the count I guess the counter to that would would be maybe all humans do is work on their own manifold, and Einstein, you know, was lucky or something, I guess, would be the counter to that. But

Speaker 0

是的。所以,你知道,有很多答案和例子,是的,它正在创造这个新的流形。我不想用那种定义性的答案。我觉得听起来可能太... 是的,太古怪,太数学化了。

Yeah. So so, know, there's several many answer and examples, and, yeah, it's creating this new manifold. I didn't want to use that definitional answer. I thought it might sound too Yeah. Too wonky, too mathematical.

Speaker 0

但本质上,如果LLMs真的创造了这个新流形,那么我会被说服。但到目前为止,它们是否只是更擅长在现有流形、现有训练集中导航?

But, essentially, if LLMs really created this new manifold, then I would be convinced. But Can I can I far, they have just gotten better at navigating the existing manifold, the existing training set?

Speaker 3

这非常强大,并将改变世界。

Which is hugely powerful and is gonna change the world.

Speaker 0

这非常强大。我不否认这一点。我认为它们极其、极其擅长... 是的,在它们能做的事情上。但它们的能力是有限的。

Which is hugely powerful. I'm not denying that. I think they're extremely, extremely good Yeah. At what they can do. But there's a limit to what they can do.

Speaker 3

所以我有个快速的问题。你接下来要做什么?我的意思是,你已经处理了上下文学习,你有了LLM的模型,现在你有了一个通用的解决方案空间模型。你在考虑接下来解决什么?

So I have one quick question. What's next for you? I mean, you've you've you've you've tackled in context learning. You've got a model for LLMs, and now you've got a generalized model for, like, you know, like their solution solution space. What are you thinking about tackling next?

Speaker 0

在建模方面还是

In terms of modeling or

Speaker 3

学术上,一个LLM

Academically, an LLM

Speaker 0

是学术上,

is Academically,

Speaker 3

我,

I'm,

Speaker 0

你知道,我在思考这个问题,需要什么样的架构飞跃

you know, I'm thinking of this, what is the architectural leap that is needed

Speaker 3

哦,那很令人兴奋。

Oh, that's exciting.

Speaker 0

要创建这个新的流形。那么我们如何使用,你知道的,多模态数据呢?

To create this new manifold. And how do we use, you know, multimodal data?

Speaker 3

太棒了。

Awesome.

Speaker 2

来扩展讨论到

To expand talk to

Speaker 3

我们。没错。我们很乐意,所以

us. That's right. We'd love So,

Speaker 0

我的意思是,即使对于大语言模型,在论文中我们提到,通过遵循低熵或最小熵路径,你可以改进推理。所以这是我们正在采取的一个非常小的步骤,你知道,我们正在构建和训练模型,这些模型将基于熵路径进行推理。

I mean, even with LLMs, in the paper, we say that you can improve the inference by following this low or minimum entropy path. So that's a very sort of small step that we are taking, you know, we are building and training models that'll do inference based on the entropic path.

Speaker 3

是的。顺便问一下,模型探针还在运行吗?

Yeah. By the way, is model probe still up?

Speaker 0

令牌探针。是的,是的。令牌探针还在运行。实际上你可以看到,令牌探针是我们构建的软件。多亏了Martin和a16z的慷慨,它运行在你们的服务器上,任何人都可以去测试。

Token probe. Yeah, yeah. Token probe is still up. And you can see actually, token probe is software that we built. And thanks to Martin and a16z's generosity, is running on your servers, and anyone can go and test.

Speaker 0

我们在那里所做的是实际展示了熵。

And what we have done there is we actually show the entropy.

Speaker 3

是的,这太有启发性了。我推荐任何对此感兴趣的听众去看看 Token Pro。它会向你展示置信度。

Yeah. It is so enlightening. I recommend anybody listening to this who's interested. Actually, check out token pro. It'll be shows you the confidence.

Speaker 3

是的,随着你继续深入,这真的很了不起。

Yeah. As you go along, it's it's remarkable.

Speaker 0

所以,你知道,在上下文学习中,你创建了你的新 DSL,把它放入提示中,然后你可以看到置信度随着每个新例子而上升。是的,熵在减少。这某种程度上是对模型的一种验证。你可以亲眼看到它在你面前展开。

So, you know, so in context learning, you know, you create your new new DSL, and you give it to the prompt, and you can see the confidence rising with each new example. Yeah. Entropy reducing. And that sort of is a validation of the model. You can see it sort of unfurling in right in front of your eyes.

Speaker 0

Token Probe 是的,Larry。谢谢。再次感谢。是的。

The token probe is yeah, Larry. Thanks. Thanks again. Yeah.

Speaker 2

Vishal,非常感谢你来参加播客。这是一次很棒的对话。

Vishal, thanks so much for coming on the podcast. This is great conversation.

Speaker 0

非常愉快。谢谢你。再次非常感谢。

Was great fun. Thank you. Thank you so much again.

Speaker 1

感谢收听本期a16z播客。如果您喜欢这一期,请务必点赞、评论、订阅、给我们评分或写评论,并与您的朋友和家人分享。更多节目请访问YouTube、Apple Podcasts和Spotify。在X上关注我们@a16z,并订阅我们的Substack:a16z.substack.com。再次感谢收听,我们下期再见。

Thanks for listening to this episode of the a 16 z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or a review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on X at a sixteen z, and subscribe to our Substack at a16z.substack.com. Thanks again for listening, and I'll see you in the next episode.

Speaker 1

提醒一下,此处内容仅供信息参考,不应视为法律、商业、税务或投资建议,也不应用于评估任何投资或证券,且并非针对任何a16z基金的投资者或潜在投资者。请注意,a16z及其关联公司可能持有本播客讨论公司的投资。更多详情,包括我们的投资链接,请参见a16z.com/disclosures。

As a reminder, the content here is for informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any a16z z fund. Please note that a sixteen z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com forward slash disclosures.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客