是时候重新思考大语言模型预训练了吗？与Aditi Raghunathan一起探讨 - #747 | The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) 中文双语解读

本集简介

今天我们邀请到卡内基梅隆大学助理教授Aditi Raghunathan，共同探讨大语言模型的局限性以及如何构建更具适应性和创造性的模型。我们深入剖析了她获得ICML 2025杰出论文奖的研究《掷骰子与三思而行：突破下一个词预测的创作桎梏》，该研究揭示了大语言模型为何难以产生真正新颖的创意。我们重点解析了"掷骰子"方法——通过在生成初始阶段注入随机性来促进结构化探索，以及"三思而行"理念——通过替代性目标训练模型实现"思维跳跃"以生成更多样且结构化的输出。此外，我们还讨论了Aditi关于"灾难性过度训练"这一反直觉现象的研究：当模型在更多数据上训练时，基准性能提升却导致其微调适应新任务的能力下降，并深入探讨了她实验室在构建更可控可靠模型方面的工作，包括"记忆沉池"这一架构创新——通过隔离特定信息实现针对性遗忘的技术路径。本集完整节目笔记详见：https://twimlai.com/go/747

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

我要感谢我们的朋友Capital One赞助今天的节目。Capital One的技术团队不仅仅在谈论多智能体AI，他们已经部署了一个。它叫做聊天礼宾服务，正在简化购车流程。通过自我反思、分层推理和实时API检查，它不仅帮助买家找到心仪的汽车，还能协助安排试驾、获得融资预批准以及估算置换价值。先进、直观，且已部署使用。

I'd like to thank our friends at Capital One for sponsoring today's episode. Capital One's tech team isn't just talking about multi agentic AI, they already deployed one. It's called chat concierge and it's simplifying car shopping. Using self reflection and layered reasoning with live API checks, it doesn't just help buyers find a car they love, it helps schedule a test drive, get pre approved for financing, and estimate trade in value. Advanced, intuitive, and deployed.

Speaker 0

这就是他们的技术实力。这就是Capital One的科技。

That's how they stack. That's technology at Capital One.

Speaker 1

你知道，我们通过基准测试来衡量性能。如果我们只关心这个，似乎可以做得很好，因为如果你收集的数据与你想要表现良好的数据相似，你只需要投入大量计算资源。但是，如果我们以略有不同但同样从部署角度有意义的方式测试，这真的能解决问题吗？模型在什么时候会失效？为什么会发生这种情况？训练动态或数据整理中的哪些方面实际上影响了这种行为？以及什么是正确的干预方式，让这些问题消失？

You know, we measure performance on a benchmark. And if that's all we care about, it seems like we can do really well because if you collect data that looks like the data you wanna do well on, you can just, you know, throw a lot of compute at it. But, like, does that actually solve the task if we, you know, just test it in a slightly different way that is also meaningful from a deployment perspective? But, like, when does that break the models, and why does that happen? And how does the, you know, the training dynamics or the data curation, like, what aspects of these actually influence this behavior and what's the right way to intervene and make these problems, you know, go away.

Speaker 0

好了，各位。欢迎收听twiml.ai播客的另一期节目。我是主持人Sam Charrington。今天，我邀请到了Aditi Raghunathan。Aditi是卡内基梅隆大学计算机科学助理教授。

Alright, everyone. Welcome to another episode of the twiml.ai podcast. I am your host, Sam Charrington. Today, I'm joined by Aditi Raghunathan. Aditi is an assistant professor of computer science at Carnegie Mellon University.

Speaker 0

在开始之前，请记得在你收听今天节目的地方点击订阅按钮。欢迎来到播客，Aditi。

Before we get going, be sure to take a moment to hit that subscribe button wherever you're listening to today's show. Welcome to the podcast, Aditi.

Speaker 1

是的。谢谢邀请。对这次对话感到兴奋。

Yeah. Thanks for having me here. Excited for the conversation.

Speaker 0

我也对这次对话感到兴奋。我们将深入探讨你最近的一篇论文，该论文获得了ICML 2025的优秀论文奖。论文名为《掷骰子并三思而后行：超越下一词预测的创造极限》。但更广泛地说，你的实验室一直在深入研究当前LLM架构的一些局限性、机遇，以及我们需要理解什么以更好地利用AI模型，我期待与你探讨这些。首先，我想请你分享一下你的背景，以及是什么让你对AI和机器学习产生兴趣的。

I'm excited for the conversation as well. We are gonna be digging into one of your recent papers, which won the outstanding or an outstanding paper award at ICML twenty twenty five. That paper is called roll the dice and look before you leap, going beyond the creative limits of next token prediction. But more broadly, your lab has been really digging into some of the limitations of current LLM architectures and opportunities and, you know, what we need to understand to better make use of of AI models, and I'm excited to talk through those with you. To get us started, I'd love to have you share a little bit about your background and, you know, what got you excited about AI and machine learning.

Speaker 1

回想我的本科时期，我一直对复杂性理论充满热情，非常喜欢思考什么是可能的、什么是不可能的这种优雅的思维方式。但与此同时，可能和其他一些人一样，我也渴望研究一些能产生即时影响、具有实用性的东西。当我开始做研究时，大约是2015、2016年，深度学习正在真正兴起，那时已经是ImageNet之后了。所以很自然地，我对此感到兴奋。但真正塑造我在这方面思考的时刻，是在斯坦福参加几次讲座时，其中一次是关于对抗样本的，它向我展示了这些能力强大的模型也会以惊人且可能看似愚蠢的方式失败。

So I guess in my undergrad, I was, you know, always excited by complexity theory, and I really liked the elegance of thinking about what's possible and what's not. But at the same time, maybe like several others also had the edge to work on something that had immediate impact, was practical. And when I started doing research, it was around 2015, 2016 when deep learning was really taking off and this was post ImageNet and so on. And so naturally, I got excited about it. But really, you know, the moments that shaped my thinking around this was when I was at Stanford, I was attending a couple of talks and one of them was on adversarial examples, which showed me how these really capable models can also fail in spectacularly kind of surprising and maybe seemingly dumb ways.

Speaker 1

这些失败还引发了一些关于安全性、在现实世界中的可靠性等直接的实际问题。这让我开始思考，我们该如何看待这些能力强大但也会出现故障的系统？这也与我对于复杂性、抽象思维以及对这些事物做出精确表述的兴趣相契合。感觉这是一个很好的结合，因为很多这些方面无法仅通过基准测试中的具体数字来捕捉，真的需要更进一步思考这些模型学到了什么、何时有效、何时失败等等。这就是我进入所有这些问题的缘由。

And they also had some immediate practical questions around security, reliability in the wild and so on. So that's sort of what got me thinking about like, how do we think about these really capable systems that also have these failures? And it also tied into my itch on complexity and think about abstractions and making precise statements about these things. And it kind of felt like a nice combination because a lot of these aspects cannot be captured by just specific numbers on a benchmark and really need to go one step beyond to think about what do these models learn, when do they work, when do they fail and so on. So that's sort of what got me into all of these questions.

Speaker 1

当然，这个领域一直在快速变化，我也随之不断跟进。有趣的是，模型在很多方面确实变得非常强大，但一些根本性的失败依然存在。因此，我们可以继续对各种各样的模型提出同样的问题。也许令人警醒的是，我们在提升模型能力的同时，并没有真正推动其可靠性的提升。

And of course the field's really been changing rapidly. And so I've kind of gone along for the ride. And what's kind of interesting is that progress has really you know, models have become very capable in a lot of ways, but some of these fundamental failures still remain. And so we can keep asking the same questions about various different models. And maybe it's also sobering that it's not that we've actually pushed the reliability as we have pushed the capability of these models.

Speaker 1

所以在某些方面，同样的问题随着时间的推移依然存在，但这些问题的背景也发生了很大变化。

And so in some ways, the same questions have remained over time, but also a lot of the context of these questions has been changing.

Speaker 0

我们能用这些模型做的事情相当惊人，但我们仍在讨论并不真正理解它们如何工作，以及它如何有点像魔法。

It's pretty spectacular what we can do with the models, and yet we're still talking about not really understanding how they work and how it's a little bit of magic.

Speaker 1

是的，完全正确。我认为同样令人担忧的是，当这种理解的缺乏成为真正的问题时——比如人们考虑如何让模型以某种方式变得安全，不输出有毒内容，不向人们发布危险信息，或以某种危险方式操纵人们。我们围绕这些事情的许多防护措施都非常脆弱，而且目前我们没有更好的方法，因为我们并不真正理解这些系统。因此，我希望能利用我们的理解，真正塑造出在这些所有情境下更可靠的模型。

Yes. Absolutely. And I think the part that's also concerning is when this lack of understanding becomes a real issue when people think about getting models to be safe in some way, to not utter toxic content or to release dangerous information to people or manipulate people in some way that's dangerous. And a lot of our guardrails around these things are very brittle and we we currently don't have a way to do better because we don't really understand these systems. And so I want to really use our understanding in a way to shape actually making these models more reliable in all of these contexts.

Speaker 0

那么，带着这个背景，请谈谈你是如何构建研究议程的，你的核心关注领域是什么？

So talk a little bit about with that context in mind, how you've kind of crafted a research agenda, what are your core focus areas?

Speaker 1

是的。我一直对思考这个问题很感兴趣，这或许可以用'分布偏移'这个统称术语来概括，但它确实捕捉到了这样一个理念：我们在基准测试上衡量性能，如果只关注这一点，似乎我们能做得很好，因为如果你收集的数据在道德上看起来与你希望表现出色的数据相似，你只需投入大量计算资源，似乎就能取得很好的效果。但同时，这能告诉我们模型在稍微不同但仍期望其表现良好的情境中如何工作吗？这就是我们在各种不同问题中采取的角度：我们可以最小化某些损失或瞄准某个基准，这正是许多模型试图做的。但如果我们以部署角度也有意义的方式稍微改变测试方法，它真的能解决任务吗？

Yeah. So I've always been interested in thinking about, like, this is maybe a catchall phrase of distribution shifts, but it does capture this idea that we measure performance on a benchmark and if that's all we care about, it seems like we can do really well because if you collect data that morally looks like the data you want to do well on, you can just throw a lot of compute at it and it seems like that just works really well. But at the same time, does that tell us anything about how the model works in any situation that's slightly different, but you still expect the model to work well? So that's sort of the angle that you've taken in a variety of different questions that we ask is like we can minimize some loss or like try to target a certain benchmark, which is what a lot of these models are trying to do. But like does it actually solve the task if we just test it in a slightly different way that is also meaningful from a deployment perspective?

Speaker 1

但这种情况何时会让模型失效？为什么会发生这种情况？训练动态或数据策展的哪些方面实际上影响了这种行为？以及什么是正确的干预方式来解决这些问题？

But when does that break the models and why does that happen? And how does the training dynamics or the data curation, what aspects of these actually influence this behavior and what's the right way to intervene and make these problems go away?

Speaker 0

基准测试性能与实际使用这些模型的体验之间日益扩大的差距，我认为正逐渐成为一个令人担忧的问题，尤其是在当前这个时刻，由于GPT-5的发布，根据基准测试它是目前最聪明的模型，但许多用户的体验在很多方面都存在不足。我认为这 exemplifies 了当前基准测试方式的某种不足。

This growing gap between benchmark performance and the experience of using these models, I think, is kind of a growing concern and one that we're hearing a lot of this particular moment of time, I think, because of the recent release of GPT-five, which according to the benchmarks is the smartest model around, the user experience for many has been lacking in a lot of ways. And I think that kind of exemplifies this idea of the benchmarking as we think of it today being somewhat inadequate.

Speaker 1

是的。或许一个具体但我觉得非常重要却未被充分思考的角度是：很多时候我们想以此作为起点进行某种微调。这可能只是为了安全对齐之类的事情，我们可以说某家公司会处理这个问题。但也有很多公司拥有自己的专有数据想要进行微调，或者你想根据自身情境进行个性化定制，或者希望它随时间改进，以及世界在不断变化等等。我们还没有好的方法来衡量这些模型的适应性，我认为这实际上是一个非常根本的问题。

Yes. And maybe one concrete way of thinking about this that I feel like is very important, but people haven't thought too much about is many times we want to use this as a starting point and do some kind of fine tuning. And so this could just be for safety alignment, that kind of stuff that we can say, oh, maybe one company just takes care of that. But also a lot of companies have their own proprietary data that they want to fine tune on, or like you want to personalize it in some way to your context, or you want it to improve over time and you know, things like that, or the world is changing. So we don't have a good way of measuring, you know, this adaptability of these models, which I also think is actually a very fundamental question.

Speaker 1

举个例子，我和同事Graham Newbig（几个月前他问我这个问题）开始研究：如果我们有一些想要微调的数据，应该从哪个模型开始？应该选择在基准测试或这些数据上表现很好的模型，即模型的零样本性能吗？这是否自动意味着微调后它会更好？我们意识到答案其实是否定的。

And as an example, I was starting with a colleague, Graham Newbig, who kind of asked me this question several months ago on like which model should we start off if we had some data that we wanted to fine tune on? Should we take the model that like as such on our benchmark or like on this data works really well, like the zero shot performance of the model as is? Does that automatically mean it's better after fine tuning? And yes. And we realized the answer is like, really no.

Speaker 1

事实上，对于许多方面，模型在基准测试上的性能确实能反映其表现，包括可靠性或对许多分布偏移的鲁棒性。我们发现，正如社区所发现的，一般来说，训练数据越多，所有这些指标都会上升。但在模型适应性的这个方面——即在数据上定型后进一步适配的难易程度——我们实际上发现了相反的现象。如果你拿一个小模型并在大量数据上持续训练，最终会发现，投入更多计算资源训练更多数据的模型，作为微调起点反而比早期检查点更差。这确实是一个惊人的发现，因为这是在非人为设置下（如高质量数据），更多计算资源实际上让模型变差而不仅仅是饱和的首批现实案例之一。

And one, for a lot of things actually model performance on a benchmark kind of tracks performance, including reliability actually or robustness to a lot of distribution shifts. What we found, like, you know, what the community has found is in general, the more data you train on, like all of these numbers go up. But this one aspect of how easy is it to adapt these models after you frame them on data, we actually found the reverse thing happens at some point. So if you take a small model and keep training on a lot of data, we see that eventually the model that has been trained on more data that you've thrown more compute on is worse as a starting point for fine tuning than an earlier checkpoint that you had. So this was actually a really striking result because it's one of the first realistic cases where more compute in a very non contrived setting, like on high quality data, more compute is actually kind of making a model worse, not just that it saturates but it's actively worse.

Speaker 1

另一个发生这种情况的场景是当人们试图服务量化模型以提高效率时。在那里我们也发现了类似趋势：在某个点上，随着给模型喂更多数据然后进行量化，用更多数据训练的模型仍然更好。但随后你会看到一个U型曲线，在某个点上更多数据实际上意味着量化后模型更差——这又是一个下游模型严格变差的案例，尽管你以最好的意图投入了更多计算资源并提供了优质数据。所以我认为这是一个非常有趣的发现，人们应该更多思考的是：使用这些模型的一个重要方面实际上是微调和适应，而我们当前仅仅通过改进预训练或这个过程的第一步来优化的做法，并不意味着在微调或后训练后就会表现良好。这不仅仅是一个理论极限，我们实际看到了这一点——很多人讨论Llama3时（这是在Llama4发布之前）就提到了这个问题。

And another context in which this happens is, you know, when people are trying to serve quantized models to improve efficiency, there also we find this kind of trend where at some point like we should, as you throw more data at these models and then you quantize them, the model trained with more data still is better. But then you see this U curve where at some point showing more data actually means that this model is worse after quantizing, which is again a case where your downstream model is strictly worse even though you've spent more compute with the best of intentions and shown good data. So I think this was like a really interesting kind of finding that I think people should think more about is like one important aspect of using these models is actually fine tuning and adapting and our current push towards just doing what we're doing now by improving this pre training or like the first step of this process or really optimizing that doesn't mean it's going to be good after doing your fine tuning or post training. And this is not just a theoretical limit. We actually see that, a lot of people talk about how LAMA three so this is before the LAMA four release.

Speaker 1

所以人们正在讨论LAMA三与LAMA二的区别。很多人发现LAMA三的微调难度要大得多。你知道，学者们经常对模型进行微调，所以这其实也是Graham提出的问题。这某种程度上与一个观点相关：因为我们训练了如此海量的数据，导致LAMA三在基准测试上依然表现优异，但如果你想将其用于特定任务时效果反而更差。同样地，我们对检查点进行了实验——这项工作很棒，他们发布了所有检查点，方便我们进行分析。

So people are just talking about LAMA three versus LAMA two. And a lot of people found that LAMA three was much harder to fine tune. You know, like academics regularly fine tune models and so, you know, this was sort of Graham's question too. And it kind of ties into this idea that it's because we have trained on so much data that at some point Lama trees is still really good at benchmarks but it's just worse if you want to kind of use it for your task. And similarly we ran experiments on the checkpoints because great work, they're releasing all these checkpoints so can do analysis on them.

Speaker 1

我们发现最小的10亿参数模型，在3万亿token上训练后，经过微调或在人们关注的实际基准测试上进行后训练时，其表现反而不如训练token量更少的模型。这个激动人心的结果表明：我们可能需要重新思考预训练的一个维度——如何获得适合微调的优质起点？

And we found that the model, the smallest size model, 1B, I believe, that was trained on 3,000,000,000,000 tokens is actually worse than the model that was trained on lesser tokens, fewer tokens after we do this kind of fine tuning or post training on very realistic benchmarks that people care about. So this was kind of an exciting result that shows that, you know, we might need to sort of one axis of rethinking pre training is like, how do we get a good starting point for fine tuning?

Speaker 0

是否具体表现为模型训练token数量与参数量的比例，与微调性能呈反比关系？

Is it specifically the ratio between the number of tokens that the model is trained on and the number of parameters that you found to be in inversely proportional to performance and tuning?

Speaker 1

是的。在精确结果方面，我们确实得出了这个结论。我们尝试过不同指数，但实验显示指数值最终相同，所以本质上就是比例关系。不过在微调实验中，我们没有绘制精确曲线，因为这很大程度上取决于具体关注的数据分布。

Yes. So for the precision result, that's exactly what we found. And, you know, we try to give different exponents, but it turns out that in our experiments, exponents turned out to be the same. So it's literally the ratio. But in our fine tuning, we kind of didn't do precise like curves because a lot of that depends on the exact distribution of interest.

Speaker 1

因此不存在适用于所有数据集的精确数学规律或统一趋势。但总体而言，我们发现较大模型在出现反效果前能吸收更多token。从某种意义上看，更大模型能更高效地消化更多token，但并不总是严格遵循比例关系，指数可能会有所不同。

So there's no like a clean mathematical result that would hold or the trend won't be the same for all data sets. But in general, we find that it does like a larger model can take in more tokens before it shows the sort of like inverse effect compared to a small model. So in some sense, more like a larger model can absorb more tokens efficiently, but it's not always exactly the ratio. It could be a different exponent.

Speaker 0

这个结果某种程度上符合直觉：当比例较大时，模型可能过拟合程度更高，在微调过程中要消除旧知识并学习新知识会更为困难。

In some ways that strikes me as like an intuitive result in the sense that, you know, when that ratio is larger, the model's potentially more overfit and it would be harder to unlearn and learn things as you're trying to fine tune.

Speaker 1

我认为另一种理解角度是：当我们训练模型时——我研究学习动力学等现象已有段时间——该领域的共识是模型先学习简单模式，再逐步掌握复杂模式。我同事有个形象比喻：就像用纸牌搭建建筑。开始时基础稳固，但随着添加复杂结构，整体稳定性反而会下降。

I feel like another way to think about this is usually when a model when we train models and, you know, some of I've looked at learning dynamics and things like that for a while now. And the converging result in that space is models learn simple things first, and then they learn increasingly complex things. And so like my colleague had this kind of visual image of like trying to like build something with cards. Like usually you start off with something really solid and then you kind of add more and more complicated things. But then like that also means that that structure gets less stable.

Speaker 1

因此，如果你真的试图以某种方式调整这些模型，那么在某种意义上一切都会崩溃。所以我认为我们看到类似的情况，你迫使模型学习越来越复杂的东西，这是好事，因为它能拟合数据。但这也意味着模型在某种程度上是脆弱的，比如你试图通过在某些方向上最小化一些梯度步长来推动模型，但这会引入太多噪声，或者某种程度上破坏模型，导致大量遗忘。这就是大致情况，对于精度来说也是一样的。

And so if you actually try to adapt these models in some way, then everything collapses in some sense. And so I think we see something similar that you're forcing the model to learn more and more complex things, which is good because it's fitting the data. But then that also means the model is brittle in some way that like you try to push the model to be a little, you know, in some direction by minimizing some gradient steps in a direction. But that introduces so much noise or like kind of breaks the model and causes a lot of forgetting. So that's sort of what and it's same thing for precision too.

Speaker 1

就像，你知道，我们可以添加——你可以把这看作是通过改变权重来添加某种噪声。所以模型变得不那么……无法吸收那种噪声，然后就崩溃了。

Like, you know, we add you can think of that as like adding some kind of noise by changing the weights. And so the models become like less like what like, cannot absorb that noise and they just break.

Speaker 0

这让我想到，观察这个比率是一个相当粗粒度的指标，但更进一步会很有趣。例如，如果你能理解训练数据在模型上的相对分布与你想要微调的方向之间的关系。比如，可能有一个模型在基准测试上表现更差，规模更小，或者其他原因让你认为它不会表现得那么好，但由于某种分布重叠之类的原因，它可能表现得更好。你认为这是一个合理的方向吗？

It strikes me that looking at this ratio is a fairly coarse grained metric, but that it would be interesting to go even further. If you could, for example, understand the relative distribution of the training data on the model relative to the direction you want to fine tune it to. Like, you might have a model that is, you know, benchmarks worse, you know, is smaller or other reasons why you might think it wouldn't do as well, but because of some like distribution overlap or something might do better. Is that a reasonable direction you think?

Speaker 1

是的。所以有几种不同的方式来思考这个问题。一种纯粹是，模型是否更稳定，以至于它可以在不损失太多的情况下朝不同方向移动。对吧？所以这就是这种标记与参数的比率，大致可以对应这一点。

Yes. So there are different ways to think about this. So like one is purely kind of, is the model more stable in that it can like move in more, can move in different directions without like losing too much. Right? And so that's this token to parameter kind of, you know, could like roughly correspond to that.

Speaker 1

然后另一个问题就像你说的，我们实际上需要移动多少？我们在我们的灾难性过度训练ICML论文中有一些实验，其中一个代理指标就是学习率。所以如果模型即使在学习率较小的情况下也能获得良好的性能，那么这大致意味着模型没有移动太多，并且它某种程度上是有效的。我喜欢这种分布在某些意义上更接近。所以我们在那里也发现了一些有趣的趋势。

And then the other questions like what you said is like how much do we actually need to move? And we have some experiments in our, you know, this catastrophic overtraining ICML paper where one proxy for how much to move is just the learning rate, you know. So if the model gets good performance, even with a small learning rate, then that roughly means the model hasn't moved too much and, you know, it kind of works. I like the distribution is closer in some sense. So that is that we do find some interesting trends over there as well.

Speaker 1

所以，如果我们在非常接近的东西上开始测量或微调，那么我们实际上看不到这种效应。模型能够接受更多标记并仍然表现出良好的性能，因为我们最终并没有太多更新模型。在极限情况下，如果微调与预训练匹配，那么我们只是回到通常的情况，随着我们展示更多数据，性能会下降。但随着变化更大，令人惊讶的是，这些趋势开始出现，在某个点上模型实际上变得更糟。一个挑战是，很难说哪些分布是接近或困难的，因为我们可能对此有一些直觉，但模型存储信息的方式可能不同，这就是为什么我们将学习率本身视为模型变化多少的代理指标，以此来说明分布有多不同。

So like if we start measuring or fine tuning on things that are very close, then we actually don't see this effect. Like the models are able to take in more tokens and still show good performance because we don't really end up updating the model much. And in the limit, if the fine tuning matches a pre training, then we just get back the usual things go down as we show more data. But then as we have larger changes, then surprise, like these kind of trends start happening where like at some point the model is actually getting worse. And one challenge is that it's not very easy to say which distributions are close or hard because we might have some intuition for what this is but like how the model stores information might be different, which is sort of why we look at learning rate itself as, like, a proxy for how much the model changes as a way of saying, like, how different the distributions are.

Speaker 0

然后当我听到灾难性过度训练时，我想到的不仅仅是这种反向关系，而且可能还有一个悬崖，比如你达到一个点，然后你的微调能力就像掉下悬崖一样急剧下降，变得糟糕得多。你们有没有描述那个点是什么？

And then when I hear catastrophic overtraining, I think of not just that there's like this inverse relationship, but also that there's maybe like a cliff, like you reach a point and then your fine tune ability kind of falls off a cliff and it's much worse. Did you kind of characterize what that point is?

Speaker 1

是的。所以我认为情况是这样的：实际上在某个阶段，总体上会存在一个机制，显示更多数据确实有帮助，无论模型大小如何，因为模型只是在学习东西。然后到了某个点，模型变得如此脆弱，以至于它从更多数据中学到的东西基本上被其脆弱性所覆盖。于是我们就开始看到这种跳跃。所以存在一个临界点，超过这个点后更多数据反而会对你不利。

Yeah. So I guess like the way we so what actually happens is at some point, I mean, like overall in the there is a regime where showing more data does help, like no matter what the model size, because the model is just learning stuff. And then at some point, the model gets so brittle that whatever it learns from the more data kind of just like is overcome by its brittleness. And so, like we start seeing this jump. So there is a point where more data starts hurting you.

Speaker 1

这就像是一个U型曲线的情况。而这个U型曲线的拐点实际上取决于我们讨论过的因素，比如你用于微调的数据类型。是的，所以我们在量化设置中能够精确描述这些现象，因为需要考虑的因素更少——你只是添加噪声。但在这种微调设置中会稍微复杂一些，因为我们无法准确说明目标分布是什么。

So it's like a U shaped kind of situation. And so, and the point at which this U is really depends on kind of like we discussed, like it depends on the data that you're fine tuning on. And yeah, so I yeah. So we are able to characterize these precisely in the quantization setting because there's just fewer factors to count for because you just add noise. But in this fine tuning settings a little bit more tricky because we can't exactly say what the distribution of interest is.

Speaker 1

但如果我们固定某个特定分布或某个关心的学习率，那么这个拐点实际上就变得可预测了。我们在论文中没有在微调设置中过于仔细地建模这个函数形式。我认为这将是非常有趣的未来工作方向——一旦确定了想要微调的分布类型，就可以决定如何分配计算资源并进行扩展定律研究等。但我们确实发现了看起来相当可预测的趋势，并且具有优美的数学形式。

But if we fix a certain distribution or a certain learning rate that we care about, then again this point actually becomes predictable. So we didn't go into modeling this functional form too carefully in the fine tuning setting in our paper. And I think that would be very interesting future work for once we decide what kind of distributions you want to fine tune on and you want to make decisions on how to allocate compute and do the scaling laws and so on. But we do exceed trends where it feels fairly it looks fairly predictable with a nice mathematical form.

Speaker 0

那么，在完成这项研究并得出这些结果之后，如果你要处理未来任务并需要微调模型，这会如何改变你选择基础模型的思路？除了一般性原则之外，比如你是否会因为这些发现而直接排除某些模型？还是说你仍然会测试所有模型，看它们在你的数据上表现如何？

So having done this research and, you know, identified these results, if you were to, you know, approach a future task and need to fine tune a model, how would it change the way you think about base model selection? You know, beyond the generalities, like, would you it's good to know because it helps you understand, but you would still like test everything and see how it performs on your data? Or is there just a whole set of models that you would no longer look at, for example?

Speaker 1

这是个很好的问题。我想如果你使用的是10亿参数规模的模型——可能出于效率考虑你会选择这个规模——如果它的训练数据超过了2.5万亿甚至2万亿（因为公开检查点的粒度不够细），我会说：好吧，除非你确定不想对模型做太大改动，否则可能就不该使用这个模型。这是比较明确的一点。在所有灰色地带中，我认为我们确实需要做一些初步实验，获取多个不同的检查点来看看微调性能最终表现如何。

Yeah, that's a great question. I think if you are using a really I guess what we found is the 1,000,000,000 model size, which again, you might you might wanna use for efficiency reasons. So if it's trained beyond 2,500,000,000,000 or maybe even 2,000,000,000,000 because we didn't have too much granularity in the public checkpoints, I would say like, okay, maybe that's a point where you don't want to use that model unless you're sure that you don't want to change the model too much. And in terms of like the so that's sort of like one thing that's clear. In all the gray areas in between, I think we do have to do some preliminary experiments where we take a bunch of different checkpoints and see like what the fine tuning performance actually ends up looking like.

Speaker 1

这样我们就能大致了解自己处于U型曲线的哪个位置。所以我会说，这项研究给我们的启示是：我们大致能预期这种曲线形状，可以通过探测来确定当前模型处于哪个机制。这将指导我们找到可能的最优点。

And that would give us some sense of like, which part of the U curve are we on? And so can we actually so I would say like that's the understanding it gives us is that we kind of expect this the shape and we can try to probe to see like which regime current models are. And so that would guide us like where the optimal could be.

Speaker 0

这就是那篇题为《过度训练的语言模型更难微调》的论文。我们会在节目说明中链接所有讨论到的论文。那么，在进入创造力话题之前，我想先做个背景说明：你们发表了一篇很有意思的博客文章，综述了实验室在ICML上发表的所有论文，并将其分为局限性和机遇两部分。其中一个局限性就是过度训练，也就是我们刚讨论的内容。下一个是遗忘学习及其相关挑战，这与微调的概念密切相关。

And this is the paper, overtrained language models are harder to fine tune. We'll link to all the papers that we discuss in the show notes for folks. And so, you know, just kind of contextualizing this, and we'll get to creativity in a second, but you published a really interesting blog post that kind of surveyed all of your labs, papers at ICML, and it was kind of broken up into limitations and opportunities. And one of those limitations is overtraining, and that's kind of what we just talked about. The next one was unlearning and the challenges associated with unlearning, which is related to this idea of fine tuning.

Speaker 0

再详细谈谈你在遗忘学习方面观察到的情况。

Talk a little bit more about what you've seen with unlearning.

Speaker 1

是的，这确实非常相关。当你思考微调的使用场景时，一个是专门化处理，在你的用例上提升某些指标而非基准测试。但实际上，所有这些后期训练或微调方法的另一个重要应用场景在某种程度上都是安全性的考量。

Yes. That's like yeah. It's actually very related. And so, like, when you think about what are the use cases of fine tuning, like one is, you know, just to specialize and like push a few benchmark push a few numbers on your use case rather than the benchmark. But actually another important use case of all of these, like post training or fine tuning methodologies is safety in some way.

Speaker 1

很多对齐工作实际上是在事后教导模型区分好坏。类似地，我们可能尝试让模型遗忘有害知识或私人信息等。这就是微调在安全领域的具体应用。我们试图探究为什么遗忘学习如此困难——相关论文层出不穷，这让我想起博士期间研究的对抗样本，当时人们提出各种防御方案，但Carlini总能破解它们。

And so a lot of the alignment work is actually trying to teach the model post hoc what is good and what's bad. And similarly, we might try to unlearn harmful knowledge or unlearn private information and so on. So that's like a very safety specific use case of fine tuning. And we tried to look at why is it so hard to do unlearned English. There are like so many papers that are published and it very much reminded me of adversarial examples from my PhD where people had all these ideas for defenses, but then, you know, Carlini would break all of them.

Speaker 1

整体感觉确实很相似。观察这个领域的发展假设，人们通常直接使用基础模型作为起点，假设模型中可能存在的特定机制，并据此设计算法。

And so, kind of, yeah, so that, so yeah, so it was basically, it kind of felt like that. And if you look at some of the assumptions or kind of how the unlearning field has progressed, people have tried to take the base model as it is, like or like, trade the starting point as it is, and then try to assume certain things that might be happening in these models and use that to get algorithms.

Speaker 0

插一句，让遗忘学习的概念更具体些。你提到安全性，例如训练数据中可能包含化学武器制造方法。现有工作大多围绕建立防护机制来检测并抑制模型讨论这些内容，但另一个方向是直接提取并擦除这些信息，这就是遗忘学习。

And to maybe maybe interject and to be more concrete about unlearning. You mentioned safety, but, you know, example might be, you may have in the training data how to create a chemical weapon. And there's a whole, you know, line of work around like building guardrails to like detect that and suppress the model from talking about that. But another direction is to try to just extract that information, erase it from the model, and that's unlearning.

Speaker 1

没错。可能是危害性信息，也可能是本不该训练涉及的隐私信息。当这些信息存在于模型中时，你希望彻底移除它。从隐私角度也能说明为什么人们不满足于防护机制，而是希望从根本上消除数据痕迹。

Yeah. And it's yeah, that could be harmful information. It could also be private information that like the, shouldn't have trained on or someone wants to remove that information. And so it's something that exists in the model and you kind of want to remove that. And so I guess maybe the privacy angle also tells sort of why the guardrails feel sort of you kind of really just want to remove it from the model and want it to be like it wasn't ever trained on this data.

Speaker 1

这就是这类技术的另一个应用场景。人们的初步尝试是通过微调使模型对需要遗忘的内容产生高损失值，但发现过度调整会破坏模型的其他知识。后来转向定位特定神经元或子空间进行局部擦除，但这种方法也受限于两个因素成效有限。

So that's the other use case for all of these things. And so what people have been trying to do is, so you can start the first, like the first attack would be, like the first way to try to address this would be to fine tune so that the model has high loss on all of these things that you want to forget. And then people realize that you can't actually do too well because then you kind of, you have to like really change everything a lot and that destroys a lot of information in the model. And then there's another sense that maybe this information could be localized or maybe we can find specific neurons or specific subspaces and try to just erase those parts. And even that has actually limited success for two reasons.

Speaker 1

首先，我们必须找到一种方法来确定存储这些信息的神经元位于何处。其次，即使我们找到了这些神经元，也不清楚如何在不破坏其他所有内容的情况下正确擦除这些信息。但这里的前提假设是，首先确实存在这样的神经元。而我们的研究表明，实际上并非如此。在我们训练这些模型的方式中，没有任何理由鼓励或应该允许信息以这种理想的方式被解耦。

One is, we have to first find a way to figure out where those neurons are which store this information. And second is even, I mean, and even after we find that, it's not clear what's the right way to erase that without destroying everything else. But the assumption here is that there exists such neurons in the first place. And what we show is that that actually is not true. That there's no reason in how we've trained these models that encourages or that should allow this information to be disentangled in this nice way.

Speaker 1

也许我们确实看到了一些分离，但由于这种分离效果不佳，这某种程度上解释了为什么我们的遗忘方法效果不理想。因此，我们转而提出，与其局限于一个起点不佳、未能真正解耦所有这些信息的情况，不如给自己灵活性：如果我们能够以某种方式训练模型，使其支持这种下游的遗忘操作，因为我们知道这可能是我们关心的一个应用场景。这启发了关于记忆同步（memorization syncs）的研究，它采用了人们隐含地对模型做出的相同想法或假设，即信息可能被隔离在神经元中。但我们不是等待它奇迹般地发生，而是尝试通过设计来实际强制执行这一点。我们在论文中通过实验和分析发现，常规训练实际上并不会使这一假设成立，因为正如论文中的分析所示，主要原因是我们并没有真正鼓励或强制这种解耦。

And it like maybe we see such separation, but because the separation is not very good, that's sort of why our unlearning methods don't work very well. And so we instead say, instead of constraining ourselves to work with the starting point that is not very good and that it's not really disentangle all of these things, what if we give ourselves the flexibility of what if we could train our models in a way that enables this kind of downstream unlearning because we know that that is a use case which we might care about. So that inspired this work on memorization syncs, is kind of taking the same idea or assumption that people implicitly try to make about models that maybe information is isolated to neurons. But instead of waiting for it to happen by magic, we are like, let's try to actually enforce that by design. And we find in our paper, both for experiments and analysis, that normal training does not actually lead to this assumption to be true because like, yeah, we have analysis in the paper, but kind of the main idea is like we don't really encourage or force this sort of disentanglement.

Speaker 1

而这实际上取决于训练算法的偏差。当前的训练算法并不具备实现这种分离的偏差。但我们可以鼓励

And it really depends on the bias of the training algorithm. And the current training algorithms don't have a bias to actually enable this kind of separation. But we can encourage

Speaker 0

我们讨论的这个假设是指知识本质上是局部化的吗？

And this the assumption that we're talking about is that knowledge is localized essentially?

Speaker 1

正是如此。是的，是的。知识是局部化的。没错。所以，你看训练目标，你只是将梯度传递给所有参数。

Exactly. Yes, yes. That knowledge is localized. Yes. So there's no like, you look at the training objective, you're just passing gradients to all the parameters.

Speaker 0

没什么可做的。

There's nothing to do.

Speaker 1

所以，关于这一点，确实存在一些看似出现的分离，但它并不完美，因为它不必完美。它没有被训练成那样。但如果我们转而训练以鼓励这种分离呢？这正是记忆同步（memorization syncs）背后的想法，我们试图为每个文档指定一些特定的神经元，这些神经元仅在该文档上更新，希望所有特定于该文档的信息都进入那些神经元，而这些神经元在其他文档上不被触及或更新。这就是这里的主要思想。

So there's nothing about and it it there is some, you know, separation that seems to emerge, but it's not perfect because it doesn't have to be. It's not trained to be. But what if we instead trained to encourage this sort of separation? And that's exactly the idea behind memorization syncs, where we're trying to say for every document, let's say that these are some specific neurons that are only updated on that document with the hope that all the information that is specific to this document that goes to that neuron, and those neurons are not touched or updated on other documents. And so that's the main idea here.

Speaker 1

当然，我们希望拥有共享神经元，因为我们希望模型能从所有这些信息中真正学习。我们不希望训练完全分散的模型，因为它们的性能不会那么强。因此，在预训练过程中，我们试图让模型学习一些共享的内容。这再次体现了训练或归纳偏置的美妙之处：通过这种架构，模型实际上被激励将跨所有文档共享的信息保留在那些在所有文档上更新的神经元中。这显然是一个更优的解决方案。

And we of course want to have shared neurons because we also want the models to actually learn from all of this information. And we don't want to train completely decentralized models because those won't be as capable. So we're trying, when you're doing pre training, want the model to learn something that is shared. And this is where again the beauty of the training or the inductive bias comes in, which is with this architecture, the model actually is incentivized to keep the shared information that is shared across all the neurons, sorry, across all the documents, in the neurons that are updated on all the documents. That's just a strictly better solution.

Speaker 1

而那些非常特定于某个文档的内容则保存在这些记忆神经元中。由于这些神经元不在其他任何文档上更新，这些信息得以保留、解耦并隔离。通过我们在较小规模上的实验（目前正在扩大规模），我们发现这种架构实际上能够实现一种良好的分离：特定文档的独特信息全部保存在特定的神经元中，而共享内容则可以在共享神经元中学习。在测试时，你只需丢弃这些记忆神经元，就可以正常使用了。

And the stuff that's very specific to a particular document is in these memorization neurons. And since those are not updated on any other documents, that information is sort of preserved, disentangled, kept aside. And we find through our experiments on, in a somewhat small scale, but we're scaling that up now, is that this architecture actually enables this kind of nice separation between what is special or like what is unique to your particular documents, that's all kept in a specific neuron, whereas what is shared is allowed to be learned in the shared neurons. And at test time, you can just drop out these memorization neurons, and then you're good to go.

Speaker 0

有几个问题。那么，在训练时，你们是在识别那些之后想要从模型中移除的信息吗？

A couple questions. So you so are you at train time, are you identifying the information that you will later want to pull out of the model?

Speaker 1

是的，这是个好问题。目前我们的设置方式是，我们只是标记那些可能想要移除的单元。所以，如果你觉得某个文档包含了你可能想要移除的信息，我们就会将整个文档与一组特定的神经元关联起来。这也可以是基于主题的。

Yeah. That's a good question. So the way we're setting it up right now is that we are just having the units that we might want to remove. So if you feel like this is a, this is, this document has information about a unit who might want to remove this information, then we kind of associate this entire document to a specific set of neurons. So it could also be like a topic.

Speaker 1

比如说，如果我们不想在文档级别操作，而是想在主题级别操作，那么你就需要为某个特定主题的所有文档设置神经元。我们希望选择性地只激活那些神经元。因此，我们需要提前知道我们之后可能想要移除的抽象内容。

Like let's say we said that we don't want to be at the document level, but we want to be at the topic level, then you wanna have neurons for all the documents for a particular topic. We want to selectively kind of activate only those neurons. So we shouldn't we need to know kind of that abstraction that we might want to remove later.

Speaker 0

看起来这些记忆同步神经元的数量会是一个超参数，而且它们与总参数数量的比例也是一个有趣的问题。是的，没错。你们在这方面做了多少实验？

It seems like the number of these memorization sync neurons would be a hyperparameter and like the ratio of those to the total number of parameters is kind of an interesting Yes. Yes. Exactly. Did you to what degree did you experiment with all of that?

Speaker 1

是的，我们在这方面都进行了实验。我们需要让模型稍微大一些，因为我们确实需要更多的神经元。但另一个巧妙的技巧是，我们并不需要完全独立的神经元，只需要确保这些神经元在某种程度上是正交的即可。

Yeah. We we experimented with all of this. So we do need to like, so the models have to be a little bit bigger in this way because we do want more neurons. But one other kind of nice trick is we don't need them to have completely separate neurons. We just need to make sure that the neurons are somewhat orthogonal.

Speaker 1

因此我们可以随机选择高维方向，它们几乎是正交的。这意味着我们可以不操作单个神经元，而是激活这些神经元但激活相当正交的不同空间，这同样有效。这是一个技巧，可以防止模型尺寸变得非常大——如果你要为每个文档或类似的东西设置特定神经元的话。这是我们确保模型尺寸不会大幅增加的一种方式。我们在论文中有消融实验，发现适度增加模型尺寸实际上能够鼓励这种行为。

And so we can just pick random high dimensional directions and they are almost orthogonal. So that kind of means we can have a we can, instead of operating on individual neurons, we take these neurons but activate different spaces that are fairly orthogonal, and that also works as well. So that's a trick to prevent the to prevent having a really large model size if you were to otherwise have a specific neuron for like every document or something like that. So that's one way in which we're able to like make sure that the model size, you know, doesn't go up way more than what it is. So we have ablations of paper where we find that for moderate increases in model size, we're actually able to kind of encourage this behavior.

Speaker 0

这些神经元是局部化在架构中的特定层，还是分布式的？它们的拓扑结构是完全学习得到的，还是先验设置的——这些神经元的位置是预设好的吗？

Are these neurons, like, localized in the architecture, like, to a particular layer or are they distributed? Are they is kind of the topology totally learned or is it is a priori, like, set up where the these neurons are?

Speaker 1

是的。我们在论文中的做法是：为每个文档随机选择一组哈希或某种随机神经元，激活这个特定的神经元组合。这实际上模拟了高维空间中某个方向的符号，就像是为这个新文档留下的空间。可以考虑更智能的方法或改进，比如应该关注哪些层。我们只关注了MLP层，并且只在MLP层中引入了这些机制，这基于人们的直觉：事实或事实信息通常存储在MLP层中。

Yeah. So what we do in our paper is we pick a random for every document, we have a hash or like some kind of random neurons that that particular combination of neurons is activated. And so this effectively models like sign of some one dimension in like one direction, the high dimensional space that's like left to this new document. One could consider smarter or like there might be improvements to this on like kind of which layers should we look at. So we only looked at the MLP layers and we only introduced these things within the MLP layers, sort of building on the intuition people had that facts or factual information is generally stored in the MLP layers.

Speaker 1

但我可以想象，有很多研究可以在这方面做得更智能。不过我们目前只是随机选择。我们也可以尝试让相关文档在某种程度上共享相似的子空间。甚至可以有一个更柔和的版本，让所有东西都正交，但让稍微接近的东西更靠近一些，这样可能会进一步提升性能。我认为这里有很多可以探索的方向。

But like I could imagine like there's a lot of research that one could do to like, you know, be more intelligent about this. But we kind of learned this, and right now it's just random. We could also try to kind of have, say like related documents to maybe share similar subspaces in some way or something. So we could even have like a softer version of this where so everything being orthogonal, can maybe put like things that are slightly close together to be closer together or so on that might further push up performance. I think there's a lot like a lot of things that one could do here.

Speaker 1

我对此非常兴奋，因为我认为这确实告诉我们如何通过设计来更好地控制这些模型。

And I'm very excited about this because I think it really kind of tells us ways to in which we can get more control over these models by design.

Speaker 0

这也让我想到了与Anthropic的电路追踪工作的重叠，以及是否有办法结合这些技术来更好地定位概念。

It also makes me think about kind of the overlap with the anthropic circuit tracing work and if there is some way to combine these techniques to better localize concepts.

Speaker 1

是的，这个观点很好。一个有趣的可能性是：也许自然训练已经倾向于形成某种结构，但还不够完美。所以我们可以从这些容易强化的结构开始，尝试实际硬性强化它们。

Yeah. That's a great point. So one thing that could be interesting is, so maybe natural training already has a propensity to have certain kind of structures, but like it's not perfect. So maybe we can start with like, okay, these are structures that seem easy to enforce. So let's try to like actually hard enforce them.

Speaker 1

所以也许这是一种方式，将这些模糊的可解释性内容变得更加硬性和具体，通过训练模型来实现更好的控制。是的。首先，我们确实做了类似的事情，人们认为知识是孤立的，所以我们想，好吧，让我们尝试真正隔离它，因为这似乎合理。我们可以想象做其他类似的事情。

And so maybe that's a way to like take these fuzzy kind of interpretability stuff and like actually train a model to make those more hard and more concrete to enable better control. Yeah. And to a first order, we kind of do something like that, which is people have kind of thought that knowledge is isolated and so we were like, okay, let's try to really isolate that because it seems plausible. So we could imagine doing other things like that.

Speaker 0

那篇论文叫做《隔离LLM训练中的记忆化》。这也让我想到，也许只是记忆化这个词的重叠，但现在很多讨论都在谈论LLM中记忆的作用，特别是记忆在基于注意力的LLM中不是一个非常稳健的特征，以及记忆架构是提高性能的有前途的方式。这与这个有任何关联吗？

That paper is called Isolating Memorization During LLM Training. It also brings to mind for me, maybe it's just this word overlap of memorization, but a lot of conversation now is talking about the role of memory with LLMs and in particular how memory is not a very robust feature of attention based LLMs and how memory architectures are a promising way to increase performance. Does that relate to this in any way?

Speaker 1

我的意思是，我可以从哲学的角度思考，我认为它们在某种程度上是相关的。在某种意义上，我们希望有另一种方式，记忆化实际上可能有用，不仅仅是为了隐私，而是告诉我们，我们可以分离那些应该保持不变或保留在模型中的东西，以及那些应该更新的东西。例如，事实会改变，但我们仍然希望保留模型的推理能力或语言能力。所以，如果我们有办法分离这些，那将有助于解决其中一些问题。因此，你可以将你所说的记忆理解为我们希望模型真正记住的东西。

I mean, I can think of a philosophical way in which I think they're somewhat related. In that, I think in some sense we want to have another way that memorization things actually could be useful is not just in like just for the privacy, but it tells us like we can disentangle things that should be constant or kept in the model and things that should be updated. So for example, facts change, but we want to still preserve the ability to reason or the linguistic capabilities of the model. So And if we had ways to kind of disentangle those, then that helps with some of these things. And so you can think of memory in the context that you were saying as stuff that we kind of want the model to actually remember.

Speaker 1

所以，我们越能分离那些应该保留的东西与那些独立的东西，我认为在这个意义上，它们是相关的。也许我们可以交叉分享一些想法，比如具体的架构是如何开发的等等。

And so the more we can kind of disentangle stuff that should be kept around versus stuff that's independent of that, I think in that sense, they are sort of, you know, related. And maybe there are some ideas that we can cross share on, like, how the exact architectures are developed and so on.

Speaker 0

是的。是的。是的。我不知道神经科学的影响会是什么，或者我们从神经科学中学到了什么，但我觉得，在人类中，事实记忆和概念记忆是不同的。所以我们应该有不同的方式。是的。

Yeah. Yeah. Yeah. And I I don't know what the kind of neuroscience implications would be or what we've learned from neuroscience, but it strikes me that, like, in humans, like, fact memory and con concept memory are different. And so we should have different Yes.

Speaker 0

是的。在AI方面的事情。

Yes. Things on the the AI side.

Speaker 1

没错。我认为那些可扩展但仍试图强制执行类似这样的架构是非常有前途的。

Exactly. And I think architectures that are scalable but still kind of try to enforce something like this, I think, are very promising.

Speaker 0

最后，在你列出的局限性清单中还有创造力，这就是你重要论文的切入点。请谈谈探索创造力的动机。

And so lastly, on the list of limitations you profiled is creativity, and that's where your big paper comes Talk a little bit about the motivation to explore creativity.

Speaker 1

就像每个AI研究人员一样，我也尽可能多地使用LLM来自动化工作，让生活更轻松。有时候——现实中可能发生过也可能没有——你会提示模型生成作业题目。对吧？特别是因为你希望模型能产生一些新东西，比如我在想：模型能想出一些真正巧妙的东西吗？就像我无法想到的测试学生的题目，或者新的研究想法来撰写新的资助提案之类的。

So I guess like every AI researcher, I try to also use LLMs as much as I can to automate and make my life easy. And sometimes, may or may not have happened in real is that you prompt the model to generate homework problems. Right? Like especially because you want the model to generate something new or like, you know, I'm like, oh, can the model come up with something really clever, like a bit like to test students that I couldn't come up with or something. Or new research ideas to, you know, write some new grant proposals or something.

Speaker 1

我几乎总是发现——或者可能总是——模型从未能给我真正让我觉得‘我没想到过那个’的东西。它们在我尝试使用的几乎所有其他任务上都很出色：擅长总结，擅长分析一堆不同事物的共性并从中提炼内容。但在我给模型的这些开放式任务上，它们并不真正擅长。

And I've almost always found that or maybe always, like the models have never been able to give me something that's truly like, I hadn't thought of that. So they're great at almost every other task that I try to use them for. They're great at summarization. They're great at looking at what's common across a bunch of different things and drawing some stuff there. But they're not really good at these open ended tasks that I give these models.

Speaker 1

所以这某种程度上就是动机，或者说一直有个念头萦绕着我：是的，我就是觉得不太对劲，我们该怎么思考这个问题？社区里也有人一直在尝试分析研究想法，比如实际运行人类研究来看看模型是否能生成想法，这方面有很多反复讨论。所以这是一个非常模糊的领域，似乎是人们正在思考的问题。这是与Google研究员Vaishnav的合作项目，我们有过这样的对话，他也是下一篇关于下一个词预测局限性的论文的作者之一。

And so that was sort of like the kind of motivation or like just, you know, there's always something that was lingering as like, yeah, I just don't feel like, oh, how do we think about that? And there's also work that people have been trying to in the community about analyzing research ideas, like actually running human studies to see can models generate ideas or not, and there's been a lot of back and forth. So it's something that is very fuzzy and seems like things people are thinking about. And this was in collaboration with Vaishnav who's a researcher at Google. And so we were having this conversation and he was also an author on this next limitations of next token prediction paper.

Speaker 1

因此我想做一些类似的研究：尝试确定创造力的核心原则是什么，我们能否测试这些原则是否真的能从我们为这些模型设定的训练目标中涌现？这是一种我们可以找到答案的方法，因为拥有实际的基准和测试创造力——谁知道训练数据里有什么——所有这些都使得大规模回答这个问题更加困难。所以我们尝试采取不同的方法，从第一性原理出发思考创造力，设计简单任务，并在这些具体任务中观察：什么才是正确的目标？下一个词预测做了什么？我们如何提取创造性解决方案？

And so I wanted to do some kind of a similar research of like, let's try to identify, what are the core principles that we might need for creativity and can we test, can that actually emerge from these training objectives that we have from these models? And that's like a way that we can get an answer to this because having actual benchmarks and testing creativity and who knows what's in the training data, all of that makes it harder to answer this at scale. So we try to take a different approach, be like, okay, let's just start from first principles here and think about creativity, try to devise simple tasks and see in these concrete tasks, like what is the right objective? Like, what does next token prediction do? How do we extract creative solutions?

Speaker 1

这就是我们开始思考这个问题的过程。同时，我的实验室还有另一个项目，从更现实的问题解决场景（如推理）角度研究这个问题。在那里，我们也发现模型——尤其是在进一步训练后——它们的解决方案开始趋于同质化。对于熟悉推理时间设置的人来说，你可以给模型一次回答的机会，或者多次查询它。我们发现，当我们训练模型时，一般来说，它们在一次性设置中的性能确实会提高，但在多次查询的设置中反而变得更差。

So that was sort of how that gotten how we started thinking about this. At the same time, I also had another project in my lab that looks at this from more, slightly more realistic settings of problem solving, like reasoning. And there again, we found that models, especially after you train them more, they started kind of collapsing in their solutions. So for people who are familiar with inference time settings, you can either give the model one shot to answer, or you can query it multiple times. And what we find is that when we try to train models, in general, their performance at this one shot setting does go up, but they also get worse at this like, give it multiple times.

Speaker 1

这意味着当模型训练过度时，它们似乎开始给出相同的解决方案，反复尝试相同的错误方法，而不是真正尝试多样化的解决方案。因此我确信这确实是个问题，模型在尝试创造性、多样化的解决方案方面并不擅长，即使在现实任务中也是如此。我们想从第一性原理研究这个问题。这确实是我们思考创造力的动机。一旦我们有了一些非常简单的任务来工作，这实际上使我们能够做出正式陈述或直观陈述，说明不同目标的作用：它们表现如何？我们能否做得更好？

So what it kind of means is that when models are trained too much, it seems like they start giving the same solution and try the same incorrect thing rather than actually trying diverse solutions. So I was kind of convinced that this is actually a problem and that models are not very good at trying out creative, diverse solutions, even on realistic tasks. And we kind of wanted to study this from a first principle. So that's sort of really what motivated us to think about creativity. And then once we had some really simple tasks to work with, that actually allowed us to make formal statements or give intuitive statements about what do different objectives do, like how do they perform, like can we do something better?

Speaker 1

我们发现了一些非常不错、有趣的替代方案，可以改进人们现有的创造力范式。我们现在正尝试大规模运行这些方案，并将其中一些想法实际应用于训练更大规模的模型。我们对通过改变训练方式来提升模型创造力的可能性感到非常兴奋。

And we found some really nice, interesting alternatives to the current paradigms that people have to improve creativity. And we are now trying to run these at scale and put some of these ideas in actually training larger scale models. And we're kind of excited to see where we can take in terms of making models more creative by changing the way we train them.

Speaker 0

那么谈谈创造力的目标意味着什么。这似乎很难用目标来捕捉。

So so talk a little bit about what an objective for creativity means. Like, that seems very difficult to capture in an objective.

Speaker 1

是的。我们从认知科学的大量工作中汲取灵感，特别是博登的研究，它试图形式化一些创造力的概念。我们并不是在思考什么是创造力的正确定义，而是更多地借鉴认知科学家对创造力的定义并加以运用。这里有一个非常具体的例子，叫做组合创造力。

Yeah. So we drew inspiration from a lot of work in cognitive science, like particularly Boden's work that tries to formalize some notions of creativity. So we are in no ways kind of like really not thinking about, like, what's the right definition of creativity, but more like let's lean in on the definitions that cognitive scientists have taken about creativity and use that. So it's a very concrete example. There's this notion called a combinational creativity.

Speaker 1

我认为文字游戏是一个很好的例子。比如一个笑话，让我想想……为什么这个……等等，我好像忘了笑点是什么。是的，我觉得这个笑话我看过太多次，已经觉得不好笑了。所以没错。

And I think a nice example of that is wordplay. So if we think of a joke, for example, let me what this why did this care what's let me try to pull up what exactly I missed the punch line here. Yeah. I I feel like I've seen this so many times that I I'm like, it's not funny to me. So yes.

Speaker 1

嗯，是的。举个例子：为什么稻草人会获奖？笑点是因为他在他的领域里出类拔萃（outstanding in his field）。思考为什么这有创意或有趣，是因为稻草人和获奖这两个词看似不同、无关，但实际上存在这种新颖的连接——'outstanding'（出类拔萃/站在田野中），这种意想不到的联系将这两个词联系在一起。

Well yeah. So I guess the the one example is why did the scarecrow win an award? And the punch line is because he was outstanding in his field. So if can think about why this is creative or why this is funny, is because scarecrow and award are two kind of seemingly different words, like unrelated. But there is this actual connection, like outstanding, that is somewhat novel or something that you hadn't thought about that actually links these two words together.

Speaker 1

所以这是一个组合创造力的例子，我们试图看模型是否能找到意外的连接，比如通过图结构，或者找到两个有共同父节点的词，就像'outstanding'的两种不同解释。模型能否真正提供新的方式来实现这一点？我们将其抽象为：假设我们有一个图，我们教模型图中所有的边，模型能否发现新的兄弟节点？即两个看似不连接但实际上有父节点的节点。

So that's an example of next token sorry, that's an example of this combinational creativity where we have like we're trying to see whether the model can find unexpected connections, like through a graph or like find two words that like have a common parent, this two different interpretations of outstanding. And so like, can the model actually give new sort of ways of doing this? And so the way we abstract this is, now let's say we have a graph and we teach the model all the edges in the graph, can the model actually discover new siblings? So like two nodes that actually have a parent. So the two nodes don't look connected, but they have a parent.

Speaker 1

那么模型能否真正看到这些并生成更多这样的连接？它能否在图中找到新的关系？这就是创造力的一种抽象例子，真正遵循了博登关于组合创造力的研究。他们有很多例子说明，许多我们认为有创造力的事情，实际上就是在事物之间找到这些意外的联系。我们考虑的另一种创造力是探索性的，即我们只想自由地发现世界上有趣的事物。

So can the model actually see some of these and then generate more of these? Like can it find new connections in the graph? So that's one example of creativity that's abstracted, really following Boden's work on the combination of creativity. And they have a lot of examples on how a lot of things that we do that we think are creative are actually kind of finding these connections, unexpected connections between things. The other kind of creativity we consider is just exploratory, which is we just want to free from find interesting things in the world.

Speaker 1

但当然它们仍然具有一定的结构，只不过这是一种潜在的结构，而我们正是想从这种结构中发现新事物。正如博登所解释的，例如，艺术家所做的很多工作都属于这种探索性创造。我们尝试用数学和符号将其表述出来，以便训练模型，比如在模型中训练生成圆形，遵循图的设定，模型能否发现图中它未曾训练过的新圆形或新三角形？这些就是我们思考创造力的方式。它们绝非完美，但我认为它们捕捉到了一些核心原则。

But of course they still have some structure to them, which is but it's just sort of a latent structure, but we just want to find new things from that structure. And like Boden explained, a lot of the work that artists do, for example, is this kind of exploratory creativity. And we try to write this down in like math and symbols that we can train models on as, let's say, we train on a bunch of like in the model, like generate circles, for example, following the graph kind of setting, can the model find new circles or new triangles in the graph that the model wasn't trained on? So these are the ways we think about creativity. They're not perfect by any means, but I think they capture some of these core principles.

Speaker 1

它们已经揭示了为什么某些训练目标可能对这些创造力概念实际上是好是坏，以及我们如何能更深入地探讨，比如什么是训练这些模型的正确方式，或者我们当前的训练可能缺失了什么，我们应该如何思考也许新的预训练方式来鼓励这种创造力。

And they already show why certain training objectives might actually be good or bad for these notions of creativity and how do we kind of and we can dig more into that, like what's the right way to train these models or what could our current training be missing and how should we be thinking about maybe new ways to pre train these models to encourage this kind of creativity.

Speaker 0

退一步讲，也许带点哲学意味，你如何区分你所思考的那种创造力，广泛地说，不一定指这些特定结构，而是更广泛地，你希望看到语言模型发展的方向？比如，我让LLM用搞笑的口音讲个笑话，它就能做这类有点创造性的事情。对于那些说‘我经常用JetGPT，它很有创造力’的人，你如何区分？

Taking a step back and maybe getting a little bit philosophical, how do you distinguish the kind of creativity that you are thinking about broadly, meaning not necessarily, you know, these particular constructs, but, you know, more broadly the direction that you'd like to see LMs go with, you know, I asked LLM to, you know, tell me a joke with a funny accent and, like, it can do these kind of creativity ish kinds of things. Like, how do you for someone who says, oh, I use JetGPT all the time and it's very creative. Like, how do you distinguish?

Speaker 1

是的。我认为一种创造力的概念仅仅是，它是否生成了训练数据中没有的东西？对吧？而且是有意义的东西。就像，这已经很长时间以来都不容易了。

Yeah. I think I feel like one notion of creativity is just, is it generating something that's not in the training data? Right? And this was all and something meaningful. And like, this was already for a long time, like not easy.

Speaker 1

所以，模型能做到这一点，能生成某某风格的诗歌等等，这非常令人印象深刻，对吧？我们可以做所有那些事情。我认为区别在于我感觉是开放式的 versus 封闭式的。在这些情况下，我们仍然在大致说明我们想去哪里。比如如果我说用Y风格写X，我是在告诉模型要做什么，然后模型可以填充路径。

And so it's like, it is very impressive that models can do this and can generate poems in the style of blah blah blah, right? Like we can do all of that stuff. I think the difference becomes what I feel is like open ended versus like closed ended. So in these cases we're still kind of saying roughly where we want to go. Like if I say write X in the style Y, I'm telling the model what to do and then the model can fill out the path.

Speaker 1

但我所思考的创造力是带我去我从未见过或想过的事物。我认为这正是我感觉我们所有的构造也在试图表达的，甚至从我们自己使用这些模型的体验来看，我感觉那也许是我们需要更多思考的地方。是的。所以再举一个例子，当我查看这些基准测试或人们尝试使用模型，例如内核基准测试，以新方式解决问题时，感觉我们仍然必须仔细地提示模型关于策略、它已经做过的事情以及它应该探索的内容等等。所以我认为那部分仍然来自我们明确的指令，尽管它在生成新东西。

But what I'm thinking about creativity is take me to things that I've just never seen or thought of before. And I think that's the part where I feel like all our constructs as well are trying to say and even from our own use of these models, I feel like that's maybe where we want to think more. Yeah. So I think as another example, when I look into these benchmarks or people trying to use models to, for example, kernel bench to come up with solve problems in new ways, It feels like we still have to carefully of prompt the model on the strategies or the things it has already done and things it should explore and so on. So I think that part is still coming from our explicit instruction, even though it's generating new things.

Speaker 1

所以我有点想看看，它们能否自己也独立完成这部分？我认为那才意味着有意义地超越我们所能做的，并发现新事物。如果我们能做到这一点，并且能大规模地做到，那么，我们实际上将真正发现许多有趣的新事物并找到新的联系。

And so I kinda wanna see, like, can they actually do this part on their own too? And I think that's what it would mean to meaningfully kind of go beyond, like, what we are able to do and find new things. And if we are able to do that and we're able to do that at scale, then, we're actually gonna really discover a lot of interesting new things and find new connections.

Speaker 0

那么关于这个想法呢，嘿。我已经有了一个创意调节器。就是温度参数，我可以把它调到最高。哇哦。这会产生我完全没预料到的效果，或者你并没有提示它去做的事情。

And then how about the idea that, hey. I've already got a creativity dial. It's temperature, and I can crank it all the way up. And wow. That's doing things that I totally didn't expect or you didn't prompt it to do.

Speaker 1

是的。在我们深入讨论之前，我还想说，当我尝试为这次iClear提示这些模型时——我们当时在准备ICML的演讲——我们试图让模型生成一些我能用的新笑话，比如稻草人那种类型的。结果其实非常糟糕。我们开玩笑说这是因为模型是用我们论文中展示的那种并不理想的目标训练的。那确实是一个我试图让模型说出一些新东西、创造新笑话但仍保持结构的案例，但它没能做到。

Yeah. Before we jump into that, I wanna also say, like, when I was trying to prompt these models for this iClear, we were work ICML, we were working on the talk, and we tried to just get can the model generate jokes that I can use like the scarecrow kind of thing, like new jokes. And it was actually really bad. And I think we were joking that it's because models were trained with, you know, this kind of objectives that we are just showing in our paper are not very good. And so that was actually a case where I was trying to like get a model to, say something that was new that was a new joke and like but still have the structure, and it wasn't able to.

Speaker 1

但也许，我的意思是，在我能力范围内尽可能进行了提示工程。

But maybe I mean, and to the best extent that I could prompt engineer it.

Speaker 0

这意味着，如果你随意想想，比如只是让模型给你讲个笑话，也许它会碰到一些你没听过的，因为世界上有很多笑话。但如果你真的深入研究，它们可能并不新颖，也可能并不好笑。是的。

Meaning that you know, maybe if you're thinking about it casually, like in just asking the model to tell you a joke, maybe you, you know, it'll come across something that you haven't heard before because there are a lot of jokes out there. But like, you know, if you really dig into it, you know, they're probably not novel and, you know, probably not funny. Yeah.

Speaker 1

而且我们无法得到一个遵守结构的笑话，比如连接意想不到的实体。提示可能是：'讲一个因为连接了意外实体而好笑的笑话'。这就是我们试图达到的那种文字游戏。所以是要找到一个我没想到的新联系，但模型在这方面做得不太好。你可以想到很多不同结构的笑话，但模型并没有发现这种隐藏笑点的特定方面。

And and we couldn't get one that obeys the structure of, like, connecting like, for example, the prompt would be like, tell me a joke that's funny because it connects to unexpected entities. Right? And that was the kind of wordplay that we were trying to get at. So it's finding a new connection that I hadn't thought about, but it was kind of bad at doing that. So you could think of many different structure of jokes, but this particular aspect of this hidden punchline that the model discovered, it didn't.

Speaker 1

我认为有一个很深层次的原因，就像我们在论文中解释的，很多训练数据并没有对这种结构进行监督——它不会先给出笑点再让模型生成笑话。相反，笑点是潜在的。这意味着模型无法学习正确的结构，最终只学会了一些局部的东西，像是记忆，而没有学会我们想要的真正过程：搜索所有可能的事物，找到新的东西，然后生成它们。这就是我们试图展示的这些模型训练中的局限性。

And I think there's a very deep reason, like we explained in the paper, that a lot of the training kind of like the training data that doesn't really have supervision on sort of this, like it doesn't say first the punchline and then the joke that the model can do that. But instead the punch line is sort of latent. And so that actually means that the models are not able to kind of learn the right structure and they end up learning some things local that are like sort of memorization and they don't learn this true process that we want where think of something that search over all possible things and find something new and then generate them. And so that's sort of the limitation that we try to show in the training of these models.

Speaker 0

有两件事让我印象深刻。一是我不确定人类能否很好地执行这种指令，比如根据给定的搞笑标准即时生成一个笑话。二是，可能有些东西与结构和创意或结构和冲动的概念是正交的。试图同时传达这两者可能会让大语言模型感到困惑，就像是规则与创造之间的冲突。

Two things jump out at me. One is that I'm not sure that a human would do very well with that kind instruction, like generate this joke on the fly given, you know, this criteria funny. But also that, you know, there's something maybe orthogonal to the idea of, structure and creativity or structure and impulse. Trying to convey both of those maybe is confusing to the LLM, like rules and create.

Speaker 1

是的，这确实是个很好的观点。所以很多创造力不仅仅是说些疯狂的话，对吧？我们希望它仍然是有结构和有意义的。如果你想想像分子生物学或药物发现这样的应用场景，我们并不是要生成随机的新东西。我们实际上希望模型能够推断出正确的潜在结构，并根据这个结构生成新事物。

Yes, actually that's a good point. So like a lot of the creativity is not just simply saying something crazy, right? We want it to be still structured and meaningful. And if you think about like use cases, like molecular biology or like drug discovery, it's not like we want to generate like random new things. We actually want the model to infer the right latent structure and all of these and generate new things according to that structure.

Speaker 1

这就是我思考的那种创造力，也是我们在这些任务中尝试建模的。如何让模型在尝试新事物的同时仍然遵循某种结构，这是个很好的问题。我想你刚才问我关于温度参数的问题，这正好引向这个方向——你知道，只要调高温度，就能开始得到疯狂的结果，但模型也会变得不那么结构化。所以我们真正需要的是这种结构化的探索。同样地，当我尝试让模型生成作业题时，我不希望它拼凑出毫无意义的东西，虽然那可能很有创意，但我真正希望的是它能遵循某种特定的逻辑或结构。

So that's really the kind of creativity that I'm thinking about and that we try to model in these tasks. And that's a great point that how do we get a model to try new things while still obeying some structure. And I think you were just asking me a question about like the temperature and I think that really leads into that where like, you know, just crank up the temperature, you can start getting crazy things, but then the model's also going to be less structured. So we really want this sort of structured exploration from these models. And similarly, when I try to get a model who generated a homework problem, I don't want it to put together something like that's, like, meaningless, so that's, you know, creative, but it, like, really want I wanted to follow some certain logic or some structure there.

Speaker 0

这个想法很有趣，最有价值的创造力例子是结构与...

That's an interesting idea that the most valuable examples of creativity are, you know, a balance of structure and

Speaker 1

冲动的平衡。是的。没错。正是这样。

impulse. Yes. Yes. Yes.

Speaker 0

那篇论文是《掷骰子与三思而后行：突破下一个开放预测的创意极限》。其中'掷骰子'和'三思而后行'这两个部分是如何体现的？

That paper is roll the dice and look before you leap going beyond the creative limits of next open prediction. Where does the roll the dice and look before you leap parts come into that?

Speaker 1

好的。首先我来谈谈'跳跃'部分，因为我觉得这与我们刚才讨论的内容更相关，就像是这种文字游戏或关联。他们研究的是需要做出思维跳跃的任务，这种跳跃在训练数据中通常没有明确说明，这就是为什么模型难以真正推断出这种思维。所以我们说的是，模型实际上应该被训练来学习如何进行这些跳跃，特别是结构化的跳跃，而不是仅仅展示跳跃的结果——这正是我们当前训练数据的情况，因为这些数据只有结果，没有背后的思维过程。

Okay. So first, I'll talk with the leap because I think that ties in closer to what we just chatted about, which is sort of like this wordplay or like this connection. We they're looking at task where there is a leap of thought that has to be made, which is not often spelled out in the training data, which is why models struggle to actually infer that thought. And so what we're saying is that models should actually be trained to kind of learn how to take those leaps or like take those structured leaps, especially, and rather than just showing the outcome of the leaps, which is kind of what our current training data is because those just have they don't have the thought process that goes behind these.

Speaker 0

这听起来有点像训练思维轨迹而不是训练答案。你觉得这方面有发展路径吗？

It sounds a little bit like training on thought traces as opposed to training on answers. Like, do you think there's a path there?

Speaker 1

是的。我认为这肯定是实现这一目标的一种方式。但我们在论文中展示的另一部分实际上是不同的训练目标，比如无教师训练。这可能包括多标记预测，我们试图阻止模型仅通过局部观察来获得正确答案。但如果它不这样做，如果它必须实际生成整个模式或整个事物集，那么这实际上会鼓励模型具备这种全局理解。

Yes. I think that's certainly one way to do that. But I think another part that we show in the paper is actually different training objectives like teacher less training. So that could be things like multi token prediction where we are trying to discourage the model from getting the right answer by just looking locally. But if it doesn't, if it has to actually generate the entire pattern or the entire set of things, then that actually encourages the model to have this global understanding.

Speaker 1

最近人们对扩散模型感到兴奋。我们发现扩散模型也有类似的特点，它们不会显示所有标记，而是会掩盖不同的内容。这实际上鼓励模型具备某种规划能力或更全局的处理能力。因此，在我们的实验中，我们发现这两种目标实际上效果更好。一种是多标记预测，我们强制模型一次生成多个标记，而不是只生成一个标记，看到正确答案后再生成下一个。

And people are excited about diffusion models lately. We find that diffusion models also kind of have a similar thing where they don't show all the tokens and they kind of mask out different things. And so that actually encourages the model to have some sort of planning or like some, the ability to do more global things. And so we find these two objectives actually work a lot better in our experiments. One is multi token prediction where we just force a model to produce multiple tokens at a time rather than just one token, see the correct answer and then the next token.

Speaker 1

因此，在多标记预测中，模型无法获得局部监督，必须在获得奖励之前把所有事情都做对。而扩散模型可以被认为具有这些不同的掩码或不同的排序方式。这也鼓励模型具备更全局的理解。所以，如果我们关心从模型中获得这种多样化的生成结果，这两种方法都是值得更认真追求的替代方案。这就是'三思而后行'的部分。

And so in multi token, the model does not get that local supervision, has to get everything right before it gets reward. And diffusion models like can be thought of as like having these different masks or different orderings. And so that also encourages the model to have like more global understanding. So both of those are alternatives that might be worth pursuing more seriously if we care about getting these kind of diverse generations from the models. That's the part about like look before you leap.

Speaker 1

'掷骰子'的部分也很有趣。我认为，关于如何从模型中获得多样化内容，有一个未被充分探索的方面。目前有两种方式。一种是我们自己思考多样化的内容。当人们提示模型时，他们会给出更具体的指令，然后又说，实际上我想做点别的事情。

The part of the roll the dice is also really interesting. I think one, there's an underexplored aspect of how to get diverse things from the model. Right now there are two ways. One is we think about the diverse stuff ourselves. And when people prompt the models, they're like, you give more specific instructions and then you're like, actually I want to do something else.

Speaker 1

然后我们告诉模型，去做这个 instead。这是一种方式。机器学习人员可能考虑的另一种方式就是提高温度参数，这样会产生更多标记。正如我之前所说，这往往也会破坏结构。因此我们试图思考，好吧，我们应该如何思考创造力？

And then we tell the model, go do this instead. So that's one way. The other way that machine learning people may be thinking about is just increase the temperature and that gives you more tokens. And like I said earlier, that also tends to really describe the structure. And so we try to think about, okay, how do we think about creativity?

Speaker 1

什么是引导模型产生这种结构化随机性的正确方式？一种比温度采样更自然的方式是首先生成一个随机想法，然后遵循这个想法生成所有标记。所以引入随机性的合适位置可能是在开始时，模型承诺要探索某个方向。一旦它决定沿着那条路径前进，如果你继续提高温度，它就会变得杂乱无章。相反，我们选择某个方向，然后坚持到底。

Like what's the right way to kind of elicit this randomness from the model that can be structured? And one way that seemed more natural than temperature sampling is first generate a random idea and then follow that idea and generate all your tokens. So the right place to introduce randomness is probably at the beginning where the model commits to exploring something. And then once it decides to go down that path, if you keep increasing the temperature, then it's just gonna go all over the place. Instead, we pick something and then we stick to it.

Speaker 1

因此在测试时，如果你想要更多样化的生成结果，我们会采样新的前缀或新的起点，然后让模型自行发挥。这就是我们说应该引入随机性的地方，就像'掷骰子'一样。所以我们先掷骰子，然后根据骰子的结果，我们选择那个想法并坚持下去。当然，这需要训练模型具备这种能力。我们在论文中提出了一种非常简单的方法，就是在训练数据中直接添加随机前缀。

And so at test time, if you want more diverse generations, we would sample new prefixes or new starting points, and then let the model kind of do its thing. So that's where we're trying to say we should introduce randomness, which is like the roll the dice. So we roll the dice first and then once we are based on the outcome of the dice, we kind of pick that thought and go and go with it. And so this of course, it requires training the model to be able to do that. And we have a very simple way to do this in the paper, which is we just, in the training data, we just have random prefixes that we prepend.

Speaker 1

所以模型会基于一个随机前缀进行条件生成，你可以把它看作某种随机想法的代理，然后它生成内容。在测试时，我们可以通过改变这个前缀来从模型中获得新的想法或新的生成内容。有什么问题吗？

So the model conditions on a random prefix that you can think of that as like some proxy for a random idea and then it generates things. And at test time, we can get new ideas or new generations from the model by changing this prefix. Sorry?

Speaker 0

是无意义的随机前缀还是有意义的随机前缀？

Random nonsensical prefix or a random sensical prefix or?

Speaker 1

是的，好问题。在我们的论文中，由于实验非常简单，无意义的随机前缀实际上就能奏效，这很令人惊讶。我们并不完全理解为什么它能工作。这也是非常有趣的未来研究方向。但作为概念验证，看起来似乎有些价值。

Yes, great question. So in our paper, because our experiments are so simple, a random nonsensical prefix actually just works, which is surprising. We don't really understand why it works. That's very interesting future work too. But as a proof of concept, it seems like, okay, like there's maybe some meat here.

Speaker 1

目前我们团队正在研究使用更有意义的前缀，这些前缀实际上能捕捉想法的具体内容，比如想法的某些语义。然后基于这个条件来生成内容。如果这种预训练范式最终有效，这也将为我们提供更可控的多样化生成。不同于用明确指令提示模型，我们可以通过某种巧妙的方式改变我们基于的条件前缀，从而获得多样性。这就是我们想表达的：也许我们应该考虑的不是进行温度采样，而是告诉模型基于新的想法进行条件生成，这些想法可以是随机的，从而能够随机化或变得更加多样化。

And what we are looking at now in my group is having more meaningful prefixes which actually capture like what exactly is the idea, like some semantics of the idea. And then condition on that to generate things. And if this paradigm of pre training ends up working, this will also give us more controlled diverse generations. Instead of prompting the model with explicit instructions, we can actually just change this prefix that we condition on in some nice way and then we can get, you know, different, like we can get diversity in that sense as well. And so that's what we are trying to say that maybe we should think about is rather than doing temperature sampling, maybe we can tell the model to condition on new ideas, random that can then be randomized or can be made more diverse.

Speaker 1

这种训练和推理方式可能实际上是从这些模型中获取结构化多样性的更好方法。

And that way of training as well as inference might actually be a better way to get structured diversity from these models.

Speaker 0

我们提到了GPG5以及推理时间计算、推理时间缩放的概念。我认为我们从这个模型中学到的一点是，至少在较高层级，它会进行大量并行推理并将这些结果整合在一起。在某种意义上，你可能会认为，假设这些东西并行处理，你会获得思维多样性，但似乎也存在一种平均化效应，某种程度上削弱了这些思维的表达。我觉得这与你的工作有关，也许他们需要在并行线程中使用更多随机种子之类的。

And we we mentioned GPG five and also this idea of, you know, inference times compute, inference times scaling. And one of the things that I think we're learning about this model is that at least at the higher tiers, like it does a bunch of parallel inferences and kind of integrates those together. And, in some sense, you might think that, okay, assuming this stuff in parallel, like, you're gonna get this diversity of thought, but also there seems to be this averaging effect that kind of like, dries out the expression of that thought. And it strikes me that it's connected to kinda your work, and maybe they need more random seeds in their their parallel threads or something.

Speaker 1

是的。这也取决于这些并行思维实际上是如何产生的。我的意思是，我不知道这些思维在幕后究竟是如何生成的。所以有两点评论：一是如果他们在某种起点上进行强化学习，问题是我们如何恰当地利用这个起点来获得正确的多样性？

Yeah. It also depends on how these parallel thoughts are actually coming. I mean, we don't I guess I don't know how exactly those thoughts are being generated behind the scenes there. So two comments. So one is if they're doing RL on sort of a starting point, the question is how do we leverage the starting point appropriately to get the right diversity?

Speaker 1

你真的能超越基础模型能生成的内容吗？某些内容是否比其他内容更可能出现？如果是这样，我们能否以某种方式纠正这一点？如果我们想生成更多某一类内容而非其他呢？所以所有这些事情在当前范式下仍然相当具有挑战性。

Can you actually go beyond what the base model can generate? Are some things more likely than the others? In which case, can we correct that in some way? What if we want to generate more of one kind than the other? So all of these kinds of things are still going to be quite challenging with kind of the current paradigm.

Speaker 1

因此，如果我们以一种允许这种多样化控制采样的方式进行预训练，那么希望我们能获得更多。但确实，我认为基本上不清楚的是，我们如何获得那种多样性，以及我们实际上如何覆盖各种可能性？如果我们只是让模型完全端到端处理，那么可能又会出现某种崩溃，模型可能无法生成我们需要的所有多样化内容。是的，其中一些可以通过仔细收集训练数据来模拟。但当然，如果你想做的事情与训练数据非常不同，那么所有这些事情如何运作呢？

So hopefully if we do pre training in a way that allows this kind of diverse control sampling, then there's more that we can get. But yeah, I think the thing that is basically unclear is like how do we get that diversity and how do we actually span things? Like if we just let it all be end to end from the model, then again there might be some collapse and the model might not be generating all the diverse things that we need. And yeah, and there's some of that that you can simulate by carefully collecting training data. But then of course, if you want to do stuff that's very different from your training data, like how do all of these things work?

Speaker 1

是的，这又很难说，尤其是关于面板追踪这类事情。但我认为，模型在某种程度上就是我们所说的，我们希望模型尝试一个新想法然后坚持下去，而不是在每一步都引入随机性。

Yeah, that's hard to say again with the panel traces kind of thing. But I think model it is sort of what we are saying is that we want the model to try a new idea and then stick to it rather than introducing randomness at every step.

Speaker 0

而随机前缀——在你的案例中，随机前缀是在推理时添加的吗？所以就像是你在提示词前面添加了这个随机前缀？

And the random prefixes the random prefixes in your case are at inference time? So it's like you're prepending that you're prepending this random prefix to the prompt?

Speaker 1

是的。所以在我们的玩具设置中，没有所谓的提示概念，我想也没有指令，但是的，你可以把它看作是前置添加。是的，是的。你是在前面添加。

Yeah. So in our toy settings, there's no like notion of a prompt, I guess there's no instruction, but yeah, you can think of that as like prepend. Yeah. Yeah. You're prepending.

Speaker 1

所以你是给出你的问题，然后添加提示——抱歉，是添加随机字符串，然后让模型生成。在实践中就是这样运作的。是的。但你也必须训练一个模型能够做到这一点。所以在训练时，我们也必须模拟这种情况，即模型在开头有随机字符串，并在训练中使用它们。

So you're giving your question and then you're adding the prompt and then letting sorry, adding the random string and then letting the model generate. That's how it would work in practice. Yes. But you also have to train a model to be able to do this. So at training time too, it should we have to simulate this where the model has random strings in the beginning that it uses in its training.

Speaker 0

意思是，所以不要指望你可以在Chat GPT的提示开头随便加一些随机性。那将会——

Meaning, so don't expect that you can just add some randomness to the beginning of a prompt to chat GPT. That's gonna

Speaker 1

给你更好的访问权限。是的。是的。是的。

give you better access. Yes. Yes. Yes.

Speaker 0

所以你刚才谈到了强化学习（RL）。你认为这种方法与RL训练兼容吗？

And so you kind of talked about, RL. Is the do you think that the approach is compatible with, RL training?

Speaker 1

是的。这很吸引人。我认为总的来说，我们在基础模型采样方式上所做的任何改进都会直接转化为RL的进步，因为很多RL过程就是：你从模型中尝试很多东西，然后让模型更多地做它做得好的事情。所以如果你有更好的起点，那么所有这些方面都会自动得到改善。我认为这些改进在很多方面都会很有用，比如结构化探索等理念。

Yeah. This is fascinating. I think overall, any improvements that we can do to the base model in terms of how we sample that directly translates to RL because a lot of the RL is like, you try a bunch of things from the model and make the model do more of what it's doing well. And so if you just have better starting points, then you just will automatically improve all of these things. And the way I think a lot of this could be useful is in ideas like structured exploration and so on.

Speaker 1

人们不断提到，探索是通往下一个前沿领域的最大挑战之一。所以如果我们想要在许多不同空间进行某种结构化探索，你对多样性控制得越多就越好。所有这些，我觉得都非常兼容，并且实际上对进一步的模型训练会非常有用。

And people keep bringing up, I guess, exploration is one of the biggest challenges to the next frontier. And so if we want to do some kind of structured exploration over many different spaces, the more control you have over the kind of diversity, the better, yeah. So all of that, I feel like that's very compatible and would, like, actually be really useful for further adult training.

Speaker 0

我已经多次参考这篇博客文章。这里面提到了这些局限性和机遇，实际上我们已经讨论了机遇部分。这涉及到记忆同步、种子条件化（就像前置随机文本）和多标记学习等概念。你已经暗示了一些未来的研究方向，但能否更广泛地谈谈你看到的这些不同努力的发展方向。

So I've I've referred back to this blog post a few times. And so there is these limitations and opportunities, and we've actually covered the opportunities. It's this idea of memorization syncs, seed conditioning, which is like prepending the random text and multi token learning. You've hinted at some future directions for your research, but talk a little bit more broadly about like the direction that you see these various efforts going.

Speaker 1

是的，我认为我们真的很想扩大这些工作的规模，既是为了我们自己，也是为了向人们证明这些是值得投资用于训练未来模型的干预措施。特别是这个适应性部分，我经常思考人们如何保持模型的新鲜度？就像现在，每个人都在每年训练新模型，所以也许我们还没有认真思考过这个问题，但总有一天我们会无法继续这样做。我认为Straumann的答案就是做检索，这样你就能获得最新的事实，一切都没问题。我们在上次会议上还有另一篇口头报告论文，展示了模型在使用上下文和覆盖其参数信息方面并不会那么出色。

Yeah, I think a lot of, like we really want to scale up a lot of these things and to see, and to both, for ourselves, but also convince people that these are the interventions that are worth investing for training their future models. And especially this adaptability part, I often think like how do people keep their models fresh? Like right now we are at a point where everyone is training new models every year and so like maybe we haven't thought about this issue, at but some point we're like kind of not going to be able to do that. And I think the Straumann answer here is like just do retrieval, like that gets you the latest facts and everything's okay. And we had this other paper that was an oral at the previous conference where we kind of show how that models are not gonna be that good at using the context and overriding its parametric information.

Speaker 1

所以这并不是保持模型更新的神奇解决方案。我认为我们正在尝试探索的方面真的很令人兴奋：我们如何真正制造出易于更新的模型？什么是分解或解耦这些应该保留的方面的正确方式？什么应该被更新？所以我对这个方面感到非常兴奋。

So this is not really a magic solution to keeping the models sort of updated. So I think we're trying to like I feel like the aspect that's like kind of really interesting thing about is like, how do we actually make models that are easy to update? Like what's the right way to decompose or disentangle these aspects of what should be preserved? Like what should be updated? So I think I'm really excited about kind of that aspect.

Speaker 1

我也非常兴奋能够扩展这些想法，以提升多样性和创造性生成。我们都在关注定理证明等领域，试图构建能够帮助数学家的系统。每次与所有想要使用大语言模型的人交谈时，我总是不禁思考：如何确保模型能够真正搜索并发现创造性的事物？例如，寻找反例。我们可以使用大语言模型，它们会很出色，但如何确保它们进行那种结构化的探索，比如以某种方式找到创造性的反例？

And I'm also really excited about scaling up these ideas for improving diversity, creative generations. And we're all looking at things like theorem proving and trying to build systems that help mathematicians. And I feel like every time I talk to all of these people who want to use LLMs, I keep coming back like, how do we make sure models can actually search and find creative things? Like for example, finding counter examples. Like we can use LLMs, they would be great, but how do we make sure they that kind of structured exploration, like finding creative counter examples in some way?

Speaker 1

因此，我非常期待将这些想法应用于各种不同的领域，我认为这些是目前人们最感兴趣的应用，能够真正推动科学各个概念的前沿。这些就是我们正在思考的事情。当然，所有这一切，我想我们稍微谈到了试图理解这些模型的事实。我认为我们的许多实验也试图不仅仅是展示一个在该领域取得高分数的最新方法，而是真正希望提供一些经得起时间考验的目标，展示超越特定数据集或特定训练决策的基本特性的新理解。这将有助于指导我们尝试训练的下一代模型。

So I would be very excited about using these ideas in like all of these different applications that I think are top of mind for people right now in terms of the interesting applications that can really push the frontiers here across various notions of science. So those are kind of the things that we're thinking about. And of course, all of this, we are I think we spoke a little bit about the fact of trying to understand these models. And I think a lot of our experiments are also trying to less be like, here is a state of the art method that gets high numbers on this, but really hopefully give some objectives that stand the test of time in terms of showing you new understanding about the fundamental properties that go beyond specific data sets or specific training decisions that you might make. And that would help in guiding kind of the next generation that we try to train.

Speaker 1

因此，我们也在继续沿着这个方向努力。

So we are continuing our efforts along that axis as well.

Speaker 0

好的，Adichie，非常感谢你分享一些关于你研究的内容。这是非常有趣的东西。

Well, Adichie, thank you so much for sharing a bit about your research. It's very interesting stuff.

Speaker 1

是的。非常感谢。我特别想强调一点，我希望有更多人思考如何理解这些模型，因为我们真的需要大量工作在这方面。我认为，你知道，这更像是一个科学问题，这些是复杂系统，我们真的需要像科学家一样思考，设置正确的对照实验，提出正式的假设，进行测试，这里有很大的机会。我认为这将真正解锁我们如何进一步推动模型，尤其是当人们开始看到仅仅推动当前范式可能带来的收益递减时。

Yeah. Thank you so much. And I kind of wanted to especially sort of plug a little bit that I want more people thinking about trying to understand these models because we really like need a lot of work there. And I think, you know, it's most it's like a scientific question where these are complex systems and we really have to think a bit like a scientist setting up the right controlled experiments, make formal hypothesis, test them out, and there's, like, a huge, sort of opportunity here. And I think that will really unlock, like, how we further push models, especially as people are starting to see maybe diminishing gains from, you know, just pushing our current paradigm.

Speaker 1

所以，我认为那里有很多机会。因此，希望我们能有更大的社区关注这些方面。

And so, like, it's kind of I think there's a lot of opportunity, there. And so, hopefully, we can have a bigger community looking at these aspects.

Speaker 0

太棒了。非常好。非常好。嗯，非常感谢你。

Awesome. Very good. Very good. Well, thank you very much.

Speaker 1

谢谢。

Thank you.