Latent Space: The AI Engineer Podcast - Cline:不降低成本的开放源代码编程助手 封面

Cline:不降低成本的开放源代码编程助手

Cline: the open source coding agent that doesn't cut costs

本集简介

赛义德·里兹万和克莱因公司的帕什与我们一同探讨了为何快速应用模型遭遇了苦涩教训,他们如何开创了编程中的“计划+执行”范式,以及非技术人员为何使用集成开发环境(IDE)进行营销和生成幻灯片。完整文章:https://www.latent.space/p/clineX平台:https://x.com/latentspacepod章节:00:00 - 开场介绍01:35 - 计划与执行范式05:37 - 模型评估与克莱因早期开发08:14 - 克莱因在编程之外的用例09:09 - 为何克莱因是VS Code扩展而非分支版本12:07 - 编程代理的经济价值16:07 - 面向MCP的早期采用19:35 - 本地与远程MCP服务器对比22:10 - Anthropic在MCP注册中的角色22:49 - 最受欢迎的MCP及其用例25:26 - MCP货币化的挑战与未来27:32 - MCP的安全与信任问题28:56 - 没有MCP的替代历史29:43 - 编程代理的市场定位与IDE集成矩阵32:57 - 编程代理中的可见性与自主性35:21 - 编程任务复杂性定义的演变38:16 - 克莱因的分支与开源遗憾40:07 - 代理设计中简单性与复杂性的权衡46:33 - 快速应用如何遭遇苦涩教训49:12 - 克莱因的商业模式与自带API密钥策略54:18 - 与OpenRouter及企业基础设施的集成55:32 - 模型成本下降的影响57:48 - 后台代理与多代理系统1:00:42 - 愿景与多模态技术1:01:07 - 上下文工程现状1:07:37 - 编程代理中的记忆系统1:10:14 - 跨代理工具的规则文件标准化1:11:16 - 克莱因的个性与拟人化设计1:12:55 - 克莱因的招聘与团队文化

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

大家好,欢迎收听《最新太空》播客。我是Decibel的合伙人兼首席技术官Celestial,今天和我一起主持的是Smollier的创始人Wix。

Hey, everyone. Welcome to the Latest Space podcast. This is Celestial, partner and CTO at Decibel, and I'm joined by my cohost, Wix, founder of Smollier.

Speaker 1

欢迎欢迎。今天录音棚里有两位来自Klein的嘉宾,Pash和Saud。没错,是的。

Welcome. Welcome. And today, I'm in the studio with a nice two guests from Klein, Pash and Saud. That's right. Yes.

Speaker 1

你说对了

You nailed

Speaker 0

。我们开始吧。

it. Let's go.

Speaker 1

我觉得Klein有一定粉丝基础,但并非人尽皆知。或许我们该先做个开场介绍,比如由你们来定义Klein是什么,之后也可以再调整。

I think that Klein has a decent fan base, but not everyone has heard of it. Maybe we should just get it like an upfront, like, is Klein maybe from you and then, like, you can modify that as well.

Speaker 2

没错。Klein是个开源编码助手,目前是VS Code扩展,但即将登陆JetBrains、NeoVim和命令行界面。你给Klein任务,它就会自动执行。它能接管你的终端、编辑器、浏览器,连接各类MCP服务,实质上接管整个开发工作流。

Yeah. Klein's an open source coding agent. It's a Versus Code extension right now, but it's it's coming to JetBrains and NeoVim and CLI. You give Klein a task and he just goes off and does it. He can take over your terminal, your editor, your browser, connect to all sorts of MCP services and essentially take over your entire developer workflow.

Speaker 2

它最终会成为你完成全部工作的统一入口。

And it becomes this point of contact for you to get your entire job done essentially.

Speaker 1

太棒了。Pash,你会如何调整定义?或者你认为Klein还有哪些值得关注的价值点?

Beautiful. Pash, what would you modify or what's another way to look at Kline that you think is also valuable? Yeah.

Speaker 3

我认为Kline是面向所有开源智能体的基础设施层,是构建智能体生态的底层架构。Kline是完全模块化的系统——这是我们的设计理念。我们正努力增强其模块化特性,以便开发者能基于它构建任何智能体。

I think Kline is the kind of infrastructure layer for agents, for all open source agents, people building on top of this like agentic infrastructure. Kline is a fully modular system. That's the way we envision it. We're trying to make it more modularized so that you can build any agents on top of it. Yep.

Speaker 3

通过我们即将发布的命令行工具和SDK,你将能构建适用于任何领域的全智能系统,而不仅限于编程。

So with the CLI and with the SDK that we're rolling out, you're gonna be able to build fully agentic systems for anything, not just coding.

Speaker 1

哦,好吧。这这是我对Klein的不同看法。那么好吧。嗯,我们先谈谈编码,然后再讨论更广泛的内容。你和Ader也很相似。

Oh, okay. That that is a different perspective on Klein that I had. So okay. Well, let let's let's talk about coding first and then we'll talk about the the broader stuff. You also are similar to Ader.

Speaker 1

我不知道谁先提出的,但你经常使用计划和行动的模式。我不确定这个有多为人所知。对我来说,我算是比较了解的。但也许你们想解释一下,为什么不同的事情需要不同的模型。

I don't know who comes first in that you use the plan and act paradigm quite a bit. I'm not sure how well known this is. Like, to me, I'm relatively up to speed on it. But again, like, maybe you guys wanna explain, like, why different models for different things.

Speaker 2

是的。我想先为提出Plan Act邀功。好吧。然后Klein是第一个提出这种开发者可以参与的两种模式的。就像和我们的用户交流,看到他们如何使用Klein时,它最初只是一个输入框。

Yeah. I wanna take the cred for coming up with Plan Act first. Okay. And then we were Klein was the first to sort of come up with this concept of having two modes for the developer to engage with. So just in like talking to our users and seeing how they use Klein where it was really only an input field.

Speaker 2

我们发现很多用户一开始会与代理合作,创建一个标记文件,要求代理制定某种架构或计划,以便代理继续执行。我们发现用户自然而然地形成了这种工作流程。于是我们思考如何将其转化为产品功能,让新用户更直观地理解,而不必自己摸索这种模式,并能引导代理在这些不同模式间切换时遵循规则。例如,在计划模式下,代理被引导进行更多探索,阅读更多文件,获取理解并填充上下文,以便为用户的任务制定行动计划。

We we found a lot of them starting off working with the agent, coming up with a marked on file where they ask the agent to put together some kind of architecture or plan for the work that they want the agent to go on to to do. And so they we would find that that people just came up with this workflow for themselves just organically. And so we thought about how we might translate that into the product. So it's a little bit more intuitive for new users who don't have to kind of pick up that pattern for themselves and can kind of direct and and put in guardrails for the agent to hear to these different modes whenever the user switches between them. So for example, in plan mode, the agent's directed to be more exploratory, read more files, get sort of understanding and fill up its context with any sort of relevant information to come up with a plan of attack for whatever the task is the user wants to accomplish.

Speaker 2

而当他们切换到行动模式时,代理会收到指令,查看计划并开始执行,运行命令,编辑文件。这让与代理合作变得更轻松,尤其是在像client这样的工具中,很多时候用户的互动主要在计划模式下进行,有很多来回交流,从开发者那里提取上下文,比如问问题:你想要主题看起来是什么样?网站上需要哪些页面?

And then when they switch to act mode, that's when the agent gets this directive to look at the plan and start executing on it, running commands, editing files. And it just makes working with agents a little bit easier, especially with something like client where a lot of the times people's engagement with it is mostly in the plan mode where there's a lot of back and forth. There's a lot of extracting context from the developer, you know, asking questions. You know, what do you want the theme to look like? What pages do you want on the website?

Speaker 2

试图提取用户可能没有在初始提示中提供的任何信息。一旦用户觉得可以放手让代理去工作了,他们切换到行动模式,勾选自动批准,然后就可以放松一下,喝杯咖啡什么的,让代理完成任务。所以大部分互动发生在计划模式,而行动模式中,他们只是稍微关注一下进展,主要是在代理偏离方向时进行纠正。

Just trying to extract any sort of information that the user might not have put into their initial prompt. Once the user feels like, okay, I'm ready to let the agent go off and work on this, they switch to act mode, check auto approve, and just kick their feet up and, you know, get coffee or whatever and let the agent get the job done. So yeah. Most of the engagement happens in the plan mode. And then act mode, they kinda just have a peripheral vision into what's going on mostly to course correct whenever it goes in the wrong direction.

Speaker 2

但大多数时候,他们可以依赖模型来完成工作。

But for the most part, they can just rely on the model to get it done.

Speaker 0

这是产品最初的形式吗?还是你们通过迭代才达到计划和行动的模式?这是公司最初的想法吗?还是你们还探索了其他方向?

And was this the first shape of the product or did you get to the plan act iteratively? And maybe was this the first idea of the company itself or were you exploring other stuff?

Speaker 2

尤其是在client的早期,我们做了很多实验,与用户交流,看看哪些工作流程对他们有用,并将这些转化为产品功能。计划和行动模式实际上是我们与Discord上的用户交流的副产品,询问他们什么对他们有用,我们可以在UI中添加哪些快捷提示。计划和行动模式本质上就是一种快捷方式,让用户不必自己输入“我希望你问我问题并制定计划”这样的内容。在其他工具中,你可能需要明确要求代理在编辑文件之前先制定计划。

It was a lot of especially in the early days of the client, was a lot of experimenting and talking to our users and seeing what kind of workflows came up that they found that were useful for them and and translating them into the product. So plan and act was really a byproduct of just talking to people in our Discord, just asking them what would be useful to them, what what kind of prompt shortcuts we could add into the UI. I mean, that's really all plan and act mode is. It's it's essentially a shortcut for the user to save them the trouble of having to type out, you know, I want you to ask me questions and put together a plan. The way that you might have to and, you know, some of the other tools, you'd have to like be explicit about, I want you to come up with a plan before, you know, acting on it or editing files.

Speaker 2

将其整合到UI中,省去了用户自己输入的麻烦。

Incorporating that into the UI just saves the user the trouble of having to type that out themselves.

Speaker 0

但你一开始就作为一款编码产品起步。然后这部分是关于,我们如何从根本上提升用户体验?

But you started right away as a coding product. And then this was part of, okay, how do we get better UX basically?

Speaker 2

没错。

Exactly.

Speaker 0

是的。当时的模型评估情况如何?我确信‘我们需要计划并行动’这部分是因为模型可能无法端到端地完成任务。当你开始研究

Yeah. What was the model evaluation at the time? So I'm sure part of like the we need plan and act is like maybe the models are not able to do it end to end. When you started working

Speaker 1

这个

on that

Speaker 0

范式时,模型有哪些局限性?当时最好的模型是什么?之后又是如何演变的?

paradigm, where were the model limitations? What were the best models? And then how has that evolved over time?

Speaker 2

是的。当我刚开始研究Klein时,大约是Cloud 3.5 Sonic发布十天后。我在阅读Anthropic的模型卡片附录时,看到关于代理编码的部分,提到它如何更擅长逐步完成任务。他们谈到运行一个内部测试,让模型在可以调用工具的循环中运行。很明显,他们内部有某种版本或应用,与当时其他产品截然不同。

Yeah. When I first started working on Klein, this was I think ten days after Cloud three five Sonic came out. I was reading Anthropic's model card addendum and there was this section about agentic coding and how it was so much better at this step by step accomplishing tasks. And they talked about running this internal test where they let the model run-in this loop where it could call tools. And it was obvious to me that, okay, they have some version, they have some application internally that's really different from how the other things at the time were.

Speaker 2

像Copilot、Cursor和Ader这样的产品,它们并不具备这种逐步推理和完成任务的能力。它们更适合问答和一次性提示的范式。当时是2024年6月,Anthropic正在举办一个Cloud构建黑客马拉松。所以我想,这是一个非常酷的新能力,之前的模型都无法做到。

Things like Copilot and Cursor and Ader. They didn't do this sort of like step by step reasoning and accomplishing tasks. They were more suited for the q and a and and one shot prompting paradigm. At the time, I think it was June 2024, Anthropic was doing a build with cloud hackathon. So I thought, okay, this is a really cool new capability that none of the models have really been capable of doing before.

Speaker 2

我认为能够从零开始构建并利用模型在那个时间点的改进细节非常重要。例如,Cloud 3.5在一个叫‘大海捞针’的测试中表现很好,如果它的上下文窗口中有大量内容,比如90%的200k上下文窗口被填满,它非常擅长从中提取细节。而在Cloud 3.5之前,模型会更关注上下文开头或结尾的内容。所以,利用它更擅长理解长上下文和逐步完成任务的特点,从零开始构建产品,让我创造出了一些与众不同的东西。构建第一版产品的核心原则之一就是保持简单。

And I think being able to create something from the ground up and take advantage of kind of like the nuances of how much the models improved in that point in time. So for example, Cloud three five was also really good at this test called needle in a haystack, where if it has a lot of context in its context window, for example, 90% of its 200 k context window is filled up, it's really good at picking out granular details in that context. Whereas before Cloud 3.5, it'd really pay a lot more attention to whatever was at the beginning or the end of the context. So just taking advantage of kind of the nuances of it being better at understanding longer context and it being better at task by task sorry, step by step accomplishing tasks and building a product from the ground up just kind of let me create something that just felt a little bit different than anything else that was around at the time. And some of the core principles in building the first version of the product was just keep it really simple.

Speaker 2

让开发者感觉可以随心所欲地使用它。所以尽量让它通用化,让他们自己设计适合的工作流程。人们用它做各种与编码无关的事情。我们的产品营销人员Nick Baumann,他用它连接Reddit MCP服务器,抓取内容,连接到XMCP服务器,然后发布推文。尽管它是一个VS Code扩展和编码代理,MCP让它像一个全能代理,可以连接到任何服务。

Just let the developer feel like they can kind of use it however they want. So make it as general as possible and kinda let them come up with whatever workflows works well for them. People use it for all sorts of things outside of coding. Our product marketing guy, Nick Baumann, he uses it to connect to a Reddit MCP server, scrape content, connect it to an XMCP server, and post tweets essentially. Even though it's a Versus Code extension and a coding agent, MCP kind of lets it function as this everything agent where it can connect to, you know, whatever services and things like that.

Speaker 2

这实际上是产品中采用非常通用的提示所带来的副作用,而不是将其限制在编码任务上。

And that's really a side effect of of having very general prompts just in the product and not sort of limiting it to just coding tasks.

Speaker 3

我在阿姆斯特丹参加一个会议时,整个演示文稿都是用这个叫slide dev的JavaScript库制作的。我向Klein提供了我的风格指南,写了一份详细的规则文档说明我想在slide dev中如何设计演示风格。我还用另一个叫Limitless的应用录下我的想法,把语音转成文字,记录了我对这次演讲内容的即兴思考。

I was at a conference in Amsterdam and I built my whole presentation, my whole slide deck using this library. It's like a JavaScript library called slide dev. And I just asked Klein like, hey, like, here's like my style guidelines. I wrote like a big Kline rules document explaining like how I wanna style the presentation in slide dev. I told Kline like the agenda, I kind of recorded using this other app called Limitless, like transcribe my voice into text about like my thoughts, just like stream of consciousness about what I was gonna talk about for this conference for my talk.

Speaker 3

然后Klein就直接帮我完成了整个幻灯片。你看,Klein真的无所不能。用JavaScript?没错,就是JavaScript。

And Klein just went in and built the whole the whole deck for me. So, you know, Klein really can do anything. In JavaScript? In JavaScript. Yeah.

Speaker 1

嗯,所以这算是编程方面的应用场景。

Yeah. So it's it's kind of a coding use case.

Speaker 3

算是编程应用,但最终产出是演示文稿。它还能运行脚本,比如帮你做数据分析,然后把结果整合进幻灯片。

It was kind of a coding use case, but then making a presentation out of it. But it can also, like, run scripts, like do, like, data analysis for you and then put that into a deck.

Speaker 1

好吧。你

Okay. You

Speaker 3

知道,就是把各种功能结合起来用。

know, kind of combine things.

Speaker 2

作为VS Code扩展,它赋予了独特能力——可以访问用户操作系统和终端,读写文件。扩展形式极大降低了开发者的使用门槛,他们无需安装新应用或走繁琐的内部审批流程。应用商店为我们提供了绝佳的分发渠道,特别适合需要桌面文件访问、终端操作、代码编辑的功能。还能利用VS Code优秀的UI展示差异对比,比如文件修改前后的变化。

And being being a Versus Code extension is is kind of this like it gives you these interesting capabilities where you have access to the user's OS, you have access to the user's terminal, and can read and edit files. Being an extension, it reduces a lot of the onboarding friction for a lot of developers so they don't have to install a whole new application or have to go through whatever internal jumping through hoops to try to get something approved to to use within their organizations. So the marketplace gave us a ton of really great distribution and it's sort of like the perfect conduit for something that needs access to files on your desktop or to be able to run things on your terminal. To be able to edit code and to take advantage of Versus Code's really nice UI and show you like diff views, for example, before and after it makes changes to to files.

Speaker 1

你们没想过分叉VS Code吗?要知道,说不定现在就能坐拥30亿美元了。

Weren't you tempted to fork Versus Code though? I mean, you know, you could be sitting on $3,000,000,000 right now.

Speaker 2

其实不会。我反而同情那些分叉VS Code的人,因为微软让维护分叉变得异常困难。需要投入大量资源才能跟上VS Code的更新节奏。

Well, no. I actually like pity anybody that has to fork Versus Code because Microsoft makes it like notoriously difficult to maintain these forks. So a lot of resources and efforts go into just maintaining, keeping your fork up to date with all the updates that Versus Code is making.

Speaker 1

明白了。是因为他们有私有仓库需要同步吗?还是说...

I see. Is that because they they have a private repo and they need to just sync it? There's no like

Speaker 2

没错。确实如此。而且还有

Exactly. Exactly. And there's

Speaker 1

这是那种开源项目之一。

It's one of those kinds of open source projects.

Speaker 2

对。而且VS Code发展如此之快,我相信他们会遇到各种各样的问题,不仅仅是合并冲突这类事情,后端也是如此。他们一直在改进和变更,比如他们的VS Marketplace API,而不得不逆向工程这些,确保你的用户在使用类似功能时不会遇到问题,这对任何维护VS Code分支的人来说肯定是个巨大的麻烦。此外,作为扩展也给了我们更多的分发渠道。你不必非得选择我们或其他人。

Right. And Versus Code's moving so quickly where I'm sure they run into all sorts of issues, not just in, you know, things like merge conflicts, but also in the back end. They're always making improvements and changes to, for example, their Versus Marketplace API and to have to like reverse engineer that and figure out kind of how to make sure that your users don't run into issues using things like that is I'm sure like a huge headache for anybody that has to maintain a Versus Code fork. And it also, being an extension also gives us a lot more distribution. It's not that you have to use us or somebody else.

Speaker 2

你可以在Kursor、Windsurf或VS Code中使用Kline。我认为Kline与所有这些工具都能很好地互补,因为我们有机会深入了解并与用户紧密合作,找出最佳的智能体体验。而Cursor、Windsurf和Copilot则必须考虑整个开发者体验,包括内联代码编辑、问答功能,以及编写代码时的各种附加功能。我们只需专注于我认为是未来编程的方向,即这种智能体范式。随着模型不断进步,人们会越来越多地使用自然语言与智能体协作,越来越少地陷入代码编辑和自动补全的细节中。

You can use Kline in Kursor or in Windsurf or in Versus Code. And I think Kline complements all these things really well in that we get the opportunity to kinda figure out and and work really closely with our users to figure out what the best agentic experience is. Whereas, you know, Cursor and Windsurf and Copilot have to think about the entire developer experience, the inline code edits, the q and a, sort of all the other bells and whistles that go into writing code. We get to just focus on what I think is the future programming, which is this agentic paradigm. And as the models get better, people are gonna find themselves using natural language, working with an agent more and more, and less being in the weeds and editing code and tab autocomplete.

Speaker 3

是啊。想象一下要投入多少资源来维护一个VS Code的分支,而我们只需专注于核心的智能体循环,优化新出现的不同模型系列并支持它们。你知道,所有这些工作都需要大量投入,在旁维护一个分支对我们来说会是个巨大的干扰,我认为这真的不值得。

Yeah. Just like imagine how many like resources you would have to spend maintaining a fork of Versus Code where we can just kinda stay focused on the core agentic loop, optimizing for different model families as they come out supporting them. You know, there's so much work that goes into all this that maintaining a fork on the side would just be such a massive distraction for us that I don't think it's really worth it.

Speaker 0

听你说话时,我感觉你在区分我们想成为未来编程的最佳选择,同时这对非编程领域也很棒。这是最近才出现的现象吗?比如你看到越来越多人使用MCP服务器做技术性较低的事情,这是个有趣的领域?还是你觉得编程仍然是当前最具经济价值的销售方向?我很好奇你能否多分享一些。

I feel like when you talk, I hear this distinction between we wanna be the best thing for the future of programming. And then also this is also great for non programming. Is this something that is being recent for you where, like, you're seeing more and more people use the MCP servers especially to do less technical thing and that's an interesting area? Or do you feel like programming is still like the highest economic value thing to be selling today? I'm curious if you can share more.

Speaker 2

就经济价值而言,编程绝对是语言模型目前成本效益最高的领域。我们看到很多模型实验室意识到这一点,OpenAI和Anthropic比一年前更重视编程了。虽然MCP生态系统在增长,很多人用它做编程以外的事情,但主要用例还是开发者工作。几周前Hacker News上有篇文章,讲一个开发者部署了一个有问题的Cloudflare Worker,用Sentry MCP服务器拉取堆栈跟踪,然后让Klein根据堆栈信息修复bug,连接GitHub MCP服务器关闭问题并部署修复到Cloudflare,全程在客户端用自然语言完成,无需离开VS Code。它整合了所有这些服务,否则开发者就得承受认知负荷,自己摸索并离开开发环境去做智能体只需用自然语言就能在后台完成的事。

In terms of economic value, programming is definitely the highest cost of benefit for language models right now. And I think, we're seeing a lot of model labs recognize that OpenAI and Anthropic are taking coding a lot more seriously than I think they did a year ago. What we've seen is while yes, like the MCP ecosystem is growing and a lot of people are using it for things outside of programming, the majority use case is mostly developer work. There was an article on Hacker News a couple weeks ago about how a developer deployed a buggy Cloudflare worker and used a Sentry MCP server to pull a stack trace and ask Klein to sort of fix the bug using the stack trace information, connect to a GitHub MCP server to close the issue and deploy the fix to Cloudflare, all right within client using natural language, never having to leave Versus Code, and it sort of interacts with all these services that otherwise the developer would have had to have the cognitive overload of having to, you know, figure out for himself and leave his developer environment to to essentially do what the agent could've done just on the background just using natural language.

Speaker 2

所以我认为这就是未来的方向:应用层连接到你可能需要手动交互的各种服务,成为你用自然语言交互的单一接触点。你会越来越少地陷入代码细节,越来越多地从高层次理解智能体的行为并能够纠正方向。我认为这是我们能在这个极其嘈杂的领域脱颖而出的重要原因。很多人对未来的方向有宏大构想,但我们一直疯狂专注于当下对人们有用的东西。很大一部分是理解这些模型的局限性,它们不擅长什么,并向终端开发者充分展示这些信息,让他们知道如何纠正方向,在出错时如何反馈。

So I think that's kinda like where things are headed is the application layer being connected to sort of all the different services that you might've had to interact with before manually and it being this sort of single point of contact for you to interact with using natural language. And you being less and less in the code and more and more a high level understanding of what the agent's doing and being able to course correct. I think that's another part of what's important to us and what's allowed us to kind of cut through the noise in this like incredibly noisy spaces. I think a lot of a lot of people have really grand ideas for, you know, where things are heading, but we've been really maniacal about what's useful to people today. And a large part of that is understanding sort of the limitations of these models, what they're not so good at, and giving enough insight into those sorts of things to the end developers so that they know how to course correct, they know how to give feedback when things don't go right.

Speaker 2

例如,Kline非常擅长让你深入了解输入模型的提示、错误发生的原因、模型调用的工具。我们尽量在每个步骤中展示模型执行任务时的具体行为。这样当事情出错或开始偏离方向时,你可以给予反馈并纠正。我认为纠正方向对高效完成工作极其重要,比让后台智能体工作几小时后回来发现完全错了、没达到预期然后不得不重试几次要快得多。

So for example, Klein is really good about, you know, giving you a lot of insight into the prompts going into the model, into when there's an error, why the error happened, into the tools that the model's calling. We try to give as much insight into what exactly the model is doing at each step in accomplishing a task. So when things don't go wrong or it starts to go off in the wrong direction, you can, you know, give it feedback and course correct. I think the course correcting part is so incredibly important in in getting work done. I think much more quickly than if you were to kind of give a sort of a background agent work, you come back a couple hours later and it's just like totally wrong and it it didn't do anything that you expected it to do and you kinda have to retry a couple times before it gets it right.

Speaker 0

我觉得Sentry的例子很棒,因为在某种程度上MCP就像在蚕食产品本身。我开始使用Sentry MCP和他们的Sentry Release(问题解决智能体),起初是免费的。我在Sentry中启用它,用起来很棒。

I think the Sentry example is great because I feel like in a way the MCPs are like cannibalizing the products themselves. Like, I started using the Sentry MCP and then Sentry Release here, which is like their issue resolution agent, and it was free at the start. So I turned it on in Sentry. I was using it. It's great.

Speaker 0

然后他们开始收费了,我就想,我可以免费使用MCP啊。把数据输入我的编程助手,它就能免费修复问题并返回结果。我特别好奇,尤其在编程领域,你可以形成这种闭环——RDS MCP将成为付费AI服务,这样你就能接入它。而Klein会不会推出类似MCP订阅服务?就像完全把这些成本碎片化?

And then they started charging money for it, and I'm like, I can use the MCP for free Yeah. To put the data in my coding agent, and it's gonna fix the issue for free and send it back. I'm curious to see, especially in coding where you can kinda have this closed loop where, okay, RDS MCP is gonna become the paid AI offering so that then you can plug it in. And is Klein gonna have kinda like a MCP subscription where, like Totally. You're kinda fractionalizing all these costs?

Speaker 0

是啊。在我看来,他们现在的架构方式不太合理。

Yeah. To me, today, it feels like it doesn't make a lot of sense the way they're structured.

Speaker 3

噢没错。我们很早就是MCP的坚定支持者,从一开始就看涨它的前景。

Oh, yeah. We we were like very early on, we like, we've been bullish on MCP from the very beginning. And

Speaker 2

你们是首发合作伙伴吗?我记得和MCP有合作关系的。

Were you a launch partner? A partnership with MCP, I think.

Speaker 1

抱歉打断一下。

Sorry to interrupt.

Speaker 3

没事,请说。

Yeah. No worries.

Speaker 2

我记得Anthropic刚推出MCP时,他们大张旗鼓宣传这个新协议是开源的,但当时没人真正理解其意义。我花了不少时间研读文档才明白运作原理和重要性。他们押注开源社区能共建生态来推动发展,所以我想尽力协助——有很长段时间,客户系统提示都是'MCP如何运作?'

I I think when Anthropic first launched MCP and and they made this big announcement about, this you new protocol that they've been working on and open sourcing it, nobody really understood what it meant. And it took me some time really digging into their documentation about how it works and why this is important. I think they they kind of took this bet on the open source community contributing to an ecosystem in order for it to really take off. And so I wanted to try to help with that for as much as possible. So for a long time, most of client system prompt was how does MCP work?

Speaker 2

因为当时太新了,模型对它一无所知。要搭建MCP服务器的话,开发者必须非常精通。我认为客户平台对MCP生态后来的发展功不可没,它让开发者更了解底层机制——这对使用都至关重要,更别说开发了。

Because it it was so new at the time that, you know, the models didn't know anything about it. And how to make MCP servers. So like if the developer wanted to, you know, make something like that, he'd be really good at it. And I'd like to think that, you know, client had something to do with how much the MCP ecosystem has grown since then. And just getting developers more insight and and sort of awareness about how it works under the hood, which I think is incredibly important in using it, let alone just developing these things.

Speaker 2

后来我们在Klein上线MCP时,Discord用户都在努力理解它。看到客户从零搭建MCP服务器后,他们终于串联起来了:原来底层是这样运作的,这就是价值所在,代理程序是这么连接工具服务和API的——这省去了我很多亲自摸索的麻烦。

And so, yeah, when when we launched MCP in Klein, I remember our our Discord users just trying to wrap their heads around it. And in seeing clients build MCP servers from the ground up, they're like, okay, they started to connect the dots. This is how it works under the hood. This is why it's useful. This is how agents connect to these tools and services and these APIs and sort of saved me a lot of the trouble of having to do this sort of stuff myself.

Speaker 3

那真是MCP的早期岁月,大家还在努力消化这个概念。

Those were like the early days of of MCP when people were still trying to wrap their heads around it.

Speaker 2

是啊。

Yeah.

Speaker 3

而且可发现性存在很大问题。今年二月我们推出了MCP市场,用户可以通过一键安装流程,查看链接到GitHub的说明文档,从零开始安装整个MCP服务器并立即运行。正是那时候,随着市场的推出,MCP真正开始腾飞——人们能发现MCP、为市场做贡献。至今我们已上线150多台MCP服务器,头部MCP下载量超过数十万次。

And there's like a big problem with discoverability. So back in like February, we launched the MCP marketplace where you could actually go through and have like this one click install process where client would actually go through looking at a read me that's like linked to a GitHub, install the whole MCP server from scratch, and just get it running immediately. And that was like I think around that time, that's when MCP really started taking off with like the launch of the marketplace where people were able to discover MCP's, contribute to the MCP marketplace. We've listed over like a 150 MCP servers since then. And like the top MCP's in our marketplace have over, you know, hundreds of thousands of downloads, people using them.

Speaker 3

比如21dev开发的Magic MCP服务器就是个典型案例,他们向编码智能体注入美学组件库,让Klein能实现精美UI。其盈利模式是标准API密钥。我们正见证开发者将MCP构建成商业生态——通过MCP市场等平台分发并实现盈利。

And you know, there's like really notable examples where you mentioned like how are people like it's like kind of eating existing products. But at the same time, we're starting to see like this ecosystem evolve where people are monetizing MCPs. Like a notable example of this is twenty first dev magic MCP server where it injects some taste into this coding agent, into the LLM where they have this library of beautiful components and they just inject relevant examples so that Klein can go in and implement beautiful UIs. And the way they monetize that was like a standard API key. So we're starting to see developers really like take MCPs, build them in, have distribution platforms like the MCP marketplace incline and monetize their whole business around that.

Speaker 3

现在这几乎像是在向智能体销售工具,这话题非常有趣。

So now it's like almost like you're selling tools to agents, which is a really interesting topic.

Speaker 0

能在VS Code里实现是因为有终端,可以NPX运行不同服务器。你们考虑过远程MCP托管吗?还是觉得不该涉足?

And you can do that because you're in Versus Code, so you have the terminal. So you can do NPX run the different servers. Have you thought about doing remote MCP hosting or do you feel like that's not something you should take over?

Speaker 3

目前我们尚未自主托管,远程MCP仍处于萌芽期。但我们肯定有兴趣支持并将其列入市场。

Yeah. We haven't really hosted any ourselves. We think that's we we're looking into it. I think it's it's all very nascent right now, the the remote MCPs. But we're definitely interested in in supporting remote MCPs and and listing them on our marketplace.

Speaker 2

本地MCP服务器与远程的区别在于:远程多用于连接API,这只是MCP的小部分用途。比如Unity MCP服务器可直接在VS Code创建3D对象,Ableton MCP服务器能用来创作音乐。

And another part I think with sort of local MCV servers and remote MCPs is most of the remote MCPs are only useful to connect to different APIs. But that's only a small use case for MCPs. A lot of MCPs help you connect to different applications on your computer. For example, there's like a a Unity MCP server that helps you create three d objects within right from within Versus Code. There's an Ableton MCP MCP servers.

Speaker 2

未来不会只有远程MCP服务器,必然是本地与远程共存。远程虽通过OAuth流程简化安装,但整个MCP生态仍处早期阶段。我们正在安全性与开发者便利性之间寻找平衡。

You can like make songs using something like Kline or whatever else uses MCP's. We won't see a world where these MCP servers are only hosted remotely. There will always be some mix of local MCP servers and remote MCP servers. I think the remote servers do make the installation process a little bit easier with something like an OAuth flow and just authenticating a little bit, not as painful as having to manage API keys yourself. But for the most part, think the MCP's ecosystem really in its earlier days.

Speaker 2

随着市场契合度显现,越来越多人分享改变工作流程的案例,更多资源将投入生态建设。Anthropic路线图上有大量规划,社区也充满创意。我们的市场尤其让我们洞察改进方向——未来的SP市场应该是怎样的?

We're still trying to figure out this good balance of security, but also convenience for the end developer so that it's not a pain to have to set these things up. And I think we're still in this very much experimental phase about how useful it is to people. And I think now that it is seeing this level of market fit and and people are coming out with these sorts of like articles and workflows about how it's totally changing their jobs, I think there's gonna be a lot more of resources and efforts that go into the ecosystem and just building out the protocol which I think there's a lot on Anthropix roadmap. And I I think the community in general just has a lot of ideas. And our marketplace in particular has has given us insights into some ways that we can improve it, things that developers have asked for from it that where we're kind of thinking about how do we what is the SP marketplace of the future look like?

Speaker 2

对我们而言,关键在于:许多用户具有高度安全意识,而不可信的MCP服务器可能很危险。我们正在探索如何建立对安装MCP的信心。目前社区信任度尚不足以吸引企业开发者,这是我们最关注的问题。

And for us that's it's it's gonna be a combination of well, there's a lot of our users are very security conscious and there's a lot of ways that MCP's servers can be pretty dangerous to use if you don't trust the end developer of these things. And so we're trying to figure out what does a future look like where you can where you have some level of confidence in the MCP servers you're installing. I think right now it's just it's too early and there's a lot of trust in the community that I don't think a lot of, you know, enterprise developers or organizations are are quite willing to do yet. So that's something that's top of mind for us.

Speaker 1

Anthropic与社区之间存在一种有趣的张力。你们内部基本上有一个模型注册表MCP注册系统对吧?老实说,我觉得应该公开它。我在你们官网上找过但没找到。

There's an interesting tension between the Anthropic and the community here. You basically kind of have a model register MCP registry internally. Right? Honestly, I think you should expose it. I was looking for it on your website and you don't have it.

Speaker 1

目前唯一访问方式就是安装客户端。但还有Smithery等其他平台对吧?不过Anthropic也说过会在某个时间点推出模型注册表或MCP注册表。

Like, the only way to access it is to install client. But there's others like Smithery and all the other guys. Right? But then Anthropic has also said they'll launch a model registry at some point or MCP registry at some point.

Speaker 2

某个时间点。

Some point.

Speaker 1

如果Anthropic推出官方版本,他们是不是直接就胜出了?因为你会直接选用他们的服务吗?

If Anthropic launched the official one, would they just just win by default? Right? Because, like, would you just would you just use them?

Speaker 2

我觉得会。整个生态圈最终都会围绕他们的标准整合。他们的分发渠道太强大了,而且...

I think so. I think the I think the entire ecosystem will just converge around whatever they do. They just have such good distribution and they're

Speaker 1

毕竟这个是他们首创的。

I mean, came up with it.

Speaker 2

没错,正是如此。

Yeah. Exactly.

Speaker 1

好的。另外我注意到你们有些下载量很高的MCP模块,我是按安装量排序的。我直接念出来,你们可以随时打断我进行补充。

Cool. And then I I wanted to I noticed that you had some, like, really downloaded MCPs. I was going by most installs. I'm just gonna read it off. You can stop me anytime to comment on them.

Speaker 1

排名第一的是文件系统MCP,很合理。然后是Agent Desk AI的浏览器工具——这个我不太了解。Sequential thinking模块是最初随MCP发布的。

So top is file system MCP. Makes sense. Browser tools from agent desk AI. I don't know what that is. Sequential thinking, that that one came out with the original MCP release.

Speaker 1

Context seven?这个我没听说过。

Context seven? I don't know that one.

Speaker 3

那可是个大块头。等等,那是什么?Context七号能帮你从任何地方拉取文档,它有个庞大的索引,收录了所有流行库及其文档。好吧。

That's a that's a big one. Wait. What what is that? Context seven kinda helps you pull in documentation from anywhere and it has like this big index of all of the popular libraries and documentation for them. Okay.

Speaker 3

你的代理可以用自然语言查询提交请求,搜索任何文档。

And you can your agent can kind of submit like a natural language query and search for any document.

Speaker 1

上面就写着'所有人的文档'。对。而且居然是Upstash做的,这挺反常,因为Upstash通常只做Redis。Get tools最早就是它推出的。

It just says everyone's docs. Yes. Yeah. And apparently Upstash did that, which is also unusual because Upstash is just normally Redis. Get tools, that one came out originally.

Speaker 1

Fetch浏览器用途。浏览器用途,我猜是和浏览器工具竞争吧?下面接着是Playwright。

Fetch, browser use. Browser use, I imagine competes with browser tools. Right? I guess. And then below that Playwright.

Speaker 1

Playwright对吧?所以有很多类似'让我们自动化浏览器操作'的工具,应该是用于调试的。Fire crawl、Puppeteer、Figma。

Playwright. Right? So there's a lot of like, let's automate the browser and let's let's do stuff. I assume for debugging. Fire crawl, Puppeteer, Figma.

Speaker 1

给你看个有趣的,Perplexity Research。这是你们的吗?

Here's here's one for you, Perplexity Research. Is that yours?

Speaker 3

算是吧。我分叉了那个项目并列出来了。没错,那是另一个很受欢迎的功能,你可以用它研究

Well, yeah. I forked that one and and listed it. But, yeah, that's, you know, that's another very popular one where you can research

Speaker 2

Perplexity上的任何内容。

anything from Perplexity.

Speaker 1

人们想模拟自动化浏览器。我只是想从大家的做法中学点经验。对吧?他们想自动化浏览器操作,想访问git和文件系统。

People wanna emulate the automate the browser. I'm just trying to learn lessons from what people are doing. Right? They wanna automate the browser. They wanna access git and file system.

Speaker 1

他们还想访问文档和搜索功能。你觉得还有什么特别值得注意的吗?

They wanna access docs and search. Anything else that you think like is notable?

Speaker 3

有各种各样的功能,比如Slack MCP,你可以用它发送消息——这其实是我设置的一个工作流,可以自动化客户端的重复任务。我告诉客户:'拉取这个PR,使用我已安装的gh命令行工具,在终端中获取PR内容,包括描述、讨论和完整差异,作为一条非交互式命令。收集所有上下文,阅读差异周围的文件,进行审查,然后提问:嘿,你想让我批准这个吗?附带这条评论。'如果我说是,就批准并在Slack中用MTP给团队发消息。哦,还能用它来写东西。

There's all kinds of stuff where it's like, you know, there's like the the Slack MCP where you can send, you know, that's actually one workflow that I have set up where you can like automate repetitive tasks in clients. So I tell client like, okay, pull down this PR, use the g h command line tool, which I already have installed using the terminal to pull the PR, get the description of the PR, the discussion on it, and get the full diff as like a like a single command, non interactive command, pull in all that context, read the files around the diff, review it, ask a question like, hey, do you want me to approve this or not with this comment? And if I say yes, approve it and then send a message in Slack to my team using the Slack MTP, for example. Oh, use it to write.

Speaker 1

是的。我只用它来阅读。

Yes. I would only use it to read.

Speaker 3

对。你知道,人们很喜欢这个功能。我喜欢能在Slack里发送自动消息之类的。你也可以按需设置工作流,比如'客户,做任何事前请先问我',确保在发送消息前获得我的批准。

Yeah. Know it's, you know, people like I love it. You know, I love being able to just like send an automated message in Slack or whatever. You can also like set it up like set up your workflow however you want where it's like, okay, client, please ask me before doing anything, you know, just make sure you're asking me to like approve before you send a message or something like that.

Speaker 1

嗯。好。关于MCP部分,最后再聊点别的?MCP认证最近刚通过批准。

Yeah. Okay. Just just to close out MCP side. Anything else interesting going on in MCP universe that we should talk about? MCP auth was recently ratified.

Speaker 3

我认为货币化是个大问题

I think monetization is a big question

Speaker 1

明白。

k.

Speaker 3

目前对MCP生态来说是的。我们和Stripe谈了很多,他们非常看好MCP,正试图为其构建货币化层。但一切都太早期了,很难预见发展方向。

Right now for the MCP ecosystem. Yeah. We've been talking a lot with Stripe. They're very bullish on MCP, and they're trying to figure out, like, a monetization layer for it. But it's all so early that it's kinda hard to really even envision where it's gonna go.

Speaker 1

我先抛砖引玉,你再来指正。这和API货币化有什么区别?比如用户在这里注册账号,拿到令牌后按使用量计费?

Let me just put up a straw man, then you can tell me what's wrong with it. Like, how is this different from API monetization? Right? Like, you sign up here, make an account, I give you a token back, and then you use token, they charge you against your usage.

Speaker 3

不。目前是这样运作的,Magic MCP那帮开发者也是这么做的。但我们设想的是代理能自主支付使用的MCP工具,为每次调用付费。而不是处理无数产品的API密钥和注册流程。

No. Like like, I think that's how it is right now. And that's how like the the magic MCP, the twenty first dev guys did it. But we're kind of envisioning a world where agents can pay themselves for these MCP tools that they're using and pay for each tool call. And you can't deal with like a million different API keys from different products and like signing up for all this.

Speaker 3

需要统一的支付层。有人提到稳定币——现在代理能直接使用它们了。Stripe正在考虑围绕MCP协议构建支付抽象层。但如我所说,具体形态还很难预测。

There needs to be like a unified kind of payment layer. Some people talk about like stable coins, how like those are coming out now that agents can natively use those. Stripe is they're considering this like abstraction and around the MCP protocol for payments. But like I said, it's kinda hard to really tell where it's gonna how that's gonna manifest. I would say, like, we

Speaker 1

我在他们去年推出代理工具包时就报道过,几个月前的事。当时看起来已经足够了。似乎除了每笔交易要收取大约30美分之外,你并不真的需要稳定币。

I I covered when they launched their agent toolkit last year, a few months ago. It seemed like that was enough. Like, it you didn't seem to need stablecoins except for the fact that they take, like, 30¢ every transaction.

Speaker 0

是啊。你见过有人用Coinbase那个x402功能吗?基本上就是可以在HTTP请求里直接包含支付。什么?

Yeah. Have you seen people use the x four zero two thing by Coinbase to make it's basically like the you can do a HTTP request that includes payment in it. What?

Speaker 3

对,对。这东西存在很久了。402错误码就是支付未被接受之类的意思。

Yeah. Yeah. It's it's been around forever. The four zero two error. That's like payment not accepted or something.

Speaker 3

对吧?所以我们看到有些人讨论更原生地集成这个功能。不过目前确实没什么实质进展。

Right? So, yeah, we've seen some people talking about that, like, more, like, natively building that in. But yeah. Nothing fine. Really.

Speaker 3

没错。现在还没人真正在这么做。

Yeah. And no one's really doing that right now.

Speaker 1

有看到什么有趣的MCP初创公司动态吗?

Anything you're seeing on like, are people, like, in making MCP startups that are interesting?

Speaker 0

围绕本地重托管这块。对,可以远程操作,不用部署10个MCP,而是设置一个标准URL集成到所有工具里,然后从各个服务器暴露所有工具接口。

Around rehosting local ones. Yeah. And do remote and then basically do instead of setting up 10 MCPs, you have like a canonical URL that you put in all of your tools and then expose all the tools from all the servers.

Speaker 1

嗯,确实。

Yeah. Yeah.

Speaker 0

有些MCP会运行这类工具。但我觉得核心问题还是如何激励人们开发更好的MCP?

There's like MCP that run some of these tools. Yeah. I But think it kinda has the same issues of how do you incentivize people to make better MCPs?

Speaker 2

嗯。

Mhmm.

Speaker 1

你知道吗?而且主要是第一方还是第三方?是的。就像你们的Perplexity MCP是第四个。Perplexity那个有什么问题?

You know? And will it be mostly first party or will it be third party? Yeah. Like your Perplexity MCP was the fourth one. What was wrong with the Perplexity one?

Speaker 3

在设备上本地安装MCP总是伴随着巨大风险。当MCP由我们完全不了解身份的人创建时,他们随时可能在GitHub上更新,比如加入某些恶意内容。所以即使你在列出时验证过

With MCPs and installing them locally on your device, there's always a massive risk associated with that. And when an MCP is created by someone that we have no idea who they are, at any point, they might, you know, update the GitHub to like introduce some kind of malicious stuff. So even if you like verified it when you're listing it

Speaker 1

明白了。

Okay.

Speaker 3

他们也可能更改它。所以我最终不得不分叉几个版本以确保锁定那个版本。

You might change it. So I ended up having to fork a few of those to make sure that we lock that version down.

Speaker 1

哦,懂了。所以你分叉就是为了防止他们修改

Oh, okay. So this is just like you're just forking it so that you you don't change it

Speaker 0

是的。

Yes.

Speaker 1

没有它的话...这很有趣。这些都是注册表的问题对吧?就像...没错。需要确保安全性之类的。

Without without it. It's interesting. These are all the problems of registry, right? Like Right. That you need to ensure security and all that.

Speaker 1

酷。我很乐意继续。我最后好奇的是,如果Anthropic没有推出MCP,会发生什么?另一种历史会是怎样?比如...你们会自己开发MCP吗?

Cool. I'm happy to move on. I I would say like the last thing that's kinda curious is like, if Anthropic hasn't cut hadn't come along and made MCP, what would have happened? What's the alternative history? Like like, would you have come up with MCP?

Speaker 2

我们看到一些竞争对手一直在开发自己的即插即用工具集成到这些代理中,他们基本上需要原生创建这些工具和集成

So we saw some of our competitors who have been kinda working on their own version of plug and play tools into these agents, they kind of had to natively create these tools and integrations themselves

Speaker 1

是啊。

Yeah.

Speaker 2

直接集成到他们的产品中。因此我认为这个领域的任何人都不得不进行繁琐的工作,重新创建这些工具和集成,所以我认为Anthropic为我们省去了很多麻烦,并利用了开源和社区驱动开发的力量,允许个人贡献者为任何人们能想到的东西创建MCP,真正以我认为现在必要的方式发挥人们的想象力,以充分挖掘这类事物的潜力。

Directly into their product. And so I think anybody in this space would have had to just do the laborious work of having to recreate these tools and integrations for so I think Anthropic just saved us all a lot of trouble and tapped into the power of open source and community driven development and allowed individual contributors to make an MCP for anything people could think of and really take advantage of people's imagination in a way that I think is, like, necessary right now for us to really tap into full potential of of this sort of thing.

Speaker 0

所以我们已经有,我想,十几集关于不同编码产品的节目了。

So We've had, I think, a dozen episodes with different coding products.

Speaker 1

是的。顺便说一下,这期节目是在他发推文关于Quad Code那期之后直接录制的。嗯。他们当时就坐在你现在坐的位置。

Yeah. And this by the way, this this episode came directly after he tweeted about Quad the Quad Code episode Mhmm. Where they were sitting right where you're sitting.

Speaker 0

谢谢分享

Thanks for sharing

Speaker 3

那个破布。是的。

the rag. Yeah.

Speaker 0

你能给大家介绍一下市场的矩阵吗,你们有完全自主的无IDE方案,有自主加IDE的方案(这有点像你们的),还有带一些协同编程功能的IDE。人们应该如何思考这些不同的工具,以及你们最擅长什么,或者可能你们认为自己不太擅长的方面?

Can you give people maybe the matrix of the market of, you you have like fully agentic no IDE, you have agentic plus IDE, which is kind of yours. You have IDE with some co piloting. How should people think about the different tools and what you guys are best at or maybe what you don't think you're best at?

Speaker 2

我认为我们最擅长的,也是我们自始至终的理念,就是满足开发者当前的需求。我认为这些模型目前需要一些洞察和引导。而IDE是实现这一点的完美媒介。你可以看到它所做的编辑,可以查看它运行的命令,可以看到它调用的工具。它为你提供了完美的用户体验,让你拥有所需的洞察力和控制力,并能够根据需要调整方向,以应对这些模型当前的局限性。

I think what we're best at and our ethos since the beginning is just meet the developers where they're at today. I think there is a little bit of insight and handholding these models need right now. And the IDE is the perfect conduit for something like that. You can see the edits it's making, can see the commands that it's running, you can see the tools that it's calling. It gives you the perfect UX for you to have the level of insight and control and be able to course correct the way that you need to to work with limitations of these models today.

Speaker 2

但我认为很明显,随着模型变得更好,你需要做的这类事情会越来越少,而更多的将是初始规划和提示,并逐渐建立起信任和信心,相信模型能够基本上按照你想要的方式完成任务。我认为总会有一点差距,因为这些模型永远无法读懂我们的思想。所以我们必须确保给它提供最全面、最详细的指令。如果你是一个懒惰的提示者,那么在真正得到你想要的东西之前,你会遇到很多摩擦和反复。但我想我们都在学习如何正确地提示这些模型,明确我们想要什么,以及它们如何填补可能需要填补的空白以达到最终结果,以及我们如何避免这种情况。

But I I think it's pretty obvious that as the models get better, you'll be doing less and less than that, less and less of that and more and more of the initial planning and prompting and sort of have the trust and confidence that, you know, the model will be able to get the job done pretty much exactly how you how you want it to. I think there will always be a little bit of a gap in that these models will never be able to read our minds. So we'll we'll there there will have to be a little bit of, you know, making sure that you give it the most comprehensive and sort of like all the details of what you want from it. So if you're a lazy prompter, you can expect a ton of friction and and back and forth before you really get what you want. But I think we're we're all learning first as as we work with these things, kind of the the right way to prompt these things and to to be explicit about what it is that we want and kind of how they hallucinate the gaps that they might need to fill to to get to the end results and how we might wanna avoid something like that.

Speaker 2

关于Cloud Code有趣的是,它并没有提供太多关于代理正在做什么的洞察。它只是给你一个高层次的整体任务清单。我认为如果模型不够好,无法产生人们普遍满意的作品,这种方式可能不会很有效。我们现在差不多达到了这个水平,我认为这个领域需要跟上,也许人们不再需要那么多关于这类事情的洞察,他们可以放心让代理完成任务。

So what's interesting about Cloud Code is there isn't really a lot of insight into what the agent's doing. It kinda gives you this like checklist of what it's doing holistically at high level. I don't think that really would have worked well if the models weren't good enough to actually produce work that people were generally happy with. We're kind of there and I think the space has to catch up to, okay, maybe people don't need as much insight into these sorts of things anymore. And they they are okay with letting an agent kind of get the job done.

Speaker 2

实际上你只需要看到最终结果,并在它真正完美之前稍微调整一下。我认为不同的工作需要不同的工具。像那种你不太了解其内部运作的完全自主的代理,可能非常适合搭建新项目,但对于那些更严肃、更复杂的任务,你需要一定程度的洞察或更多的参与,你可能需要使用能提供更多洞察的工具。所以我认为这些工具是互补的。例如,编写测试或启动10个代理尝试修复同一个bug,可能适合那些不需要你太多参与的工具。

And really all you need to see is sort of the end result and tweak it a little bit before it's it's really perfect. And I think there is gonna be different tools for different jobs. I think something like totally autonomous agent that you don't have a lot of insight into is great for maybe scaffolding new projects, but for kind of the serious, more complex sorts of things where you do need a certain level of insight or you do need to kind of have like more engagement, you might wanna use something that does give you some more insight. So I think these sorts of tools complement each other. So for example, writing tests or spinning off 10 agents to try to fix the same bug might be useful for a tool that doesn't require too much engagement from you.

Speaker 2

而那些需要更多创造力、想象力或从大脑中提取上下文的任务,则需要对模型行为有更深入的洞察,以及一种我认为客户端更适合的来回交互过程。

Whereas something that requires a little bit more creativity or imagination or extracting context from your brain requires a little bit more of a insight into what the model's doing and a back and forth that I think client is a little better suited

Speaker 3

了解代理正在做什么。这就像是一个维度。另一个维度是自主性,即它的自动化程度。我们有一类公司更关注那些甚至不想看代码的用户场景,比如Lovables、Replets这类平台——你进去就能构建应用,可能完全不懂技术,只对结果满意。还有一类混合型产品,是面向工程师的。

visibility into what the agent is doing. That's like one axis. And then another is autonomy, like how how automated it is. And we have a category of companies that are focusing more on the use case of people that don't even wanna look at code, which is like, you know, the lovables, the replets, where it's like you go in, you build an app, you might not even be technical, and you're just happy with the result. And then you have kind of stuff that's kind of like a hybrid where it's, you know, for engineers.

Speaker 3

虽然是为工程师打造的,但你并不真正清楚底层发生了什么。这适合那些完全放手让AI掌控方向、快速构建项目的氛围程序员,很多开源爱好者和业余编程者都喜欢这种方式,确实很有趣。然后是严肃的工程团队,他们还不能完全交给AI(至少目前不行)。

It's built for engineers, but you don't really have a lot of visibility into what's going on under the hood. This is like for like the vibe coders where they're, you know, fully, you know, letting letting the AI take the wheel and building stuff very rapidly. And lots of open source fans and, you know, people that are hobbyists enjoy coding in this in this manner. It is really fun. And then you get to like serious engineering teams where they can't really give everything over to the AI, at least not yet.

Speaker 3

他们需要高度透明地了解每个步骤的进展,确保真正理解代码的变化。这就像把生产代码库交给一个非确定性系统,然后指望在代码审查时发现问题。就我个人使用AI(比如Klein)的方式而言,我喜欢全程参与引导方向——每个文件编辑时我都逐项批准,确保进程正确,在开发过程中就清楚走向。

And they need to have high visibility into what's going on every step of the way and make sure that they actually understand what's happening with their code. You're kind of handing off your production code base to this non deterministic system and then hoping that you catch it in review if anything goes wrong. Whereas personally, way I use AI, the way I use Klein is I like to be there every step of the way and kind of guide it in the right direction. So I know every step of the way, like as every file is being edited, I approve every single thing and make sure that things are going in the right direction. I have a good understanding as things are being developed where it's going.

Speaker 3

这种混合工作流非常适合我。不过有时想彻底放手时,我也会开启'YOLO模式'直接全部自动批准,然后离开...

So like this kind of hybrid workflow really works for me personally. But you know, sometimes if I wanna go full YOLO mode, I go ahead and just auto approve everything and just step out for

Speaker 0

去喝杯咖啡

a cup of coffee

Speaker 3

再回来审查工作成果。

and then come back and, you know, review the work.

Speaker 0

作为工程师,我的困扰在于我们都认为自己处理的是复杂问题。你们观察到'复杂'的界限如何随时间变化?如果十二个月前讨论,当时模型认为的'复杂'比如今简单得多。你们觉得演进速度是否足够快——比如十八个月后是否75%-80%的工作都该交给生成式AI?还是说进展不如预期?

My issue with this as an engineer myself is that we all wanna believe that we work on the complex things. How how have you guys seen the line of complex change over time? I mean, if we sat down having this discussion twelve months ago, Complex was, like, much easier than today for the models. Do you feel like that's evolving quickly enough that, like, you know, in eighteen months, it's like, you should probably just do follow GenTech for, like, 75% of work, 80% of work? Or do you feel like it's not moving as quickly as you thought?

Speaker 2

我认为几年前的复杂问题与现在完全不同。如今更需要谨慎对待早期架构决策,以及模型如何在此基础上构建。如果有清晰的方向规划,就能更好地为代码库奠定基础。几年前我们视为复杂的算法挑战,对现今模型已微不足道——我们只需给出预期或单元测试,它就能给出完美解决方案。

I think I think what was complex a couple years ago is totally different to what is complex today. Now I think what we need to be more intentional about are the architectural decisions we make really early on and how the model kind of builds on top of that. If you have kind of a clear direction of where things are headed and what you want, you kind of have a good idea to about how you might wanna lay the foundation for the code base that you're producing. And I think what we might have considered complex a few years ago, algorithmic, you know, challenges, that's pretty trivial for models today and stuff that we don't really necessarily have to think too much about anymore. We kind of give it, you know, a certain expectation or unit test about what we want and and it kinda goes off and puts together the, you know, the perfect solution.

Speaker 2

因此现在更需要深思熟虑的架构决策,这取决于你对可行方案的实践经验、项目方向的清晰规划,以及对代码库的愿景。这些决策很难依赖模型完成,因为它缺乏上下文理解能力,也无法领会你的愿景——除非你给出包含所有需求的超长提示词。我们几年前的工作重点已彻底改变(我认为是向好的),思考架构决策可比编写算法有趣多了。

So I think there's a lot more thought that has to go into tasteful architectural decisions that really comes down to you having experience with what works and what doesn't work, having a clear idea for the direction of where you wanna take the project and sort of your vision for the code base. Those are all decisions that I think is is is hard to rely on a model for because of its limited context and it's inability to kind of see your vision for things and really have a good understanding of of what you're trying to accomplish without you putting together a massive prompt of of everything that you want from it. I think what we were what we spent most of our time working on a couple years ago has totally changed and and I think for the better, I think architectural decisions are a lot more fun to think about than putting together algorithms.

Speaker 3

这某种程度上解放了高级软件工程师,让他们能更专注于架构层面的思考。当他们真正理解代码库的现状、架构的当前状态后,在引入新内容时就能从架构高度进行思考。他们需要清晰地阐述这种设计思路,这确实需要一定技巧。部分问题可以通过主动追问、在代理端主动澄清来缓解。

It kind of frees up the senior software engineers to think more architecturally. And then once they they they have a really good understanding of what's what the current state of the repository is, what the current state of the architecture is, and when they're introducing something new, they're really thinking at an architectural level. And they articulate that decline. And that's also there's like some skill involved there. And some of that can be mitigated with like asking follow-up questions, being proactive about clarifying things on the agent side.

Speaker 3

但最终你需要向代理阐明这个新架构。然后代理就能深入细节,为你实现所有内容。这种方式工作起来更有趣——就我个人而言,我觉得在更高架构层面思考更能投入专注。对初级工程师来说,这也是了解代码库的绝佳范式。

But ultimately, you need to articulate this new architecture to the agent. And then the agent can go down and and down into the minds and implement everything for you. And it is more fun working that way. Like, personally, like, I I find it a lot more engaging to just think on a more architectural level. And for junior engineers, it's a really good paradigm to learn about the code base.

Speaker 3

这就像随身带着一位高级工程师,你可以随时询问:'嘿,能帮我解释下代码库吗?如果我要实现这样的功能,该看哪些文件?这个机制如何运作?' 这种场景下它也非常好用。

It's kinda like having a senior engineer in your back pocket where you're asking Klein like, hey, can you explain the the repository for me if I wanted to implement something like this? What files would I look at? How does this work? It's great for that as well.

Speaker 0

如果话题要转向竞争,我最后还有个问题。竞争。对。你们和RUE code在推特上有些摩擦,我想知道背后的故事。

If you're moving on from competition, I have one last question. Competition. Yeah. So there's Twitter beef with RUE code. I just wanna know where the backstory is.

Speaker 0

因为昨天有人发推请RUCO添加Gemini CLI支持时,你们回复说'又要照抄我们的吗',他们回应'谢谢,我们会记得注明出处'。这是真的有过节吗?

Because you tweeted yesterday, somebody asked RUCO to add Gemini CLI support, and then you guys responded, just copy it from us again. And they said, thank you. We'll make sure to give credit. Is it a real beef?

Speaker 3

不算是。算是友好互怼吧?我觉得大家只是在时间线上玩梗而已。现在有很多分支版本...

No. Is it? A friendly beef? I think we're all just having fun on the timeline. There's there's a lot of forks that

Speaker 2

大概有6000个分叉。

It's like 6,000 forks.

Speaker 3

对。如果你在VS Code市场搜索Klein,整个页面全是Klein的分叉版本。甚至还有分叉的分叉项目——有些还融到了大笔资金。简直...

Yeah. There's like if you search Klein on the What? In the Versus Code marketplace, it's like the the entire page just like forks of Klein. And there's like even forks of forks that, you know, came out and raised like a whole bunch of money and and it's What? Yeah.

Speaker 3

疯狂。应用商店顶部

Crazy. Apps The top

Speaker 2

OpenRoute排行榜全是Klein和它的分叉版本。Klein分叉、Klein分叉...确实挺逗的。

in OpenRoute are all Klein and then Klein fork Klein fork. Yeah. It's funny.

Speaker 3

是啊。数十亿的token通过这些分叉项目流转。没错,现在就像分叉大战一样,有上万个分叉,而你只需要一把小刀就能参与。所以,不,这真的很令人兴奋。

Yeah. Billions of tokens getting sent through like all these forks. Yeah. There's like there's like fork wars and 10,000 forks and all you need is a knife, you know. So, no, it's it's exciting.

Speaker 3

我觉得他们都是很酷的人。欧洲有人fork我们,中国也有人做了我们的小型分叉。我记得三星最近在《华尔街日报》的文章里提到他们用的是Klein,但其实是他们自己隔离的小分支。我们鼓励这种行为。

I think they're all really cool people. We got people in Europe forking us. We got people in China making like a little fork of us. I think Samsung recently came out with like a was it a Wall Street Journal article where they're using Klein, but they're using like their own little fork of Klein that's kind of isolated. You know, we we encourage it.

Speaker 0

你对开源这件事有过后悔吗?

Do you have any regrets about being open source?

Speaker 2

完全没有。Klein最初就为编码智能体奠定了优质基础。人们在此基础上衍生出各种有趣的想法和概念,看到这种热情和整个领域的活力,既鼓舞人心,也帮助我们甄别有效方案。尤其对三星这类存在软件使用壁垒的组织,开源显著降低了接触门槛——这对于尝试颠覆传统编程范式的智能体编码革命至关重要。

Or Not at all. I think Klein started off as this like really good foundation for what a coding agent looks like. And people just had a lot of their own really interesting ideas and spin offs and concepts about, you know, what they thought, you know, they that they wanted to build on top of it was. And just being able to see that and see the excitement around just in this space in general has just been, I think, inspirational and has helped us kind of glean insights into what works and what doesn't work and incorporate that into our own product. And for the most part, I think for, you know, the Samsungs and all the organizations where there's a lot of friction and being able to use software like this on their on their code bases, it reduces that barrier to entry, which I think is like incredibly important when you wanna get your feet wet with this whole new agentic coding paradigm that's gonna completely upend the way that we've written software for for decades.

Speaker 2

所以从宏观角度看,这对世界和行业都是净收益,毫无遗憾。

So in the grand scheme of things, I I think it's a net positive for the world and for the space and so no regrets.

Speaker 3

某种程度上,我们和分叉者们是最初的同行者——坚持极简哲学,把一切交给模型处理,不做推理变现,注重上下文加载,这与Cloud Code的理念不谋而合。看到他们验证'保持简单'的哲学令人欣慰,这也契合RAG(检索增强生成)的早期理念——2022年向量数据库公司涌现时,上下文窗口还很小。

In a lot of ways, like, you know, it's us and the forks. We were kind of there originally when we were like the only ones with this like philosophy of keeping things simple, things down to like the model, letting the model do everything, not cutting on, not trying to make money off of inference, going context heavy, reading files into context very aggressively, and kind of going back to Cloud Code, I was actually like, it was really nice to see that they they came out and they validated our our whole philosophy of like keeping things as simple as possible. And that kind of goes in with like the whole rag thing, which is like rag was this early thing in like 2022. You started getting these vector database companies. Context windows were very small.

Speaker 3

当时人们鼓吹'给AI无限记忆'的营销话术,虽然名不副实却形成了固有认知。直到现在,企业采购时还会机械地问'你们做代码库索引和RAG吗',我就反问'为什么需要这个?'

This was like a way of people called it like, oh, you can give your AI infinite memory. It's not really that, but that was like the marketing that was sold to the venture backers that were like investing in all these companies and it became this narrative that really stuck around. And like even now, like we we get like potential like, you know, enterprise perspective, like they're they're going through like the procurement process and they're it's almost like they're going through like a checklist asking like, hey, do you guys do like indexing and like of the code base and doing rag? I'm like, well, why? Like, why are you like, why do you want to do this?

Speaker 3

Boris在这个播客里说得很到位:我们尝试过RAG,效果并不理想,尤其对编程。RAG需要把整个代码库切碎成片段,扔进高维向量空间搜索——这本质上就像精神分裂,反而会干扰模型性能。

I think Boris said it said it very well on on this exact podcast where we tried rag and it doesn't really work very well, especially for coding. It's like the way rag works is you have to

Speaker 0

比如

like

Speaker 3

资深工程师接触新项目时会先观察目录结构,追踪文件引用关系,像智能体一样探索代码库——这种方式比RAG高效得多。

chunk all these files across your entire repository and like chop them up into small little pieces and then throw them into this hyperdimensional vector space and then pull out these random chugs when you're searching for relevant code snippets. It's like fundamentally, it's like so schizo and like I think it actually distracts the model and you get worse performance than just doing well like a senior software engineer does when they first they're introduced to a new repository where it's like that you look at the folder structure, you look through the files. Oh, this file imports from this other file. Let's go take a look at that. And you kind of agentically explore the repository.

Speaker 3

这就像我们发现效果更好的方式。类似的情况还有很多,简单性总是胜出。比如这个'苦涩的教训'中,快速补全(fast supply)就是另一个例子。Cursor在2024年7月推出这个他们称为即时补全的功能,当时模型的文件编辑能力还很差。在智能体语境下,文件编辑的工作原理是:先有一个搜索块,然后是一个替换块,你必须精确匹配搜索块才能进行替换。

That's like we found that works so much better. And there's like similar things where it's like like the simplicity always wins. Like this bitter lesson where fast supply is another example. So Cursor came out with this fast supply, like they call the instant apply back in July 2024, where the idea was models at the time were not very good at editing files. And the way editing files works in kind of the context of an agent is you have a search block and then a replace block where you have to like match the search block exactly to what you're trying to replace.

Speaker 3

替换块会直接进行替换。当时的模型表现不佳——我记得他们底层使用的GPT版本还不擅长精确构建这些搜索块,经常失败。于是他们想出了这个聪明的解决方案:对快速补全模型进行微调,允许当时的前沿模型输出模糊内容。让它们输出那些我们都熟悉的偷懒代码片段,比如'文件其余部分在此'、'导入语句在此'之类,然后把这些输入给经过微调的快速补全模型——可能是个量化过的7B小模型。

Then replace block just swaps that out. And at the time, models were not very good. It was like, I forget, like GPT they were using under the hood at the time wasn't very good at formulating these search blocks perfectly and it would fail oftentimes. So they came up with this clever workaround to fine tune this fast supply model where they let these frontier models at the time, they let them be vague. They let them output those like lazy code snippets that we're all very familiar with where it's like rest of the file here, like rest of the imports here, And then fed that into this fine tuned fast supply model that was like probably like a quen seven b or something quantized, very small dinky little model.

Speaker 3

他们把这种偷懒代码片段输入给这个小模型,而这个经过微调的小模型会输出包含所有代码改动的完整文件。Ader的一位创始人在早期GitHub讨论中说得很到位:'现在你不仅要担心一个模型搞砸,还得担心两个模型搞砸。更糟的是,你要把生产代码交给这个Fastify小模型处理,它的推理能力很差。'

And they they fed this lazy code snippet into this smaller model and the small model we fine tune to output the entire file with the code changes applied. And that, you know, the one of the founders of Ader said this really well in like very early GitHub discussions where he said like, well now instead of worrying about one model messing things up, now you have to worry about two models messing things up. And what's worse is the other model that you're giving that you're handing your production code to this like Fastify model. It's like it's a tiny model. Its reasoning is not very good.

Speaker 3

它的最大输出token数可能只有8千或1万6——现在他们可能在训练3万2token的版本。但我们的代码库里有长达4万2token的文件,这已经超过了这些小快速补全模型的最大输出长度。这时候你怎么办?只能再构建各种补救方案。

Its maximum output tokens, you know, there there might be 8,000 tokens, 16,000 tokens. Now they're training like 32,000 tokens maybe. And a lot of the coding files like we have a file in our repository that's like 42,000 tokens long and that's longer than the maximum token output length of one of these smaller fast supply models. So what do you do then? Then you have to build workarounds around that.

Speaker 3

你必须搭建整套基础设施来传递内容,然后它又开始出错。而且是非常隐蔽的错误——看起来运行正常,但和原始前沿模型的建议有细微差异,会给代码引入各种难以察觉的bug。我们现在看到的是:随着AI进步,应用层正在简化。

Then you have to build all this infrastructure to like pass things off and then it's making mistakes. It's like very subtle mistakes too where it's like it looks like it's working, but it's not actually what the original frontier model suggested. And it's like slightly different. And it introduces like all of these subtle bugs into your code. And what we're starting to see is like as AI gets better, the application layer is reducing.

Speaker 3

你不再需要这些聪明的补救方案,不必维护这些系统。所以摆脱RAG和快速补全的束缚真的很解放,只需专注于核心智能体循环和最小化差异编辑失败率。在我们内部测试中,Claude Sonnet 4最近将差异编辑失败率降到了5%以下——准确说是4%左右。而快速补全刚推出时失败率高达20%-30%。

You're not gonna need all these clever workarounds. You're not gonna have to maintain these systems. So it's really liberating to not be bogged down with rag or with fast supply and just focus on this like core agentic loop and and maximizing diff edit failures. Like in our own internal benchmarks, Claude Sonnet four recently hit a sub 5% or like around actually 4% diff at a failure rate at the like, when Fast Supply came out, that was way higher. That was like in the twenties and the thirties.

Speaker 3

现在我们已经降到4%了。想想看,六个月后...

Now we're down to 4%. Right? And in six months. How does

Speaker 0

会降到零吗?这个...

it go to zero? Well, it's

Speaker 3

正在向零迈进,每天都在接近。我最近和几家做快速补全的公司创始人聊过,他们试图与我们合作——微调这些快速补全模型是他们的主营业务。

going to zero like as we speak. It's going to zero every day, you know. And I was actually talking with the founders of some of these companies that do fast supply. They were trying to kind of work with us. Their whole bread and butter is fine tuning these fast supply models.

Speaker 3

比如Relays和Morph。我和他们有过非常坦诚的对话:'快速补全确实有过它的窗口期——Cursor在去年7月开启了这个窗口。你们觉得这个技术还有多久会彻底过时?你们认为这个窗口期是无限的吗?'

And you know, like relays and morph. And I had it like a very candid conversation with these guys where I was like, well, there's a window of time where fast supply was relevant. Cursor started this window of time back in July. How much time do you think we have left until they're no longer relevant? Do you think it's an infinite time window?

Speaker 3

他们坚持认为,这个阶段绝对是有限的。快速应用模型的时代肯定要结束了。我就问,那你们觉得还能持续多久?他们说大概三个月,甚至更短。不过我还是认为在某些场景下,RAG(检索增强生成)依然有用。

They're like, no, it's definitely finite. Like this this era of fast apply models is definitely coming to an end. And I was like, well, how long do you guys think? They're like, maybe three months, maybe less. So I still think there's some cases where rag is useful.

Speaker 3

比如你拥有大量人类可读文档,一个庞大的文档知识库,而你并不太关心它们内部的固有逻辑——那当然可以建立索引、分块处理、进行检索。或者如果你的组织被迫使用像DeepSeek这样不擅长搜索替换的小模型,也许可以考虑快速应用模型。

You know, if you have a lot of human readable documents, a large knowledge base of documents where you don't really care about like inherent logic within them, like sure, index it, chunk it, do retrieval on it. Or fast apply is like maybe if your organization, you're forced into using like a very small model that's not very good at search and replace like a deep seek or something. You know, maybe use a fast apply model.

Speaker 2

我认为RAG和FastApply只是工具包里的备选方案,用于应对模型在大上下文或搜索替换编辑方面表现不佳的情况。但现在它们反而成了可能引发问题的多余组件。Cognition Labs有篇关于多智能体协同的有趣文章,我...

I think Rag and FastApply were these just tools in a toolkit for when models weren't the greatest at large context or search and replace stuff editing. But now they are extra ingredients that could make things go wrong that you just don't need anymore. There was an interesting article from Cognition Labs about multi agent orchestration and I'm

Speaker 1

直接进入正题了。你就像我们的自动驾驶仪。

getting right into it. You're autopilot for us. It's like

Speaker 2

很酷。他们那篇文章确实很棒。对,是篇好文章。

That's cool. Yeah. I mean they they It's a great article by the way. Yeah. Was a great article.

Speaker 2

文章提到当你开始使用不同模型和智能体时,很多细节容易丢失。而魔鬼就在细节里——这些恰恰是最关键的。要确保智能体不会陷入循环或重复相同问题,必须保持完整的上下文。我认为应该贴近模型,直接提供全部所需上下文,而不是为了成本优化采用RAG检索或廉价模型来编辑文件。虽然让Claude Sonnet这类模型扫描整个代码库会消耗更多上下文窗口确实更昂贵,但一分钱一分货。

They they talked about how, you know, when you start working with different models, different agents, there's a lot that gets lost in the details. And, you know, the dollar in the details, that's those are the most important things and making sure that it doesn't you don't have the agents sort of like running in loops and running to the same issues again and and have sort of like all the the right context. And and so I think being close to the model, throwing all the context you need at it, not taking these cost optimized approach to pulling in relevant context using something like Rag or a cheaper model to apply edits to a file. I think ultimately, yes, it's more expensive asking a model like Claude Sonnet to do sort of all these sorts of things to grep an entire code base and to fill up its entire context. But you kinda get what you pay for.

Speaker 2

开源的另一优势是我们的开发者可以'掀开和服'查看底层——他们能清楚知道请求发送到哪里、输入什么提示词。这种透明度建立了信任,当用户每天花费10刀、20刀甚至100刀时,他们确切知道数据去向、使用什么模型、输入什么指令,因此更愿意为效果买单。

I And think that's been another benefit of of being open source is that our developers, they can peek under the kimono. They can see, you know, where their requests are being sent, what prompts are going into these things. And that creates a certain level of trust where, you know, when they spend $10.20, $100 a day, they know kind of where their data is being sent, what model is being sent to, what prompts are going into these things. And so they get comfortable with the idea of spending that much money to get the job done.

Speaker 3

没错。不从推理环节盈利很关键——这个讨论中激励机制太重要了。如果你每月只收20美元还想盈利,就不得不把重要工作转给小型模型,或用RAG做成本优化:比如只读取文件片段而非全文。但如果不靠推理赚钱,让用户自带API密钥,突然你就没有削减成本的动机了。

Yeah. It's like not making money off of inference. I think the the incentives are so they're so relevant in this discussion because, you know, if you're incentivized to you know, if you're charging, you know, $20 per month and you're trying to make money on that, you're gonna be offloading all kinds of important work to smaller models or optimizing for costs with rag like retrieval with rag, not reading in the entire file, maybe reading like a small snippet of it. Whereas if you're not making money off inference and you're just going direct, you know, users can bring their own API keys. Well, then all of a sudden, you're you're not incentivized to cut down on cost.

Speaker 3

这时你的动力纯粹是打造最佳智能体。整个行业都在朝这个方向发展——随处可见按量付费或直接为推理付费的开放模式,我认为这就是未来。

You're actually incentivized just to build the best possible agent. And we're we're starting to see this trend of the whole industry is moving in that direction. Right? You're starting to see like everyone open up to pay as you go models or pay directly for inference. And I think that is the future.

Speaker 0

客户定价的商业模式是怎样的?

What's the client pricing business model?

Speaker 2

目前,它需要接入一个API密钥。本质上,无论您预先承诺使用哪种推理服务提供商,或认为哪种模型最适合您的工作类型,只需将您的Anthropic、OpenAI、OpenRouter或其他任何API密钥输入Klein,它就能直接连接到您选择的模型。我认为这种透明度,这种我们专注于打造最佳产品而非通过价格模糊、巧妙技巧和模型编排来获取利润、降低成本并优化高收益的做法,使我们处于独特的位置,能够真正发挥这些模型的全部潜力。这一点已经得到了证明。

Right now, it's bringing an API key. Essentially, just whatever pre commitment you might have to whatever inference provider, whatever model you think works best for your type of work, you just plug in your Anthropic or OpenAI or OpenRouter or whatever it is, API key into Klein and it connects directly to whatever model you select. And I think that level of transparency, that level of we're building the best product, we're not focused on sort of capturing margin on the price obfuscation and clever tricks and model orchestration to keep costs low for us and optimize for higher profits. I think that's put put us in this unique position to really push these models to their full potential. And I think that's shown.

Speaker 2

我认为这就是一分钱一分货。在Klein中执行任务可能会很昂贵,但

I think that's you get what you pay for. Throw a task in Klein and it gets expensive, but

Speaker 3

这就是智能的成本,对吧?

That's the cost of intelligence. Right?

Speaker 2

没错,这就是智能的成本。所以目前的商业模式是:您可以自由选择——它是开源的,您可以分叉它,选择数据输入的方式,选择支付对象。我们接触过的许多组织都能从这些提供商那里获得一定程度的批量折扣,因此他们可以通过Klein利用这一点,这很有帮助,因为Klein可能会相当昂贵。

It's the cost of intelligence. Yeah. So yeah, the the business model right now is is you you get to choose kinda where it's open source, you can fork it, you can choose where your data gets in, you can choose who you wanna pay. A lot of organizations we've talked to get some A certain level of volume based discounts with these providers and so they can take advantage of that through client which is helpful because client get pretty expensive and yeah.

Speaker 1

等等,我还是没听明白你们怎么赚钱。

Wait, I mean, I'm still not hearing how you make money.

Speaker 2

就像你

Like you

Speaker 1

说的你们不赚钱,为什么?为什么要赚钱?是啊,因为你们得发工资吧?

said you don't, Why? Why make money? Yeah. Because you have to pay your salaries?

Speaker 3

不。很多人问我们这个问题,我总是反问他们为什么,但事实是

No. That that's the that's the a lot of people ask us that and I always just throw the why at them, but it's These

Speaker 1

他们不像派对狂人那样。派对狂人是

are not like the party full guys. Party full is

Speaker 3

真正的答案是企业级市场。

like The real answer is enterprise.

Speaker 1

我们可以这么说是因为你知道,我们在你发布时推出了这个。

Which we can say because you're you know, we released this when you launch it.

Speaker 3

对,对。所以你想聊聊企业版吗?

Yeah. Yeah. So you wanna talk about enterprise?

Speaker 2

是的。我认为在API密钥方面保持开源,让我们在这些高度重视数据隐私、控制和安全的组织中获得了轻松的采用。人们很难承诺将代码以明文形式发送到天知道是什么的服务器上,用可能将他们的知识产权输出给随机用户的模型来训练他们的数据。我认为人们现在更加关注他们的数据被发送到哪里以及被用来做什么。因此,这给了我们一个机会来说,好吧,没有任何数据会经过我们自己的服务器。

Yeah. I think being open source when you're in API key has given us a lot of easy adoption in these organizations where things like data privacy and control and security are top of mind. And it's hard to commit to sending their code in plain text to God knows what servers, training their data to do training their data on models that might output their IP to random users. I think people are a lot more conscious about where their data's getting sent and what's being used to it. And so it's given us this opportunity to say, okay, nothing passes through our own servers.

Speaker 2

你可以完全控制整个应用程序中数据的去向。这让我们在过去几个月里与许多组织交流时,获得了这种轻松的采用,我认为这为我们提供了更紧密合作的机会,我们可以问:我们能做些什么来帮助你的组织其他部分的采用?本质上,我们如何在这些组织中为人们对Klein的热情火上浇油,推广代理编码的使用,我认为在企业层面上。

You have total control over the entire application where your data gets sent. And that's given organizations that we've been talking to over the course of the last couple of months this sort of like easy adoption and I think this opportunity for us to work more closely with them and say, what are all the things that we can do to help with adoption in the rest of your organization? Essentially, how can we pour gasoline on sort of the evangelism that people have for Klein in these organizations and spread the usage of of agentic coding, I think at an enterprise level.

Speaker 3

嗯,是的。疯狂的是,我们开源了Klein。人们真的很喜欢它。开发者们在他们的组织中使用它。他们的组织勉强接受了,因为他们看到我们是开源的,没有把他们的数据发送到任何地方。

Well, yeah. What's what's crazy is so we we had we open sourced Klein. People really liked it. Developers were using it within their organizations. Their organizations were kind of like reluctantly okay with it because they saw like we're open source and we're not sending our date their data anywhere.

Speaker 3

他们可以使用他们现有的API密钥。然后我们在网站上推出了一个企业联系表单,如果你对企业版感兴趣,请联系我们。当时我们并没有真正的企业产品。结果我们收到了大量大型企业的联系。你知道,有一家财富五强公司找到我们说,嘿,我们有数百名工程师在组织内部使用Klein,这对我们来说是个大问题。这就像一场我们需要扑灭的大火,因为我们不知道他们在使用什么API密钥,花了多少钱,数据被发送到哪里。

They could use their existing API keys. And then we launched like on our website, like a contact form for enterprise, like if you're interested in enterprise offering hit us up and we had no real enterprise products at the time. And it turned out like we just got this massive influx of big enterprises reaching out to us. And you know, we had a fortune five company come up to us and they were like, hey, we have hundreds of engineers using Klein within our organization and this is a massive problem for us. This is like a fire that we need to put out because we have no idea what API keys they're using, how much they're spending, where they're sending their data.

Speaker 3

请让我们给你们钱来开发一个企业产品吧。所以这个产品就这样应运而生了,对吧?

Please just like let us give you money to make an enterprise product. So the product kind of just evolved out of that. Right?

Speaker 2

对,对。我的意思是,这真的只是更多地倾听我们的用户。所以在推出这个页面后,我们收到了大量对企业基本功能的需求,比如安全护栏、治理和洞察力,这些组织的管理员需要可靠地使用像Klein这样的工具。是的。

Right. Right. I mean, it's it really just comes down to more of listening to our users. So right after we put out this page, we just had a lot of demand for sort of like the table stake enterprise features, the security guardrails and governance and insights that sort of like the admins in these organizations need to to reliably use something like Klein. Yeah.

Speaker 2

很多人希望我们给他们两样东西。发票,只是为了帮助预算和花费那数千美元。

We've gotten a lot of people wanting us to sort of give them two things. Invoices just to help with like all the budgeting and spending the thousands of dollars.

Speaker 3

所有的欧洲人。

All the Europeans.

Speaker 2

是的。另一个让我觉得有点意外的是,客户对他们所提供效益的某种程度的洞察。比如节省了多少时间或编写了多少行代码,因为这能让那些推动组织采用这类工具的人工智能先行者,把这些数据作为证明点,去向团队其他成员说:'看,这个工具帮了我这么多。你们也得开始用起来,我们才能跟上行业步伐。'

Yeah. Just the other thing which I thought was a little bit surprising was some level of insight into the benefit that clients providing them. So it could be hours saved or lines of code written because it allows these sort of like AI forward drivers for adopting these sorts of tools in these organizations to take that as a proof point and go to the rest of their teams and say, this is how much client's helping me. You need to start adopting this so we can keep up with the rest of the industry.

Speaker 1

这是为了让内部推广者证明投资回报率吗?

This for like internal champions to prove the ROI?

Speaker 2

没错。可以当作这类证据来合理化支出。同时也用于在这些组织内部推广产品。

Exactly. Okay. Use as sort of evidence for this, to justify the spend Yeah. But also to promote the product in these organizations.

Speaker 1

我们可以稍后再做这个,但我们想采访这些人,在播客里展示他们是如何向上级汇报的,这样我们能更了解情况。因为我们通常只和开发者工具的创始人、构建者对话,却很少接触终端用户。实际上我们很想听听他们的想法——比如他们如何看待这个工具,他们需要什么。

We can do this afterwards, but we would like to talk to those and actually feature some of them, what they're saying to their bosses on the podcast so that we can get a sense. Because oftentimes, we here, we only talk to founders and builders of, like, the dev tool, but, like, not the end consumer. And, actually, we we wanna hear from them. Right? Like, about how they're thinking about it, what they need.

Speaker 1

挺有意思的。我想深入探讨的是OpenRouter与你们企业版服务的关系。据我理解目前所有请求都经过OpenRouter处理?

Kinda cool. One thing I wanted to ask to double click on is the relationship between OpenRouter and then like your your enterprise offering. Right? So my understanding is currently everything runs through OpenRouter.

Speaker 2

不是全部。用户可以自带OpenAI、Anthropic或Bedrock的API密钥。

Not everything. So you can bring API keys to OpenAI, Anthropic, Bedrock.

Speaker 1

然后你们会建立直连通道

And then you have a direct connection there

Speaker 2

如果...如果用户那边建立了直连通道的话

if if I The the user has a direct connection there.

Speaker 1

没错。其他所有请求都会通过OpenRouter。所以企业版本质上就是拥有专属的OpenRouter实例,企业可以对这个实例进行可视化和管控。

Correct. Everything else would run through OpenRouter. And so basically, the enterprise version of client would be you have your own OpenRouter that you would provide visibility and control to that enterprise.

Speaker 3

对。这相当于自托管方案。其实很多企业不一定要自托管,只要能用他们自己的Bedrock API密钥之类的就行。

Yeah. Like, that's for like the self hosted Yeah. Option. Right? Like, there's a lot of enterprises where they're okay with not self hosting, but as long as they're using their own bedrock API keys and stuff like that.

Speaker 3

而那些真正对自托管或团队管理感兴趣的人,内部会有一个类似这样的路由器在运作。

Whereas the ones that are really interested in like self hosting or like that wanna be able to manage their teams, there would be like this internal router going on.

Speaker 1

这里有趣的是,如果模型成本直接归零会怎样?比如Gemini代码直接开源宣布:嘿各位,免费使用。

The curious thing here is like, what if what if model cost just go to zero? Like Gemini code just comes out and it's like, yeah, guys, it's free.

Speaker 2

确实。不过他们真这么做的话对我们倒是好事。我们的观点是:推理服务本身不是门生意

Well, yeah. No. They come out with yeah. It'd be great for us. So our thesis is inference is not the business

Speaker 1

你永远无法通过推理服务赚钱。

You would just never make money on inference.

Speaker 2

我们想要给予终端用户完全透明的价格体系——我认为这对消除大额支出的心理障碍至关重要。当前行业的价格不透明导致开发者抵触按量付费模式。现在很多人开始认同这种理念:基础套餐保证产品可用性,但在推理服务上保持开放,尊重开发者的知情权——不仅要公开成本,还要说明模型选用逻辑,让他们能放心投入必要资金。当然可以用RAG和FasterPlan这类技巧来控制成本。

We yeah. We wanna give the end user total transparency into price, into which I think is like incredibly important to even get comfortable with the idea of spending as much money as you do. I think the price obfuscation in this space has given developers this reluctance to opt into usage based plans. We're seeing a lot of people kind of converge on this concept of, okay, maybe have like a base plan just to use the product, but sort of get out of the way of the inference and respect the end developer enough to give them the level of insight into not just the cost but the models being used and give them more confidence in spending however much it takes to get the work done. I think there you know, you know, you can use tricks like Rag and FasterPlan, things like that to keep costs low.

Speaker 2

但总体来说,编程智能体的投资回报率足够高,人们愿意为完成任务买单。

But for the most part, there's enough ROI on on coding agents where, you know, people are willing to spend Yeah. Money to to get the job done.

Speaker 3

对于真正优秀的编程智能体,其ROI甚至难以量化——它让我完成了很多原本根本不会去尝试的事情:临时实验、副项目、或是那些压根想不到要修的随机bug。这种价值怎么衡量?

And for a truly like good coding agent, the ROI is almost hard to even calculate because there's so many things that I would have never even bothered doing. But then I now I have client and I could just like do this weird experiment or do this side project or, you know, fix this random bug that I would have never even thought about. So like how do you measure that?

Speaker 1

没错。在转向上下文工程和内存话题前,我想探讨下后台智能体与多智能体系统的变体。目前的典型实现比如Codex每分钟自动生成PR,或是Devin、Cognition这类系统。

Yeah. Right? One variant of this problem we're about to move on to context engineering and memory and all the other stuff. One variant of this I wanted to touch on a little bit was just background agents and multi agents. So the instantiations of this now, I would say, are background agents is it would be codecs, for example, like spinning up, you know, one PR per minute and or Devin or Cognition.

Speaker 1

你们会考虑这种方案吗?具体来说:客户端会部署在服务端吗?另一种方案是保持本地运行但采用并行智能体架构。

So would you ever go there? That's one concrete question I can ask you. Like, would there be client on the server? Whatever. And then the other version is still on the laptop, but more sort of parallel agents.

Speaker 1

比如现在大热的看板系统——人们正在为Cursor和Cloud Code开发并行任务管理界面,各种后台并行方案都很火爆。

Like, kind of the the Kanban is is currently very hyped right now. People are making, like, Kanban interfaces for cursor and also for Cloud Code. Just anything like in a parallel or background side

Speaker 2

的一些事情。

of things.

Speaker 3

我们正在发布Klein的CLI版本。使用这个CLI版本时,它是完全模块化的。你可以让Klein通过CLI启动更多客户端,或者在GitHub Action等云端流程中运行客户端,随心所欲。因此CLI确实是Okay的理想形态。

We're releasing a CLI version of Klein. And using the CLI version of Klein, it's fully modular. So you can ask Klein to run the CLI to spin up more clients. Or you could run client in some kind of cloud process in a in a GitHub action, whatever you want. So the the CLI is really the form factor for Okay.

Speaker 3

这类完全自主的智能体。另一个优势是能够接入你电脑上运行的现有客户端CLI,并接管引导其方向。这也是可行的。你觉得呢,Sad?

These kind of fully autonomous agents. And it's also nice to be able to tap into an existing client CLI running on your computer and be able to like take over and steer it in the right direction. So that's also possible. But what do you think, Sad?

Speaker 2

我认为这不是非此即彼。这些不同模式实际上能很好互补——无论是Codex、Devons还是Cursor的后台代理,本质上都在实现相同目标。如果我们推出自己的版本,它将成为其他开发者构建的基础。就像Nick的哥哥Andre,他的思维总是超前十年。

I don't think it's an either or. I think all these different modalities complement each other really well. So the Codex, the Devons, Cursor's background agent, I think they all sort of accomplish the same thing. They if we were to come out with our own version of it, I'd say that it would be the foundation for how other developers could build on top of it. So Nick's older brother, Andre, he's sort of thinking ten years ahead.

Speaker 2

他关于这个领域发展方向的某些想法总让我惊叹。最近我们讨论过构建一个开源框架,用于为任何平台开发编程代理。构建必要的SDK和工具,将客户端带到Chrome扩展、CLI、JetBrains、Jupyter Notebooks,甚至你的智能汽车——任何地方。包括你的冰箱。你的冰箱。

And it always kind of blows my mind a little bit about some of the ideas that he has about where the space is going. But we recently had a discussion about building this open source framework for coding agents for any sort of platform. Building the SDK and the tool necessary to bring client to know, Chrome as an extension, to the CLI, to JetBrains, to Jupyter Notebooks, to your smart car, whatever it is. But to build the Your fridge. Your fridge.

Speaker 2

没错。就是要

Exactly. To to put to

Speaker 3

微波炉也可以。

Microwave maybe.

Speaker 2

对,正是如此。

Yeah. Exactly.

Speaker 3

我是说,这个

I mean, this

Speaker 2

就像我们看到的6000个Klein分支那样,我们搭建的这个基础让开发者社区能够在其上构建,利用他们的实验、想象力和创造力探索领域前景。展望未来,构建开源基础和模块,将Clion这类工具扩展到软件开发或VS Code插件之外,将开启互补性可能。这永远不会是非此即彼——后台代理适合某些工作,多代理并行适合试验着陆页的五个版本,而像Klein这样的单代理对话则非常适合提取上下文并制定复杂任务的执行计划。

is what we saw kind of like with sort of the, you know, the 6,000 forks, you know, on top of Klein is we sort of like put together this foundation for how this community of developers we sort of put together this foundation that this community of developers could like build on top of and sort of take advantage of, their experiments and imagination and their creativity about where the space is headed. And I think looking forward, building an open source foundation and the building blocks for how we bring something like Clion to things that go outside the scope of software development or or, you know, Versus Code extension. I think that'll open up the door to things that, you know, ultimately complement each other really well, but it'll never be sort of this like either or thing. Think background agents are good for certain kinds of work and parallel can be multi agents might be good for when you wanna experiment and iterate on, you know, five different versions of, you know, how a landing page might look. And then something like a back and forth with a single agent like Klein works really well for when you wanna, you know, pull context and put together a really complicated plan for a really complex task.

Speaker 2

我认为所有这些不同的工具最终会相互补充,人们会逐渐培养出对不同工作最适合哪种工具的品味和理解。但展望未来十年,我们至少希望站在前沿,为背景代理或多代理之后的下一个事物提供基础构建模块。

And I think all these different tools will ultimately end up complimenting each other and people will kind of develop a taste and an understanding for what works best for what kind of work. But I I think something just looking ten years ahead, we at the very least wanna sort of be at the frontier of providing sort of the building blocks for what the next thing is after background agents or multi agents.

Speaker 1

我本来想谈谈当下热门的话题——上下文工程。我觉得这和RAG(检索增强生成)的思路有些相似,RAG就像一种思维病毒,顺便说一句我很喜欢你这种表述方式。对了,你在文档里提到了上下文管理,还有个关于记忆库的章节,这挺酷的。

I was gonna go into context engineering kind of like topic du jour. I think that this is kinda similar ish in a thread to RAG and how RAG is a mind virus, which I love by the way that the way that you phrased it. Yeah. You you you have you have in your docs context management. You also have a section on memory bank, which is kinda cool.

Speaker 1

我觉得很多人都在试图理解记忆机制。我们先从宏观层面开始,稍后再深入讨论记忆。那么,对你来说上下文工程意味着什么?

I think a lot of people are trying to figure out memory. Let's just just start at the high level and then we'll go into memory later. What, you know, what does context engineering mean to you?

Speaker 2

上下文工程对我来说意味着什么?

Context engineering mean to me?

Speaker 3

是指提示词工程吗?对。

Means prompt engineering? Yeah.

Speaker 1

没错。我觉得这里有很多艺术成分,比如决定哪些内容应该放入上下文。我认为构建优秀代理80%的关键就在于确定上下文的组成。MCP系统与客户端之间的交互,比如推荐提示词,我认为这才是打造优质代理的核心要素。

Right. Like, I mean so I think, like, there is a lot of art to, like, what goes in there. You you I think that really is, like, the eighty twenty of building a really good agent is, like, figuring out what goes into the context. And and I go, you know, I think interplay between MCP and your system client, like, you know, recommended prompts, I think, is what is ultimately making a good agent.

Speaker 3

是的。我认为上下文管理包含两部分:一是加载哪些内容到上下文中,二是当接近上下文窗口限制时如何进行清理。如何策划从零到最大上下文窗口的整个生命周期?

Yeah. I think context management is like one part of it is what you load in to context. The other part of it is how do you clean things up when you're reaching the context window. Right? How do you curate that whole life cycle from zero to maximum context window?

Speaker 3

我的思考角度是:我们面临太多选择,也存在太多可能导致代理偏离方向或分心的风险。比如RAG或其他形式的检索是一种思路,而代理自主探索是我们发现的另一种更有效的方案。

And the way that I think about it is there's so many options on the table and there's so many risks to misdirecting the agents or distracting the agents. There's ideas about, you know, rag or other kinds of forms of retrieval. That's that's one idea. There's the agentic exploration. That's another idea that we found works much better.

Speaker 3

当前趋势似乎是:在加载内容到上下文时,应该赋予模型自主决定权,让它能使用工具主动拉取内容,同时提供一些线索指引——就像给模型一张当前状况的认知地图。比如抽象语法树,或者VS Code中打开的标签页(这在我们内部基准测试中效果极佳),当开着几个相关标签页时,模型几乎能读心般地理解意图。

And it seems like the trend is generally for loading things into context. It's giving the model the tools that it can use to pull things into context, letting the model decide what exactly to pull into context, as well as some hints along the way, kind of like a like a a map of what's going on. Like ASTs, abstract syntax trees, potentially what tabs they have open in Versus code that was actually in our internal kind of benchmarking that turned out to work very very well. It's almost like it's reading your mind when you have like a few tabs It gets me

Speaker 1

这让我有点慌,因为有时候我开着不相关的标签页,就不得不先关掉它们...

out because like sometimes then I'm like, I have like unrelated tabs open and I have to go close them before

Speaker 3

我来开个头。我觉得不必想太多,尤其是使用Klein时。Klein在导航这方面做得相当不错。但确实存在边界情况对吧?任何事物都有边界情况,这就像——主要使用场景是什么?比如当你开始一个全新任务时,连一个相关标签页都没打开的情况?

I kick off the thing. I I wouldn't think too much about especially when you're using Klein. Klein does a pretty good job of just navigating that. But I definitely there are edge cases, right? There's edge cases for everything and it's kind of like, okay, what's like the majority use case is like, you know, when are you starting a brand new task and you don't have a single tab open that's relevant to it?

Speaker 3

显然在CLI中可能没有那个小指示器,所以需要跳出框架思考。这是关于读取上下文内容的场景。而上下文管理的挑战在于接近上下文窗口容量上限时如何压缩?早期我们尝试过简单粗暴的截断方法——直接丢弃对话前半部分。

Obviously, in the CLI, you might you don't have that little indicator. So there you have to think outside the box for that. So that's like for reading things into context. And then for context management is when you're approaching the full capacity of the context window is how do you condense that? And we've played around with this kind of naive truncation very early on where we just like throw out the first half of the conversation.

Speaker 3

这很常见。但显然有问题,就像你从书的中途开始阅读——完全不知道之前发生了什么。我们非常重视叙事完整性:每个Client任务都像故事,可能是个无聊故事——比如孤独的编码代理决心帮你解决问题,就像主角需要克服的核心障碍就是任务解决。

That's common. And there is problems with that, obviously, because it's like kind of like you're halfway through a book and you you're like, you start reading halfway through, right? You don't know anything that happened beforehand. And we like to think a lot about like narrative integrity is like every task in client is kind of like a story. It might be a boring story where it's like this lonely coding agent that's just, you know, determined to help you solve, you know, whatever it is like the child, like the the big thing that the protagonist needs to overcome is like the resolution of the task.

Speaker 3

对吧?关键是如何保持叙事完整性,让代理能预测下一个标记(token)——预测故事下一部分来达成结论。我们尝试过清理重复文件读取,效果不错。但本质上这还是需要思考:如果直接让模型决定上下文该保留什么?另一种形式是摘要——总结所有相关细节后替换进去。

Right? But how do we maintain that narrative integrity where every step of the way the agent can kind of predict the next token, which like predict the next part of the story to reach that conclusion. So we played around with things like cleaning up duplicate file reads That works pretty well. But ultimately, this is another case where it's like, well, what if you just give the model like, what if you just ask the model, like, what do you think belongs in context? Another form of this is summarization, which is like, summarize all the relevant details and then we'll swap that in.

Speaker 3

这个方法效果非常非常好。

That works really really well.

Speaker 1

嗯。关于提到的AST(抽象语法树)想深入问下:这个功能很冗长,你们什么时候会使用它?

Yep. Double clicking on the AST mentioned. That's very verbose. When do you use that?

Speaker 2

目前它是个工具。工作原理是当Klein进行代理式探索、试图获取相关上下文时——比如想了解某个目录的情况,就有工具让它获取该目录所有语言元素:类名、函数名等,使其初步判断该文件夹内容是否与任务相关。若相关就会深入读取完整文件。本质上是帮助它在大代码库中导航的途径。

Right now it's a tool. The way that it works is when Klein once when Klein is doing sort of the agentic exploration of trying to pull in relevant context and it wants to sort of get an idea of what's going on in a certain directory for example, there's a tool that lets it pull in all the sort of language from a directory. So it could be the names of classes, the names of functions, and that gives it some idea of, okay, here's what's going on in this folder. And if seems relevant to whatever the task is trying to accomplishes, then it sort of zooms in and starts to actually read those entire files into context. So it's essentially a way to help it kind of figure out how to navigate through large code bases.

Speaker 3

对。我们看到有些公司在研发类似AST但兼具知识图谱的技术。你可以用近乎确定性的操作来查询,比如'找出代码库中所有未被调用的函数并删除'。代理能用类似SQL的语言操作知识图谱执行这类全局操作。

Yeah. We've seen some companies working on it's like an interesting idea. It's like an AST, but it's also a knowledge graph. And you can run these discrete deterministic almost like actions on this knowledge graph where you could say like, hey, find me all the functions that find me all the function of the code base and find me all the functions that aren't being used and delete all of them. And the agent can kind of reason in this almost like SQL like language working with this knowledge graph to do these kinds of global operations.

Speaker 3

现在如果让编码代理删除未使用函数或做大重构,有时能成功,但多数情况会消耗大量token最终失败。而用这类工具,它可以通过简短查询语句操作整个仓库。这种超越AST的、用于查询知识图谱的语言有很大潜力。但就像Cloud four发布所示——前沿模型厂商会针对自身应用层训练,你设计的巧妙工具理论上可行,却可能因Cloud four的训练方式而不兼容。

Like right now, if you ask a coding agent to go through and remove all unused functions or do like some kind of large refactoring work, In some cases, it might work, but very oftentimes, it's just gonna struggle a lot, burn a lot of tokens and fail ultimately. Whereas with these kinds of tools, it can actually operate on the entire repository with these kinds of query, like short little query statements. I think there is a lot of potential in something like this where it's like the next level beyond the AST and it's like a language for querying this this kind of knowledge graph. But like we've seen with with like the Cloud four release is these frontier model shops, they tend to train on their own application layer. And you might come up with like a very clever tool that in theory would work work really well, but then it doesn't work well with Cloud four because Cloud four is trained to rep.

Speaker 3

对吧?这现象很有趣:你期待前沿模型变得更通用,结果它们反而更专业化,迫使你需要适配不同模型家族。

Right? So that's another interesting phenomenon where it's like you're you're expecting these frontier models to become more generalized over time. But instead, they're becoming more specialized and you have to like support these different model families.

Speaker 0

关于记忆功能做个总结。记忆本质上是信息摘要的产物。你汇总上下文后提取出某些侧面。在这方面有什么有趣的发现吗?比如那些可能不太直观的方面,特别是针对代码的?人们能理解关于人类的记忆,但关于代码库的记忆是怎样的呢?

Just to wrap on the memory side. Memory is almost the artifact of summarization. So you summarize the context and then you kinda extract some sides. Any interesting learnings from there, like things that are maybe not as intuitive, especially for code? I think people grasp the, like, memory about humans, but, like, are memories about code bases and things look like?

Speaker 2

我认为目前大部分记忆功能基本没用。编程助手真正需要记住的,可能是你们团队在项目中的特定工作习惯,比如强制使用驼峰命名法这类规则。这类内容更适合放在通用指南或规则文件里。但让编码助手记住项目相关记忆或工作方式时,往往需要强制它存储到记忆库中。

I think memories right now for the large part are mostly useless. I think the kinds of the kinds of memories that you might want the coding agent to hold onto are, you know, specific quirks about how, you know, your team works in the project or certain rules like only use CamelCase for example. It's better to place those sorts of things in like a general sort of guideline or rules file for example. But I found that this idea of asking the agent, at least coding agents to hold onto certain memories about the project or how you work or things like that. Mostly have to force it to store those things into memory.

Speaker 2

我觉得人们不愿操心这些事。我们正在思考的是:如何自动保存这些助手在过程中学到的、人们既没记录也没写入规则文件的团队隐性知识,而不需要用户特意强制存储到记忆数据库。

I don't think people They don't wanna have to think about those sorts of things. So it's something we're we're thinking about is is how can we hold onto the tribal knowledge that these agents learn along the way that people aren't documenting or putting into rules files without the user having to go out of their way to sort of force them to store these things into a memory database, for example.

Speaker 3

那些属于工作空间规则或团队隐性知识,比如团队通用模式。我们内部做过实验:开发了一个待办事项工具,每次都可以重写任务清单。我们会定期传入最新清单状态作为上下文,发现这能帮助助手在多次上下文摘要压缩后仍保持正轨,甚至能突然从头构建超出上下文窗口长度十倍的复杂任务。

Those were like kinda like workspace rules or tribal knowledge, like general patterns that you use as a team. But then there is like in our we ran this like internal experiment where we built this to do list tool where it was only one tool where you could just write the to do. And every time you could like rewrite the to do from scratch, and we would passively as part of every like not every message, but like every once in a while we would pass in this context of what the latest state of this to do list is. And we found that that actually keeps the agent on track after multiple rounds of context summarization and compaction. And it could all of a sudden build like an entire complex kind of task from scratch over, you know, 10 x the the context window length.

Speaker 3

内部测试效果非常理想。我们正在完善这个功能。早期版本的记忆库其实源自市场部的Nick Bauman提出的概念——类似让Klein助手工作时始终带着草稿纸记录进度。现在这个功能更内建化了,对助手记录'已完成事项/待办事项'会非常有帮助。

And in internal testing, this was like very very promising. So we're trying to flesh that out and I think something like that, we we had earlier versions of the the memory bank, which actually are like Nick Nick Bauman, our our marketing guy came up with this memory bank concept where it was like this Klein rules where he would tell Klein like, hey, whenever you're working, have the scratch pad of what you're working on. And this is like a more built in way of doing that. And I think that also might be very, very, very helpful for the agents to just have like a little scratch pad of like, hey, what have I done so far? What's left?

Speaker 3

比如具体涉及哪些代码文件、通用上下文,并在不同会话间传递这些信息。

Specific app file mentions like what kind of code we're working on, general context, and passing that off between sessions. Yeah.

Speaker 0

对Cloud MD、Agents MD和Agent MD有什么看法?我开发过开源工具Agents 927(像XKCD漫画那样),可以跨文件名复制粘贴内容。你觉得应该用单一文件吗?还有ID规则与代理规则之争。

Any thought on Cloud MD versus Agents MD versus Agent MD? I built an open source tool called Agents nine twenty seven, like the XKCD that just copy pastes it across all the different file names. So all of them have access to it. Do you think there should be a single file? There's also the ID rules versus the agent rules.

Speaker 0

这里存在不少问题。

There's kinda like a lot of issues.

Speaker 2

我认为不同工具保留各自指令很合理。使用Cursor规则和Client规则时,我希望它们以不同方式工作——Client代理的运作方式应与Cursor操作代码库的方式不同。每个工具都针对特定工作类型,需要不同指令。虽然有人抱怨这会让代码库显得杂乱,但分离设置对我极其有帮助。

I actually think it's fine that each of these different tools have their own specific instructions because I find myself using a cursor rules and a client rules separately. When I want client the agent, I want him to work a certain way that's different than how I might want cursor to interact with my code base. So I think each tool is specific to the kind of work that I do and I have different instructions for how I want these things to operate. So I think I've seen like a lot of people complain about it and I get that they could make code bases look a little bit ugly, but for me it's been like incredibly helpful for them to be separated.

Speaker 1

我注意到你用了'他'来指代Client,这个客户端是有...

I noticed that you said him. Does this client have an

Speaker 2

克莱恩是个新手吗?

Klein's a beginner?

Speaker 1

是啊。好吧。他有完整的背景故事吗?

Yeah. Okay. Does he have a whole backstory?

Speaker 2

有。性格?没有。克莱恩(Klein)其实是CLI的双关语。对。还是个编辑器。

Yeah. Personality? No. Klein so Klein is a play on CLI Yeah. And editor.

Speaker 1

因为他以前是云端开发者(Cloud Dev),现在叫克莱恩了。

Because he used to be Cloud Dev and now it's Klein.

Speaker 2

没错。我觉得克莱恩在这个领域里挺突出的,因为它比光标代理、副驾驶或级联这类东西更人性化一些。

Yeah. I feel like Klein kind of stands out in the space for having for being a little more humanized than something like, you know, a cursor agent or a copilot or a cascade. I

Speaker 1

其实还有德文(Devin),那可是个真人名字。

think Well, there's Devin, which is a real name, you know.

Speaker 3

克劳德(Clawd)也是真人名字,那可是...

Well, Clawd is a real name, I That's a

Speaker 2

真人名字。对。我...我觉得我们都有意让它更人性化,因为至少在协作时能给你更多信心。我觉得可以更依赖它一点。和代理建立信任需要...我觉得人性化这方面对我个人很有帮助。

real name. Yeah. I yes. I've been I've been in I think we've all been intentional about just sort of humanizing it because it at least in working with kinda gives you more confidence and I don't know that I could like lean on it a little bit more. There's there's kind of a of a trust building with a I think with an agent And the humanizing aspect of it, I think has been helpful to me personally.

Speaker 3

这就像回到叙事完整性。我认为给代理赋予人性特征其实非常重要。因为它们做的每件事都像个小故事。如果没有鲜明的个性,效果就会打折扣。开发这些代理时,我们就该这么看待它们。

This goes back to like the narrative integrity. It's just it's actually really important, I think, to anthropomorphize agents in general. Because everything they do is like a little story. And without having a distinct kind of identity, you get worse results. And when you're developing these agents, that's kind of how we need to think about them.

Speaker 3

对吧?我们要像 crafting 这些故事一样。我们几乎像是好莱坞导演。对吧?我们把所有要素安排到位,让故事自然展开。

Right? We need to think that we're like crafting these stories. We're almost like Hollywood directors. Right? We're we're putting all the right pieces in place for the story to unfold.

Speaker 3

确实,围绕这一点建立身份认同非常重要。而Klein,你知道的,他是个酷酷的小家伙。他,怎么说呢,他——

And yeah, having an identity around that is really really important. And Klein, you know, he's a cool little guy. He's, you know, he's

Speaker 1

就是个随和的家伙。他是个

Just a chill guy. He's a

Speaker 3

随和的人。他总在帮助我们。明白吗?他总是乐于助人。但如果你让他别高兴,他也能变得非常暴躁。

chill guy. He's helping us out. You know? He's always, like, happy to help. Or if you tell him to not be happy, he can be very grumpy.

Speaker 3

懂吧?所以这很棒。

You know? So that's great.

Speaker 1

太棒了。我知道你们在招人。你们现在有20人,目标是100人。还有漂亮的新办公室。

Awesome. I know you're hiring. You are you have you're 20 people now. You are aiming to a 100. You have a beautiful new office.

Speaker 1

加入Klein的最佳理由是什么?

What's your best pitch for working at Klein?

Speaker 2

目前我们的招聘主要靠熟人推荐,来自我们社交圈的人,或是曾合作过的可信赖伙伴——我们知道这些人能应对我们正在攻克的艰巨挑战。前方困难重重,但这个问题领域可能是当下最令人兴奋的工作方向。工程师通常喜欢做能让自己生活更轻松的事,而我想象不出还有什么比开发编程智能体更激动人心的了。虽然带点主观色彩,但这个领域确实充满吸引力。

A lot of our hiring right now is so far it's been just friends of friends, people in our network, people that we've worked with before that we've trusted and that we know can show up for like this incredibly hard thing that we're working on. And there's a lot of challenges ahead and I think the problem space is probably the most exciting thing to be working on right now. Engineers in general love working on things that make their own lives easier. And so I couldn't imagine working on something more exciting than than a coding agent. And and you know, it's a little biased, but I think a large part of it is it's it's an exciting problem space.

Speaker 2

我们寻找真正有动力的人,愿意共同探索未来十年的技术图景,为后智能体时代奠定基础,参与定义行业发展方向。我们拥有充满热情的开发者社区,开源模式为我们积累了巨大善意,收到的反馈极具建设性,深刻影响着产品路线图。与这样的社区共事是最有成就感的事。虽然现在办公室正在过渡期,但我们经常组织卡丁车、皮划艇这类团建活动。

We're looking for really motivated people that wanna work on challenges like figuring out kinda like what the next ten years looks like and building kind of the foundation for what comes next after background agents or multi agents and really help in sort of defining how all this shapes up. We have this like really excited community of users and developers. I think being open source also created a lot of goodwill with us where a lot of the feedback we get is like incredibly constructive and helpful in shaping our roadmap and the product that we're building. And working with a community like that is like one of the most fulfilling things ever. Right now we're kind of in between offices, but, you know, doing things like go karting and kayaking and things like that.

Speaker 2

所以虽然工作强度大,但我们确保整个过程充满乐趣。

So it's it's a lot of hard work, but, you know, we we make sure to to have fun along the way.

Speaker 3

没错。Klein是家独特的公司,因为这里真的像一群朋友在共同打造很酷的东西。我们工作极其努力,而这个领域不仅是竞争激烈,简直是超高度竞争。

So Yeah. No. Like Klein is a it's a unique company because it it really does feel like we're all just like friends building something cool. And we work really, really hard and the space is it's not just competitive. It's like hyper competitive.

Speaker 3

资本正涌入每一个可能的竞争对手。正如我所说,我们面临分叉再分叉的局面,有些项目筹集了数千万美元。我们发展非常迅速,目前有20人,目标是在年底前达到100人规模。

There's like, capital is flowing into all every single possible competitors. We have forks to forks, like I said, raising tens of millions of dollars. And we're growing very rapidly. We're at 20 people now. We're aiming to be at a 100 people by the end of the year.

Speaker 3

开源模式有其独特挑战。我们投入大量研究,进行基准测试确保差异编辑算法的稳健性,优化模型以最大限度减少差异编辑失败。但当我们开源成果并发布在推特上时,就会有人说'感谢开源,我要用这个去融资做自己的产品'。

And being open source, it has its own challenges. It's like people we do all this research. We do all this benchmarking work to make sure our diff editing algorithm is robust the way we're working with these models to optimize for the lowest possible diff edit failures. And then we open source that and then we post it on Twitter and someone's like, oh, thanks so much for open sourcing that. I'm gonna go and like raise much money with like our own product with it.

Speaker 3

但我的看法是:让他们模仿吧。我们是这个领域的领导者,正在为整个行业指明方向。作为工程师构建这些东西令人兴奋,与这些优秀人才共事更是美妙。

But the way that I see it is like, this is, you know, let them copy. We're the leaders in the space. We're we're kind of showing the way for the entire industry and being an engineer and and building all this stuff is super exciting. So working with all these people is just amazing.

Speaker 0

好的,太棒了。感谢各位的到来。

Okay. Awesome. Thank you guys for coming on.

Speaker 2

是的,非常感谢你们。

Yeah. Thank you. Thank you so much.

Speaker 0

这次

It's been

Speaker 2

真是太有趣了。

so much fun.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客