Latent Space: The AI Engineer Podcast - Claude代码:Anthropic的命令行界面代理 封面

Claude代码:Anthropic的命令行界面代理

Claude Code: Anthropic's CLI Agent

本集简介

更多信息:https://docs.anthropic.com/en/docs/claude-code/overview AI编程大战现已分化为四大战场: 1. AI集成开发环境:以两家领先初创公司Windsurf(被OpenAI以30亿美元收购)和Cursor(估值90亿美元)为首,身后还有众多竞争者(如Cline、Github Copilot等)。 2. 氛围式编码平台:Bolt.new、Lovable、v0等均呈现快速增长态势,数月内营收达数千万美元。 3. 队友型智能体:Devin、Cosine等。只需下达任务,它们就会提交完整PR(效果参差不齐)。 4. CLI基础智能体:继Aider初步成功后,现涌现出包括两大实验室产品(OpenAI Codex和Claude Code)在内的多个替代方案。核心优势在于:1) 可组合性 2) 按token用量计费。 既然前三个类别已有涉猎,本期嘉宾是Claude Code首席工程师Boris与产品经理Cat。若要从本期获取一个核心观点,请记住Boris这句话:Claude Code与其说是产品,不如说是Unix实用工具。 这与Anthropic"先做简单事"的产品原则高度契合。无论是记忆实现(自动加载的markdown文件)还是提示摘要方法(直接让Claude总结),他们始终选择有用、易懂且可扩展的最小构建模块。即便是"/think"规划和markdown中的#标签等主要功能,也遵循以文本I/O为核心接口的设计理念,这与原始UNIX哲学如出一辙: Claude Code也是直接使用Sonnet进行编码的最简途径,无需像其他产品那样经历隐藏提示和优化过程。用户能立即感受到这点——例如Claude Code用户日均消费6美元,而Cursor月费仅20美元。据传Anthropic内部有工程师单日消费超1000美元! 若您正在开发AI开发者工具,本期还包含大量CLI工具设计、交互/非交互模式权衡以及功能创建平衡之道的独家洞见。敬请收听! 时间轴 [00:00:00] 开场 [00:01:59] Claude Code起源 [00:04:32] Anthropic产品哲学 [00:07:38] Claude Code的功能边界 [00:09:26] Claude.md与记忆简化 [00:10:07] Claude Code对决Aider [00:11:23] 并行工作流与Unix工具哲学 [00:12:51] 成本考量与定价模型 [00:14:51] 发布后的关键功能 [00:16:28] Claude Code自写80%代码 [00:18:01] 自定义斜杠命令与MCP集成 [00:21:08] 终端用户体验与技术栈 [00:27:11] 代码审查与语义检查 [00:28:33] 非交互模式与自动化 [00:36:09] 工程效能指标 [00:37:47] 功能创建与维护的平衡 [00:41:59] 记忆与上下文未来 [00:50:10] 沙盒环境、分支与智能体规划 [01:01:43] 未来路线图 [01:11:00] Anthropic擅长开发者工具的原因

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

各位AI工程师们好。几周前,工程界传奇人物、往期嘉宾Sourcegraph的Steve Yegi撰写了一篇热情洋溢的评测。我使用Claude代码工具已有几天时间,它在处理我陈旧代码库中的历史错误时展现出了惊人的效率。就像一台靠美元驱动的木材粉碎机。仅通过聊天界面就能完成令人惊叹的复杂任务。

Hello, AI engineers. A few weeks ago, engineering legend and former guest Steve Yegi from Sourcegraph wrote an enthusiastic review. I've been using Claude code for a couple of days, and it has been absolutely ruthless in chewing through legacy bugs in my gnarly old code base. It's like a wood chipper fueled by dollars. It can power through shockingly impressive tasks using nothing but chat.

Speaker 0

大多数高品位测试者似乎都认同这一点。自此之后,Claude代码团队一直保持顶级水准,每周发布更新,推出智能编码最佳实践,并建立了专属的Claude代码文档库。随着GitHub Copilot迎来四周年,我们观察到编码智能体四大主战场:一是Windsurf和Cursor等AI IDE,估值已超120亿美元;二是Bolt等沉浸式编码平台。

It seems the majority of high taste testers agree. Since then, the Claude Code team has been on an absolute tier, delivering weekly updates, shipping best practices for agentic coding, and dedicated Claude Code docs. As GitHub's Copilot turns four years old, we now see four major battlegrounds for coding agents. One, AI IDs like Windsurf and Cursor, now worth over $12,000,000,000. Two, Vibe coding platforms like bolt.

Speaker 0

新晋平台Newcomma、Lovable和V Zero。三是Cognition的Devon、Cozine的Genie以及即将做客的Factory AI机器人等自主外层循环智能体。我们已经覆盖了这三类编码智能体。而今天,我们将聚焦最新领域——基于命令行的智能体,例如ADA、OpenAI Codecs和Claude Code。

Newcomma lovable and v zero. Three, autonomous outer loop agents like Cognition's Devon, Cozine's Genie, and upcoming guest factory AI's droids. We've covered all three categories of coding agents. And today, we're taking a look at the newest one. The CLI based agents like ADA, OpenAI Codecs, and Claude Code.

Speaker 0

我们激动地宣布,Claude代码团队将在旧金山举办的AI工程师世界博览会上进行演讲,早鸟票现已开售。6月3日全天是实践工作坊学习日;6月4日设有MCP、TinyTeams、沉浸式编码、LLM推荐系统、GraphRAG、智能体可靠性、基础设施、AI产品管理、语音AI等专题轨道;6月5日新增推理与强化学习、软件工程智能体、评估体系、检索搜索、安全、生成式媒体、设计工程、机器人学、自主系统八大轨道。针对CTO和AI副总裁,现开设两大领导力轨道:财富500强AI实践与AI架构师专题,后者以我们与Sierra和OpenEye的Brett Taylor广受好评的播客命名。

We're excited to share that the Claude Code team will be presenting at the upcoming AI Engineer World's Fair in San Francisco, which now has early bird tickets on sale. On June 3, spend the day learning in hands on workshops. On June 4, take in tracks across MCP, TinyTeams, Vibe coding, LLM recommendation systems, GraphRAG, agent reliability, infrastructure, AI product management, and voice AI. On June 5, eight more tracks for reasoning and RL, SWE agents, evils, retrieval and search, security, generative media, design engineering, robotics, and autonomy. For CTOs and VPs of AI, there are now two leadership tracks, AI in Fortune 500 and AI architects, named after our very well received podcast with Brett Taylor of Sierra and OpenEye.

Speaker 0

Claude代码团队将于6月5日在软件工程智能体轨道进行分享。欢迎访问ai.engineer报名参与。敬请关注,保重身体。

Claude Code will be presenting on the SWE agents track on June 5. Join us at ai.engineer. Watch out and take care.

Speaker 1

大家好,欢迎收听Lit and Space播客。我是Decibel合伙人兼CTO Alessio,身边是我的搭档——Small AI创始人Swiggs。

Hey, everyone. Welcome to the Lit and Space podcast. This is Alessio, partner and CTO at Decibel, and I'm joined by my cohost, Swiggs, founder of Small AI.

Speaker 2

大家好。今天我们演播室邀请了Kat Wu和Boris Cherni。欢迎二位。

Hey. And today, we're in the studio with Kat Wu and Boris Cherni. Welcome.

Speaker 3

谢谢邀请。

Thanks for having us.

Speaker 4

感谢邀请。

Thank you.

Speaker 2

Kat,我们之前就认识。我刚发现Dagster也是你的项目。

Kat, you and I know each other from before. I just realized Dagster as well.

Speaker 4

是的。

Yeah.

Speaker 2

然后是Index Ventures,现在是Anthropic。

And then Index Ventures and now Anthropic.

Speaker 4

没错。

Exactly.

Speaker 2

看到以前认识的朋友现在在Endopic工作,并且推出很酷的东西,真是太棒了。Boris,你简直是个名人了,我们刚才在外面喝咖啡时,就有人通过你的视频认出了你。

It's so cool to see, like, a friend that you know from before, like, now working in Endopic and like shipping really cool stuff. And Boris, you are a celebrity because like we were just having you outside just getting coffee and people recognize you from your video.

Speaker 3

哦,哇,真的吗?这可是新鲜事。

Oh, wow. Right? That's new.

Speaker 2

是不是很酷?

Wasn't that wasn't that neat?

Speaker 3

是的,我确实有过一两次这样的经历

Yeah. I definitely I had that experience like once or twice

Speaker 2

是啊。

Yeah.

Speaker 3

就在最近几周。是的,确实。

In the last few weeks. Yeah. Was Yeah.

Speaker 2

感谢您抽出时间。我们在这里是为了讨论Cloud Code。可能大多数人都听说过它,我们认为有不少人已经尝试过,但让我们先来一个清晰的定义,Cloud Code到底是什么?

Thank you for making the time. We are here to talk we're here to talk about Cloud Code. Most people probably have heard of it. We think like, you know, quite a few people have tried it, But let's get a crisp upfront definition, like, what is Cloud Code?

Speaker 3

是的。所以Cloud Code就是终端里的Claude。你知道,Claude有各种不同的界面。有桌面版、网页版,没错。QuadCode,它运行在你的终端里。

Yeah. So Cloud Code is Cloud in the terminal. So, you know, Cloud has a bunch of different interfaces. There is desktop, there's web, and yeah. QuadCode, it runs in your terminal.

Speaker 3

因为它运行在终端里,它能访问到很多在网页版或桌面版等其他平台上无法获得的东西。所以它可以运行bash命令,可以查看当前目录下的所有文件,并且所有这些操作都是以智能代理的方式完成的。我想,这也许回到了那个问题背后的疑问——这个想法是从哪里来的?部分原因是我们想了解人们如何使用智能代理。我们选择CLI(命令行界面)这种形式,是因为编码是目前人们自然使用代理的一个领域,而且这个产品与市场有一定的契合度。

Because it runs in the terminal, it has access to a bunch of stuff that you just don't get if you're running on the web or on desktop or whatever. So it can run bash commands, it can see all of the files in the current directory, and it does all of that agentically. And yeah, I guess maybe it comes back to maybe the question under the question is you know, where did this idea come from? Yeah, part of it was we just want to learn how Claude we want to learn how people use agents. We're doing this with the CLIA form factor because coding is kind of a natural place where people use agents today and, you know, there's kind of product market fit for this thing.

Speaker 3

但这确实像是一个疯狂的研究项目,显然它还比较简陋和简单。不过,它就像是你的终端里的一个智能代理。

But yeah, it's just sort of this crazy research project and obviously it's kind of bare bones and simple. But yeah, it's like an agent in your terminal.

Speaker 2

最棒的东西往往都是这样开始的。

That's how the best stuff starts.

Speaker 1

是啊。它是怎么开始的?你们是有计划要构建Cloud Code吗?还是

Yeah. How did did it start? Did you have a master plan to build Cloud Code? Or

Speaker 3

没有什么总体规划。当我加入Anthropic时,我在尝试在不同地方以不同方式使用模型,我通过公共API(和其他人一样的API)进行这些实验。其中一个非常奇怪的实验就是这个在终端里运行的quad,我用它来做一些奇怪的事情,比如查看我正在听的音乐并做出反应,截图我的视频播放器并解释画面内容等等。这个东西构建起来很快,玩起来也很有趣。

There's no master plan. When I joined Anthropic, I was experimenting with different ways to use the model kind of in different places, and the way I was doing that was through the public API, the same API that everyone else has access to. And one of the really weird experiments was this quad that runs in a terminal, and I was using it for kind of weird stuff. I was using it to look at what music I was listening to and react to that, and then screenshot my video player and explain what's happening there and things like this. And this was kind of a pretty quick thing to build and it was pretty fun to play around with.

Speaker 3

后来有一次,我给了它访问终端和编码的能力,突然间它就变得非常有用。我开始每天都用这个东西。它就从那里发展起来。我们让核心团队也使用它,结果他们也都开始天天用,这挺让人惊讶的。然后我们让Anthropic所有的工程师和研究人员都用,很快每个人都在天天用了。

And then at some point I gave it access to the terminal and the ability to code, and suddenly it just felt very useful. I was using this thing every day. It kind of expanded from there. We gave the core team access and they all started using it every day, which was pretty surprising. And then we gave all the engineers and researchers that anthropic access, Pretty soon everyone was using it every day.

Speaker 3

我记得我们有一个内部用户的日活跃用户图表,我看着它,好几天都是直线上升。我们就想,好吧,这里确实有点东西。我们得把这个给外部用户,让其他人也能试试。是的,没错。

I remember we had this DAU chart for internal users, and I was just watching it and it was vertical for days. And we're like, Alright, there's something here. We gotta give this to external people so everyone else can try this too. Yeah. Yeah.

Speaker 3

它的起源就是这样。

That's where it came from.

Speaker 1

那你当时也已经和Boris一起工作了吗?还是说这个东西出来后开始增长,然后你们觉得,好吧,也许我们需要为此组建一个团队?

And were you also working with Boris already? Or did this come out and then it started growing and then you're like, okay, we need to maybe make this a team, so to speak?

Speaker 4

是的。最初的团队是Boris、Sid和Ben。随着时间的推移,随着越来越多的人开始使用这个工具,我们觉得,好吧,我们真的需要投入资源来支持它,因为我们所有的研究人员都在使用它,这是我们让他们真正提高效率的关键手段。所以在那时,我正在使用Cloud Code构建一些可视化图表。我正在分析大量数据,有时候启动一个streamlet并一次性查看所有聚合统计数据非常有用,而Cloud Code让这变得非常非常简单。

Yeah. The original team was Boris, Sid, and Ben. And over time, as more people were adopting the tool, we felt like, okay, we really have to invest in supporting it because all our researchers are using it and we this is like our one lever to make them really productive. And so at that point, I was using Cloud Code to build some visualizations. I was analyzing a bunch of data and sometimes it's super useful to like spin up a streamlet and like see all the aggregate stats at once and Cloud Code made it really, really easy to do.

Speaker 4

所以我想我给Boris发了很多反馈。然后在某个时候,Boris说,你想不想直接来参与这个项目?事情就是这样发生的。

So I think I sent Boris like a bunch of feedback. And at some point, Boris was like, do you wanna just work on this? And so that's how it happened.

Speaker 3

实际上对我来说情况还不止如此。你当时一直在发送所有这些反馈,同时我们正在寻找一位产品经理,考察了几个人选,然后我记得告诉经理说,嘿,我想要Kat。这是我唯一想要负责这个项目的产品经理。

It was actually a little like it was more than that on my side. You were sending all this feedback and at the same time we were looking for a PM and we're, like, looking at a few people, and then I remember telling the manager, like, hey. I want Kat. The only PM I want on this.

Speaker 1

我相信大家都很好奇。在Entropic内部,将一个项目推向成熟的流程是怎样的?比如,你们经历了相当快的增长,然后引入了一位产品经理。你们是什么时候决定,好了,这个项目应该对外开放了?

I'm sure people are curious. What's the process within Entropic to, like, graduate one of these projects? Like, so you have kinda like the a lot of growth, then you get a PM. When did you decide, okay, we should like it's ready to be opened up?

Speaker 3

通常在Anthropic,我们有一个产品原则:先做简单的事。我认为我们构建产品的方式正是基于这个原则。所以你尽可能少地配置人员,尽可能保持灵活,因为限制条件实际上很有帮助。就这个项目而言,我们希望在扩大规模之前看到一些产品市场匹配的迹象。

Generally at Anthropic, we have this product principle of do the simple thing first. And I think that the way we build product is really based on that principle. So you kind of staff things as little as you can and keep things as scrappy as you can because the constraints are actually pretty helpful. And for this case, wanted to see some signs of product market fit before we scaled it.

Speaker 2

是的。我想是这样的,比如我们这周要发布MCP的那期节目,我想MCP现在也应该有团队在支持它,方式很类似。它现在非常正式,可以说是Anthropic的一个产品了。所以我有点好奇,对Kat来说,你是如何看待管理这样的产品的?比如,你主要是在梳理路线图吗?

Yeah. I imagine so like we're putting out the MCP episode this week and I imagine MCP also now has a team around it in much the same way. It is now very much officially, like, sort of like a an anthropic product. So I'm kinda curious for for Kat, like, how do you view PM ing something like this? Like, what is I guess you're, like, sort of grooming the road map.

Speaker 2

你在倾听用户的声音,而且我从那个话题中看到的推进速度是我从未见过的。

You're listening to to to users, and the velocity is something I've never seen coming out of that topic.

Speaker 4

我觉得我的管理方式比较轻量。我认为Boris和团队都是非常强大的产品思考者。我们路线图上的绝大多数功能,实际上就是大家在构建他们希望产品拥有的东西。所以真正自上而下的指令非常少。我感觉我的主要作用是,如果有任何障碍就扫清道路,并确保我们从法律、市场等各个角度都准备就绪。

I think I've come with a pretty light touch. I think Boris and the team are, like, extremely strong product thinkers. And for the vast majority of the features on our roadmap, it's actually just like people building the thing that they wish that the product had. So very little actually is tops down. I feel like I'm mainly there to, like, clear the path if anything gets in the way and just make sure that we're all good to go from, like, a legal marketing, etcetera perspective.

Speaker 2

是的。

Yeah.

Speaker 4

然后我认为,在非常宏观的路线图或长期路线图方面,整个团队会聚集在一起思考:好吧,我们认为三个月后模型会在哪些方面表现得特别出色?然后,我们要确保我们正在构建的东西与模型未来的能力发展方向真正兼容。

And then I think, like, in terms of very broad roadmap or, like, long term roadmap, I think the whole team comes together and just thinks about, okay, what do we think models will be really good at in three months? And, like, let's just make sure that what we're building is really compatible with, like, the future of what models are capable of.

Speaker 2

我想深入探讨这一点。三个月后模型会在哪些方面表现优异?因为我觉得人们在构建AI产品时总说要考虑这个问题,但没人知道如何思考,因为大家都在说模型一直在普遍变好。我们很快就会有AGI了,所以不用费心。你知道,如何校准三个月的进展程度?

I'd be interested to double click on this. What will models be good at in three months? Because I think that's something that people always say to think about when building AI products, but nobody knows how to think about it because it's everyone's just like, it's generically getting better all the time. We're getting AGI soon, so don't bother. You know, like, how do you calibrate three months of progress?

Speaker 4

我认为如果回顾历史,我们往往每隔几个月就会更换模型。所以三个月只是我随意选择的一个数字。我们希望模型发展的方向是能够以尽可能高的自主性完成越来越复杂的任务。这包括确保模型能够探索并找到完成任务所需的信息,确保模型在完成任务的每个方面都很彻底,确保模型能有效组合不同工具。是的。

Well, I think if you look back historically, we tend to shift models every couple of months or so. So three months is just like an arbitrary number that I picked. I think the direction that we want our models to go in is being able to accomplish more and more complex tasks with as much autonomy as possible. And so this includes things like making sure that the models are able to explore and find the right information that they need to accomplish a task, making sure that models are thorough in accomplishing every aspect of a task, making sure the models can compose different tools together effectively. Yeah.

Speaker 4

这些是我们关注的方向。

These are the directions we care about.

Speaker 3

是的。回到代码方面,这种方法也影响了我们构建代码的方式。因为如果我们想要一个如今具有广泛产品市场契合度的产品,我们会构建Cursor或Windsurf之类的产品。这些都是很棒的产品,很多人每天都在使用,我也在用。但那不是我们想构建的产品。

Yeah. I guess coming back to code, this kind of approach affected the way that we built code also. Because we know that if we want some product that has very broad product market fit today, we would build a Cursor or a Windsurf or something like this. These are awesome products that so many people use every day, I use them. That's not the product that we want to build.

Speaker 3

我们想构建的是处于发展曲线更早期阶段的东西,可能一年后或随着模型改进会成为重要产品。这就是为什么Code在终端中运行,它更加基础,你可以直接访问模型,因为我们没有花时间构建那些精美的UI和上层框架。

We want to build something that's kind of much earlier on that curve, and something that will maybe be a big product a year from now, or however much time from now as the model improves. And that's why code runs in a terminal, it's a lot more bare bones, you have raw access to the model because we didn't spend time building all this kinda nice UI and scaffolding on top of it.

Speaker 1

说到所谓的' harness'(控制框架)和周边功能,其中一个可能是提示词优化。显然我每天都用Cursor。Cursor里有很多超出我提示词优化的功能。但我知道你们最近发布了压缩上下文等功能。你们如何决定在CLI之上需要添加多少功能层?

When it comes to, like, the harness, so to speak, and things you wanna put around it, there's one that may be prompt optimization. So obviously, I use Cursor every day. There's a lot going on in Cursor that is beyond my prompt for, like, optimization and whatnot. But I know you recently released, like, you know, compacting context features and all that. How do you decide how thick it needs to be on top of the CLI?

Speaker 1

这算是共享接口的问题。你们在什么时间点决定:这个应该成为ClawCode的一部分,而那个应该由IDE开发者来解决?

So that's kind of the share interface. And at what point are you deciding between, okay, this should be a part of ClawCode versus this is just something for the IDE people to figure out, for example?

Speaker 3

是的。我们可以在三个层面构建功能。作为AI公司,最自然的构建方式就是直接内置到模型中,让模型执行行为。下一层可能是上层框架,比如ClaudeCode本身,再下一层是将quad code作为更广泛工作流中的工具,比如Compose集成。例如很多人用code配合TMux来管理多个窗口和会话。

Yeah. There's kind of three layers at which we can build something. So the you know, being an AI company, the most natural way to build anything is to just build it into the model and have the model do the behavior. The next layer is probably scaffolding on top, so it's like ClaudeCode itself, and then the layer after that is using quad code as a tool in a broader workflow, so Compose stuff in. So for example, a lot of people use code with TMux, for example, to manage a bunch of Windows and a bunch of sessions happening in Pro.

Speaker 3

我们不需要内置所有功能。压缩功能属于中间层,因为我们希望在使用code时就能正常工作,而不需要额外工具。目前模型还无法实现这种内存重写功能,所以必须使用工具。因此它必须处于这个中间区域。我们尝试了多种压缩方案,比如重写旧工具调用、截断旧信息但保留新信息。

We don't need to build all of that in. Compact, it's sort of this thing that kind of has to live in the middle because it's something that we want to work when you use code, you shouldn't have to pull in extra tools on top of it. Rewriting memory in this way isn't something the model can do today, so you have to use a tool for it. And so it kind of has to live in that middle area. We tried a bunch of different options for compacting, like rewriting old tool calls and truncating old messages and not new messages.

Speaker 3

最终我们其实采用了最简单的方法:直接让Claude总结之前的消息并返回结果。有趣的是,当模型足够强大时,简单的方法往往就奏效了。不需要过度设计。

And in the end, we actually just did the simplest thing, which is ask Claude to summarize the previous messages and just return that and that's it. And it's funny, when the model is so good, the simple thing usually works. Right. You don't have to over engineer it.

Speaker 2

是的,我们为Claude玩Pokemon时也这么做。看到这种模式重新出现还挺有意思的。

Yeah. We do that for Claude plays Pokemon too. Just kind of interesting to see that pattern reemerging.

Speaker 1

然后你们还有Claude.md文件用于更用户驱动的记忆,可以这么说。我觉得这有点像incursive规则的等价物。

And then you have the Claude dot m d file for the more user driven memories, so to speak. It's like the equivalent of maybe incursive rules, I would say.

Speaker 3

没错。QuadMD是另一个体现'先做简单事'理念的例子。我们之前有很多关于内存架构的疯狂想法,这方面有大量文献和外部产品,我们想从所有这些中汲取灵感。但最终,我们选择发布了最简单的东西——就是一个包含某些内容的文件,它会自动读入上下文,现在这个文件有几个版本。

Yeah. And QuadMD, it's another example of this idea of, you know, do the simple thing first. We we had all these crazy ideas about, like, memory architectures, and, you know, there's so much literature about this. There's so many different external products about this and we wanted to be inspired by all this stuff. But in the end, the thing we did is ship the simplest thing, which is, you know, it's a file that has some stuff and it's auto read into context and there's now a few versions of this file.

Speaker 3

你可以把它放在根目录,或者子目录里,甚至放在主目录,我们会以不同方式读取所有这些文件。但没错,就是最简单可行的方案。

You can put it in the root or you can put it in child directories, or you can put in your home directory and we'll we'll read all of these in kind of different ways. But yeah, simplest thing that could work.

Speaker 1

我相信你们熟悉Ader,这是我们Discord社区用户很喜欢的另一个工具,后来Cloud Code推出时,同样这群人也爱上了Cloud Code。你们从中获得了什么灵感?做了哪些不同设计?或者采取了什么不同的设计原则?

I'm sure you're familiar with Ader, which is another thing that people in our Discord loved, and then when Cloud Code came out, the same people love Cloud Code. Any thoughts on inspiration that you took from it, things you did differently, kind of like maybe design principle in which you went a different way?

Speaker 3

是的。实际上我就是在这一刻被AGI'洗礼'的——就和这个有关。好吧,也许我可以讲讲这个故事。没错。

Yeah. This is actually the moment I got AGI pilled is related to this. Okay. Maybe I can tell that story. Yeah.

Speaker 3

CLI,就像是CLI quad。这是quad code的前身。算是个用Python写的研究工具,启动需要一分钟左右,非常研究者风格,不是个成熟产品。

So CLI, it's like CLI, quad. And that's the predecessor to quad code. It's kind of this research tool that's written using Python. It takes like a minute to start up. It's very much written by researchers, it's not a polished product.

Speaker 3

我刚加入Anthropic时,提交了第一个pull request。当时我手动写的,因为不太懂。当时的培训伙伴Adam Wolf就说:'其实与其手写,不如让Quiet来写'。我想好吧,反正是AI实验室,也许有什么我不知道的功能。

And when I first joined Anthropic, I was putting up my first pull request. I hand wrote this pull request because I didn't know any better. And my bootcamp buddy at the time, Adam Wolf was like, you know, actually maybe instead of handwriting it, just ask Quiet to write it. And I was like, okay, I guess so. It's an AI lab, maybe there's some capability I didn't know about.

Speaker 3

于是我启动了这个终端工具,花了一分钟左右,然后问Quiet:'嘿,这是描述,能帮我做个PR吗?' 它运行了几分钟后,真的做出了一个能用的PR。我完全被震撼了,因为我根本不知道存在能做这种事的工具。

And so I started up this like terminal tool and it took like a minute to start off and I asked Quiet, hey, here's the description. Can you make a PR for me? And after a few minutes of chugging along, it made a PR and it worked. And I was just blown away because I had no idea. I had just no clue that there were tools that could do this kind of thing.

Speaker 3

就像...在加入之前,我以为单行自动补完就是最先进的技术了。那一刻我真的被AGI震撼到了。没错,Code就是这样诞生的。

Like, Like I thought that kind of single line autocomplete was the state of the art before it joined. And then that's the moment where I got AGI filled. And yeah, that's where code came from.

Speaker 2

我认为人们显然有兴趣进行比较对比,因为对你来说,这显然是内部工具,你在上面工作。人们感兴趣的是,比如,如何在这些工具之间做选择。世界上有Curses这样的工具,有Devins这样的工具,有Eaters,还有Cloud Code。我们不可能一次性尝试所有东西。我的问题是,在这个选择宇宙中,你把它放在什么位置?

I think people are interested in comparing contrasting obviously, because to you, obviously, this is the house tool, you work on it. People are interested in, like, figuring out how to choose between tools. There's the curses of the world, there's like the devins of the world, there's Eaters, and there's Cloud Code. And you we can't try everything all at once. My question would be, where do you place it in the universe of options?

Speaker 3

嗯,你可以让Quad直接尝试所有这些工具,然后

Well, you can ask Quad to just try all these tools and

Speaker 2

我想知道它会怎么说。完全没有自夸成分。

I wonder what it would be. No self favoring at all.

Speaker 3

Quad Play的工程团队。我不知道,我们在内部也使用所有这些工具。我们都是这些东西的忠实粉丝。ClotCode显然与其中一些工具有点不同,它更加原始。就像我说的,它没有那种华丽的大UI界面,而是对模型的原始访问。

Quad Play Quad Play's engineering. I don't know, use all these tools in house too. We're big fans of all this stuff. ClotCode is obviously it's a little different than some of these other tools in that it's a lot more raw. Like I said, there isn't this kind of big beautiful UI on top of it, It's raw access to the model.

Speaker 3

这是我能提供的最原始访问。所以如果你想使用一个能让你直接访问模型并利用Clot自动化大型工作负载的强大工具,例如,如果你有上千个Lint违规,想要启动一千个Clot实例来逐个修复,然后使其完美,那么QuadCode是一个相当不错的工具。明白了。这是一个为高级用户处理高强度工作负载的工具。我认为这就是它的定位。

It's as raw as I get. So if you want to use a power tool that lets you access the model directly and use Clot for automating big workloads, for example, if you have like a thousand Lint violations and you want to start a thousand instances of Clot and have it fix each one, and then make it pure, then QuadCode is a pretty good tool. Got it. It's a tool for power workloads for power users. And I think that's kind of where it fits.

Speaker 1

并行与单路径的概念是一种思考方式吗?IDE更专注于你想做的事情,而clock code则更倾向于需要较少监督。你可以同时启动很多实例。这是正确的思维模型吗?

Is the idea of parallel versus single path one way to think about it? Where the IDE is really focused on what you wanna do versus clock code. You kinda more see it as less supervision required. You can spin up a lot of them. Is that the right mental model?

Speaker 3

是的,Anthropic有一些人通过这类自动化每天花费数千美元。大多数人不会这样做,但你完全可以做类似的事情。是的,我们把它看作是一个Unix工具,嗯。对吧?就像你可以组合使用grep或cat那样——哦,cat。

Yeah, and there's some people at Anthropic that have been racking up like thousands of dollars a day with this kind of automation. Most people don't do anything like that, but you totally could do something like that. Yeah, we think of it as like a Unix utility, Mhmm. Right? So it's like the same way that you would compose, you know, grep or cat or oh, cat.

Speaker 3

Cat。就像你可以将代码组合成工作流一样。

Cat. It's gonna be like the the same way you can compose code into workflows.

Speaker 1

成本机制很有趣。人们需要内部付费还是免费的?如果你在Andrava工作,可以每天随意运行这个工具吗?

The costing is interesting. Do people pay internally or do you get free? If you work at Andrava, you can just run this thing as much as you want every day?

Speaker 3

内部使用是免费的。很好。

It's for it's for free internally. Nice.

Speaker 1

是的。我认为如果每个人都能免费使用,那将是巨大的。因为,比如,我每月支付Cursor 20美元。我在Cursor中使用了数百万的token,如果在Cloud Code中这将花费我更多。所以我觉得,和我交谈过的很多人,他们其实并不清楚做这些事情的成本有多高。

Yeah. I I think if everybody had it for free, it would be huge. Be because, like, I mean, if I think about I pay Cursor $20 a month. I use millions and millions of token in Cursor that would cost me a lot more in Cloud Code. And so I think, like, a lot of people that I've talked to, they don't actually understand how much it costs to do these things.

Speaker 1

他们会完成一个任务后说,哦,这花了20美分。真不敢相信我花了这么多钱。回到产品方面,你觉得在多大程度上这是你的责任去尝试提高效率,还是说这其实不是我们开发这个工具的初衷?

And they'll do a task, they're like, oh, that costs 20¢. I can't believe I paid that much. How do you think going back to like the product side too, it's like, how much do you think of that being your responsibility to try and make it more efficient versus that's not really what we're trying to do with the tool?

Speaker 4

我们确实将QuadCode视为能提供模型最强智能能力的工具。我们关心成本,因为它与延迟高度相关,我们希望确保这个工具使用起来极其迅捷,工作极其彻底。我们希望对我们产生的所有token都非常谨慎。我认为我们可以做更多来向用户传达成本信息。目前,我们看到的成本大约是每个活跃用户每天6美元左右。

We really see quad code as like the tool that gives you the smartest abilities out of the model. We do care about cost insofar as it's very correlated with latency and we wanna make sure that this tool is extremely snappy to use and extremely thorough in its work. We wanna be very intentional about all the tokens that it produces. I think we can do more to, like, communicate the cost with users. Currently, we're seeing costs around, like, $6 per day per active user.

Speaker 4

所以,这样算下来一个月确实比Cursor稍高一些,但我不认为这超出了合理范围,我们大致就是这样考虑的。

And so it's like it does come out to a bit higher over the course of a month in Cursor, but I don't think it's like out of band and that's like roughly how we're thinking about it.

Speaker 3

我想补充一点,我认为看待这个问题的方式是,这是一个投资回报率(ROI)的问题,而不是成本问题。所以,如果你考虑一下,比如一个普通工程师的薪水,以及我们在播客前讨论过的,工程师是非常昂贵的。如果你能让工程师的效率提高70%,那价值是巨大的。我认为应该从这个角度来思考。

I would add that I think the way I think about it is it's a ROI question, it's not a cost question. And so if you think about, you know, an average engineer salary and like what you know, we were talking about this before before the podcast. Like, engineers are very expensive. And if you can make an engineer 70% more productive, that's worth a lot. And I think that's the way to think about it.

Speaker 2

所以,如果你说你的目标是让Cloud成为频谱中最强大的一端,而不是更弱但更快、更便宜的一端,那么通常会有人推荐采用瀑布式方法,对吧?你先尝试这个更快、更简单的,如果不行,你就升级,再升级,最后才用到Cloud Code。至少对于那些受token限制、不在Endtopic工作的人来说是这样。

So if you're saying if you're targeting cloud to be the most powerful end of the spectrum as opposed to the less powerful but faster, cheaper side of the spectrum, then there's there's typically people who recommend a waterfall. Right? You try this faster, simple one, that doesn't work. You upgrade, you upgrade, you upgrade, and finally, you hit Cloud Code. At least for people who are token constrained that don't work at Endtopic.

Speaker 2

而我的一部分想法是想直接跳过所有这些步骤。我想一次性铺开所有方案。如果对下一个解决方案不满意,我就直接切换到再下一个。不知道这是否现实。

And part of me wants to just fast track all that. I just want to fan out to everything all at once. And I'm not satisfied with the next one solution, I'll just sort of switch to the next. Don't know if that's real.

Speaker 3

是的,我们肯定在努力让QuadCode更容易成为你处理所有不同工作负载的工具。例如,我们最近推出了‘思考’功能。对于任何你以前可能使用其他工具进行的规划类工作,你现在可以直接问Quad,它会使用思维链(chain of thought)来把事情想清楚。

Yeah, we're definitely trying to make it a little easier to make quad code kind of the tool that you use for all the different workloads. So for example, we launched thinking recently. For any kind of planning workload where you might've used other tools before, you can just ask Quad and that'll use, you know, chain of thought to to think stuff out.

Speaker 2

我认为我们会达到那个目标的。也许我们会这样做。要不我们回顾一下QuadCode的简史怎么样?从你们发布到现在,已经有了不少新功能发布。你会如何强调其中的主要里程碑?

I think we will get there. Maybe we'll do it this way. How about we recap, like, sort of the brief history of QuadCode? Like, between when you launch it now, there there have been quite a few ships. How would you highlight the major ones?

Speaker 2

然后我们再继续

And then we'll get to

Speaker 3

那个思考工具。我想我得去查看你的推特才能记起所有内容。

the the thinking tool. And I think I'd have to, like, check your Twitter to to remember everything.

Speaker 4

我认为我们收到很多请求的一个重要功能是网页抓取。是的。所以我们与法律团队紧密合作,确保我们交付尽可能安全的实现。这样,当用户直接提供URL时——无论是在他们的调用中、直接消息中,或者URL是在先前抓取的某个URL中被提及的——

I think a big one that we've gotten a lot of requests for is web patch. Yep. So we worked really closely with our legal team to make sure that we shipped as secure of an implementation as possible. So well, web fetch, if a user directly provides an URL, whether that's in their call. Md or in their message directly or if a URL is mentioned in one of the previously fetched URLs.

Speaker 4

企业就可以对让开发者继续使用它感到相当安全。我们发布了一系列自动化功能,比如自动补全,你可以按Tab键补全文件名或文件路径;自动压缩,让用户感觉拥有无限上下文,因为我们在后台进行压缩。我们还推出了自动接受,因为我们注意到很多用户说,嘿,Cloud Code能搞定,我对Cloud Code已经建立了很大的信任。

And so this way, enterprises can feel pretty secure about letting their developers continue to use it. We shipped a bunch of, like, auto features, like autocomplete, where you can press tab to complete a file name or file path, autocompact so that users feel like they have, like, infinite context since we'll compact behind the scenes. And we also shift auto accept because we noticed that a lot of users were like, hey. Like, Cloud Code can figure it out. I've, like, developed a lot of trust for Cloud Code.

Speaker 4

我只是希望它能自主编辑我的文件、运行测试,然后稍后再回来找我。所以这些是一些重要的功能。

I wanted to just, like, autonomously edit my files, run tests, and then come back to me later. So those are some of the big ones.

Speaker 2

Vim模式,自定义斜杠命令。

Vim mode, custom slash commands.

Speaker 4

大家很喜欢Vim模式。是的,这也是一个顶级需求,那个功能传播得相当快。

People love Vim mode. Yeah. So that was a that was a top request too. That one went pretty viral.

Speaker 3

是的,是的。记忆功能,那些是近期的,比如用井号来记住。

Yeah. Yeah. Memory, those are recent ones, like the hashtag to remember.

Speaker 2

所以,是的,我想深入探讨技术方面,有没有哪个功能特别具有挑战性。Ader的Paul总是说有多少是由Ader编写的。那么问题是有多少是由Cloud Code编写的?显然有一定比例,但我想知道你们有没有具体数字。比如50%?

So, yeah, I mean, I'd love to dive into, you know, on the technical side, any of them that was particularly challenging. It Paul from Ader always says how much of it was coded by Ader, you know. So then the question is how much of it was coded by Cloud Code? Obviously, there's some percentage, but I wonder if you have a number. Like, 50?

Speaker 2

80%?

80?

Speaker 4

很高。

High.

Speaker 3

大概接近80%吧,我觉得。是的。

Probably near 80, I'd say. Yeah.

Speaker 4

非常高。

Very high.

Speaker 3

是的。

Yeah.

Speaker 4

不过还是有很多人工代码审查。

A lot of human code review though.

Speaker 3

是的,很多很多人工代码审查。我觉得有些东西必须手写,有些代码可以由Quad编写,而知道如何选择以及为每种任务分配多少比例需要一定的智慧。所以我们通常的做法是,先让Claude写代码,如果写得不好,再让人工介入。还有一些情况我其实更喜欢手动操作,比如复杂的数据模型重构之类的。

Yeah. A lot of lot of human code review. I think some of the stuff has to be handwritten and some of the code can be written by Quad, and there's sort of a wisdom in knowing which one to pick and what percent for each kind of task. So usually where we start is, Claude writes the code, and then if it's not good, then maybe a human will dive in. There's also some stuff where I actually prefer to do it by hand, it's like intricate data model refactoring or something.

Speaker 3

不会交给Quad去做,因为我有非常强烈的个人见解,而且直接动手实验比向Quad解释要容易得多。所以,我认为总体上大概有90%的代码是由Quad编写的。

Won't leave it to Quad because I have really strong opinions and it's easier to just do it and experiment than it is to explain it to Quad. So, I think that nets out to maybe like 90% Quad written code overall.

Speaker 1

是的。我们在投资组合公司中听到很多这样的情况,尤其是A轮公司。他们写的代码大约85%是由AI生成的。

Yeah. We're hearing a lot of that in our portfolio companies, like more like series a companies. It's like 85% of the code they write is ad generated.

Speaker 3

是的。没错。

Yeah. Yeah.

Speaker 1

所以是的。嗯,那完全是另一个话题了。关于自定义斜杠命令,我有个问题。你怎么看待自定义斜杠命令MCP?这一切是如何结合在一起的?

So yeah. The well, that's a whole different discussion. The custom slash command, I had a question. How do you think about custom slash command MCPs? Like, how does this all tie together?

Speaker 1

斜杠命令和时钟代码算是MCP的扩展吗?人们是否在构建一些本不该是MCP,而只是自成体系的东西?大家应该如何看待这个问题?

Is the slash command and clock code kind of like an extension of the MCP? Are people building things that should not be MCP, but are just kind of like self contained things in there? How should people think about it?

Speaker 3

是的。我的意思是,显然我们是MCP的忠实粉丝。你可以用MCP做很多不同的事情,比如自定义工具和自定义命令等等,但与此同时,你不应该被迫使用它。所以如果你只想要一些非常简单且本地化的东西,本质上就像保存好的提示词,那就用本地命令好了。

Yeah. I mean, obviously we're big fans of MCP. You can use MCP to do a lot of different things. You can use it for custom tools and custom commands and all this stuff, but at the same time you shouldn't have to use it. So if you just want something really simple and local, you just want some essentially like prompt that's been saved, just use local commands for that.

Speaker 3

一直以来,我们一直在思考如何以便捷的方式重新暴露这些功能。举个例子,假设你有一个本地命令,你能把它重新暴露为一个MCP提示吗?当然可以。因为Clock Code既是MCP客户端也是MCP服务器。或者类似地,假设你传入一个自定义的bash工具,有没有办法把它重新暴露为MCP工具?

Over time, something that we've been thinking a lot about is how to re expose things in convenient ways. So for example, let's say you had this local command, could you reexpose that as an MCP prompt? Yep. Because clock code is an MCP client and an MCP server. Or similarly, let's say you pass in a custom bash tool, is there a way to reexpose that as an MCP tool?

Speaker 3

所以,是的,我们认为一般来说你不应该被绑定到特定的技术上。你应该使用任何对你有用的东西。

So, yeah, we think generally you shouldn't have to be tied to a particular technology. You should use whatever works for you.

Speaker 1

没错。因为有些像Puppeteer这样的工具。我觉得这是和Clockcode一起使用的绝佳方式,对吧,用于测试。有一个Puppeteer MCP协议,但人们也可以编写自己的斜杠命令。我很好奇,MCP最终会定位在哪里,也许每个斜杠命令都利用MCP,但命令本身不是MCP,因为它最终会被定制化。

Yeah. Because there's some like puppeteer. I think that's like a great way great thing to use with Clockcode, right, for testing. There's like a puppeteer MCP protocol, but then people can also write their own slash commands. And I'm curious, like, where MCP are gonna end up being, where it's like, maybe each slash command leverages MCPs, but no command itself is an MCP because it ends up being customized.

Speaker 1

我觉得人们还在试图弄清楚这一点。就像是,这应该放在运行时还是MCP服务器里?我认为人们还没有完全搞清楚界限在哪里。

I think that's what people are still trying to figure out. It's like, should this be in the runtime or in the MCP server? I think people haven't quite figured out where the line is.

Speaker 3

是的。对于像Puppeteer这样的东西,我认为它可能属于MCP,因为那里也有一些工具调用。所以把它封装在MCP服务器里可能挺好的。

Yeah. For something like Puppeteer, I think that probably belongs in MCP because there there's a few, like, tool calls that go in that too. And so it's probably nice to encapsulate that in the MCP server.

Speaker 4

而斜杠命令实际上只是提示词,所以它们并不是真正的工具。我们正在考虑如何暴露更多的自定义选项,以便人们可以自带工具或关闭Cloud Code自带的一些工具。但这里也有一些棘手之处,因为我们想确保人们带来的工具是Quad能够理解的,并且人们不会因为带来一个让Quad困惑的工具而意外地影响他们的体验。所以我们正在努力解决这方面的用户体验问题。

Whereas slash commands are actually just prompts, so they're not actually tools. We're thinking about how to expose more customizability options so that people can bring their own tools or turn off some of the tools that Cloud Code comes with. But there is also some trickiness there because we wanna just make sure that the tools people bring are things that Quad is able to understand and that people don't accidentally inhibit their experience by maybe bringing a tool that is confusing to Quad. So we're just trying to work through the UX of it.

Speaker 3

我再给你举个例子说明这些东西是如何连接的。在GitHub仓库内部,我们有一个运行的GitHub Action。这个GitHub Action通过一个本地斜杠命令调用Quad Code。这个斜杠命令是lint。所以它只是用Quad运行一个linter。

I'll give you an example also of how this stuff connects. For quad code internally in the GitHub repo, we have this GitHub action that runs. And the GitHub action invokes quad code with a local slash command. And the slash command is lint. So it just runs a linter using Quad.

Speaker 3

它做的一些事情对于基于静态分析的传统linter来说相当棘手。例如,它会检查拼写错误,但也会检查代码是否与注释匹配。它还检查我们是否使用特定的库进行网络获取,而不是内置库。有很多这些特定的检查项很难仅用Lint来表达。理论上,可以进去为这个写一堆Lint规则。

And it's a bunch of things that are pretty tricky to do with a traditional linter that's based on static analysis. So for example, it'll check for spelling mistakes, but also checks that code matches comments. It also checks that we use a particular library for network fetches instead of the built in library. There's a bunch of these specific things that we check that are pretty difficult to express just with Lint. And in theory, can go in and write a bunch of Lint rules for this.

Speaker 3

有些你可以覆盖,有些你可能无法覆盖,但说实话,直接在本地命令里用markdown写一个要点然后提交要容易得多。所以我们做的就是,Quad通过GitHub Action运行,我们用project:lint调用它,也就是调用那个本地命令。它会运行linter,识别任何错误,进行代码更改,然后使用GitHub MCP服务器将更改提交回PR。所以你可以把这些工具组合起来,我认为这就是我们看待代码的方式:它只是一个生态系统中的一个工具,可以很好地组合,而不对任何特定的滥用方式持偏见。

Some of it you could cover, some of it you probably couldn't, but honestly it's much easier to just write a one bullet in markdown in a local command and just commit that. And so what we do is, quad runs through the GitHub action, we invoke it with project:lint, so which just invokes that local command. It'll run the linter, it'll identify any mistakes, it'll make the code changes, and then it'll use the GitHub MCP server in order to commit the changes back to the PR. And so you can kind of compose these tools together, and I think that's a lot of the way we think about code is just one tool in an ecosystem that composes nicely without being opinionated about any particular abuse.

Speaker 2

这很有趣。我我我的简历里有一段奇怪的经历,让我成为了Nellify的CLI维护者,所以我稍微深入了解过。有一个Cloud Code的反编译版本似乎已经被撤下了,但看起来你们使用了Commander JS和React Inc,这是公开信息。我只是有点好奇,比如在某个时刻,你们甚至不是在构建Cloud Code,而是在构建一个通用的CLI框架,任何开发者都可以根据自己的目的进行定制。你们必须考虑这一点吗?

It's interesting. I I I have a weird chapter in my CV that makes me I was the CLI maintainer for Nellify, and so I have a little bit of a dive. There's a decompilation of Cloud Code out there that seem that has since that has been since been taken down, but it seems like you use Commander JS and React Inc is is, like, the public info about this. And I'm just kinda curious, like, at some point you're just you're not even building cloud code, you're kinda just building this general purpose CLI framework that anyone any developer can hack to their purposes. You have to think about this?

Speaker 2

这种级别的可配置性更像是CLI框架或某种以前不存在的新形态。

Like, this level of configurability is more of like a CLI framework or like some new form factor that is doesn't exist before.

Speaker 3

是的。在一个非常棒的CLI上做hack确实很有趣,因为这样的CLI并不多。

Yeah. It's definitely been fun to hack on a on a really awesome CLI because there's not that many of them.

Speaker 2

是的。

Yeah.

Speaker 3

但没错,我们是Ink的忠实粉丝。

But yeah, we're we're big fans of Ink.

Speaker 2

是的。Vadim Demetas,我们实际上用过他。我们在很多项目中使用了React Ink。

Yeah. Vadim Demetas, we actually used him. We used React Ink for a lot of our projects.

Speaker 3

哦,酷。是的。Ink太棒了。它在很多方面有点hacky和janky。

Oh, cool. Yeah. Yeah. Ink is amazing. It's sort of hacky and janky in a lot of ways.

Speaker 3

就像你有React,然后渲染器只是将React代码转换为ANSI转义码作为渲染方式。有很多东西根本不起作用,因为ANSI转义码就像是上世纪七十年代开始编写的东西。没有很好的规范,每个终端都有点不同。所以以这种方式构建,对我来说有点像过去为浏览器构建时,你必须考虑IE6、Opera、Firefox等等。

It's like you you have have React and then the renderer is just translating the React code to like ANSI escape codes as their way to render. There's all sorts of stuff that just doesn't work at all because ANSI escape codes are like you know, it's like this thing that started to be written like the nineteen seventies. And there's no really great spec about it. Every terminal is a little different. So building in this way, it feels to me a little bit like a building for the browser back in the day where you have to think about like Internet Explorer six versus Opera versus Firefox and whatever.

Speaker 3

你必须经常考虑这些跨终端的差异。但是,是的,我们是Ink的忠实粉丝,因为它帮助抽象了这些。我们也使用Bunn。所以是Bunn的忠实粉丝。它让编写和运行测试快了很多。

Like, you have to think about these cross terminal differences a lot. But, yeah, big fans of Ink because it helps abstract over that. We're also we use Bunn. So big fans of Bunn. That's been it makes writing our tests and running tests much faster.

Speaker 3

我们还没有在运行时使用它。

We don't use it in the runtime yet.

Speaker 2

这不仅仅是为了速度,但你说说看,我不想——我不想替你把话说完。但我的印象是它们能帮你完成编译,生成可执行文件。

It's not just for speed, but you tell me, I don't wanna I don't wanna put words in your mouth. But my impression is they help you ship the compilation, the executable.

Speaker 3

是的,没错。所以我们用BUN来把代码编译到一起。

Yeah. Exactly. So we use BUN to to compile the code together.

Speaker 2

嗯。BUN还有其他优点吗?我只是想追踪BUN与Deno的讨论对比。Deno也在其中。

Yeah. Any other pluses of BUN? I just wanna track BUN versus Deno conversations. Use Deno's in there.

Speaker 3

其实我很久没用过Deno了,有一阵子了。是的,记得那是

I actually haven't used Deno back, it's been a while. Yeah. Remember That's what

Speaker 2

很多人都这么说。是的。

a lot of people say. Yeah.

Speaker 3

是的。Ryan早年开发的,它里面有一些我觉得非常酷的想法,但确实,就是没能达到同样的流行程度。嗯,仍然有很多很酷的点子。比如,能够直接从任何URL导入NPM包,我觉得

Yeah. Ryan made it back in the day and it was like, there there are some ideas that I think were very cool in it, but yeah, just never took off to that same degree. Yeah. Still a lot of cool ideas. Like, being able to NPM just import from any URL, think is

Speaker 2

相当了不起。ESM的梦想。是的,非常酷。好的。

pretty amazing. Dream of ESM. Yeah. Very cool. Okay.

Speaker 2

另外,我还想问你另一个功能,然后我们可以谈谈自动接受的思考工具。我正在尝试围绕对智能体的信任开发一个小想法。对吧?什么时候你说,好了,去自主行动吧。

Also, I was gonna ask you one one other feature, then we can get to the thinking tool of auto accept. I have this little thing I'm trying to develop thinking around for trust in agents. Right? When do you say, alright. Go autonomous.

Speaker 2

什么时候你需要把开发者拉进来?有时你让模型决定。有时你觉得,这是一个分散注意力的操作,总是要问我。我只是好奇你们内部是否有关于何时自动接受以及这一切走向的启发式规则。

When do you pull the pull the developer in? And sometimes you let the model decide. Sometimes you're like, this is a distractive action, always ask me. And I'm just curious if you have any internal heuristics around when to auto accept and where all this is going.

Speaker 4

我们花了很多时间构建它们的权限系统。我们团队的Robert正在领导这项工作。我们认为给开发者控制权非常重要,让他们能够说,嘿,这些是允许的权限。通常,这包括模型总是被允许读取文件或读取任何内容。然后由用户来决定,嘿,它是否被允许编辑文件?

We're spending a lot of time building out their permission system. So Robert on our team is leading out this work. We think it's really important to give developers the control to say, hey, these are like the allowed permissions. Generally, this includes stuff like the model's always allowed to read files or read anything. And then it's up to the user to say, hey, is it allowed to edit files?

Speaker 4

允许运行测试吗?这些可能是最安全的三个操作。然后还有一长串其他操作,用户可以根据正则表达式匹配来设置允许列表或拒绝列表。

Is it allowed to run tests? These are like probably the safest three actions. And then there's like a long list of other actions that users can either allow list or deny list based on regex matches with the action.

Speaker 1

如果有版本控制,写文件怎么会不安全呢?我认为

How can writing a file ever be unsafe if you have version control? I think that's

Speaker 3

是的。我认为有几个不同的安全方面需要考虑,所以把它分解一下会很有帮助。对于文件编辑,其实安全性问题相对较小,尽管仍然存在安全风险——比如模型获取URL时遭遇提示注入攻击,然后在磁盘上写入恶意代码而你却没发现。当然,代码审查作为另一层保护存在。但总的来说,文件权限方面最大的问题是模型可能会做错事。

Yeah. I think it's I think there's like a few different probably like aspects of safety to think about, so it could be useful just to break that out a little bit. So for file editing, it's actually less I think about safety, although there is still a safety risk because what might happen is, let's say the model fetches a URL and then there's a prompt injection attack in the URL, and then the model writes malicious code to disk and you don't realize it. Although, you know, there is code review as like a separate kind of layer there as protection. But I think generally for file rights, the model might just do the wrong thing.

Speaker 3

这是最重要的一点。我们发现如果模型在做错误的事情,越早发现并纠正,体验就越好。如果你让模型沿着完全错误的方向运行十分钟后才纠正,体验就会很糟糕。所以通常这样识别失败是更好的方式。但有些情况下你也需要让模型自由发挥。

That's the biggest thing. And what we find is that if the model is doing something wrong, it's better to identify that earlier and correct it earlier, and then you're gonna have a better time. If you wait for the model to just go down this totally wrong path and then correct it ten minutes later, you're gonna have a bad time. So it's better to usually identify failures that way. But at the same time, there's some cases where you just want to let the model go.

Speaker 3

比如当Claude代码在为我编写测试时,我会直接按shift+tab进入自动接受模式,让它运行测试并迭代直到通过。因为我知道这是相当安全的操作。而对于像Bashtool这样的工具就完全不同了,因为Quad可能会运行rm -rf,那就糟了。这可不是什么好事。

So for example, if Claude code is, you know, it's writing tests for me, I'll just hit shift tab, enter auto accept mode, and just let it run the tests and iterate on the tests until they pass. Because I know that's a pretty safe safe thing to do. And then for some other tools like Bashtool, it's pretty different because Quad could run, you know, RM, RF, and that would suck. Right. That's not a good thing.

Speaker 3

所以我们绝对需要人工参与来捕捉这类问题。模型经过训练不会这样做,但这些是非确定性系统。所以仍然需要人工监督。我认为总体趋势是人类干预的时间间隔正在缩短。

So we definitely want people to be in the loop to to catch stuff like that. The model is trained and aligns to not do that, but these are non deterministic systems. So you still want a human in the loop. I think that generally the way that things are trending is kind of less time between human input.

Speaker 2

你看过meter论文吗?没有。他们基本上建立了一个人类输入间隔时间的摩尔定律,理念是每3-7个月翻倍。Anthropic在这个基准测试上目前表现非常出色。

Did you see the meter paper? No. The they establish a Moore's law for time between human input basically. And it's basically doubling every three to seven months is the idea. And Anthropic is currently doing super well on that benchmark.

Speaker 2

在第50百分位的人类努力程度上,它大约能自主运行50分钟以上,这挺酷的。真的。强烈推荐看看。

It it's roughly above autonomous for fifty minutes at the fiftieth percentile of human effort, which is kinda cool. True. Highly recommend that.

Speaker 1

我经常把Cursor设为YOLO模式直接运行。不过没关系。所以

I put cursor in YOLO mode all the time and just run it. But but it's fine. So

Speaker 2

氛围编程。对吧?

Vibe coding. Right?

Speaker 3

就像,是的。

Like, is Yeah.

Speaker 2

所有最新的潮流。

All of the latest fade.

Speaker 1

当你谈到对齐和模型训练时,有几个有趣的点。我总是把它放在Docker容器里,并且每个命令前都加了Docker Compose前缀。昨天,我的Docker服务器没启动。我心想,哦,Docker没在运行。让我就在Docker外面运行它吧。

And there's a couple things that are interesting when you talked about alignment and the model being trained. So I always put it in a Docker container, and I have it prefixed every command with, like, the Docker Compose. And yesterday, my Docker server was not started. And I was like, oh, Docker is not running. Let me just run it outside of Docker.

Speaker 1

然后我就想,哇,哇,哇,哇,哇。你应该启动Docker并在Docker里运行它,我会到外面去。

And I'm like, woah, woah, woah, woah, woah. You should start Docker and run it in Docker and I'll go outside.

Speaker 4

抱歉

I'm sorry

Speaker 1

要说的是,这是一个非常好的例子,你知道,有时候你以为它在做某件事,但实际上它在做另一件事。至于代码审查方面,我很想多聊聊这个。我觉得你提到的linter部分,可能人们第一次看时会忽略它,不太会注意到。但从基于规则的linting到语义linting的转变,我认为非常棒且超级重要。很多公司都在尝试如何实现自主的PR审查,但我至今还没见过一个我真正在用的。

to say that you That is, like, a very good example of, like, you know, sometimes you think it's doing something and then it's doing something else. And for the review side, it's I would love to just chat about them more. I think the linter part that you mentioned, I think maybe people skipped it over. It doesn't register the first time, but, like, going from, like, rule based linting to, like, semantic linting, I think is, like, great and super important. And I think a lot of companies are trying to do how do you do autonomous PR review, which I've not seen one that I use so far.

Speaker 1

它们都感觉一般般。所以我很好奇,你如何看待闭环或改进这一点,特别是弄清楚到底应该审查什么?因为当你购买代码时,这些PR会变得非常大。你知道吗?有时候我会想,哦,哇。

They're all kinda like mid. So I'm curious how you think about closing the loop or making that better and figuring out, especially, like, what are you supposed to review? Because these PRs get pretty big when you buy code. You know? Sometimes I'm like, oh, wow.

Speaker 2

哦,GTM。

Oh, GTM.

Speaker 1

你知道吗?就像,我真的需要读所有这些吗?大部分看起来挺标准的,但我敢肯定里面有些部分模型能理解,但可能有点超出分布,需要仔细看。所以,是的。我知道这是个很开放的问题,但你有任何想法都会很棒。

You know? It's like, am I really supposed to read all of this? It kinda seems most of it seems pretty standard, but, like, I'm sure there are parts in there that the model would understand that are, like, kinda out of distribution, so to speak, to really look at. So yeah. I know it's a very open ended question, but any thoughts you have would be great.

Speaker 3

我们的思考方式是,就像我之前说的,代码是一种原始构件。所以如果你想用它来构建代码审查工具,可以做到。如果你想构建安全扫描、漏洞扫描工具,也可以。如果你想构建语义linter,同样可以。希望有了代码,如果你想做这些,只需要几行代码就能实现。

The way we're thinking about it is quad code is like I said before, it's a primitive. So if you want to use it to build a code review tool, can do this. If you want to build like a security scanning, vulnerability scanning tool, you can do that. If you want to build a semantic linter, you can do that. And hopefully with code it makes it so if you want to do this, it's just a few lines of code.

Speaker 3

你也可以让Quad来编写这段代码,因为Quad在编写GitHub Actions方面确实非常出色。

And you can just have Quad write that code also because Quad is really great at writing GitHub Actions.

Speaker 4

是的,需要提到的一点是我们确实有一个非交互模式,就像Cloud在这些场景中使用的方式,或者我们如何在这些情况下使用Cloud来自动化Cloud Code。而且,很多使用Cloud Code的公司实际上都在使用这种非交互模式。比如他们会说,我的代码库里有成千上万的测试,其中一些已经过时,一些不太稳定,然后他们会让Cloud Code去检查每一个测试并决定如何更新它们,比如是否应该弃用某些测试。

Yeah, one thing to mention is we do have a non interactive mode, which is like what Cloud uses in these or how we use Cloud in these situations to automate Cloud Code. And also, a lot of our the companies using Cloud Code actually use this non interactive mode. So they'll, for example, say, hey, I have, like, hundreds of thousands of tests in my repo. Some of them are out of date, some of them are flaky, and they'll send Cloud Code to look at each of these tests and decide, okay, how can I update any of them? Like, should I deprecate some of them?

Speaker 4

我该如何提高我们的代码覆盖率?所以这已经成为人们非交互式使用Cloud Code的一个非常酷的方式。

How do I, like, increase our code coverage? So that's been a really cool way that people are non interactively using Cloud Code.

Speaker 2

这里的最佳实践是什么?因为在非交互模式下,它可能会无限运行,而你并不一定会审查所有输出结果,对吧?所以我有点好奇,在非交互模式下有什么不同?最重要的超参数或需要设置的参数是什么?

What are the best practices here? Because when it's non interactive, it could run forever and you're you're not you're not necessarily reviewing the output of everything. Right? So I I'm just kinda curious how does how is it different in non non interactive mode? What are the the most important hyper parameters or arguments to set?

Speaker 3

是的。对于还没用过这个功能的朋友来说,非交互模式就是使用claud -p,然后传入提示语,就这么简单。就是-p标志。一般来说,它最适合只读任务,这是它表现非常好的地方,你不需要过多考虑权限和无限运行之类的问题。

Yeah. And for folks that haven't used this, so non interactive mode is just claud -p and then you pass in the prompting quotes and that's all it is. It's just the -p flag. Generally, it's best for tasks that are read only. That's the place where it works really well and you don't super have to think about permissions and running forever and things like that.

Speaker 3

例如,一个运行但不修复任何问题的linter。或者,我们正在做一个项目,使用quad加上-p来为quad生成变更日志。每个PR只是查看提交历史,然后决定哪些应该进入变更日志,哪些不应该。因为我们知道大家一直要求有变更日志,所以我们就让quad来构建它。因此,生成非交互模式非常适合只读测试。

So for example, a linter that runs and doesn't fix any issues. Or for example, we're working on a thing where we use quad with -p to generate the change log for quad. So every PR is just looking over the commit history and being like, Okay, this makes it into the changelog, this doesn't. Because we know people have been requesting changelogs, so we're just getting quad to build it. So generate non interactive mode, really good for read only tests.

Speaker 3

对于需要写入的测试,我们通常建议在命令行中传入一组非常具体的权限。你可以做的是传入allow tools,然后允许使用特定的工具。比如,不仅仅是Bash,还可以是git status或Git diff。所以只给它一组可以使用的工具,或者编辑工具。

For tests where you want to write, the thing we usually recommend is pass in a very specific set of permissions on the command line. So what you can do is pass in allow tools, and then you can allow a specific tool. So for example, not just Bash, but for example git status or Git diff. So just give it a set of tools that it can use or, you know, edit tool

Speaker 2

之类的。它仍然有默认工具或文件读取、grep、系统工具如Bash和LS以及内存工具,对吧?所有这些都

or something. It still has default tools or file read, grep, systems tools like Bash and LS and memory tools. Right? All those are

Speaker 3

所以它仍然有,是的。它仍然有所有这些工具,但allowed tools只是让你可以预先接受权限,而不是弹出权限提示,因为非交互模式下没有那个提示。是的。

So it still has yeah. It still has all these tools, but allowed tools just lets you instead of the permission prompt because you don't have that in the non interactive mode, it's just kind of pre accepting. Yeah.

Speaker 4

我们还强烈建议从小规模开始。比如先在一个测试上试运行,确保其行为合理,优化你的提示词,然后扩展到10个测试,确保成功,或者如果失败,就分析失败的模式,并逐步扩大规模。所以千万不要一开始就运行修复10万个测试的任务。

And we'd also definitely recommend that you start small. So like test it on one test, make sure that that has reasonable behavior, iterate on your prompt, then scale it up to 10, make sure that it succeeds or if it fails, just like analyze what the patterns of failures are and gradually scale up from there. So definitely don't kick off a run to fix like a 100,000 tests.

Speaker 2

是的。我想说的是,现在这个阶段,我只是...你知道,我脑子里一直有个标语:在Anthropic,有Cloud Code生成代码,然后Cloud Code还会审查自己的代码。就像,在某个时刻,不同的人都在设置这一切。你并不真正掌控它,但它正在发生。我思考的重点是,我们有NG的副总裁、CTO们在听。这对单个开发者来说固然很好,但那些负责技术、整个代码库、工程决策的人,这一切正在发生。

Yeah. I think the so at this point, I just, you know, I I wanna this tagline is in my head that basically at Anthropic, there's Cloud Code generating code and then Cloud Code also reviewing its own code that like, at some point, right, like different people are setting all this up. You don't really govern that, but it's happening. The point of the thing I was thinking about was we have, you know, VPs of NG, CTOs listening. Like, it's this is all well and good for the individual developer, but the people who are responsible for the tech, the the entire code base, the engineering decisions, all this is going on.

Speaker 2

我的开发者们,比如,我管理着大约100名开发者。他们任何人在这个时候都可能在做这些事情。我该如何管理?我的代码审查流程该如何改变?我的变更管理该如何调整?

My developers, like, I I manage, like, a 100 developers. Any of them could be doing any of this at this point. What do I do to manage this? How does my code review process change? How does my change management change?

Speaker 2

我不知道。

I don't know.

Speaker 4

我们与很多副总裁和CTO讨论过这个问题。他们实际上往往非常兴奋,因为他们会试用这个工具,下载它,问几个问题。当Cloud Code给出合理的答案时,他们真的很兴奋,因为他们觉得,哦,我可以理解代码库中的这种细微差别。有时他们甚至用Cloud Code发布小功能。我认为通过这种与工具互动的过程,他们建立了很大的信任。很多人实际上来找我们,问我们如何能更广泛地推广它?

We've talked to a lot of VPs and CTOs about it. They actually tend to be quite excited because they experiment with the tool, they download it, they ask it a few questions, and, like, Cloud Code, when it gives them sensible answers, they're really excited because they're like, oh, I can understand like this nuance in the code base. And sometimes they even ship small features with Cloud Code. And I think through that process of like interacting with the tool, they build a lot of trust in it. And a lot of folks actually come to us and they ask us like, how how can I roll it out more broadly?

Speaker 4

然后我们经常会与开发产品副总裁等举行会议,讨论这些关于如何确保人们编写高质量代码的担忧。我认为总的来说,仍然很大程度上取决于每个开发者自己坚持很高的代码质量标准。即使我们使用Cloud Code编写大量代码,合并代码的个人仍然有责任确保这是维护良好、文档齐全、具有合理抽象层次的代码。所以我认为这种情况会继续存在,Cloud Code并不是一个独立的工程师,它不会自己提交代码,仍然很大程度上取决于个人贡献者对产生的代码负责。

And then we'll often like have sessions with like VPs of dev prod and talk about these concerns around how do we make sure people are writing high quality code. I think in general, it's still very much up to the individual developer to hold themselves up to a very high standard for the quality of code that they merge. Even if we use Cloud Code to write a lot of our code, it's still up to the individual who merges it to be responsible for, like, this being well maintained, well documented code that has, like, reasonable obstructions. And so I I think that's something that will continue to happen where Cloud Code isn't its own engineer, that's like committing code by itself, it's still very much up to the ICs to be responsible for the code that's produced.

Speaker 3

是的。我认为Cloud Code也让很多质量工作变得容易得多。例如,我已经好几个月没有手动编写过单元测试了。

Yeah. I think Cloud Code also makes a lot of this stuff a lot of quality work becomes a lot easier. For example, I have not manually written a unit test in many months.

Speaker 4

我们有很多单元测试。

We have a lot of unit tests.

Speaker 1

我们有

We have

Speaker 3

很多单元测试。这是因为Quad编写了所有测试。以前我觉得在别人的PR上要求写测试很不好意思,因为你知道,他们大概知道覆盖率?

a lot of unit tests. And it's because Quad writes all the tests. Before I felt like a jerk if on someone's PR, I'm like, Hey, can you write a test? Cause you know, they kinda know they Coverage?

Speaker 2

那还是覆盖率的问题。是的。

Is that still Coverage. Yeah.

Speaker 3

好吧。你知道,他们某种程度上知道自己应该写测试,这可能是正确做法,但在他们脑子里会做权衡,只想更快地发布。所以你总是觉得要求测试有点讨厌,但现在我总是要求,因为Quad可以直接写测试。

Okay. And you know, they kinda know they should probably write a test and that's probably the right thing to do and somewhere in their head they make that trade off where they just wanna ship faster. And so you always kinda feel like a jerk for asking, but now I always ask because Quad can just write the test.

Speaker 2

你是

You're

Speaker 3

对的。不需要人工工作,只需让Quad来做并编写。我认为随着编写测试变得更容易,编写Lint规则也变得更容易,实际上现在比以往更容易拥有高质量代码。

right. There's no human work, just ask Quad to do it and write the and I think with writing tests becoming easier and with the writing Lint rules becoming easier, it's actually much easier to have high quality code than than it was before.

Speaker 2

你相信哪些指标?比如,很多人其实不相信100%代码覆盖率,因为有时候那是在优化错误的东西。可以说,我不确定。但显然你在代码质量指标方面有很多经验,那么什么指标仍然是有意义的?

What are the metrics that you believe in? Like, is it there's a lot of people actually don't believe in 100% code coverage because sometimes that is kind of optimizing for the wrong thing. Arguably, I don't know. But, like, obviously, you have a lot of experience in different code quality metrics, But what what what is still what still makes sense?

Speaker 3

老实说,我认为这很大程度上取决于工程团队。我希望有一个万能答案。

I think it's very engineering team dependent, honestly. I wish there's a one size fits all answer.

Speaker 2

是啊。就是那个解决方案。

Yeah. It's just the one solution.

Speaker 3

对某些团队来说,测试覆盖率极其重要。对其他团队,类型覆盖率很重要,特别是在使用严格类型语言时,比如避免JavaScript和Python中的any类型。是的。我认为圈复杂度经常被批评,但老实说它仍然是个不错的指标,因为在衡量代码质量方面没有更好的方法了。

For for some teams, test coverage is extremely important. For other teams, type coverage is very important, especially if you're working in very strictly typed language and, for example, avoiding like NES and JavaScript and Python. Yep. I think psychomatic complexity kind of gets a lot of flack, but it's still honestly a pretty good metric just because there isn't anything better in terms of ways to measure code quality.

Speaker 2

好的。那么生产力方面,显然不是代码行数,但你在乎衡量生产力吗?我相信你在乎。

Okay. And then productivities, obviously not lines of code, but do you care about measuring productivity? I'm sure you do.

Speaker 3

是的。说实话,代码行数其实并不差。

Yeah. You know, line code honestly isn't terrible.

Speaker 0

天啊。

Oh god.

Speaker 3

它确实有缺点。是的。它很糟糕。代码行数作为衡量标准很糟糕,有很多原因。

It's a it has downsides. Yeah. It's it's terrible. Lines of code is terrible for a lot of reasons.

Speaker 2

是的。

Yes.

Speaker 3

但真的很难找到更好的替代方案。所以它是最不糟糕的。它是最不糟糕的。比如,代码行数,也许还有PR数量之类的。

But it's really hard to make anything better. So It's the least terrible. It's the least terrible. There's, like, lines of code, maybe, like, number of PRs.

Speaker 2

你的GitHub贡献图有多绿。对。对。

How green your GitHub is. Yeah. Yeah.

Speaker 4

我们真正想要重点抓的两个指标是:第一,缩短周期时间。也就是说,使用这些工具后,你的功能上线速度提高了多少?这可能是从首次提交到PR合并的时间。虽然很难精确衡量,但这是我们的目标之一。另一个我们想更严格衡量的指标是,那些原本不会开发的功能数量。

The two that we're really trying to nail down are one, decrease in cycle time. So how much faster are your feature shipping because you're using these tools? So that might be something like the time between first commit and when your PR is merged. It's very tricky to get right, but one of the ones that we're targeting. The other one that we wanna measure more rigorously is, like, the number of features that you wouldn't have otherwise built.

Speaker 4

我们有很多收集客户反馈的渠道。我们在Cloud Code上看到的一个模式是,有时客户支持或客户成功团队会发帖说,某个应用有某个bug。然后有时十分钟后,该团队的一名工程师就会说,Cloud Code已经修复了它。在很多这种情况下,当你联系他们并说'嘿,这太酷了'时。

We have a lot of channels where we get customer feedback. And one of the patterns that we've seen with Cloud Code is that sometimes customer support or customer success will like post, hey, like, this app has like this bug. And then sometimes ten minutes later, one of the engineers on that team will be like, Cloud Code made a fix for it. And a lot of those situations when you, like, ping them and you're like, hey. That was really cool.

Speaker 4

他们会说,是的。如果没有Cloud Code,我可能不会做那个修复,因为那会偏离我原本要做的事情太多。它可能只会被归入漫长的待办清单。这类事情正是我们想要更严格衡量的。

They were like, yeah. Without Cloud Code, I probably wouldn't have done that because it would have been too much of a divergence from what else otherwise gonna do. It would have just ended up in this long backlog. This is the kind of stuff that we really want to measure more rigorously.

Speaker 3

那对我来说是另一个AGI震撼时刻。很多个月前,有一个非常早期的quad code版本。Anthropic的一位工程师Jeremy构建了一个机器人,它查看Slack上的一个特定反馈频道,并将其连接到代码,让代码自动提交PR来修复所有这些问题。虽然有些问题无法修复,但它解决了很多问题。我会

That was the other AGI peeled moment for me. There was a really early version of quad code many, many months ago. And this one engineer at Anthropic, Jeremy, built a bot that looked through a particular feedback channel on Slack and he hooked it up to code to have code automatically put up PRs with just fixes to all this stuff. And some of this stuff, you know, couldn't fix every issue, but it fixed a lot of the issues. I would Is

Speaker 2

大概百分之十?五十?

it like ten percent? Fifty?

Speaker 3

你知道,这是早期阶段,所以我记不清具体数字了,但高得令人惊讶,以至于我成为了这种工作流程的信徒。而我之前并不是。

You know, this was like early on, so I But don't remember the it was it was surprisingly high to the point where I became a believer I see. In this kind of workflow. And I I wasn't before.

Speaker 1

SOPM,在某种程度上不也很可怕吗?当你能构建太多东西时,几乎就像也许你不应该构建那么多东西。我觉得这是我最纠结的地方。它给了你创造、创造、再创造的能力,但到了某个时刻,你就必须支持、支持、再支持。

SOPM, isn't that scary too in a way? Where you can build too many things, it's almost like maybe you shouldn't build that many things. I think that's what I'm struggling with the most. It's like it gives you the ability to create, create, create. But then at some point, you gotta support, support, support.

Speaker 2

这就像《侏罗纪公园》。就像,科学家们太专注于他们能否做到。是的。是的。完全正确。

This is the Jurassic Park. Like, scientists are so preoccupied with whether you could. Yeah. Yeah. Exactly.

Speaker 1

我不知道。我们应该...是的。你是怎么做决策的?比如,既然实际实现某事的成本正在降低,作为一个产品经理,你如何决定什么才是真正值得做的?

I don't know. We should. Yeah. How how do you make decisions? Like, now that the cost of actually implementing the thing is going down as a PM, how do you decide what is actually worth doing?

Speaker 4

是的。我们对于全新的功能仍然设定了非常高的标准。大部分修复都是关于,嘿,这个功能坏了,或者存在一个我们之前没处理过的奇怪边缘情况。所以这更像是打磨粗糙的边缘,而不是构建一个完全全新的东西。对于全新功能,我认为我们设定了相当高的标准:它必须非常直观易用,新用户体验要最小化,让人一看就明白它是如何工作的。

Yeah. We definitely still hold a very high bar for net new features. Most of the fixes were like, hey, this functionality is broken or this like there's a weird edge case that we hadn't addressed yet. So it's very much like smoothing out the rough edges as opposed to building something completely net new. For net new features, I think we hold a pretty high bar that it's very intuitive to use, the new user experience is like minimal, it's just like obvious that it works.

Speaker 4

我们有时实际上会用 Cloud Code 来制作原型,而不是使用文档。是的。所以你会得到可以试玩的原型,这通常能让我们更快地感受到:嘿,这个功能准备好了吗?或者,这是正确的抽象吗?这是正确的交互模式吗?

We sometimes actually use Cloud Code to prototype instead of using docs. Yeah. So you'll have like prototypes that you can play around with and that often gives us a faster feel for, hey, is this feature ready yet? Or, like, is this the right abstraction? Is this the right interaction pattern?

Speaker 4

所以它能让我们更快地对一个功能建立起真正的信心,但这并不能绕开我们确保该功能确实符合产品愿景的过程。

So it gets us faster to feeling really confident about a feature, but it's it doesn't circumvent the process of us making sure that the feature definitely fits in, like, the product vision.

Speaker 3

有趣的是,随着构建东西变得更容易,它改变了我编写软件的方式,就像 Kat 说的,以前我会写一份大的设计文档,对于某些问题集,有时我会在构建之前思考很长时间。而现在,我会直接让 QuadCode 为其制作三个版本的原型,我会试用这个功能,看看我更喜欢哪一个,这比一份文档能更好、更快地给我提供信息。我认为我们行业还没有完全内化这种转变。

It's interesting how as it gets easier to build stuff, it changes the way that I write software where like like Kat's saying, like, before I would write a big design doc and I would think about a problem for a long time before I would build it sometimes for some set of problems. And now I'll just ask QuadCode to prototype three versions of it and I'll try the feature and see which one I like better, and then that informs me much better and much faster than a doc would have. I think we haven't totally internalized that transition yet in the industry.

Speaker 1

是的。对于我内部构建的一些工具,我也有同样的感觉。人们问我,我们能做这个吗?我就说,行,我直接把它做出来。然后感觉,嗯,感觉还挺不错的。

Yeah. I feel the same the same way for some tools I build internally. People ask me, could we do this? And I'm like, I'll just, yeah, just build it. And it's like, well, it feels it feels pretty good.

Speaker 1

我们应该把它打磨一下,你知道,或者有时候会觉得,不,那样不行。

We should like polish it, you know, or sometimes it's like, no, that's not.

Speaker 2

令人安心的是,你知道,你的最大成本上限是——我的意思是,即使在理论上成本无上限的 Anthropic,成本也大约是每天 6 美元。这让人很放心,因为我想,每天 6 美元?没问题。每天 600 美元?那我们可得谈谈了。是的。

It's comforting that, you know, like, that your ups your max cost is I mean, you're even at Anthropic where it's theoretically unlimited, the cost is roughly $6 a day. That gives people peace of mind because I'm like, $6 a day? Fine. $600 a day, we have to talk. Yeah.

Speaker 2

你知道吗?

You know?

Speaker 1

是啊。我每月花200美元制作吉卜力工作室风格的照片。所以一切都很好,完全物有所值。

Yeah. I paid $200 a month to make Studio Ghibli photos. So it's all it's all good. That is totally worth it.

Speaker 4

你提到了内部工具,这实际上是我们看到正在兴起的一个很大用例。因为在很多操作密集型工作中,如果能快速搭建一个内部仪表盘或操作工具(比如批量处理上千封邮件访问权限),这类需求往往不需要精美设计,只需要能用就行。Cloud Code特别擅长这类从零到一的任务。

You mentioned internal tools and that's actually a really big use case that we're seeing emerge. Because a lot of times, if you're working on something operationally intensive, if you can spin up a internal dashboard for it or like an operational tool where you can, for example, grant access to a thousand emails at once. A lot of these things, you don't really need to have, like, a super polished design. You kinda just need something that works. And Cloud Code's really good at those kinds of zero to one tasks.

Speaker 4

比如我们内部使用Streamlit后,数据可视化能力大幅提升。正因为能可视化,我们才能发现仅看原始数据时无法察觉的规律。

Like, we use Streamlit internally and there's been like a proliferation of how much we're able to visualize and because we're able to visualize it, we're able to see patterns that we wouldn't have otherwise if we were just looking at like raw data.

Speaker 3

没错。我上周也在做一个辅助网站,直接把设计稿截图拖进终端对ClaudCode说:'这是设计稿,能实现吗?'它确实实现了,虽然有点粗糙。

Yeah. Like, I I was working on also this side website last week, and I just showed ClaudCode the mock. So I just took the screenshot I had, dragged and dropped it into the terminal, and I was like, Hey Claud, here's the mock. Can you implement it? And it implemented and it worked, it was a little bit crummy.

Speaker 3

我就说:'现在用Puppeteer检查并迭代直到符合设计稿'。它反复修改了三四次,最终效果就和设计稿一模一样了。要知道这些以前可全是手动工作。

I was like, alright, now look at it in Puppeteer and iterate on it until it looks like the mock. And then it did that three or four times, and then the thing looked like the mock. Yeah. This was just all manual work before.

Speaker 2

我想再请教两个关于智能体功能的问题。我对记忆功能很感兴趣,你们提到过自动压缩和带标签的记忆机制。虽然最简单的方法有效,但好奇是否见过其他有趣的方案?或者内部探索过的记忆黑科技值得分享?

I think we're gonna ask about, like, two other features of, I guess, the the the overall agent pieces that we mentioned. So I'm interested in memory as well. So we talked about auto compact and memory using hashtags and stuff. My impression is that your you you like you say, simplest approach works, but I'm curious if you've seen any other requests that are interesting to you or internal hacks of memory that people have explored that, like, you know, you might wanna surface to others.

Speaker 3

记忆处理有多种方案,大多依赖外部存储。比如Chroma?对,就是这类。

There's a bunch of different approaches to memory. Most of them use external stores of various sorts. There's Chroma? Yeah. Exactly.

Speaker 3

是的,类似项目很多。主要是K近邻检索和图存储这两种主流模式。

Yeah. There's a lot of projects like that. Yeah. It's either a k value or kind of like graph stores that's like the two big shapes for this.

Speaker 2

你相信知识图谱在这方面能发挥作用吗?

Are you a believer in knowledge graphs for this stuff or

Speaker 3

你知道,如果在我加入Anthropic和这个团队之前你跟我聊过,我可能会说,是的,绝对是。但现在我实际上觉得一切都在于模型。就像最终胜出的总是模型。随着模型

You know, I'm a big I if you talked to me before I joined Anthropic and this team, I would have said, yeah, definitely. But now actually I feel everything's the model. Like that's the thing that wins in the end. And just as the model

Speaker 2

变得

gets

Speaker 3

更好,它会吸收其他一切。所以,在某个时间点,模型会编码自己的知识图谱。只要你给它合适的工具,它甚至会编码自己的知识库故事。是的。但具体工具方面,我认为还有很大的实验空间,我们目前还不确定。

better, subsumes everything else. So, you know, at some point, the model will encode its own knowledge graph. It'll encode its own, like, KB story if you just give it the right tools. Yeah. But, yeah, I think the the specific tools, there's still a lot of room for experimentation that we just we don't we don't know yet.

Speaker 2

在某种程度上,我们是不是因为缺乏上下文长度而在凑合?比如,我们现在做的这些记忆相关的事情,如果有一个1亿token的上下文窗口,我们是不是就不在乎了?

In some ways, are we just coping for lack of context length? Like, are we doing things for memory now that if we had like a 100,000,000 token context window, we don't care about?

Speaker 3

这个问题很有意思。

It's an interesting one.

Speaker 4

我当然很想要一个1亿token的上下文窗口。

I would love to have a 100,000,000 talking context for sure.

Speaker 2

有些人声称已经做到了。我们不知道是真是假。

Some people have claimed to to have done it. We don't know if it's true or not.

Speaker 3

Sean,我想问你一个问题。如果把世界上所有知识都放进你的大脑里,是的。假设有某种治疗可以让你大脑拥有任意长度的上下文,你拥有无限的神经元。你会想这样做吗?还是你仍然希望把知识记录在外部?

I guess here's the question for you, Sean. If you took all the world's knowledge and you put it in your brain Yeah. And let's say, you know, there is like some treatment that you could get to make it so your brain can have any amount of context. You have like infinite neurons. Is that something that you would wanna do or would you still wanna record knowledge externally?

Speaker 2

把知识放进我脑袋里和用代理工具来做是两回事,因为我想控制代理。我想让自己变得无限,但我希望使用的工具是有限的,因为这样我就知道如何控制它们。这甚至不是安全论证,更像是我想知道你知道什么。如果你不知道某件事,有时候这样反而更好。

Putting it in my head is like different for me trying to use an agent tool to do it because I'm trying to control the agent. And I'm trying to make myself unlimited, but I wanna make the tools that I use limited because then they then I know how to control them. And it's not even like a safety argument. It's just more like a I want to know what you know. And if you don't know don't know a thing, sometimes that's good.

Speaker 3

就像审计意图的能力。

Like like the ability to audit what what's the intent.

Speaker 2

我不知道这是不是小脑思维,因为这算不上什么惨痛教训——实际上,有时候你就是想控制输入上下文中的每个细节。但越是放手让模型自主运作(就像说'耶稣来掌舵'那样信任模型),你就越不知道它到底在关注什么。

And I don't know if this is the small brain thinking because this is not very bitter lesson, which is like, actually, sometimes you you just want to control every every part of what goes in there in the context. And the more you just, you know, Jesus take the wheel, trust the model, then you have no idea what it's paying attention to.

Speaker 3

是啊。不知道你有没有看到Chris Chrisola团队上周发布的MEC可解释性研究?

Yeah. I don't know. Did you see the MEC interpretability stuff from Chris Chrisola and the team that was published?

Speaker 2

上周那个。看到了。

Like last week. Yeah.

Speaker 3

对,上周发布的。

Last week. Yeah.

Speaker 2

嗯,有什么特别之处吗?

Yes. What about it?

Speaker 3

我在想这类技术是不是未来方向。这样就能更便捷地审计模型本身。

I I wonder if something like this is the future. So there's an easier way to audit the model itself.

Speaker 2

嗯。

Mhmm.

Speaker 3

如果你想查看存储的内容,直接审计模型就可以了。

And so if you wanna see like what what is stored, you can just audit the model.

Speaker 2

没错。最关键的是他们能掌握每个token激活的特征,可以进行调节或抑制。但我不确定这是否能细化到上下文中的具体知识单元。

Yeah. The main salient thing is that they've they know what features activate it per token and they can tune it up, suppress it, whatever. But I don't know if it goes down to the individual, like, item of knowledge from context, you know.

Speaker 3

目前还不行。但我在想,这或许就是西方版的'惨痛教训'吧。

Not yet. Yeah. But I wonder if, you know, maybe that's the bitter western version of it.

Speaker 2

对。对。还有其他关于记忆功能的意见吗?如果没有,我们可以继续讨论规划和思考。

Right. Right. Any other comments from memory? Otherwise, we can move on to planning and thinking.

Speaker 4

我们观察到人们以非常有趣的方式使用记忆功能,比如让Claude记录它执行的所有操作日志,这样随着时间的推移,Claude就能逐渐理解你的团队工作内容、你在团队中的角色、你们的目标以及你喜欢的工作方式。我们希望能找出最通用的实现方案以便广泛推广。我认为在开发硬编码功能时,实际实现功能的工作量反而小于调整这些功能以确保它们能很好地适用于广大用户群体,即覆盖广泛的使用场景。所以记忆功能有很多有趣的潜力,我们只是希望确保在广泛推广前它能开箱即用。

We've been seeing people play around with memory in quite interesting ways like having Claude write a logbook of all the actions that it's done so that over time, Cloud develops this understanding of what your team does, what you do within your team, what your goals are, how you like to approach work. We would love to figure out what the most generalized version of this is so that we can share broadly. I think with things like hard code, with like, I think when we're developing things like hard code, it's actually less work to implement the feature and a lot of work to tune these features to make sure that they work well for general audiences, like across a broad range of use cases. So there's a lot of interesting stuff with the memory and we just wanna make sure that it works well out of the box before we share it broadly.

Speaker 2

同意这一点。我认为这里还有很大的发展空间。

Agree with that. I think there's a lot more to be developed here.

Speaker 3

我想记忆功能的一个相关问题是:如何将信息纳入上下文?

I guess a related problem to memory is how do you get stuff into context?

Speaker 2

知识库。对吧?比如知识库。

Knowledge base. Right? Like knowledge base.

Speaker 3

是的。最初我们尝试的早期Quad版本实际上使用了Rag技术。我们对代码库进行了索引,当时应该用的是Voyage,就是现成的Rag方案,效果相当不错。我们尝试了几个不同版本,先是Rag,后来又试了几种不同的搜索工具。

Yeah. And originally we tried very, very early versions of Quad actually used Rag. So we indexed the codebase and I think we were just using Voyage, so just off the shelf Rag and that worked pretty well. And we tried a few different versions of it. There was Rag and then we tried a few different kinds of search tools.

Speaker 3

最终我们确定采用智能代理搜索作为解决方案。主要有两个——也许是三个重要原因。首先是它的表现远超其他方案,优势非常明显。

And eventually we landed on just agentic search as the way to do stuff. And there were two big reasons, maybe three big reasons. So one is it outperformed everything. By a lot. By a lot.

Speaker 3

这个结果令人惊讶。

And this was surprising.

Speaker 2

在什么基准测试中?

In what benchmark?

Speaker 3

主要是直觉判断。内部使用感受。虽然也有些内部基准测试,但主要还是靠直觉。就是感觉更好用。

This was just vibes. So internal vibes. There there's some internal benchmarks also, but mostly vibes. It just felt better.

Speaker 2

在代理式RAG中,意思是你可以让它根据需要查找任意多个周期。

In in agentic rag, meaning you can you just let it look up in however many cycles it needs.

Speaker 3

是的。就是使用常规的代码搜索,你知道,GLOB、GREP,就是常规的

Yeah. Just using regular code code searching, know, GLOB, GREP, just regular

Speaker 2

代码搜索。常规代码搜索。是的。

code search. Regular code search. Yeah.

Speaker 3

是的。所以有一个是这样的,然后第二个是RAG需要做整个索引步骤。这带来了很多复杂性,因为代码会不同步,然后还有安全问题,因为这个索引必须存放在某个地方,如果那个提供商被黑客攻击了怎么办?所以对公司来说这样做责任很大。即使对我们的代码库来说,它非常敏感,所以我们有点不想把它上传到第三方的东西,可以是第一方的东西,但我们仍然有这个不同步的问题。

Yeah. So there was like one, and then the second one was there was this whole indexing step that you have to do for RAG. And there's a lot of complexity that comes with that because the code drifts out of sync, then there's security issues because this index has to live somewhere, then what if that provider gets hacked? And so it's just a lot of liability for a company to do that. Even for our code base, it's very sensitive, so we're kind of we don't want to upload it to a third party thing, it could be a first party thing, but then we still have this out of sync issue.

Speaker 3

而代理式搜索完全避开了所有这些。所以本质上,以延迟和代币为代价,你现在拥有了非常棒的搜索,而没有安全方面的缺点。

And agentic search just sidesteps all of that. So essentially, at the cost of latency and tokens, you now have really awesome search without security downsides.

Speaker 1

嗯,记忆就像规划。对吧?有点像记忆就像我喜欢做什么,然后规划就像是现在利用那些记忆来制定一个做这些事的计划。有一个

Well, memory is like planning. Right? There's kinda like memories like what I like to do, and then planning is like now use those memories to come up with a plan to do these things. There was one

Speaker 2

或者也许可以这样说,记忆有点像过去,就像我们已经做过的事情。而规划有点像我们将要做什么。是的。它们在某个点上会交叉。

Or maybe put it as, like, memories sort of the past, like, what we what we had already did. And if planning is kinda what we will do Yeah. It just crosses over at some point.

Speaker 1

是的。我认为从外部看可能有点混淆的是你如何定义思考。所以有广泛的思考。有思考工具,它有点像规划中的思考,也就是执行前的思考。然后还有像思考你正在做什么,这就像是事情工具。

Yeah. I think that maybe slightly confusing thing from the outside is what you define as thinking. So there's like extensive thinking. There's the think tool, and it's kinda like thinking as in planning, which is like thinking before execution. And then there's like thinking what you're doing, which is like the thing tool.

Speaker 1

你能带大家了解一下不同的AI吗?

Can you maybe just run people through the different AIs?

Speaker 2

听你说话我真的很困惑。嗯,

I'm really confused listening to you. Well,

Speaker 3

它是一个工具。所以如果你要求Quad思考,它就可以思考。通常,最佳使用模式是你让Quad做一些研究,比如使用一些工具,将一些代码拉入上下文,然后要求它进行思考。之后它可以制定计划,在执行前进行规划步骤。有些工具有明确的规划模式,比如Rue Code有这个功能,Quine也有。

it's one tool. So quad can think if you ask it to think. Generally, the usage pattern that works best is you ask Quad to do a little bit of research, like use some tools, pull some code into context, and then ask it to think about it. And then it can make a plan and, you know, do a planning step before you execute. There's some tools that have explicit planning modes, like Rue Code has this and Quine has this.

Speaker 3

其他一些工具也有这个功能,比如你可以在规划和执行模式之间切换,或者可能有几种不同的模式。我们考虑过这种方法,但我认为我们的产品方法类似于我们对模型的方法,即苦乐教训。所以保持自由形式,保持非常简单,保持接近底层。因此,如果你想让Claude思考,只需告诉它思考,比如制定计划,认真思考,先不要写任何代码,它通常应该遵循这一点。你也可以随时这样做。

Some of the other tools have it, like you can shift between, you know, plan and act mode, or maybe a few different modes. We've sort of thought about this approach, but I think our approach to product is similar to our approach to the model, which is bitter lesson. So just free form, keep it really simple, keep it close to the metal. And so if you want Claude to think, just tell it to think, be like, you know, make a plan, think hard, don't write any code yet, and it should generally follow that. And you can do that also as you go.

Speaker 3

所以可能有一个规划阶段,然后Claude编写一些代码或其他什么,之后你可以要求它再思考和规划一下。你随时都可以这样做。

So maybe there's a planning stage, and then Claude writes some code or whatever, and then you can ask it to think and plan a little bit more. You can do that anytime.

Speaker 1

是的。我读了关于思考工具的博客文章,它说虽然听起来类似于扩展思考,但这是一个不同的概念。扩展思考是Claw在开始生成之前所做的。而思考则是在开始生成后,Adi必须停下来思考?这些都是由时钟代码框架完成的吗?

Yeah. I was reading through the think tool blog posts, and it said, while it sounds similar to extended thinking, it's a different concept. Extended thinking is what Claw does before it starts generating. And then think it once it starts generating, Adi had to stop and think? Is this all done by the clock code harness?

Speaker 1

所以基本上,人们不需要真正考虑两者之间的区别,是这个意思吗?

So people don't really have to think about the difference between the two, basically, is the idea?

Speaker 3

是的。你不需要考虑它。好的。而且它

Yeah. You don't you don't have to think about it. Okay. And it's

Speaker 1

所有这些都很有帮助。这很有帮助。因为有时候我会想,我是不是没有正确思考?

all That that is helpful. That that is helpful. Because sometimes I'm like, man, am I not thinking right?

Speaker 3

是的。这实际上是,在quad代码中,所有这些都是思维链。所以我们不使用思考工具。任何时候quad代码进行思考,都是通过思维链完成的。

Yeah. This it is and it's all chain of thought actually in quad code. So we don't use the think tool. Anytime that quad code does thinking, it's all chain of thought.

Speaker 2

我对此有一个见解。这又是我们在录制前讨论过的事情,即在Cloud Placed Pokemon黑客松中,我们访问了更多分支环境功能,这意味着我们可以获取任何虚拟机状态,进行分支,稍微推进一下,并在规划中使用它。然后我意识到昨天的要点基本上是,在每个时间点都这样做成本太高了。但如果你把它作为一个工具提供给Claude,并在某些情况下提示它使用该工具,似乎是有意义的。我只是有点好奇,比如,你对整体沙盒化、环境、分支、可回滚性等的看法。

I had a insight with this. This is again something we had a discussion we had before recording, which is in the Cloud Placed Pokemon hackathon, we had access to more sort of branching environments feature, which meant that we could take any VM state, branch it, play it forward a little bit, and use that in the planning. And then I realized the TLDR of of yesterday was basically that it's too expensive to just always do that at every point in time. But if you give it as a tool to Claude and prompt it in certain cases to use that tool, seems to make sense. I'm just kinda curious, like, your takes on overall, like, sandboxing, environments, branching, rewindability maybe.

Speaker 2

这是你立即提出的,我没想到的事情。这对Claude有用吗?还是Claude对此没有意见?是的。我我可以

This is something that you immediately brought up, which I didn't think about. Is that useful for Claude? Or Claude has no opinions about it? Yeah. I I could

Speaker 3

关于这个可以聊上好几个小时。Claude 大概也能做到。

talk for hours about this. Claude probably can too.

Speaker 2

是啊。要我说的话,我们先从你那里获取原始 token,然后就可以用这些数据训练 Cloud。顺便说一句,这档播客本质上就是这样。我们就是在为人们生成 token。

Yeah. If you ask me. Let's get original tokens from you and then we can train Cloud on that. By the way, that's like explicitly what this podcast is. We're just generating tokens for people.

Speaker 3

这是预训练还是后训练?

Is this is this the pre training or the post training?

Speaker 2

那是预训练数据集。我们得参与进去。

That's a pre training dataset. Like, we gotta get in there.

Speaker 3

天啊。没错。怎么购买?怎么获取一些 token?从沙盒开始,理想情况下我们想要的是始终在 Docker 容器中运行代码,这样它就有自由度,之后你可以在上面用其他工具进行快照,可以快照、回滚,做所有这些操作。

Oh man. Yeah. How do I buy? How do I get some tokens? Starting with sandboxing, ideally the thing that we want is to always run code in a Docker container and then it has freedom and you can snapshot with other kind of tools later on top, can snapshot, rewind, do all this stuff.

Speaker 3

不幸的是,对所有事情都使用 Docker 容器就像要做很多工作,大多数人不会这么做。所以我们想要某种方式来模拟其中一些功能,而不必完全使用容器。现在有一些可以做的事情。比如,有时候如果我有一个规划问题或研究类问题,我会让 Claude 并行研究几条路径。如果你直接要求它,现在就可以做到。

Unfortunately, working with a Docker container for everything is just like a lot of work and most people aren't gonna do it. And so we want some way to simulate some of these things without having to go full container. There's some stuff you can do today. So for example, something I'll do sometimes is if I have a planning question or a research type question, I'll ask Claude to investigate a few paths in parallel. And you can do this today if you just ask it.

Speaker 3

比如说,我想重构 x 来实现 y。你能研究三种不同的实现方案吗?并行进行。用三个智能体来完成。所以在用户界面中,当你看到一项任务时,那实际上就像是一个子 Claude。

So say, you know, I want to refactor x to do y. Can you research three separate ideas for how to do it? Do it in parallel. Use three agents to do it. And so in the UI, when you say when you see a task, that's actually like a subclaud.

Speaker 3

它是一个执行这个任务的子代理。通常当我处理复杂问题时,我会要求它在 Provo 中研究三次或五次或任意多次。然后 Claude 会挑选出最佳选项并为你总结。

It's a sub agent that does this. And usually when I do something hairy, I'll ask it to just investigate three times or five times or however many times in Provo. And then Claude will kind of pick the best option and then summarize that for you.

Speaker 1

但 Claude 如何挑选最佳选项?难道你不想自己选择吗?在'应该由你选择'和'我应该做最终决定'之间的交接点是什么?

But how does Claude pick the best option? Don't you want to choose? What's your handoff between you should pick versus I should be the final decider?

Speaker 3

我觉得这取决于具体问题。你也可以让 Claude 把选项呈现给你。

I think it depends on the problem. You can also ask Claude to present the options to you.

Speaker 2

可能,你知道,它存在于技术栈的不同层面,与Cloud Code本身不同。Cloud Code作为一个CLI工具,你可以在任何环境中使用它。所以如何组合使用取决于你自己。我们是否应该讨论模型在何时以及如何失败?因为我觉得这对你们来说是另一个热门话题。

Probably, you know, it exists at a different part of the stack than than Cloud Code specifically. Cloud Code as a CLI, like, you could use it in any environment. So it's up to you to compose it together. Should we talk about how how and when models fail? Because I think that was another hot topic for you.

Speaker 2

我就开放讨论吧。比如,你们如何观察代码失败的情况?

I'll just leave it open. Like, how do you observe Code failing?

Speaker 4

模型确实有很大的改进空间,我觉得这非常令人兴奋。我们的大多数研究团队实际上每天都在使用Cloud Code。这对他们来说是一个很好的方式,可以非常实际地接触并体验模型失败的情况,这让我们更容易在模型训练中针对这些问题,从而提供更好的模型,不仅是为了Cloud Code,也是为了我们所有的编程客户。我觉得最新Sonnet 3.7的一个特点是它非常执着。它非常、非常有动力去完成用户的目标,但有时它会过于字面地理解用户的目标。

There's definitely a lot of room for improvement in the models, which I think is very exciting. Most of our research team actually uses Cloud Code day to day. And so it's been a great way for them to be very hands on and like experience the model failures, which makes it a lot easier for us to target these in model training and to actually provide better models, not just for Cloud Code, but for, like, all of our coding customers. I think one of the things about the latest Sonnet 3.7 is it's a very persistent model. It's like very, very motivated to accomplish the user's goal, but it sometimes takes the user's goal very literally.

Speaker 4

因此,它并不总是满足请求中隐含的部分,因为它过于专注于,比如,我必须

And so it doesn't always fulfill what, like, the implied parts of the request are because it's just so narrowed in on, like, I must

Speaker 2

它被锁定在

It's locked get

Speaker 4

完成任务上。所以我们正在尝试弄清楚,如何给它多一点常识,让它知道在努力尝试和“不,用户绝对不想要那个”之间的界限。

x done. And so we're trying to figure out, okay, how do we give it a bit more common sense so that it it knows the line between trying very hard and like, no, the user definitely doesn't want that.

Speaker 3

是的。就像那个经典例子:嘿,去让这个测试通过。然后,你知道,五分钟后,它说,好了,我把所有东西都硬编码了。测试通过了。我就说,不。

Yeah. Like the classic example is like, hey, go on, get this test to pass. So then, you know, like five minutes later, it's like, alright, Well, I hard coded everything. The test passes. I'm like, no.

Speaker 3

那不是我要的。硬编码答案。是的。但这就是问题所在。不过,它只会从这里变得更好。

That's not what I wanted. Hard coded to answer. Yeah. But that's but that's the thing. Like, it only gets better from here.

Speaker 3

比如,这些用例有时候能行,你知道,不是每次都能成功,而且模型有时会过于努力,但它只会变得更好。

Like, these use cases work sometimes today, not, you know, not every time and, you know, the model sometimes tries too hard, but it only gets better.

Speaker 2

是的。

Yeah.

Speaker 4

是的。比如上下文就是一个重要问题,很多时候如果你进行很长的对话并且多次压缩内容,可能你最初的一些意图就不像刚开始时那么强烈了。所以模型可能会忘记你最初告诉它要做的一些事情。因此我们对更大的有效上下文窗口等功能感到非常兴奋,这样你就能处理这些复杂的长达数十万token的任务,并确保quad code全程保持正轨。这将会是一个巨大的提升。

Yeah. Like context, for example, is a big one where, like, a lot of times, if you have a very long conversation and you compact a few times, maybe some of your original intent isn't as strongly present as it was when you first started. And so maybe the model, like, forgets some of what you originally told it to do. And so we're really excited about things like larger effective context windows so that you can have these, like, gnarly, like, really long hundreds of thousands of tokens long tasks and make sure that quad code is on track the whole way through. Like, that would be a huge lift.

Speaker 4

想想看,这不仅对quad code,对每个编程公司都是如此。

Think not just for quad code, but for every coding company.

Speaker 2

昨天David Hershey主题演讲中有个有趣的故事。他其实怀念3.5版本的常识性,因为3.7版本太坚持了。3.5版本有一些有趣的故事,显然它会放弃任务,而3.7版本就不会。当Cloud 2.5放弃时,它开始给游戏开发者写正式请求来修复游戏。他还有一些截图,非常精彩。

Fun story from David Hershey's keynote yesterday. He actually misses the common sense of 3.5 because three point seven being so persistent. Three point five actually had some entertaining stories where apparently it, like, gave up on task and just three point seven doesn't. And they gave and when Cloud two point five gives up, it started, like, writing formal request to the developers of the game to fix the game. And he has some screenshots of it, which is excellent.

Speaker 2

所以如果你在听这个,可以在YouTube上找到,因为我们会发布。非常非常酷。我想捕捉的一种失败形式是你我们在喝咖啡时提到的,就是quad code在会话间记忆或缓存方面做得不够。对吧?所以它每次都会重新构建整个调用的完整状态。

So if if you're listening to this, you can find it on the YouTube because we'll we'll post it. Very, very cool. One form of of failing, which I kinda wanted to capture was something that you mentioned while we're getting coffee, which is that quad code doesn't have that much between session memory or or caching or whatever you call that. Right? So it it reforms the whole state for whole call for every single time.

Speaker 2

这是为了对可能发生的变更做最小假设。那么它能保持多大的一致性呢?就像说的,我认为一个失败之处是它会忘记过去在做的事情,除非你通过Cloud.md等方式明确选择加入。

So as to make the minimum assumptions on the changes that can happen in between. So, like, how consistent can it stay? Right? Like, said, I think that one of the failures is that it forgets what it was doing in the past unless you explicitly opt in via Cloud. M d or whatever.

Speaker 2

这是你担心的问题吗?

Is that something you worry about?

Speaker 4

这绝对是我们正在努力解决的问题。我认为,目前对于想要跨会话恢复工作的人,我们最好的建议是告诉Claude:把这个会话的状态写进这个文本文档里(可能不是claud.md,而是另一个文档)。然后在新会话中告诉Claude从那个文档读取。是的。

It's definitely something we're working on. I think, like, our best advice now for people who wanna resume across sessions is to tell Claude to, hey, like, write down the state of this session into this text doc, probably not the claud. Md but like in a different doc. And in your new session, tell Claude to read from that doc. Yeah.

Speaker 4

但我们计划构建更原生的方式来处理这个特定工作流程。

But we'll we plan to build in more native ways to handle this specific workflow.

Speaker 3

这种情况有很多不同的案例。对吧?有时候你并不希望Claude拥有上下文。这有点像git,有时候我只想要一个没有任何历史的新分支。

There's a lot of different cases of this. Right? Like, sometimes you don't want Claude to have the context. And it's sort of like git. Sometimes I just want a, you know, a fresh branch that doesn't have any history.

Speaker 3

但有时候我已经在某个PR上工作了一段时间,需要所有的历史上下文。

But sometimes I've been working on a PR for a while and I need all that historical context.

Speaker 2

对。

Right.

Speaker 3

所以我们想支持所有这些情况,但要做到一刀切确实很棘手。不过总的来说,我们的代码方法是确保它能开箱即用,无需额外配置。所以一旦我们实现这一点,就会有所成果。

So we kinda wanna support all these cases and it's tricky to do a one size fits all. But generally, our approach to code is to make sure it works out of the box for people without extra configuration. So once we get there, we'll have something.

Speaker 1

你们是否设想过一个功能,让提交记录在拉取请求中扮演更重要的角色?比如我们如何追溯代码变更历史?你知道的,PR中有大量关于代码如何变化的历史信息可以供模型参考。但如今模型主要关注的是分支的当前状态。

Do you see a feature in which the commits play a bigger part of, like, in a pull request, like, how do we get here? You know? There's kinda, like, a lot of history in how the code has changed within the PR that informs the model. But today, the models are mostly looking at the current state of the branch.

Speaker 3

是的。实际上Claude在某些情况下会查看完整历史记录。比如当你让Claude帮你创建PR时,它会查看自你分支从主分支分离以来的所有变更,然后在生成拉取请求消息时综合考虑这些变化。

Yeah. So Claude, some things, it'll actually look at the whole history. So for example, if it's writing if you tell Claude, hey, make a PR for me, it'll look at all the changes since your branch diverged from main, and then, you know, take all of those into account when generating the pull request message.

Speaker 4

你可能会注意到在使用过程中它在运行git diff。我认为它很擅长追踪这个分支到目前为止发生了哪些变化,并确保在继续完成任务之前充分理解这些变更。

You might notice it running git diff as you're using it. I think it's pretty good about just tracking, hey, what changes have happened on this branch or so far, and just make sure that it's like understands that before continuing on with the task.

Speaker 3

其他人做过的一件事是要求Quad在每次更改后都提交。你可以把这个写在QuadMD里。我觉得这些高级用户工作流非常有趣。比如有些人要求Quad每次更改后都提交,这样他们就能轻松回退。还有人要求Quad每次都创建一个工作树,这样就能在同一个仓库中并行运行多个quad实例。

One thing other people have done is ask Quad to commit after every change. You can just put that in the QuadMD. There there's some of these like power user workflows that I think are super interesting. Like, some people are asking Quad to commit after every change so that they can rewind really easily. Other people are asking Quad to create a work tree every time so that they could have, you know, a few quads running in parallel in the same repo.

Speaker 3

从我们的角度来看,我们希望支持所有这些功能。所以再说一次,quad代码就像是一个基础工具,无论你的工作流程如何,它都应该能够无缝融入。

I think from our point of view, we wanna support all of this. So again, quad code is like a primitive and it doesn't matter what your workflow is. It should just fit in.

Speaker 1

我知道3.5 HYCU发布时在Ader上排名第四。你们是否设想过Cloud Code可以有这样的场景:比如通过提交钩子使用Haiku持续进行代码检查之类的工作,然后配合3.7估算器?

I know that 3.5 HYCU was the number four model on Ader when it came out. Do you see Cloud Code have a world in which you have like a commit hook that uses maybe Haiku to do something like like the linter stuff and things like that continuously, and then you have 3.7 estimator?

Speaker 3

是的。如果你想要的话确实可以这样做。你是指通过预提交钩子或者GitHub Action之类的方式吗?

Yeah. You could actually do this if you want. So you're you're saying like through like a pre commit hook or like a GitHub action or

Speaker 1

对对对。就像运行Clog code那样,就像你之前演示的Lint示例那样。是的。

Yeah. Yeah. Yeah. Say well, kinda like run Clog code, like the Lint example that you had. Yeah.

Speaker 1

我想在每次本地提交时运行它,比如在提交到PR之前。

I wanna run it at each commit locally, like, before it goes to the PR.

Speaker 3

是的。所以如果你愿意,今天就可以这么做。如果你在使用Husky或者任何预提交钩子系统,或者直接用git预提交钩子,只需添加一行quad dash p,然后加上你的指令,这样每次都会运行。

Yeah. So you could do this today if you want. So in the you know, if you're using, like, Husky or, like, whatever pre commit hook system you're using, or just, like, git pre commit hooks, just add a line quad dash p and then, you know, whatever instruction you have and that'll run every time.

Speaker 1

不错。你只需指定Hypo。其实没什么区别,对吧?可能效果会稍差一些,但我还是可以支持它?

Nice. And you just specify Hypo. There's really no difference. Right? It's like maybe it'll work a little worse, but I could still support it?

Speaker 3

是的。如果你想,可以覆盖模型。通常我们使用Sonnet,默认用Sonnet处理大多数事情,因为我们发现它的表现更优。没错。

Yeah. You can override the model if you want. Generally, we use Sonnet. We default to Sonnet for most everything just because we find that it outperforms. Yep.

Speaker 3

不过,是的,如果你想,可以覆盖模型。

But, yeah, you can override the model if you want.

Speaker 1

是啊。但我没那么多钱在提交钩子上运行。

Yeah. I don't have that much money to run commit hook on through.

Speaker 2

顺便提一下预提交钩子,我曾在一些地方工作,他们坚持使用预提交钩子;也曾在另一些地方,他们坚决不用,因为觉得会妨碍提交和快速推进。我有点好奇,你们有什么立场或建议吗?

Just as as a side note on pre commit hooks, I have worked in places where they insisted on having pre commit hooks. I've worked in places where they insisted they'll never do pre commit hooks because they get in the way of committing and moving quickly. I'm just kinda curious, like, do you have a stance or recommendation?

Speaker 3

天啊。这就像问用制表符还是空格一样,对吧?

Oh god. That's like asking about tabs versus spaces, would it?

Speaker 2

有点类似。但我觉得,在某些方面,如果测试失败,用quad代码修复测试会更简单;另一方面,在每个点都运行这个成本更高。所以,这其中有权衡。

A little bit. But like, you know, I I think it is easier in some ways to like, if you have a breaking test, go fix the test with quad code. In other ways, it's more expensive to run this at every point. So, like, there's trade offs.

Speaker 3

对我来说,最大的权衡是希望预提交钩子运行得足够快,这样无论是人还是quad,都不必等上一分钟。

I think for me, the biggest trade off is you want the pre commit hook to run pretty quickly so that if you're either if you're a human or if you're a quad, you don't have to wait like a minute

Speaker 2

为了所有运行的东西。快速的

for all the things run. The fast

Speaker 3

是的。所以一般来说,你知道,我们代码库的预提交应该运行

Yeah. So generally, you know, pre commit for our code base should run

Speaker 2

只是类型检查。

Just types.

Speaker 3

是的。大概不到五秒左右,就是类型检查和Lint之类的。然后更耗时的东西你可以放在GitHub Action或GitLab或你用的任何工具里。

Yeah. It's like less than, you know, five seconds or so, like just types and Lint maybe. And then more expensive stuff you can put in the GitHub Action or GitLab or whatever you're using.

Speaker 2

同意。我不知道。我喜欢提出明确的建议,这样人们可以采纳并说,这家伙说了,我们应该在团队里这么做。而且,这可以作为决策的基础。

Agreed. I don't know. I like putting prescriptive recommendations out there so that people can take this and go like, this guy said it, we should do it in our team. And, like, that's that's a basis for decisions.

Speaker 3

是的。是的。是的。

Yeah. Yeah. Yeah.

Speaker 2

酷。还有其他技术故事要讲吗?你知道,本来想更宏观地谈谈产品相关的东西,但你可以尽情深入技术细节。

Cool. Any other technical stories to tell? You know, wanted to zoom out into more product y stuff, but, you know, you can get to as technical as you want.

Speaker 3

我不知道。有个可能有趣的小故事是,在代码发布的前一晚,我们正在处理最后几个问题,团队熬夜到很晚。有个问题困扰我一段时间了,我们用的那个markdown渲染器。就像今天Quad里的markdown渲染很美,在终端里渲染得非常漂亮,能很好地处理加粗、标题、间距等。但我们试了很多现成的库。

I don't know. Like, one anecdote that might be interesting is the night before the code launch, we were going through to burn down the last few issues and the team was up pretty late trying to do this. And one thing that was bugging me for a while is we had this markdown rendering that we were using. It was just you know, it's like the markdown rendering in Quad today is beautiful And it's just like really nice rendering in the terminal and it does bold and headings and spacing and stuff very nicely. But we tried a bunch of these off the shelf libraries to do it.

Speaker 3

我想我们试了两三个甚至四个不同的库。但没一个完美的。有时段落和列表之间的间距有点不对。嗯。或者有时文本换行不太正确。

And I think we tried like two or three or four different libraries. Just nothing was quite perfect. Sometimes the spacing was a little bit off between a paragraph and like a list. Mhmm. Or sometimes the text wrapping wasn't quite correct.

Speaker 3

或者有时颜色不完美。每个库都有这些问题,而这些markdown渲染器都很受欢迎,在GitHub上有几千星,维护了很多年,但它们并不是为终端设计的。所以在发布前一晚10点左右,我说,好吧,我来搞定。我就让Quad给我写一个markdown解析器,它真的写了。零样本。

Or sometimes the colors weren't perfect. So each one had all these issues and all these markdown renderers are very popular and they have thousands of stars on GitHub and have been maintained for many years, but they're not really built for a terminal. And so the night before the release at like 10PM, I'm like, all right, I'm gonna do this. So I just asked Quad to write a markdown parser for me and they wrote it. Zero shot.

展开剩余字幕(还有 76 条)
Speaker 3

是的。虽然不是完全零样本,但在经过一两次提示后,他们就搞定了。这就是Eco中现在使用的Markdown解析器,也是为什么Markdown看起来如此美观的原因。

Yeah. It wasn't quite zero shot, but after, you know, like maybe like one or two prompts, they got it. And, you know, that's the markdown parser that's in eco today and the reason that markdown looks so beautiful.

Speaker 2

这个例子很有趣。我想,现在实现功能的新标准变得很有意思。就像这个例子,通常你会使用某些现成的库,但出于各种原因对其不满意,现在你可以直接创建一个替代方案并投入使用。

That's a fun one. It's interesting what the new bar is, I guess, for implementing features. Like like this exact example where there's libraries out there that you normally reach for that you find, you know, some dissatisfaction with for literally whatever reason, you could just spin up an alternative and go off of that.

Speaker 3

是的。我觉得AI在去年改变了很多事情。但很多这类问题,就像我们之前的例子,以前你可能不会自己构建的功能或会使用库,现在你可以自己动手了。编写代码的成本在下降,生产力在提升。只是我们还没有完全消化这真正意味着什么。但我预计会有更多人开始这样做,比如编写自己的库或者直接发布每个功能。

Yeah. I feel like AI has changed so much and literally in the last year, but a lot of these problems are, you know, like the example we had before, a feature you might not have built before or you might have used a library, now you can just do it yourself. The cost of writing code is going down and productivity is going up. Just have not internalized what that really means yet. But yeah, I expect that a lot more people are going to start doing things like this, like writing your own libraries or just shipping every feature.

Speaker 1

宏观来看,你们显然没有单独的Cloud Code订阅服务。我很好奇路线图是什么。这会长期处于研究预览阶段吗?还是会变成一个正式产品?我知道你们和很多CTO和副总裁谈过。

Just to zoom out, you obviously do not have a separate Cloud Code subscription. I'm curious what the road map is. Like, is this just gonna be a research preview for much longer? Are you gonna turn it into an actual product? I know you were talking to a lot of CTOs and VPs.

Speaker 1

会有Cloud Code企业版吗?愿景是什么?

Is there gonna be Cloud Code Enterprise? What's the vision?

Speaker 4

是的。我们有一个永久团队负责Cloud Code。团队正在扩大。我们对长期支持Cloud Code感到非常兴奋。所以,是的,我们计划会持续运营一段时间。

Yeah. So we have a permanent team on Cloud Code. We're growing the team. We're really excited to support Cloud Code in the long run. And so, yeah, we'll we plan to be around for a while.

Speaker 4

关于订阅制本身,我们讨论过这个问题。这在很大程度上取决于大多数用户是否更喜欢它而不是按量付费。到目前为止,按量付费让人们更容易开始体验产品,因为没有前期承诺。在一个人们更多使用脚本自动化Cloud Code的自主世界里,这也更合理。但我们也听到了关于价格可预测性的担忧,如果这要成为我的主要工具。

In terms of subscription itself, it's something that we've talked about. It depends a lot on whether or not most users would prefer that over pay as you go. So far, as you go has made it really easy for people to start experiencing the product because there's no upfront commitment. And it also makes a lot more sense with a more autonomous world in which people are scripting Cloud Code a lot more. But we also hear the concern around, hey, I want more price predictability if this is gonna be my go to tool.

Speaker 4

所以我们仍然处于摸索阶段。对于企业来说,鉴于Cloud Code很像是个体贡献者(IC)的生产力倍增器,并且大多数IC可以直接采用它,我们一直在支持企业解决关于安全性和生产力监控的问题。是的,我们发现很多人看到公告后想了解更多,所以我们一直在进行这些交流。

So we're very much still in the stages of figuring that out. I think for enterprises, given that Cloud Code is very much like a productivity multiplier for ICs and most ICs can adopt it directly, we've been just supporting enterprises as they have questions around security and productivity monitoring. And so, yeah, we we've found that a lot of folks see the announcement and they wanna learn more and so we've been just engaging in those.

Speaker 2

你们有可靠的生产力提升数据吗?比如,和你谈过的认可这个话题的人,我们是说30%吗?有些数字会更有助于证明其价值。

Do you have a credible number for the productivity improvement? Like, for people who nodded in topic that you've talked to, like, you know, are we talking, you know, 30%? Some number would help justify things.

Speaker 3

我们正在努力获取这个数据。是的,我们应该...这是我们正在积极进行的工作。但就我个人经验而言,它大概让我的生产力翻了一番。

We're working on getting this. Yeah. We should yeah. But it's something we're actively working on. But anecdotally for me, it's probably two x my productivity.

Speaker 2

天哪。

Oh my god.

Speaker 3

所以我就觉得,我是个工程师,每天从早到晚都在写代码。是的。对我来说,效率大概是两倍吧。是的。我觉得Anthropic有些工程师可能提升了10倍的生产力。

So I'm just like, I'm an engineer that codes all day every day. Yeah. For me, it's probably two x. Yeah. I think there's some engineers at Anthropic where it's probably 10 x their productivity.

Speaker 3

然后还有些人还没真正搞懂怎么使用它,你知道,他们可能就用它来生成一些提交信息之类的。那大概只有10%的提升。所以我认为可能范围很大,我们需要更多研究。

Then And there's some people that haven't really figured out how to use it yet and, you know, they just use it to generate like commit messages or something. That's maybe like 10%. So I think I think there's probably a big range and I think we need to study more.

Speaker 4

举个例子,有时候我们一起开会,销售或合规部门的人会说,嘿,我们真的需要某个功能。然后Boris会问几个问题来理解需求规格。大约十分钟后,他就说,好了,功能已经做好了,我稍后会合并。

For reference, sometimes we're in meetings together and sales or compliance or someone is like, hey, like, we really need like x feature. And then Boris will ask a few questions to like understand the specs. And then like ten minutes later, he's like, alright. Well, it's built. I'm gonna merge it later.

Speaker 4

还有其他事吗?所以这感觉和我之前担任的任何其他产品经理角色都截然不同。

Anything else? So it's how it feels definitely far different than any other PM role I've had.

Speaker 1

你是否预见到自己会开放非技术人员与ClockCode对话的渠道,然后实例会来找你,他们准备好找到它并与之交谈、解释需求,而你则负责日历审核和实现方面的工作?

Do you see yourself opening that channel of the nontechnical people talking to ClockCode and then the instance coming to you, which like, they're ready to find and talk to it and explain what they want, and then you're doing calendar review side and implementation.

Speaker 3

是的。我们实际上已经做了不少这样的事。比如我们团队的设计师Megan,她不是程序员,但她正在赢得拉取请求。她用代码来实现。

Yeah. We've actually done a fair bit of that. Like, Megan, the designer on our team, she is not a coder, but she's winning pull requests. She uses code to do it.

Speaker 2

她设计用户界面?

She designs the UI?

Speaker 4

是的。而且她正在向我们的控制台产品提交PR。所以这不仅仅是在Cloud Code上构建,而是在我们单代码库的整个产品套件中进行构建。

Yeah. And she's landing PRs to our console product. So it's not even just building on Cloud Code, it's building across our product suite in our mono repo.

Speaker 2

对。是的。

Right. Yeah.

Speaker 3

是的。同样地,你知道,我们的数据科学家使用Cloud Code,对吧?就像,你知道,比如BigQuery查询,前几天有个财务人员来找我,说,嘿,我一直在用Cloud Code。我当时就想,什么?你是怎么安装上的?

Yeah. And similarly, you know, our data scientists uses Cloud Code, right? Like, you know, like BigQuery queries and there was like some finance person that went up to me the other day and was like, hey, I've been using Cloud Code. I'm like, what? Like, how did you even get it installed?

Speaker 3

你想用Git吗?他们说,是啊。是啊。自己搞定了。没错,他们正在用。

You wanna use Git? And they're like, yeah. Yeah. Figured it out. And yeah, they're using it.

Speaker 3

他们说,QuadCode可以管道输入,因为它是个Unix工具。所以他们做的是把数据放进CSV,然后cat那个CSV文件,通过管道传给code,然后向它提问关于CSV的问题,他们一直这么用。

They're like, so QuadCode you can pipe in because it's a Unix utility. And so what they they do is they take their data, put it in a CSV, and then they take the they cat the CSV, pipe it into code, and then they ask it code questions about the CSV, and they they've been they've been using it for that.

Speaker 1

是啊。那对我来说会非常有用。因为我很多时候做的就是这样,有人给我一个功能请求,我差不多重新写一下提示词,放进代理模式,然后审查代码。能有PR等着我就太好了。

Yeah. That would be really useful to me. Because really what I do a lot of the times, like, somebody gets me a feature request, I kinda, like, rewrite the prompt. I put it in agent mode, and then I review the code. It would be great to have the PR wait for me.

Speaker 1

我在第一步基本没什么用。就像,你知道,接收功能请求并提示代理去写代码,我其实没做什么。我的工作真正开始是在第一次运行完成后。所以,我

I'm kinda useless in the first step. Like, you know, taking the feature request and prompting the agent to write it, I'm not really doing anything. Like, my work really starts after the first run is done. So And I

Speaker 2

我正想说,我能看到两方面。所以,好吧。也许我可以简化成这样:在非技术人员参与的工作流程中,技术人员应该在开始时介入,还是在结束时介入?对吧?或者在结束时,开始结束时。

was gonna say, like, I can see it both ways. So, like, okay. So maybe I'll simplify this to in the in the workflow of non technical people in loop, should the technical person come in at the start or come in at the end? Right? Or come in at the end, end of start.

Speaker 2

显然,那是最高杠杆的事情。因为,有时候你就是需要技术人员问出非技术人员不知道要问的正确问题,这真的会影响实现。

Obviously, that's the highest leverage thing. Because, like, sometimes you just need the technical person to ask the right question that the non technical person wouldn't know to ask. And that really affects the implementation.

Speaker 1

但这不是模型更好的教训吗?模型也会擅长追问后续问题?就像,你知道,如果你告诉模型,嘿。

But isn't that the better lesson of the model that the model will also be good at asking the follow-up question? Like, you know, if you're, like, telling the model, hey.

Speaker 2

那正是你最不信任模型去做的事情。对吧?是啊。抱歉,你继续。

That's what you trust the model to do the least. Right? Yeah. Sorry. Go ahead.

Speaker 2

是啊。

Yeah.

Speaker 1

是的。不。如果你是,比如说,那个模型,嘿。你就是那个需要翻译这个非技术人员请求的人

Yeah. No. If you're, like, the model, hey. You are the person that needs to translate this nontechnical person request

Speaker 2

是的。

Yeah.

Speaker 1

是的。转换成最适合 Cloud Code 的提示

Yeah. Into the best prompt for Cloud Code

Speaker 2

是的。

Yeah.

Speaker 1

来进行首次实现。没错。就像,我不知道模型今天会有多好。我没有评估标准,但这对我来说似乎是个有希望的方向。就像,对我来说,审查10个PR比处理10个请求,然后运行代理10次,再等待所有这些运行完成并审查要容易得多。

To do a first implementation. Yep. Like, I don't know how good the model will be today. I don't have an eval for that, but that seems like a promising direction for me. Like, it's easier for me to review 10 PRs than it is for me to take 10 requests, then run the agent 10 times, and then wait for all of those runs to be done and review.

Speaker 3

我认为现实情况介于两者之间。我们花了很多时间观察用户,观看不同资历和技术深度的人使用代码。我们发现的一件事是,那些非常擅长从任何上下文提示模型的人,也许他们甚至不是技术人员,但他们就是非常擅长提示,他们在使用代码方面非常有效。而如果你不擅长提示,那么代码往往会偏离轨道,做出错误的事情。所以我认为在模型发展的现阶段,花时间学习如何提示模型绝对是值得的。

I think the the reality is somewhere in between. We spend a lot of time shadowing users and watching people at different levels of seniority and technical depth use code. And one thing we find is that people that are really good at prompting models from whatever context, maybe they're not even technical, but they're just really good at prompting, they're really effective at using code. And if you're not very good at prompting, then code tends to go off the rails more and do the wrong thing. So I think in this stage of where models are at today, it's definitely worth taking the time to learn how to prompt model as well.

Speaker 3

但我也同意,也许在一两个月或三个月后,你就不再需要这个了,因为,你知道,苦涩的教训总是会赢。

But I also agree that, you know, maybe in a month or two months or three months, you won't you won't need this anymore because, you know, the bitter lesson always wins.

Speaker 1

请。请做吧。请在交通中做吧。

Please. Please do it. Please do it in traffic.

Speaker 2

我认为人们对分叉或自定义 Cloud Code 有广泛的兴趣。所以我们必须问,为什么它不是开源的?

I think there's a there's a broad interest in people forking or customizing Cloud Code. So we have to ask, why is it not open source?

Speaker 3

我们正在调查。啊。好的。

We are investigating. Ah. Okay.

Speaker 2

所以还不是时候。

So it's not yet.

Speaker 3

这其中涉及很多权衡。一方面,我们的团队规模很小,如果开源的话,我们非常期待开源贡献。但维护所有内容并跟进需要大量工作。我维护了很多开源项目,团队里很多其他人也是。这真的非常耗时耗力。

There's a lot of trade offs that that go into it. On one side, our team is really small and we're really excited for open source contributions if it was open source. But it's a lot of work to kind of maintain everything and look it up. I maintain a lot of open source stuff and a lot of other people on the team do too. And it's just a lot of work.

Speaker 3

就像,管理贡献和所有这些事务本身就是一份全职工作。

Like, it's a full time job managing contributions and all this stuff.

Speaker 2

是的。我只想指出,你可以采用源码可用(source available)的方式,这样无需经历完全开源的法律障碍,就能解决很多个人使用场景。

Yeah. I'll just point out that you can do source available and that's solves a lot of individual use cases without going through the legal hurdles of a full open source.

Speaker 3

没错,正是如此。我的意思是,源码里没什么真正秘密的东西,而且显然都是JavaScript,你直接反编译就行了。

Yeah, exactly. I mean, I would say like there's nothing that secret in the source and obviously it's all JavaScript, you can just decompile it.

Speaker 2

编译产物就在那儿,是的。这非常...

Compilation's out there, Yeah. It's very

Speaker 3

总的来说,我们的方法是,你知道,所有秘方都在模型里,而这只是模型之上最薄的一层包装。我们 literally 没法做得更精简了。这是最精简的东西。是的。所以里面真的没多少东西。

And generally, approach is, you know, all the secret sauce, it's all in the model and this is the thinnest possible wrapper over the model. We literally could not build anything more minimal. This is the most minimal thing. Yeah. So there's just not that much in it.

Speaker 2

如果有一个并非最简单的、你感兴趣的其他架构,你会选择什么作为替代方案?你知道,我们这里讨论的是智能体架构。对吧?比如,这里有一个循环,它遍历并通过一种相对直观的方式调用模型和工具。如果你要从头重写,并选择那条世代相传更艰难的道路,那会是什么样子?

If there was another architecture that you would be interested in that is not the simplest, what would you have picked as an alternative? You know, like and we're just talking about agentic architectures here. Right? Like, there's a there's a loop here and it goes through and and you you sort of pull in the models and tools in a relatively intuitive way. If you were to rewrite it from scratch and like choose the generationally harder path, like what would that be look like?

Speaker 4

嗯,Boris已经重写过这个了。Boris和团队已经重写了大概五次了。

Well, Boris has rewritten this. Boris and the team have rewritten this like five times.

Speaker 2

哦,这倒是个故事。

Oh, that's a story.

Speaker 3

是的。比如,Quad Quad Coating?

Yeah. Like, Quad Quad Coating?

Speaker 4

我认为按设计来说,这是最简单的事情了。

Much the simplest thing, I think, by design.

Speaker 2

好的。所以它变得更简单了。它变得更简单了。实际上它变得更复杂了。

Okay. So it just got simpler. It got simpler. It really got more complex.

Speaker 3

我们是从头开始写的,是的,大概每三四周就要重写一次。这就像忒修斯之船,对吧?每个部件都在不断被替换,只是因为Quad非常擅长编写自己的代码。

We've written it from scratch, yeah, probably every three weeks, four weeks or something. And it just like all the it's like a ship of Theseus, right? Like, every piece keeps keeps getting swapped out and just because Quad is so good at writing its own code.

Speaker 2

是的。我的意思是,最终,真正会发生变化的是接口。对吧。Quad、FCP等等这些。所有这些基本上都需要保持不变,除非你有充分的理由去改变它。

Yeah. I mean, at the end of the thing, the thing that's breaking changes is the interface. Right. The Quad, FCP, blah blah blah. All all that has to kinda stay the same unless you really have a strong reason to change it.

Speaker 4

是的。我认为大部分改动都是为了简化,比如在不同组件间共享接口,因为我们最终只想确保给模型的上下文是最纯粹的形式,并且框架不会干扰用户的意图。所以很多工作其实就是移除那些可能碍事或让模型困惑的东西。

Yeah. I think most of the changes are to make things more simple, like to share interfaces across different components because ultimately, we just wanna make sure that the context that's given to the model is in the purest form and that the harness doesn't intervene with the user's intent. And so very much a lot of that is just removing things that could get in the way or that could confuse the model.

Speaker 3

是的。在用户体验方面,有些事相当棘手,这也是为什么我们让设计师来负责终端应用——设计终端界面其实非常困难。这方面的文献资料不多。我做产品有一段时间了,所以我知道如何为应用、网页以及面向工程师的开发体验工具进行构建。但终端算是比较新的领域。

Yeah. On the UX side, something that's been pretty tricky and the reason that we have a designer working on a terminal app is it's actually really hard to design for a terminal. There's not a lot of literature on this. I've been doing product for a while, so I kind of know how to build for apps and for web and for engineers in terms of tools that have DevEx. But terminal is sort of new.

Speaker 3

有很多非常古老的终端用户界面使用curses之类的库来构建复杂的UI系统,但按今天的UI标准来看,这些都感觉非常过时。因此,我们花了很多功夫来探索如何让应用在终端中显得新颖、现代且直观。是的,我们不得不自己摸索出很多设计语言。

There's a lot of these really old terminal UIs that use curses and things like this for very sophisticated UI systems, but these are all they all feel really antiquated by the UI standards of today. And so it's taken a lot of work to figure out how exactly do you make the app feel fresh and modern and intuitive in a terminal. Yep. And we've had to come up with a lot of that design language ourselves.

Speaker 2

没错。我相信这会随着时间不断发展。好的。最后一个问题。这是不是更通用化了?

Yep. I mean, I'm sure you'll be developing over time. Cool. Closing question. Is it just more general?

Speaker 2

比如,我觉得很多人都在想,Topic可以说拥有AI工程领域最好的品牌形象,特别是在开发者和编码模型方面。现在加上编码工具,它就拥有了模型、工具和协议的全套产品组合。对吧?而一年前的今天,这一点并不明显。就像Cloud三发布时,它更像是一个通用模型之类的。

Like, I think a lot of people are wondering and Topic has, I think it's easy to say, the best brand for AI engineering, like, you know, developers and and coding models. And now with, like, the coding tool attached to it, it just has the whole product suite of model and tool and protocol. Right? So and then I don't think this is obvious one year ago today. Like when Cloud three launched, it was just it was just more like this is general purpose models and all that.

Speaker 2

但就像CloudSonant确实成为了编程工具的首选,我认为这建立了Anthropic的品牌,而你们现在正在扩展。那么为什么Anthropic在开发者中如此成功?似乎每次我和Anthropic的人交谈,他们都说,哦,是的,我们只是有了这个想法,推动它,然后它就成功了。我就在想,这里没有集中的策略,或者,你知道,有没有一个总体的 overarching 策略?

But like CloudSonant really took the scene as like the sort of coding tool of choice and I think built Anthropic's brand and you guys are now extending. So why is Anthropic doing so well with developers? Like, it seems like there's just no centralized every time I talk to Anthropic people, they're like, oh, yeah. We just had this idea and we pushed it and it did well. And I'm just like, there's no centralized strategy here or like, you know, is there an overall overarching strategy?

Speaker 3

听起来像是个产品经理的问题。

Sounds like a PM question to me.

Speaker 4

我不知道。

I don't know.

Speaker 2

我会说,Dario并不像在紧盯着你们的脖子说,去构建最好的开发工具。他只是,你知道,让你们做自己的事情。

I would say like Dario is not like breathing down your necks going like build the best dev tools. Like, he's just, you know, letting you do your thing.

Speaker 3

是的。每个人都只想构建很棒的东西。

Yeah. Everyone just wants to build awesome stuff.

Speaker 4

就像我感觉模型本身就很想写代码。是的,我认为很多这都源于模型本身在代码生成方面非常出色。我们很大程度上是建立在一个不可思议的模型基础上的。那是quad code能够实现的唯一原因。

It's like I feel like the model just wants to write code. Yeah. I I think a lot of this trickles down from like the model itself being very good at code generation. Like, we're very much building off the backs of an incredible model. Like, that's the only reason why quad code is possible.

Speaker 4

我认为为什么模型本身擅长代码有很多原因,但我觉得,一个高层次的解释是,世界上这么多东西都是通过软件运行的。而且对优秀软件工程师的需求巨大。这也是一个几乎只需要一台笔记本电脑或开发机或一些硬件就能完成的事情。所以这只是一个非常适合LLMs的环境。在这个领域,我们觉得通过做得很好,可以释放很多经济价值。

I think there's a lot of answers to why the model itself is good at code, but I think, like, one high level thing would be so much of the world is run via software. And there's, like, immense demand for great software engineers. And it's also something that, like, you can do almost entirely with just a laptop or, like, just a dev box or, like, some hardware. And so it it just, like, is an environment that's very suitable for LLMs. It's an area where we feel like you can unlock a lot of economic value by being very good at it.

Speaker 4

那里有非常直接的投资回报率。我们也非常关心其他领域,但我认为这只是模型往往表现相当出色的一个领域,团队也非常兴奋能在其上构建产品。

There's, like, a very direct ROI there. We do care a lot about other areas too, but I think this is just one in which the models tend to be quite good and the team's really excited to build products on top of it.

Speaker 1

而且你提到你们在扩大团队?你们想招聘吗?

And you're growing the team you mentioned? Do you wanna hire?

Speaker 3

是的。我们是在招聘。

Yeah. We are.

Speaker 1

什么样的人适合加入你们的团队?

Who's like a good fit for your team?

Speaker 3

我们没有特定的标准。如果你对编程和这个领域充满热情,如果你有兴趣了解模型如何运作、终端如何工作以及所有相关技术,那就联系我们吧。我们随时乐意交流。

We don't have a particular profile. So if you feel really passionate about coding and about the space, if you're interested in learning how models work and how terminals work and how all these technologies that are involved, yeah, hit us up. Always happy to chat.

Speaker 1

太棒了。感谢你的到来,这次交流很愉快。

Awesome. Well, thank you for coming on. This was fun.

Speaker 4

你。

You.

Speaker 3

谢谢邀请我们。

Thanks for having us.

Speaker 4

这很有趣。

This was fun.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客