Software Engineering Daily - LangChain与Erick Friis探讨的代理式AI工程 封面

LangChain与Erick Friis探讨的代理式AI工程

LangChain and Agentic AI Engineering with Erick Friis

本集简介

LangChain 是一个流行的开源框架,用于构建将大型语言模型(LLM)与外部数据源(如API、数据库或自定义知识库)集成的应用程序。它常用于聊天机器人、问答系统和工作流自动化。其灵活性和可扩展性使其成为创建复杂AI驱动软件的事实标准。 Erick Friis 是 LangChain 的创始工程师,负责领导集成与开源工作。Erick 做客本期播客,探讨了 LangChain 的创作灵感、代理流与链式流的对比、代理式AI设计的新兴模式等话题。 Sean 曾担任学者、初创公司创始人和谷歌员工,发表过涵盖从AI到量子计算等广泛主题的著作。目前,Sean 是 Confluent 的AI驻场企业家,专注于AI战略和思想领导力。您可以通过 LinkedIn 联系 Sean。 点击此处查看本集文字稿。 赞助咨询:sponsor@softwareengineeringdaily.com 《LangChain与代理式AI工程——对话Erick Friis》一文首发于Software Engineering Daily。

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

LangChain是一个流行的开源框架,用于构建将大语言模型与外部数据源(如API、数据库或自定义知识库)集成的应用程序。

Langchain is a popular open source framework to build applications that integrate LLMs with external data sources, like APIs, databases, or custom knowledge bases.

Speaker 0

它常用于聊天机器人、问答系统和工作流自动化。

It's commonly used for chatbots, question answering systems, and workflow automation.

Speaker 0

其灵活性和可扩展性使其成为创建复杂人工智能驱动软件的事实标准。

Its flexibility and extensibility have made it something of a standard for creating sophisticated AI driven software.

Speaker 0

Eric Fries是LangChain的创始工程师,负责领导集成和开源工作。

Eric Fries is a founding engineer at LangChain, and he leads their integrations and open source efforts.

Speaker 0

Eric参加本期播客,将讨论LangChain的创作灵感、Agent流程与Chain流程的对比、Agent式AI设计的新兴模式等内容。

Eric joins the podcast to talk about what inspired the creation of LangChain, AgenTic flows versus chain flows, emerging patterns of AgenTic AI design, and much more.

Speaker 0

本期节目由Sean Falconer主持。

This episode is hosted by Sean Falconer.

Speaker 0

查看节目说明获取关于Sean工作及联系方式的更多信息。

Check the show notes for more information on Sean's work and where to find him.

Speaker 1

Eric,欢迎来到节目。

Eric, welcome to the show.

Speaker 2

嘿,John。

Hey, John.

Speaker 1

很高兴你能来。

Glad to have you here.

Speaker 1

我经常使用LangChain,所以很期待深入讨论这个话题。

I've used line chain a bunch, so I'm excited to get into it and talk about it.

Speaker 2

是啊,我也是。

Yeah, me too.

Speaker 1

作为创始团队成员,能否回顾一下最初是什么激励你们创建LangChain?

So you were there from the beginning, so I just want to kind of go back to that point from what you can recall, like what sort of for you and other people that were kinda part of that initial founding team, like, inspired you to create lane chain?

Speaker 1

你们是如何识别出需要这样一个串联大语言模型的框架抽象层的?

Like, how did you sort of identify this need for a framework that chain together LMs created this abstraction layer?

Speaker 2

是的。

Yeah.

Speaker 2

毫无疑问。

Definitely.

Speaker 2

这里有个重要的背景是,开源项目的启动(哈里森在2022年10月发起)与围绕它成立公司的时间(大约在2023年)之间存在某种区分。

And some important context there is that there's kind of a distinction between the starting of the open source project, which Harrison did in October 2022, and then kind of when the company formed around it, which was kind of the 2023.

Speaker 2

所以这个开源项目出现得正是时候,恰好在ChatGPT真正发布之前,那时大家刚开始用这些大语言模型进行开发。

So the open source projects kinda came out right place, right time, right before ChatGPT really launched, everyone kinda started building with these LLMs.

Speaker 2

这完全是哈里森的创意结晶。

And that was really Harrison's brainchild.

Speaker 2

他当时主要在用GPT-3工作——不知道你是否接触过那些文本补全模型(而非聊天补全模型),但那些模型用起来相当具有挑战性。

He was kind of working with GPT three, which I don't know if you interacted with some of the text completion models instead of the chat completion ones, but they were relatively challenging to work with.

Speaker 2

基本上是你输入一串文字,然后它就会继续输出后续文字。

It was kind of string in, and then it just kind of continued the string output.

Speaker 2

因此作为用户需要进行更多手动操作,早期人们使用Linkchain的主要原因之一就是输出解析的概念,再到如何利用当时的最新研究将其转化为某种代理循环。

And so there was a lot more manual steps you had to do as a user of those from kind of the concept of output parsing was kind of one of the big reasons that people use Linkchain in the early days all the way through, like, how do we turn this into some sort of agentic loop using some of the current research of the time.

Speaker 2

作为公司我们延续了这一方向,持续跟进最新模型以及人们想要使用的集成方案。

And so we've kind of continued that effort as the company where we're kinda keeping up with the latest models and kind of integrations that people wanna work with.

Speaker 2

这主要是我负责的领域,同时也在为原型设计和生产应用打造可用的最新研究成果实施方案。

That's primarily the area that I work in as well as making kind of usable implementations both for prototypes as well as production applications of kind of the latest and greatest research?

Speaker 2

比如:如何用这些新模型构建软件?

Like, how do you build software with these new models?

Speaker 1

回顾早期GPT-3时代,他们的API确实不太好用,但总体来说,我认为这些基础模型公司在直接使用API方面已经进步很多了。

So back in the early days of like, you know, GPT three, there was, I guess, like their APIs were not necessarily the easiest thing to use, but they have as a whole, I think like the sort of the model foundation model companies have gotten a lot better in using their APIs directly.

Speaker 1

那么这种情况如何改变chains的聚焦点?在基础API之外,他们还能带来哪些附加价值?

Like, how does that kind of change my chains focus in terms of like the value add that they're bringing above and beyond just having essentially like a cleaner API into some of this stuff?

Speaker 2

完全正确。

Totally.

Speaker 2

是的。

Yeah.

Speaker 2

让我们大致比较一下2022年10月与现在的情况——当时的技术格局是大家主要使用GPT-3。

So let's kinda compare October 2022 to now where the landscape in October 2022 was you kinda use GPT three.

Speaker 2

或许你也用过早期的云端文本补全模型。

Maybe you used one of the early cloud text completion models as well.

Speaker 2

我认为最初的LangChain库中有三个集成模块,就是那些大语言模型。

And so there is, I think, three integrations in that original langchain library, which were those LLMs.

Speaker 2

那时用户需要手动处理所有消息格式化工作,手动将输出解析成消息。而这些聊天模型的简化之处在于:本质上仍是文本补全模型,但它们经过了特定对话格式的训练——比如人类消息、AI消息交替出现的模式。

And at that point, you had to, as a user, handle all of your kind of message formatting, all of your output parsing into messages manually, where the kind of simplification of what these chat models are doing is it's still just a text completion model, but they're trained on very specific formats of alternating, like, human message, AI message, human message, AI message.

Speaker 2

这使得API提供商能够以比我们单纯观察模型输出更严格的方式,确保下一条消息必然是AI生成的。

And so that allows the API providers to actually, like, guarantee that the next message is gonna be an AI message in a more strict way than we can do as kind of just observers of the output of the model.

Speaker 2

由于它们承担了更多这类工作,我们的关注点自然就更多转向了技术栈的上层,这么说应该能理解吧?

And so with them handling more of that, our focus becomes a lot kind of further up the stack, if that makes sense.

Speaker 2

我们当前的主要焦点其实是放在编排图(laying graph)上。

So our main focus right now is really on laying graph.

Speaker 2

如何将这些智能体作为状态机来编排?大语言模型虽然非常强大,但仅靠简单的React循环(即让LLM访问所有指定工具并执行后反馈结果)还不足以构建功能性软件。

How do you orchestrate these kinds of agents as state machines where the LLMs are clearly very powerful, but they're not quite powerful enough yet to build functional software with just that simple React loop, where React loop is just given LLM access to all the tools that you wanna give it access to, go and execute those tools, and then pipe that output back into the model.

Speaker 2

这就是最基本的React循环。

And so that's the simplest React loop.

Speaker 2

效果并不理想,因为随着提供给模型的工具数量增加,它开始会调用错误工具。

Doesn't work all that well because as you increase the number of tools that you provide to the model, it starts calling the wrong tool.

Speaker 2

有时调用时参数也不相关,诸如此类的问题。

Sometimes it doesn't call it with, like, relevant parameters to it and those kinds of things.

Speaker 2

所以我们重点攻关的方向是...

And so what we've kind of been doubling down on is, okay.

Speaker 2

根据模型特性(不同模型差异很大),通常让LLM访问五个左右的工具,或者设计某种流程使其在不同阶段能访问不同工具集。

Give your LLM access to and it depends a lot on the model, but give your LLM access to, like, five tools or have some sort of flow where at different steps, it might have access to different ones.

Speaker 2

我们以邮件助手为例来说明。

Let's use an email assistant as an example.

Speaker 2

你可以将邮件分类为招聘类入站邮件、来自招聘人员的邮件,或是候选人主动联系的邮件。

You might classify an email either as, like, recruiting inbound or, like, from a recruiter or, like, a candidate reaching out to you.

Speaker 2

在这两种不同场景下,你可能需要授予它访问不同工具的权限。

And in those two different instances, you might wanna give it access to different tools in terms of, okay.

Speaker 2

我可能什么都不想授权,只让它回复个'嗨'之类的。

I wanna give it access to nothing and just respond like, hey.

Speaker 2

比如根据背景公司或候选人的情况,起草'感兴趣'或'不感兴趣'的回复草稿。

Like, write a draft of interested or not interested depending on the background company or candidate coming in.

Speaker 2

你可能需要将其关联到招聘系统如Greenhouse或ATS,以记录他们曾给你发过邮件。

You might wanna attach that to something like your recruit like, Greenhouse or applicant tracking system in order to track that they emailed you.

Speaker 2

这样你就能在这种可视化流程图中,实际细分这类请求——就像我们在Minecraft Studio中展示的那样。

And so you can actually segment that request in this kind of, like, nice graph flow that we visualize in Minecraft Studio.

Speaker 1

这是否意味着需要某种观点作为框架,来指导人们如何整合这些智能体?

Is that really about sort of having like an opinion, I guess, as like a framework for how people need to stitch together these agents?

Speaker 1

与其犯下创建一个庞然大物式智能体的错误——让它能访问上千种工具,

So rather than someone kind of like, you know, making the mistake of creating sort of this almost like a, you know, monolith agent where it's gonna have access to a thousand tools.

Speaker 1

你的意思是:不要那样做。如果采用这种流程观点,它本质上会迫使你将系统拆分成更合理的模块化组件。

You're saying, like, don't do that if you use this opinion essentially in the flow of it will essentially force you to kind of break it up into modular components that make more sense.

Speaker 2

正是如此。

Precisely.

Speaker 2

这其实是langchain演进过程中有趣的部分——我们最初提出的智能体抽象是langchain智能体执行器,它主要实现了那个宽泛的React循环。

And that actually is kind of a fun part of langchains evolution where the first agent abstraction we came out with was the langchain agent executor, which really just implemented that broad React loop.

Speaker 2

需要说明的是,只要以正确方式设计工具访问权限——限制工具数量并提供优质的调用提示和描述,很多人使用它都取得了很大成功,这样智能体才能真正调用到这些工具。

And to be clear, lots of people are using that and being very successful with it as long as they're engineering the tools that it has access to in the right way where you kinda have to have a limited number of tools and you have to have a really good prompting and descriptions for how to call it such that the agents actually end up calling it.

Speaker 2

显然,当你这样做时,使用最新最强大的大模型会表现得更好。

And it obviously performs better with kind of the latest and greatest larger models when you do that.

Speaker 2

这些Landgraft流程确实能让你使用更小的模型,既节省成本,也适合在性能较弱的硬件上运行。

And some of these Landgraft flows really enable you to use smaller models as well, both for cost savings or maybe you wanna run on hardware that isn't as powerful.

Speaker 1

是的。

Yeah.

Speaker 1

那我们继续讨论智能体,不过在深入之前,能否先解释一下智能体模式与我们之前见过的固定流程架构(比如Rag等)究竟有何不同?

So let's stay on agents, but maybe before we go deeper on that topic, like, can you explain sort of, like, what is different about, like, agentic versus sort of what we've maybe seen previously of sort of these fixed flow architectures through things like Rag and so on?

Speaker 2

好的。

Yeah.

Speaker 2

在我看来,关键区别在于它是前馈式应用还是循环式应用。

The distinction in my mind is really whether it's a feed forward application or a cyclic application.

Speaker 2

因此我们将其区分为链式结构与智能体/图结构(如果用Landgraft构建的话)。

And so we kind of distinguish them as chains versus agents or graphs if if you're building them with Landgraft.

Speaker 2

而链式结构总会执行完毕。

And a chain always finishes.

Speaker 2

它只是按部就班地完成所有步骤。

Like, it always just kind of goes through the steps.

Speaker 2

以Ride场景为例,它会执行检索步骤,查找需要插入提示词的文件,将提示词传递给大语言模型,最终生成类似Perplexity那样的精美描述。

Maybe for the ride case, it does a retrieval step, looks up some documents that it wants to paste into your prompt, passes that prompt to the LLM, and generates some nice description that you might get out of a perplexity or something like that.

Speaker 2

在智能体版的RAG中,你不仅完成检索生成输出,甚至可能进行事实核查,比如:

In the agentic version of RAG, you really do that retrieval step, generate that output, and then you might even, like, fact check that output and say, hey.

Speaker 2

这个内容是否准确?

Is this factually accurate?

Speaker 2

或者执行其他步骤来过滤输出结果。

Or you might do some other steps that kind of filter out that output.

Speaker 2

如果不满意,你完全可以回到起点并说:

And if you don't like it, you can actually just bounce back to the beginning and say, like, hey.

Speaker 2

请根据编辑节点/子智能体提供的反馈重新生成。

Regenerate this based on this feedback that our editor node or editor sub agent told it to do.

Speaker 1

你如何从根本上避免陷入不断反思和规划的死循环,导致这个周期永远无法真正完成?

How do you get to know like, essentially avoid a situation where you are running endless loop of reflection and planning and then, like, this cycle never actually finishes?

Speaker 2

是的。

Yeah.

Speaker 2

有几种不同的策略。

There's a few different strategies.

Speaker 2

默认情况下,lang graph设有递归限制。

So by default, lang graph has a recursion limit.

Speaker 2

你可以把这想象成在Python或其他编程语言中递归调用函数次数过多时遇到的问题,最终会触及堆栈限制。

So you can really think of this as the same problem that you end up in when you, like, recursively call a function too many times in Python or in any language where it'll kind of hit that stack limit.

Speaker 2

这相当于图结构中的类似概念——它就是为了触及这个限制而设计的。我们可以通过某些方式处理这种情况,比如根据之前所有步骤生成的产物,在达到限制时优雅退出。

It's kind of the equivalent concept for a graph where it's really designed to hit that, and there's ways that you can kind of handle that case such that we can kinda gracefully exit when we hit that based on the artifacts that we've generated through all of those steps.

Speaker 2

但我们也看到很多人直接在图中通过状态跟踪来实现。

But we've also seen a lot of people implement kind of just tracking in the state of a graph.

Speaker 2

在线图模型的重要背景是:所有节点和边相互连接,但所有节点都遵循相同的模式运作。

Got some important background online graph is the model is this like, all these nodes and edges that you connect to each other, but all of the nodes operate on the same schema.

Speaker 2

因此我们称之为类型化字典——即图的状态。

And so we call that typed dictionary just the state of the graph.

Speaker 2

于是你可以设置一个状态字段,比如'答案验证次数'。

And so you can have a state field of, like, number of times I've fact checked the answer.

Speaker 2

初始值为零,每次验证后递增。

And it starts at zero, and you just increment it each time.

Speaker 2

这只是编写for循环的另一种方式。

It's just a different way of writing a for loop.

Speaker 2

当达到三次时,我们就说:

And when we hit three, we say, like, okay.

Speaker 2

好了,这个事实核查可以结束了。

Like, we're kinda done fact checking this.

Speaker 2

现在我们就回复用户说,嘿。

Now let's just respond to the user and say, like, hey.

Speaker 2

我不完全确定这是正确答案,但这是我们最终得出的结论。

I'm not completely sure if this is the right answer, but here's what we kind of ended up with.

Speaker 2

这样就限制了你的智能体生成答案的时间上限。

And so that caps the amount of time that your agent can be spent producing an answer.

Speaker 1

那么在行为模式方面,人们在使用'飞行图表盒子'时采用了哪些智能体模式?

And in in terms of, like, patterns of behavior, like, what are some of the agentic patterns that people are using in their support of the Box of Flying Graph?

Speaker 2

是的。

Yeah.

Speaker 2

好问题。

Great question.

Speaker 2

首先大多数人会从React智能体开始——虽然不该说所有人,但确实很多人会先尝试这个,因为它非常简单。

So the first one that everyone starts with well, I shouldn't say everyone, but many people start with is that React agent that I mentioned before because it's so simple.

Speaker 2

它只有两个节点。

It's just two nodes.

Speaker 2

一个是LLM调用节点,另一个是我们称为工具节点的组件,它只执行与工具相关的代码并生成可传回模型的输出。

One of them is the LLM calling node, and one of them is the kind of tool we call it tool node, but it just executes the code associated with the tool and produces the output that can be passed back to the model.

Speaker 2

这是初学构建智能体时最快获得成就感的方式,你能真正做出根据输入自动发邮件或Slack消息的东西。

That is kind of the quickest dopamine hit when you're kinda getting started building agents where you can really build something that goes and sends an email for you or sends a Slack message to you based on some input that came in.

Speaker 2

但我们很快发现人们开始加入'人在回路'类输出,比如每次调用发邮件模式时都想要先审核。

But we very quickly see people start adding human in the loop type outputs where, okay, whenever I call my send email mode, I really wanna review that first.

Speaker 2

所以在实际执行发送步骤前,我会像Minecraft里的中断概念那样向用户展示草稿,他们可以选择提供反馈进行修改,或直接点击发送。

So whenever before actually executing the send step, I'll, like, interrupt is kind of the concept in Minecraft and show the user the draft that I've written, and they can choose to, like, give some feedback that can be then edited and shown as a new draft or just hit send and send it away.

Speaker 2

所以React通常是人们的起点。

So React is kinda where people get started.

Speaker 2

'人在回路'是我们看到反复出现的模式,现在已成为Minecraft的一等公民概念。

Human in the loop is one of the patterns that we see recurring and is now a first party concept in Minecraft.

Speaker 2

人们经常构建的另一种方式是,我们现在有了全局状态存储的概念。

Other ones that people are building a lot with, we now have a concept of kind of a global state store.

Speaker 2

当你开始与智能体对话时,我们将其视为一个线程,类似于ChatGPT中的交替消息,但你可能需要某种能在多次交互中持续追踪的记忆机制。

So whenever you start a conversation with an agent, we consider that a thread kind of similar to how you would have alternating messages in ChatGPT, but you might want to have some sort of memory that's actually tracked between multiple interactions with your agent.

Speaker 2

这就是我们在检查点机制中引入全局状态存储的原因。

And so that's where we get into kind of that global state store in the Checkpointer.

Speaker 2

我们尝试过许多不同版本的内存方案,最终得出的结论是:少即是多。

We've experimented with a lot of different versions of memory, and what we've kind of come to is a realization that kinda less is more.

Speaker 2

对吧?

Right?

Speaker 2

就像能够设置和获取键值对,可能还附带按会话过滤的功能,或是关联特定线程的其他功能。

Like, just being able to set and get keys potentially with some added features around filtering by sessions or, like, other threads that are kind of associated with a certain one.

Speaker 2

这些是有用的抽象功能,但自动编辑和修剪对话历史之类的功能可能没那么实用。

Those are useful abstractions, but maybe automatically editing and, like, trimming conversation histories and things like that, maybe not as useful.

Speaker 1

确实。

Yeah.

Speaker 1

我认为另一个简单模式是基础反思机制。

I think another, like, simple pattern too is basic reflection.

Speaker 1

就像用户使用ChatGPT时,可以尝试这样的操作:先让AI写封涉及某些要点的邮件,然后复制输出内容粘贴回去,要求分析改进,这样反复几次就会得到更好的版本。

I mean, you can even like, if you just go to ChatGPT as a user, you can experiment with this where you say like, you know, write me an email that touches on these things and then copy that email output, paste it back in and say like, analyze this and improve it And you'll get a better version and do that a couple of times.

Speaker 1

这本质上就是自动化反思智能体的理念。

And essentially that is the idea of like automated reflection agent.

Speaker 1

只不过你现在是手动操作。

You're just doing it manually.

Speaker 1

完全正确。

Totally.

Speaker 1

那么在创建这些抽象层时,考虑到技术发展如此迅速,构建这类抽象会不会很困难?

And then in terms of like creating these abstractions, like given how fast everything is moving all the time, like, is it hard to create these types of abstractions?

Speaker 1

比如,在技术快速迭代的情况下(从2023年到现在接近2025年,而AI领域的两年相当于四十年),你该如何选择合适的抽象层级?

Like how do you choose sort of the right abstraction when things are you've been doing this since 2023, but now we're nearing 2025, but in the life of Gen AI, two years is really like forty years or something like that.

Speaker 1

而且技术发展实在太快了。

And so like things are moving so quickly.

Speaker 1

就是说,你怎样才能选对抽象层级,避免陷入需要不断做破坏性变更的境地——随着新知识新技术不断涌现?

Like, how can you kinda choose the right abstraction and not get into a place where you end up with, like, having to make, like, a ton of, you know, breaking changes as you, like, learn new things and new things come out?

Speaker 2

是啊。

Yeah.

Speaker 2

好问题。

Great question.

Speaker 2

这确实是我们持续面临的挑战,相信作为领英用户的你,以及收听这期播客的众多用户都有同感。

And this is a constant struggle for us as I'm sure you as a LinkedIn user have experienced as well as lots of the users listening to this podcast.

Speaker 2

我们经历了多次迭代——2022版的领英主要采用那种全封闭的'黑箱'链式结构,最简单的就像LLM链类,底层做了大量魔法操作,调试极其困难。

We have gone through a lot of iterations where kind of the 2022 version of LinkedIn was really about these kind of all encompassing opaque chains where you would create an the simplest one was, like, an LLM chain class, which actually did a lot of magic under the hood, and it was really difficult to debug.

Speaker 2

到2023年,我们重点转向lang chain表达式语言,将链条拆解为独立步骤,但很多步骤仍存在不透明性。

And then in 2023, we really focused on the lang chain expression language where you would kinda compose these chains as distinct steps, but a lot of the steps were still a little bit opaque.

Speaker 2

比如你必须预先知道JSON输出解析器会接收字符串并输出某种字典结构这类细节。

Like, you had to know that the JSON output parser would take a string in and output some sort of a dictionary and those kinds of things.

Speaker 2

而今年我们全面转向了图式架构。

And then this year, we've really gone towards laying graph.

Speaker 2

尽管我们仍兼容旧体系,但从用户视角看,每次变革都会带来割裂感。

And so each of those, even though we still support all the old things, from a user's perspective, every single time that changes, it can feel jarring.

Speaker 2

对吧?

Right?

Speaker 2

因为现在最显眼的快速入门指南,和我最初学习blank chain时接触的完全不一样了。

Because of the kind of front and center quick start is now something that I didn't learn when I first learned blank chain.

Speaker 2

我认为我们在版本公告和旧模型兼容方面做得不错——毕竟仍有大量用户在使用blank chain表达式语言。

And I think we've done a pretty good job of kind of announcing those and then still supporting the old models because, obviously, we have a lot of users operating on the blank chain expression language in particular.

Speaker 2

但我认为这个理念已经变得越来越精简,每个使用langchain的人要么是Python开发者,要么是JavaScript开发者。

But I think the philosophy has really just become more and more bare bones where everyone who comes to langchain is either a Python developer or a JavaScript developer.

Speaker 2

就我们维护的两个包而言,社区在Go和Kotlin方面也有一些自发性的开发工作。

As long as we're talking about the two packages that we maintain, there are some community driven efforts in kinda Go, Kotlin.

Speaker 2

还有其他一些语言支持。

There's there's a few other ones.

Speaker 2

但对于两个主要语言,大家都懂这两种语言。

But for the two main ones, everyone knows those two languages.

Speaker 2

所以我们能让人们写的纯Python代码越多越好,因为这是他们已理解的内容,而且基本上没有黑魔法。

And so the more just raw Python that we can let people write, the better because it's things that they already understand, and it's kinda no magic included there.

Speaker 2

所以在line graph中,所有东西本质上就是个Python函数。

And so with line graph, everything is really just a Python function.

Speaker 2

主要抽象概念和network x类似(如果你用过的话),比如创建图表,graph.add_node,然后通过边连接这两个节点,或用条件边连接这些节点,诸如此类的操作。

And the kind of main abstraction is the same as network x if you'd used that before where you're saying, like, create my graph, graph dot add node, graph dot add node, and then connect these two nodes to each other with an edge or connect these two nodes with a conditional edge, which does those kinds of things.

Speaker 2

当然还有很多附加功能,比如可以通过抛出特定错误来中断,或者使用检查点功能来存储状态或记忆。

And I'm sure there's lots of bells and whistles on the side that you can use for interrupts if you throw a particular kind of error or kind of these checkpoint or features where you're storing state or memory.

Speaker 2

但入门时,相比从未见过的lang chain表达式语言代码,看line graph代码会直观得多。

But in order to get started, like, seeing some line graph code makes a lot more sense than seeing some lang chain expression language code if you've never seen it before.

Speaker 2

不知道你用过没有,它有很多管道操作符。

I don't know if you've used it, but it's a lot of kind of pipe operators.

Speaker 2

看起来更像Bash而不是Python。

It looks a lot more like Bash than Python.

Speaker 2

这就是我们一直秉持的理念。

And so that has been really the philosophy.

Speaker 2

这在我看来是个重大转变。

That's kind of been the big change in my mind.

Speaker 1

好的。

Okay.

Speaker 1

在创建这些抽象概念时是否也很困难?比如,你如何看待不同模型会有不同的限制?

And is it hard as well to and when you're creating these abstractions, like, how do you think about, like, how different models are gonna have different limitations on them?

Speaker 1

如果我突然从GPT-4切换到其他版本的GPT或Quad之类的模型,上下文窗口的大小可能会受到影响。

And depending on if I, you know, switch the model suddenly from GPT four to a different version of GPT or quad or whatever, then the size of the context window could be impacted by that.

Speaker 1

也许其他类型的功能也会因此受到影响。

Maybe other types of features could be affected by that.

Speaker 2

确实如此。

Definitely.

Speaker 2

随着行业发展,实际上限制条件已经发生了很大变化。我刚加入Blanketain时,模型间的主要区别就是上下文窗口。

And as the industry has evolved, actually, the constraints have changed a lot, I would say, where when I first joined Blanketain, the main difference between models was the context window.

Speaker 2

对吧?

Right?

Speaker 2

具体数字我记不清了,但GPT-3.5 Turbo最初大概是4000个token的上下文窗口,后来可能提升到了16000。

You'd use I'm gonna forget the actual numbers, but I think, like, GPT three five turbo had, like, I think, a 4,000 token context window to start, and maybe it came up to 16,000 later.

Speaker 1

是的。

Yeah.

Speaker 1

但是后来

So But then

Speaker 2

很多小模型的窗口只有24个token,你几乎塞不进想发送的信息。

a lot of the smaller models were, like, a 24 tokens, and so you could, like, barely fit in the messages that you wanted to send to them.

Speaker 2

它们会突然报错终止,就像'好吧,你超出token限制了',有时甚至出现在输出生成到一半时。

And they terminated with these really jarring errors where it's like, okay.

Speaker 2

然后你就只能得到这些不完整的、没什么用的部分输出。

You exceeded the token window, sometimes in the middle of the output it was generating, and then you just get, like, these partial things that weren't that useful.

Speaker 2

而如今最主要的区别可能是工具调用功能。工具调用绝对是Landchain和Landgraf用户最看重的模型功能,它能很好地提供工具调用和结构化输出——你提供模式,LLM就能生成你需要的所有字段,这成为代码与LLM之间绝佳的接口。

And then nowadays, the main distinction I would call out is probably tool calling, where tool calling is is easily the most important feature that Landchain and Landgraf users are using out of the models where it's really useful for providing some sort of tool calling and structuring output where you provide a schema and the LLM generates all the fields that you kind of ask for, which is a really nice interface point between code and these LLMs.

Speaker 2

不同模型在这方面的表现差异很大。

And different models perform very differently.

Speaker 2

即便是同一款模型,我们稍后可以详细聊聊这个话题,但Meta开源的Lama系列模型,其工具调用性能在不同供应商间存在显著差异,这取决于它们如何解析这些工具调用——这恰好是我们当前正在研究的一个有趣现象,因为要在文档的供应商页面上明确指出这点确实有些困难。

And even with the same model, we can chat a little bit about this in a second, but, like, the open source Lama line of models for Meta, the tool calling performance is actually markedly different with different providers depending on how they've implemented kind of parsing of those tool calls, which is kind of another fascinating thing that we're kinda working with right now where it's kinda difficult to call that out on some of the provider pages in our documentation.

Speaker 2

但回到你最初关于我们如何管理这些问题的提问,第一步其实就是完善文档。

But to answer your original question about how do we kinda manage a lot of that, the first step is really just documentation.

Speaker 2

对吧?

Right?

Speaker 2

我们会在供应商页面添加备注,比如:注意

We add notes to the provider pages where it's like, hey.

Speaker 2

这个模型可能带有工具调用的勾选标记、叉号标记,甚至在某些情况下会有警告标志,提示:注意

This model has a check mark for tool calling or an x for tool calling or maybe even a warning sign in some cases where it's like, hey.

Speaker 2

这个模型声称支持工具调用,但实际上从未调用过工具。

This model says it has tool calling, but it never actually calls tools.

Speaker 2

而下一步就是围绕这些情况构建合理的抽象层。

And the next step is really building abstractions that make sense around those.

Speaker 2

因此现在我们的库中已经绑定了这些工具,并配有结构化输出方法——只要你调用它们,要么能正常工作,

And so now in the library, we have these bind tools and with structured output methods that if you just call them, they either work.

Speaker 2

对吧?

Right?

Speaker 2

要么会返回某种结构化输出,要么抛出未实现的错误提示:注意

They give you some sort of structured output or they throw a not implemented error of, like, hey.

Speaker 2

该供应商不提供工具调用功能,所以你也不能绑定工具。

This provider doesn't offer a tool calling and so you can't mind tools too.

Speaker 1

嗯。

Mhmm.

Speaker 1

某种程度上说——你可能年纪太小不记得了——这感觉就像互联网早期,当时HTML和JavaScript等标准尚未统一。

In some ways, like and and you're probably too young to remember this, but it feels like the early days of the web when, like it was a non standardization around like HTML and JavaScript and stuff like that.

Speaker 1

那时你不得不设置控制循环或控制语句,比如:如果是这个特定版本浏览器,那么它必须这样运行,或者我只能进行这种调用。

And you'd have to have these like control loops essentially or control statements around like, okay, if it's this specific browser of this specific version, essentially this is the way that it needs to behave or this is the call that I can make.

Speaker 1

这可能只是早期阶段的副产品。

And it's just probably a byproduct of early days.

Speaker 1

发展速度很快,每个人都急于将产品推向市场,这必然会导致各种标准不统一的情况。

Things are moving quickly and everybody's trying to push things out to production and there's going to be essentially a non standardization across all these things.

Speaker 1

当你引入开源模型时,不同的人会提供不同的服务,对某些功能(比如工具响应)的理解也可能存在差异,大家都会有自己的一套实现方式。

Then when you bring in the open source models they're gonna be served by different people, there's gonna be potentially different interpretations of how to, you know, respond to something like tools, for example, and people are gonna have their own takes on those.

Speaker 2

完全同意。

Totally.

Speaker 2

如果听众想深入了解背景知识,可以查看最初Anthropic集成的部分源代码——实际上AWS FedRAW至今仍保留着这种将所有内容都视为文本补全的集成版本。

And, if any of the listeners kinda wanna dig into the lore here, if you look at some of the source code for the original Anthropic integration or and, actually, still, like, AWS FedRAW has one version of their integration that's kind of like this, where everything is still a text completion.

Speaker 2

很多消息解析逻辑其实是在LinkedIn集成中完成的,想想我们曾经生活在这样一个魔幻世界:特别是在Bedrock中,居然要用if语句来判断。

So a lot of those that message parsing logic actually happens in the LinkedIn integration, which is kind of a crazy world that we lived in once upon a time where there was an if statement of, like in Bedrock in particular.

Speaker 2

对吧?

Right?

Speaker 2

就像这样:如果是Anthropic模型,就用这种方式解析。

It it's like, if it's an anthropic model, parse it in this way.

Speaker 2

如果是Coherent模型,就用另一种方式解析输出,因为实际输出的消息令牌是不同的——这显然是各家供应商为了快速抢占市场采取的权宜之计。

If it's a coherent model, parse the output in this way because the message tokens that are actually outputted are different, which is, yeah, obviously kind of a speed to market type thing that you see across all these different providers.

Speaker 1

关于用Landgraf实现智能体,你能详细说说具体流程吗?

In terms of implementing agents with Landgraf, can you kind of walk through, like, what is that process?

Speaker 1

比如说,我想开始使用Landgraf。

Like, you know, I wanna get started with Landgraf.

Speaker 1

我想构建一个基础智能体。

I wanna build a basic agent.

Speaker 1

需要做哪些准备?

Like, what do I need to do?

Speaker 2

没错。

Yeah.

Speaker 2

我将从最基础的部分开始介绍,我们准备了多种媒体形式来帮助入门,因为我们发现有些人更喜欢视频教程。

I'll actually start from the very beginning where we actually have a lot of different media formats to get people started because we've realized that some people like following video tutorials.

Speaker 2

有些人则偏好可以复制粘贴代码的书面文档,还有些人特别喜欢从现成的模板开始自行扩展。

Some people like following kind of written documentation where you can copy paste code, and some people really like starting from, like, a complete template that they just extend themselves.

Speaker 2

针对这三种需求,我们有LangChain学院(academy.langchain.com),提供视频形式的课程。

So for those three, we have the LangChain Academy, which is academy.langchain.com, which is a video format for this.

Speaker 2

我们有Landgraf文档。

We have the Landgraf documentation.

Speaker 2

直接谷歌搜索就能找到快速入门指南。

Just Google that, and there's a quick start.

Speaker 2

或者使用Landgraf Studio,它提供五种入门模板,这些模板在学院课程中也会用到。

Or we have Landgraf Studio, which has kind of five templates to get you started, which is actually used in Academy if you end up doing that.

Speaker 2

如果你想深入学习,强烈推荐去我同事制作的资源站看看,他们的内容比我此刻能描述的精彩得多。

So, definitely, if you wanna dive in further, would recommend going to one of the sources from my coworkers who have made much better content than I can describe right now.

Speaker 2

但简而言之,目前全部内容都是基于Python的。

But for the kind of brief answer right now, it's all just Python.

Speaker 2

你可以打开Jupyter Notebook或文本编辑器,创建一系列Python函数来定义流程图中的各个步骤。

So you either open up a Jupyter Notebook or you open up a text editor and you create a bunch of Python functions representing the different steps you want your graph to take.

Speaker 2

有趣的是,这种图形界面对于接触过低代码编辑器的人来说更直观,整个编排过程让我联想到LabVIEW或那些通过连接节点来设计机器人流程的老式工具。

And funnily enough, the kind of graph interface for this tends to be more intuitive for people who've interacted with, like, no code type editors, which the whole orchestration of it actually reminds me a lot of, like, LabVIEW or, like, some of those kinds of old connect a bunch of edges between different nodes for robotics type things.

Speaker 2

你定义好这些操作并连接它们,然后直接运行查看输出结果。

And you define those operations, you connect them up, and then you really just run it and see what the output is.

Speaker 2

我漏掉了一个重要步骤:定义模式。

One important step I left out is defining that schema.

Speaker 2

默认情况下,我们建议存储消息历史记录。

So by default, we recommend just storing, like, a message history.

Speaker 2

最简单的智能体甚至不需要任何工具,它只是不断累积消息——我发送消息时将其追加到消息状态,LLM回复时也同样追加。

So the most simple agent actually doesn't even have any tools where it's just accumulating like, I send a message and it appends it to the messages state, and then the LLM sends a message and it appends that to the messages state.

Speaker 2

在交互中你们来回交流,但同时你也可以存储一些信息,比如对话轮次,并在每个节点递增计数。

And you just go back and forth in an interaction, but then you could also store something of, like, number of turns of conversation and just increment that at each node.

Speaker 2

你可以存储LLM应该访问哪些工具,并随时间推移进行调整。

You could store which tools the LLM should have access to and then modify that over time.

Speaker 2

你确实可以在其中存储任何信息,但需要注意的是,如果存储了不可序列化的内容,你将无法使用某些检查点或托管功能,如果这说得通的话。

And you can really store any of that in there with the caveat that if you store anything that's not serializable, you won't be able to use some of the checkpoint or kinda hosted features, if that makes sense.

Speaker 1

就工作流程和编排而言,这些都是在我的环境中进行的,本质上是我在托管自己的代码?

In terms of the workflow and orchestration, that's all happening within my environment where I'm essentially hosting my code?

Speaker 2

完全正确。

Totally.

Speaker 2

是的。

Yeah.

Speaker 2

所以实际上,是的,这是个重要区别。

So actually, yeah, important distinction.

Speaker 2

Landgraf主要是一个开源项目,但我们还有LanGraph平台,算是我们的托管服务。

Landgraf is kind of mostly an open source project, but then we also have LanGraph platform, which is kind of our hosting.

Speaker 2

如果你用过Next。

If you've used, like, Next.

Speaker 2

Js和Vercel,这种模式类似,LanGraph负责所有编排工作,它知道如何执行一切。

Js and Vercel, kind of a similar model where LanGraph is all the orchestration, it knows how to execute everything.

Speaker 2

我们有一些开源版本的检查点工具,可以让你序列化状态,并利用数据库功能快速前进或回退执行过程。

We have some open source versions of checkpointers that allow you to kinda serialize that state and kinda fast forward and rewind through some of your execution using essentially database features for that.

Speaker 2

但LangGraph平台的核心是作为REST API托管所有内容,并提供可视化功能。

But then LangGraph platform is really about hosting everything as a REST API and also visualizing it.

Speaker 2

实际上我们在Lang Smith中有一些功能,这是我们用于调试和可观测性的商业产品,可以让你可视化图表,通过这类中断功能手动与状态交互。

So we actually have some features in Lang Smith, which is our kind of debugging and observability commercial product that lets you visualize your graph, interact with the state manually through these kinds of interrupts and things like that.

Speaker 2

总的来说,这能让长期构建这类系统变得更简单。

And it overall just kind of makes it easier to build some of these things over time.

Speaker 2

这两者都提供相当慷慨的免费套餐,但必须明确指出它们不属于开源产品的一部分。

And both of them have a generous free tier, but have to call out that they are not part of the open source offering.

Speaker 1

对于图中的节点,内存是否在节点间共享?

For the nodes in the graph, is memory shared across nodes?

Speaker 2

状态内存是在所有节点间共享的,并且各节点上的数据完全一致。

So the state memory is shared across all the nodes, and it's identical across all the nodes.

Speaker 2

这基本上就是整个抽象设计的核心理由。

And that's kind of the whole reason for the abstraction.

Speaker 2

我认为如果不是这种情况,可能直接用原生Python编写所有代码会更合理。

I think if that weren't the case, it would probably make sense to just write everything as raw Python.

Speaker 2

需要说明的是,现在仍有很多开发者选择这样做。

And to be clear, lots of developers still do that.

Speaker 2

但检查点的全局状态也会存储在所有线程中。

But then the check pointer kinda global state, that is stored across all threads as well.

Speaker 2

你可能发送一条消息,这条消息会触发大约六个节点的连锁反应,最终返回给用户的内容通常是消息历史中的一条记录。

So you might send a single message, which ends up kicking off a sequence of, like, six nodes just from that one message before it returns something that's meant to be shown to the user, which is typically a message on the message history.

Speaker 2

所有这些节点的执行不会影响你与它的对话——如果是我启动的这个线程,而那个全局状态对我们双方都是可接受的。

But the execution of all those nodes does not affect, like, your conversation with it if I was the one to start that thread, whereas that global state is acceptable for both.

Speaker 2

所以数据存在几个不同层级的分组。

So there's kind of a few layers of grouping of data.

Speaker 1

就扩展性而言,我认为本质上需要由托管这些服务的开发者来应对运行这类智能体流程时可能遇到的规模挑战。

In terms of, like, scaling these, I guess it's essentially on the developer who is gonna be hosting this to take on the the scale challenges that they might run into with running one of these agent flows.

Speaker 1

对吧?

Right?

Speaker 2

具体指哪方面的扩展?

Scale in which regard?

Speaker 1

比如我用Line Graph构建了某个智能工作流并部署到服务器上,当有人开始访问时,一方面会冲击我的托管基础设施,而如果使用开源版本,编排和工作流也会在我的环境中运行。

Well, in terms of if I build something in line graph and some agentic workflow and then I put it up on a server somewhere and someone starts hitting it, you know, essentially one, it's going to be hitting my hosting infrastructure, but the orchestration and workflow, if I'm using the open source version, that's also hosted within my environment.

Speaker 1

所以我理解主要责任在我,需要满足各种规模需求,并通过必要时进行拆分来设计可扩展架构。

So I'm assuming it's on me to essentially meet whatever the scale requirements and also architect it for scale by, you know, potentially breaking this up as needed.

Speaker 2

我会从两个方面来回答这个问题。

I'll answer this in kinda two ways.

Speaker 2

Landgraf本身是负责执行这些节点以处理单个请求的。

So Landgraf itself is the one that's kinda helping execute those nodes just for a single request.

Speaker 2

因此如果你的架构设计,比如全部使用同步API而非异步API,就会比使用Python异步方案占用线程更长时间。

And so if you have some architecture if you, for example, like, use all the synchronous APIs instead of async APIs, that's gonna hog a thread for for much longer than if you use something like Python async for that.

Speaker 2

开发者需要对自己的图代码实现负责,这涉及到如何构建执行图的决策。

And there's decisions in terms of, like, how you implement your graph that developers will always have to kinda take responsibility for because that's their graph code.

Speaker 2

但当我将其打包成类似FastAPI端点后,在部署FastAPI端点前,你需要决定是自己构建这个端点并托管在EC2的Docker容器或其他服务,还是直接使用Landgraf平台。

But then once I have that packaged into, like, essentially a fast API endpoint or something like that, that's when you kind of have a decision of whether you want to or actually, right before you have the FastAPI endpoint, you have a decision of whether you wanna build that FastAPI endpoint yourself and kinda host that just in a Docker container on EC two or or some sort of hosting service, or you can kinda go with Landgraf platform.

Speaker 2

我们既有与Landgraf捆绑的云版本,也提供可自托管的Landgraf平台容器,其中包含免费层级。

We have both a cloud version of it, which is packaged with Landgraf or host kinda self host that Landgraf platform container where we have a free tier.

Speaker 2

之后我们会强制执行企业许可证。

And then after that, we kind of enforce an enterprise license for that.

Speaker 2

但说到重点,构建能承载高并发查询的基础设施确实存在挑战,这正是Landgraf平台的价值所在——我们为您承担这部分工作。

But I think that kinda getting to your point, there are challenges associated with building infrastructure that that hosts these at scale where you're getting lots and lots of queries per second, and that's really where Landgraf platform comes into play, and we kinda take that on for you.

Speaker 1

Landgraf平台在后端基础设施方面做了哪些处理高并发的设计?

What's happening on Landgraf platform in terms of, you know, what you're you can share in the the back end infrastructure to handle that scale?

Speaker 2

是的。

Yeah.

Speaker 2

核心策略之一是将存储与计算分离。

So a lot of it is kinda segmenting the storage from the compute.

Speaker 2

如前所述,我们通过节点和边的概念来组织执行流程。

So as mentioned before, we have this concept of nodes and edges, so that's kind of the execution.

Speaker 2

我们还设计了检查点机制作为存储状态,允许工作节点执行直到遇到首个中断或自然结束。

And then we have this concept of a check pointer, which is the storage, state allowing kind of one worker to execute the nodes until it hits kind of the first interrupt or it ends for whatever reason.

Speaker 2

然后另一个计算节点可以实际接管并在稍后处理,如果有不同的请求进来。

And then another compute node can actually pick that up and work on it later if a different request comes in.

Speaker 2

因此这带来了一些负载均衡方面的挑战。

So there's some load balancing challenges associated with that.

Speaker 2

甚至像实现检查点这样的事,或者如何妥善处理基础设施故障,都存在挑战。

There are even challenges on just, like, implementing a checkpoint or in a way that handles infrastructure failures well.

Speaker 2

比如数据库连接中断这类情况。

So, like, database connection goes down, those kinds of things.

Speaker 2

所有这些都算是托管服务的一部分。

All of those are kind of part of the the hosted offering of those.

Speaker 2

目前我们将这些都存放在Postgres中,所以托管服务的检查点在那里,而本地版本则完全基于SQLite,虽然运行良好但扩展性稍逊。

Right now, we stick all of those in Postgres, so the checkpoint or for hosted is there, and the local one is all based on SQLite, which is which works well, but is kind of not as scalable.

Speaker 1

那么在工具集成方面呢?

What about in terms of, like, tool integration?

Speaker 1

比如调用第三方工具时总会存在一定的潜在脆弱性。

Like, you're gonna have a certain, like, potential fragility with calling out to some third party tool.

Speaker 1

我可以编写函数去,比如说从CRM系统拉取数据,但这个调用可能会失败。

So I can write the function going to, you know, maybe it pulls data from my CRM or something like that, but then that call could fail.

Speaker 1

所以这是否意味着开发者需要遵循分布式系统的最佳实践,比如重试机制这类操作?

So is it really, I guess, like on the developer to sort of follow the best practices around like, you know, distributed systems and retries and things like that?

Speaker 1

还是说部分功能由Wangchin来分担?

Or some of that offloaded by, Wangchin?

Speaker 2

好问题。

Great question.

Speaker 2

关于节点重试,我们有些功能可以简化这个操作。

So node retries, we have some features that make it easy to to use those.

Speaker 2

但大多数情况下,我们看到用户会自己实现重试逻辑,比如使用Tenacity这类重试库,因为不同场景需要不同的重试策略——比如发送邮件的函数就不该无限重试,因为空响应可能意味着邮件已发送成功,只是我们无法确认。

But for the most part, we're seeing people implement their retries themselves with something like Tenacity or kinda one of these retry libraries just because you actually want different retry behavior in different situations where, like, a send email function, you probably don't wanna retry that indefinitely because a empty response to me it could have still sent the email, and I'm just not sure.

Speaker 2

你需要某种自定义逻辑来检查邮件是否已发送。

You need some kind of custom logic to check if the email was sent or not.

Speaker 2

但从谷歌检索结果这类操作可以无限重试,因为这只会触及速率限制等问题,而不会影响任何外部状态。

But something like retrieving results from Google or something like that, that can be retried pretty indefinitely because it's just gonna you might hit rate limits or things like that, but you're you're not, affecting any external state.

Speaker 2

但长期来看有这个可能,目前暂无计划。

But long term, potentially, but no plans for that currently.

Speaker 2

我们确实与几家让工具协作更便捷的服务商合作。

We do work with a few providers that do make working with tools easier.

Speaker 2

最近开始合作的Arcade AI公司专注工具认证领域,当多用户处理不同服务的权限时,他们能解决这个难题。

One of the companies we started working with recently is this company Arcade AI that does a lot of stuff around auth for tools, where if you have multiple users kind of handling all the different permissions to their different services can sometimes be a challenge, and they handle that.

Speaker 1

我很好奇你对推理优化的看法,包括成本和性能两方面。虽然能构建这些完成复杂任务的智能工作流很棒,

I'm just curious what your thoughts on in terms of like optimization around inference, both from a cost perspective and also performance perspective, because it's great that I can build like these agentic workflows that can do really complicated, amazing things.

Speaker 1

但每次依赖模型API调用进行推理时,不仅有财务成本,还存在性能开销。

But every time I'm relying on a API call to a model to perform some inference cycle, like, there's not only a financial cost associated with that, but there's also performance costs with that.

Speaker 2

确实。

Yeah.

Speaker 2

说得好。

Great point.

Speaker 2

我们有些客户的有趣案例表明,至少在成本方面,多数人遵循的哲学是赌这些成本会下降。

And we have a few fun anecdotes from a few of our customers on this where most seem to be following the at least on the cost side, the philosophy is mostly that we're betting that the cost of these things come down.

Speaker 1

对。

Like Yeah.

Speaker 1

规模经济效应。

Economies of scale.

Speaker 2

没错。

Exactly.

Speaker 2

就像OpenAI的推理成本过去一年下降了约50倍,随着模型执行方式的优化,短期内这种趋势很可能持续。

Like, OpenAI inference cost has gone down by, like, 50 x or something in the last it's it's, like, more than an order of magnitude in the last year, and so that'll probably continue to happen at least in the short term as we get smarter about how we we execute these models.

Speaker 2

速度方面往往也会随之提升。

And then in speed also tends to come with that.

Speaker 2

比如,在模型性能方面,很多工作就是通过降低精度或采用不同的稀疏策略来缩小模型规模,这样既能兼顾两者优势,又不会真正牺牲性能。

Like, a lot of, kind of model performance side is is just making the models smaller either through kind of decreasing precision or or doing different kinds of, like, sparsity strategies, and that kinda gets you the the benefit of both while not really sacrificing performance.

Speaker 2

但在开源领域使用尤其能带来额外收益。

But there is still a benefit to using especially on the open source side.

Speaker 2

对吧?

Right?

Speaker 2

相比LAMA七db模型,LAMA七b模型能提供更优的速度和成本表现。

You're gonna get better speed and cost characteristics out of a LAMA seven b model than a LAMA seven d b model.

Speaker 2

所以我们看到短期内人们主要在分类步骤上采用这种策略——用70模型仅仅因为它更快更便宜。

And so that's the main area that we see people pulling that lever in the short term where you might have a classification step that's run on a 70 model just because it's a lot faster, a lot cheaper.

Speaker 2

但通常当需要为用户生成内容时,你会转而使用更大规模的模型来处理重要工具调用,这些调用往往决定着应用程序的控制流。

But then, typically, when you're generating output for users and things like that, you tend to back up on the larger models for important tool calls that, like, kind of decide the control flow of the application.

Speaker 2

特别是更复杂的任务,通常会选用更大的模型。

Especially for more complicated ones, you tend to use a larger model.

Speaker 2

于是我们形成了新的铁三角关系:我认为可以把成本和延迟归为同一顶点,

So we kind of have this new iron triangle where you're talking about I would actually group, like, cost and latency together in one corner.

Speaker 2

准确性特征作为另一顶点,然后还有可靠性——这个总是令人担忧的维度。

You have kinda accuracy characteristics in another, and then you have, like, reliability, which is kind of the always always concerning one.

Speaker 1

对于那些基于Landgraf甚至旧版line chain开发的用户,他们通常会遇到哪些需要应对的典型挑战?

For those that are building on, you know, Landgraf or even the sort of older version of of line chain, like, what are some of the typical challenges that they run into that they have to navigate?

Speaker 1

他们本质上应该了解哪些重点事项?

Like, what what are some of the things that they should know about essentially?

Speaker 2

是的。

Yeah.

Speaker 2

好问题。

Great question.

Speaker 2

首先,我很想听听大家的想法。

Well, first of all, would love to hear from folks.

Speaker 2

我们现在所有的文档页面都开放了评论功能。

We now have comments on all of our docs pages.

Speaker 2

我们当然会通过拉取请求和问题来监控GitHub动态。

We monitor our GitHub, obviously, through through kind of pull requests and issues.

Speaker 2

这周主要在处理积压的拉取请求。

This week has largely been working through the backlog on pull requests there.

Speaker 2

但就用户入门遇到的挑战而言,我认为主要是信息过载问题。

But in terms of challenges that users see getting started, I think the main one is information.

Speaker 2

这其实是整个行业普遍存在的问题,某种程度上也是我们在这个领域相对成功的原因——我们在文档中详细记录了这些新策略,让用户能立即实施。

And this is kind of a problem across the industry, and this is actually, I think, part of the reason that we've been relatively successful in this space is we document a lot of things in our documentation on these new strategies in a way that you can kind of immediately implement.

Speaker 2

但每天都有新东西出现,要消化这些海量信息往往很有挑战性。

But with something new coming out every day, it is often challenging to kind of drink from that fire hose of information.

Speaker 2

这也是我们内部经常纠结的问题,就像...好吧...

And so this is also something that we struggle with internally where it's like, okay.

Speaker 2

Landgraf的快速入门指南在哪里?

What is the quick start to Landgraf?

Speaker 2

对吧?

Right?

Speaker 2

目前我们大概有三四种不同的入门方案,取决于你是想部署在Landgraf平台的生产环境,还是只想做个聊天机器人玩玩。

And I think right now, we have, like, three or four different ones depending if you want to kinda target more of, like, a hosted production application on Landgraf platform or if you're just kinda playing and want to build some sort of chatbot.

Speaker 2

所以我认为明年年初要重点解决的问题是:作为新用户,我该如何选择适合自己的入门路径?

And so I think the main one that I would like to work on kind of in the beginning of next year is is how do I decide which happy path I wanna take as a new user?

Speaker 2

这确实很有挑战性。

And it's challenging.

Speaker 2

这个问题我们每三个月就要重新梳理一次,因为技术前沿总是在变化。

It's something that we rework every three months just because the the state of the art is always different.

Speaker 1

你在langjain平台上见过哪些最令人惊讶或最具创意的应用?

What are some of the most surprising or creative applications that you've seen built on top of langjain?

Speaker 2

哦,好问题。

Oh, good question.

Speaker 2

我觉得还行。

I think okay.

Speaker 2

既惊人又有创意。

Surprising and creative.

Speaker 2

今年早些时候让我惊讶的一个案例是,Uber的一些工程师在GitHub Universe上做了个演示。

One of the ones that surprised me earlier this year is some engineers from Uber actually gave a presentation on this is at GitHub universe.

Speaker 2

他们演示了如何用基于Landecraft构建的代码助手来编写单元测试。

Gave a presentation on writing unit tests with a kind of code assistant built on top of Landecraft.

Speaker 2

代码助手现在显然超级热门。

And code assistants are obviously super popular right now.

Speaker 2

它们能带来非常显著的生产力提升,所以很多公司都在研发这类工具。

There is very distinct productivity ads from them, and so a lot of different companies are working on them.

Speaker 2

但最酷的是能看到他们如何在Landgraf中逐步布局节点来实现这个功能。

But it was really cool to see kind of the step by step process of, like, how you actually lay out nodes in Landgraf to to do this.

Speaker 2

这个演讲视频就在YouTube上。

And and the talk is on YouTube.

Speaker 2

强烈推荐大家去听听。

Would highly give recommend giving it a listen.

Speaker 2

这是他们工程团队部分成员实际使用的真实代码助手案例。

So that one was kind of a real life kinda code assistant that that was being used by some portion of their engineering org.

Speaker 2

我特别喜欢Elastic的安全助手。

I really like Elastic's security assistant.

Speaker 2

这个项目挺有意思的。

That's been kind of a fun one.

Speaker 2

实际上他们与我们合作已经很久了。

They actually they've been working with us for a long time.

Speaker 2

他们最初是在代理执行器上构建的第一个版本,就是我之前提到的那个带有扩展功能的React循环,最近他们又将其迁移到了Landgraf平台。

So they built the first iteration of that on the agent executor, which was kind of that React loop that I mentioned before with some extensions to it, and then they've recently migrated that to to Landgraf as well.

Speaker 2

这主要是关于生成安全规则——我忘了具体术语叫什么——就是那种软件自动隔离机制,以及通过监控日志来编写相关规则的功能,这个确实很酷。

And that's really about generating security rules for I forget what the term is, but the the kind of automated quarantining of software and, like, monitoring logs for for writing rules for that has been a cool one.

Speaker 2

然后是所有客户支持相关的案例,不同人想在客户支持系统中构建的步骤差异之大让我很惊讶,这确实出乎我的意料。

And then all the customer support ones, just the variance in what steps different folks want to build into their customer support systems is quite surprising to me, where I don't know.

Speaker 2

我觉得这是个我自己涉猎不多的领域。

I I think it's a domain that I haven't done as much work in myself.

Speaker 2

因此,为这类流程可能需要集成的工具数量之多,以及人们实现方式的多样性,都相当令人意外。

And so just the number of different tools that you might have to integrate for those kinds of flows has been relatively surprising and and the different ways that people wanna do it.

Speaker 1

是啊。

Yeah.

Speaker 1

我是说,如果能实现单元测试自动化、安全治理的某些环节、值班文档这类虽然不总是有趣但必须完成的工作,我想会让很多人感到高兴。

I mean, I think if you can get to a place where we can automate like unit tests, certain parts of security, governance, on call documentation, like the types of things that, you know, are not always the funniest jobs but have to be done, I think that would make a lot of people happy.

Speaker 2

我同意。

I agree.

Speaker 1

那么,如果把视野放大到供应链之外呢?

What about, you know, zooming out even beyond buying chain?

Speaker 1

当前AI领域有哪些让你感兴趣并持续关注的关键创新?

Like, what are some of the key innovations that are happening in AI right now that are interesting to you and you're sort of keeping an eye on?

Speaker 2

首先,工具调用性能无疑是最突出的亮点。

So first and foremost, tool calling performance is definitely the best one.

展开剩余字幕(还有 62 条)
Speaker 2

人们在Landgraf中使用工具调用的方式,本质上是将其作为开放式推理模型——当你拥有某些上下文数据并提出问题时。

So the way people are using tool calling in Landgraf is really as a reasoning model where you an open domain reasoning model where you have some sort of context and data and you have a question of like, hey.

Speaker 2

比如:这个联系我咨询表单的客户是重要客户还是普通客户?

Is this an important customer or a not important customer that just reached out to my inbound form?

Speaker 2

目前我们尚未达到一个临界点,即能够完全信任模型能直接做出正确决策。

And as there's kind of a threshold that we haven't hit yet where you kinda just trust the model to make the right decision out of the box.

Speaker 2

我认为这可能是火上浇油的因素之一,就人们能用它们构建什么而言。

And I think that's probably the one that kind of throws gasoline on this fire in terms of, like, what people can build with them.

Speaker 2

现阶段即便你成功运行了测试集中的前100个案例,第101个以意外方式失败的案例,对于你的发布能力而言仍是重大打击。

Where right now, even if you successfully run your first, like, 100 things in your test set through it, the hundred and first that fails in kind of an unexpected way is kind of a big hit in terms of, like, your ability to to release that.

Speaker 2

人们正在用Landgraf以巧妙方式设置防护栏,确保这类意外情况不会流出。

And people are using Landgraf in cool ways to kinda guardrail that and and make sure those kinds of unexpected ones don't go out.

Speaker 2

但推理能力确实是目前亟待提升的方面。

But the reasoning capability is definitely something that leaves it to be desired.

Speaker 2

我个人对正在兴起的多模态输入输出模型非常期待——从GPT-4视觉预览开始,到现在GPT-4o已原生支持文本和图像作为输入。

I'm personally very excited about these multimodal input and output models that we're starting to see where we have seen kind of text and image modalities as inputs to models for a little while now, kinda starting with g p d four vision preview, and then kinda four o is now out of the box capable of that.

Speaker 2

现在我们看到更多能输出文本、图像、音频混合内容的模型,虽然可靠性尚不足,但展现了未来训练出优质模型后的可能性图景。

But now we're seeing more models that are actually outputting kind of mixes of text, images, audio that are not necessarily the most reliable yet, but they kind of show a vision for what the future could look like once once training in those models kinda produces something that's that's very good.

Speaker 2

不过就我个人而言,对某些视频类模型的兴奋度相对较低。

Potentially, take, but, personally, I'm a little bit less excited about some of these, like, video models and things like that.

Speaker 2

它们能创作很酷的艺术品,在创意领域非常适合头脑风暴。

I think they make really cool art, and they're really good for brainstorming in the creative space.

Speaker 2

但对我个人来说可能实用性稍逊。

But I guess, for me personally, may maybe a little bit less useful.

Speaker 2

我们肯定会看到人们在LinkedIn上生成这类提示词,然后通过生成评论来逐步完善它们。

But we're totally gonna see people, like, generating prompts for those in in LinkedIn and then generating kinda critiques of them to kinda edit over time.

Speaker 2

我确信这将成为常见用例。

I'm sure that'll be a use case that we see.

Speaker 2

回到多模态这个话题——

Back to the modalities one, though.

Speaker 2

我对音频领域真的非常期待。

I'm I'm really excited about audio.

Speaker 2

我觉得,比如OpenAI的实时功能、应用中的高级语音模式,以及实时API,这些都是我个人非常感兴趣的领域。

I think, like, OpenAI real time or advanced voice in the app and then the real time API is is an area that I'm personally very excited in.

Speaker 2

我其实是从2015年开始接触NLP(自然语言处理)的,具体时间记不太清了。

I actually got my start in NLP back in, what was it, 2015?

Speaker 2

我的第一份实习是在一家叫Gboe的公司,他们开发了一个出自媒体实验室的小型白色机器人。

My first internship was at a company called Gboe, which is this little, like, white robot that was out of the Media Lab.

Speaker 2

当时我们正在研究开放式文本策略,但我觉得通过语音与事物互动的能力,当它运行良好时,是一种相当神奇的体验。

And we were doing some strategies for kinda open ended text there, but kind of the ability to interact with something over voice, I think, is a pretty magical experience when it works well.

Speaker 2

实际上过去几个月我一直在用高级语音模式练习普通话,为两周前的台湾之行做准备。

And I was actually using, like, advanced voice mode to practice my Mandarin for the last few months for a trip I took to Taiwan last two weeks ago.

Speaker 2

能随时随地在口袋里装个老师教你任何想学的东西,这种感觉非常有趣。

And it's pretty fun to to be able to kind of have a teacher in your pocket for anything you want.

Speaker 1

是啊。

Yeah.

Speaker 1

而且我认为这为实时翻译打开了大门,距离《星际迷航》多年前提出的通用翻译器之类的愿景已经不远了,这确实非常惊人。

Also, I think open up the door to, like, real time translation and not being that far from the vision that Star Trek put out years ago of, like, the universal translator and and things like that is is is pretty amazing.

Speaker 1

我在Google Assistant工作过一段时间,所以深知语音技术失灵时的风险和挫败感。

I worked on Google Assistant for a while, so I know the perils of and frustrations around voice when it doesn't work.

Speaker 1

因此看到现在这种阶梯式的进步,感觉真的很不可思议。

So it's like incredible, the step functions that are happening.

Speaker 1

回到你之前提到的多模态话题,虽然目前表现还不完美,但看看图像生成从2022年到2023年的进步,再到2024年的视频生成,技术改进的速度本质上是指数级的。

And I think, you know, going back to your point around multimodal and how it's not like perfect performance today, but if you even look at like image generation from 2022 to where it went from there to 2023 to then like videos in 2024, like the speed with which things are getting better is is like exponential essentially.

Speaker 1

所以应该用不了多久,多模态的表现就会比现在好得多。

So it probably won't take that long until multimodal is significantly better performance than where it is now.

Speaker 2

这正是我们所期待的。

That's the hope.

Speaker 1

没错。

Yeah.

Speaker 1

在将基于语言模型的应用投入生产时,人们通常遇到的最大障碍是什么?

In terms of productionizing LM based applications, like, is the biggest hurdle that people typically run into?

Speaker 2

我可能会说首先是可靠性问题

I would probably say reliability is the first one

Speaker 1

就是那些标准系统层面的东西

where The standard system stuff.

Speaker 2

没错

Exactly.

Speaker 2

但不是基础设施可靠性的问题

Like but not infrastructure reliability.

Speaker 2

在这种情况下,我认为更多是输出具有不确定性

In this case, I think it's probably more the outputs are nondeterministic.

Speaker 2

所以你需要定义标准,比如针对客服应用的标准

And so you want you need like, even just defining the criteria for your, like, customer support application.

Speaker 2

比如要有多少比例的邮件不会激怒客户,我才能将其投入生产?

Like, what percent of my emails have to not piss off a customer in order for me to release this in production?

Speaker 2

定义这类标准是我们某种程度上必须做的,就像培训这类系统的人类操作员一样

Like, defining criteria like that is something that, I guess, we've had to do to some extent with, like, kinda training kinda human operators of of these kinds of things where it's like, okay.

Speaker 2

一旦我们认为某人具备足够的可信度来处理这些事

As soon as we feel like someone has enough, I don't know, credibility to to respond to these things.

Speaker 2

但目前仍有大量评估仅凭感觉进行——Harrison周二和Character AI的James做了个演讲,提到Character早期很多评估其实就是研究人员试用系统时凭感觉判断

But the amount of evals that are still happening just with, like, vibe checks Harrison actually did a talk on Tuesday with James from Character AI, and he was talking about some of the a lot of the early evals at Character were were literally just the researchers playing with the systems and and doing just vibe checking it.

Speaker 2

我认为这种情况仍然普遍存在,因为定义具体评估标准确实很有挑战性

And I think there's still a lot of that going on just because defining the criteria, like, defining concrete evaluation criteria is is really challenging.

Speaker 2

现在我们有很多系统可用

And now we have lots of systems.

Speaker 2

Lang Smith就有些很棒的系统,可以在线运行评估,也能在发布新版本前进行检查

Lang Smith has some great systems for running evals both online as well as kind of before you release new versions of your application.

Speaker 2

但要想发出任何这些邮件,你真的需要下功夫明确具体内容。

But in order to get any of those emails out, you really do need to put in the effort to define what that is.

Speaker 1

是啊。

Yeah.

Speaker 1

还有一批新兴公司正在投资开发新产品,比如Brain Trust,我对这个很期待。

And there's a whole new crop of companies that are investing building products like, you know, Brain Trust, I'm excited about.

Speaker 1

我想我们很快会邀请他们上节目。

I think, you know, we're having them on the show sometime soon.

Speaker 1

确实有很多人在尝试解决这个问题,这本质上是个必须解决的问题,因为其非确定性特质,很难判断你的改动是否真的在推动事情往正确方向发展。

Like, there there's people trying to address that issue and it is like something that fundamentally has to be addressed because the non deterministic nature, it's very hard to tell whether when you make a change, are you actually moving things in the right direction or not?

Speaker 1

很多时候就像把手指伸到空中,感受当下风向那样。

And it is a lot of just kind of like, you know, dipping your finger up in the air and seeing which way the wind is flowing at at the moment.

Speaker 1

我觉得现在有很多这种'氛围检测'的做法。

I think there is a lot of this, like, vibe checking that's going on.

Speaker 2

完全同意。

Totally.

Speaker 2

我认为LangswithEvals实际上在将这种氛围检测转化为真实评估方面做得很好。

I think LangswithEvals actually has done a really good job of helping turn those vibe checks into real evals.

Speaker 2

我们大约一年前推出了一个标注队列,看到人们如何用它从实时数据甚至内部数据中筛选数据集,再将其转化为评估指标。

Like, we launched an annotation queue, I think, about a year ago at this point, and the way people have used that to kind of curate datasets from live data or even just from, like, internal data in terms of interacting with things and then converting those into evals.

Speaker 2

我们当然也有功能,比如使用L作为评判标准,或者运行代码来评估这类内容。

We obviously have of features for, like, using L as a judge or kinda converting just, like, kinda running code to evaluate those kinds of things.

Speaker 2

与众多不同客户合作这个过程非常有趣,能看到各组织如何思考这个问题。

And it's it's been really interesting working with lots and lots of different customers on that to just see kind of how different organizations think about that.

Speaker 1

太棒了。

Awesome.

Speaker 1

好的Eric,非常感谢你的参与。

Well, Eric, thanks so much for being here.

Speaker 1

我真的很喜欢。

I really enjoyed it.

Speaker 2

非常感谢,约翰。

Thanks so much, John.

Speaker 2

祝你今天愉快。

Have a good day.

Speaker 1

干杯。

Cheers.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客