ElevenLabs估值刚达66亿美元，但CEO称真正的商机已不在语音领域

本集简介

ElevenLabs凭借打造逼真AI语音技术崭露头角。这家由两位不满劣质电影配音的波兰工程师创立的公司，现已发展为估值66亿美元的盈利企业，短短九个月内估值翻倍。近期红杉资本和ICONIQ领投的1亿美元要约收购中，a16z等机构参与跟投。其技术已为《堡垒之夜》角色、客服机器人等场景提供支持，并与OpenAI展开正面竞争，力争成为AI语音的行业标准。在本期TechCrunch的Equity播客中，我们带来今年Disrupt大会上与CEO马蒂·斯坦尼舍夫斯基的对话。他出人意料地坦言：认为语音模型将在几年内沦为大宗商品。那么当其他竞争者迎头赶上时，ElevenLabs的应对策略是什么？收听完整节目，了解以下内容： • ElevenLabs为何从语音模型转向构建对话式AI代理平台 • 公司如何通过数字水印、AI检测和设备认证应对深度伪造 • 斯坦尼舍夫斯基为何认为AI生成内容将很快超越人类创作 • ElevenLabs进军音乐生成领域及音视频模型融合的合作伙伴计划欢迎在Apple Podcasts、Overcast、Spotify等平台订阅Equity播客。您也可以在X和Threads上关注@EquityPod。了解更多广告选择，请访问megaphone.fm/adchoices

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

准备好部署真正可用的AI了吗？

Ready to ship AI that works?

Speaker 0

立即前往 mongodb.com/build 开始构建。

Start building at mongodb.com/build.

Speaker 0

大家好，欢迎回到TechCrunch旗舰播客《Equity》，本节目聚焦初创企业的商业动态。

Hello, and welcome back to Equity, TechCrunch's flagship podcast about the business of startups.

Speaker 0

我是丽贝卡·巴兰，本期节目我们将邀请行业专家，帮助我们深入探讨科技界的一项趋势。

I'm Rebecca Balan, and this is the episode where we bring on industry experts to help us explore a trend in the tech world and dive deep.

Speaker 0

Eleven Labs 凭借打造逼真的AI语音而声名鹊起。

Eleven Labs has made a name for itself building realistic AI voices.

Speaker 0

这家公司的起源是两位波兰工程师，他们对糟糕的电影配音感到不满，如今已成长为一家估值60亿美元的公司，不仅实现盈利，还正与OpenAI正面竞争，为从《堡垒之夜》角色到客服机器人等各种应用提供技术支持。

What started as two Polish engineers annoyed by terrible movie dubbing is now a $6,000,000,000 company that's not only profitable, but also going toe to toe against OpenAI to power everything from Fortnite characters to customer service bots.

Speaker 0

今天，我们带来的是Eleven Labs首席执行官马蒂·斯坦谢夫斯基在今年Disrupt大会上的一段对话，他在对话中做出了一项令人惊讶的表态。

Today, we're bringing you a conversation with Elevenlabs CEO, Matty Staniszewski from this year's Disrupt, where he made a surprising admission.

Speaker 0

他认为，语音模型将在短短几年内变得商品化。

He thinks voice models will be commoditized in just a couple of years.

Speaker 0

那么，当其他人都赶上来时，Eleven Labs的计划是什么？

So what's Eleven Labs plan when everyone else catches up?

Speaker 0

让我们来听一听。

Let's take a listen.

Speaker 1

感谢大家的到来。

Thank you everyone for being here.

Speaker 1

今天我邀请到了Eleven Labs的联合创始人兼首席执行官Matty Staniszewski。

I'm joined today by Matty Staniszewski, the Eleven Labs co founder and CEO.

Speaker 1

Matty，感谢你来到这里。

Matty, thank you for being here.

Speaker 2

谢谢你们邀请我。

Thanks for having me here.

Speaker 2

大家好。

Hi, everyone.

Speaker 1

我们马上进入正题。

We're gonna get right into it.

Speaker 1

Eleven Labs 制作 AI 音频模型和 AI 语音模型，几年前有很多初创公司都在训练 AI 基础模型，但现在数量已经少了很多。

Eleven Labs makes AI audio models, AI voice models, and there were a lot of startups training AI foundation models a few years ago, and now there are far fewer.

Speaker 1

但 Eleven Labs 一直坚持下来，并在过去几年中取得了显著增长。

But Eleven Labs has managed to stick it out and actually grow quite a bit in the last few years.

Speaker 1

你们现在有很多产品。

You have a lot of products right now.

Speaker 1

你们做的东西非常多。

There's a lot of things that you guys do.

Speaker 1

我想深入了解这些产品。

I wanna get into them.

Speaker 1

但你们今天最关注的是什么？

But what's your biggest focus today?

Speaker 2

作为一家公司，我们致力于解决的一个问题是人类与技术的互动方式。

As a company, one thing that we are aiming to solve is how humans and technology interact.

Speaker 2

因此，我们所做的许多工作都服务于这一共同使命。

So a lot of the work that we do underpins that common mission.

Speaker 2

实际上，回顾我们历史上解决的诸多问题，总是从一个问题开始：问题是什么？

And effectively, as you think about a lot of the problems that we solved across the history, it always started, what's the problem?

Speaker 2

我们能解决它吗？

Can we solve it?

Speaker 2

接下来我们遇到的又是什么问题？

What's the next problem that we came across the scene?

Speaker 2

就像你所说的，我们最初从语音开始。

So like you said, we started with voice.

Speaker 2

因此，在公司初创时，一个关键问题是，我们周围的技术听起来都不够像人声。

So one of the key things that came when we were starting the company is that a lot of the technology around us just didn't sound human.

Speaker 2

当时有很多知识和信息都无法获取，我们试图通过开发自己的文本转语音模型和语音模型来解决这个问题，让声音听起来更像真人。

You had a lot of the knowledge, information that just wasn't accessible, and we tried to solve this for creating our own text to speech model, and our own voice models, to make it sound human.

Speaker 2

接着，下一个问题出现了：要创造出如此出色的作品，光有这些还不够。

And then the next thing came about, which is to be able to create a lot of that incredible work, you need a lot of more than that.

Speaker 2

你还需要声音、音乐，以及对语音到文本的理解。

You need sounds, you need music, you need understanding of speech for speech to text.

Speaker 2

因此，我们一次又一次地构建了这套系统，最终为该领域的创作者打造了我们的创意平台服务。

So we, time and time again, started building that across, and built effectively our creative platform offering for creatives in the space.

Speaker 2

而更近一段时间以来，我认为这是过去两年公司最核心的焦点：我们意识到，信息正从静态向动态转变，人们终于能够以前所未有的方式与技术互动。

And then more recently, and I think that's the biggest focus across the company over the last two years, we realized that there's a big shift from static information into dynamic information where finally you can interact with the technology in ways you could never before.

Speaker 2

随着智能代理、对话代理、语音代理的兴起，人们现在可以直接与设备对话，它们也能理解并回应你。

With the rise of agents, conversational agents, voice agents, suddenly you can speak with the devices, they can understand you and speak back.

Speaker 2

但要真正让这些代理变得有价值，关键在于将所有信息和知识库整合进代理系统，使其能够与现有系统集成——无论是谷歌、Salesforce的数据，还要找到一种可测试、可长期评估、可监控且具备适当安全机制的部署方式。

But the main thing to actually make them valuable is bringing all of the information, the knowledge base inside agent, being able to integrate it with the existing systems, whether it's the Google, whether it's the Salesforce data, then figuring out the way to deploy the agent in a way which is testable, where you can evaluate over time, you can monitor, you have the right safeguards in place.

Speaker 2

因此，我们今天的核心任务是帮助一些最大的公司和初创企业部署他们的对话代理，使他们能够以全新的方式与用户互动。

So the main thing for us today is helping some of the biggest companies to starting companies deploying their conversational agents and be able to interact with their users in a new ways.

Speaker 1

我想再回到代理这个话题，但这里面涉及的内容太多了。

So I wanna get back to agents, but that's a lot of things.

Speaker 1

而且你们今年夏天还推出了全新的AI生成音乐平台。

And there's also the new AI generated music platform that you guys launched over the summer.

Speaker 1

我觉得你们在某些方面与OpenAI形成了竞争。

I see you guys as competing with OpenAI in some aspects.

Speaker 1

我觉得你们在某些AI语音工具上与Adobe竞争。

I see you guys competing with Adobe on some AI voice tools.

Speaker 1

我们该如何看待你们呢？

How should we think about you?

Speaker 2

在整个AI领域，我们看到的是，当基础模型现在能够生成所有模态时，重叠正在增加。

Across the general across AI space, what we are seeing is that there's increasing overlap when the foundational models now can create all the modalities.

Speaker 2

它们也越来越转向多模态方法。

Increasingly, they also shift to that multi model approaches.

Speaker 2

是的。

Yes.

Speaker 2

你说得对。

You're right.

Speaker 2

会有大量的重叠。

There will be a lot of overlaps.

Speaker 2

我们的重点是帮助人们在创意流程中获得最佳的音频体验，并协助部署对话式代理。

Our key focus is helping people get the best of audio for creative workflow, and then helping deploy conversational agents.

Speaker 2

当然，支撑这一点的——也许正是让我们在这个领域独一无二的原因——是我们自研模型，而你们提到的那些公司，对于他们来说，能够充分利用语音并部署最佳语音的用例非常罕见。

And of course, underpinning that, and maybe that's what makes us unique in the space is that we build our own models, where all of the companies that you mentioned, for them, those use cases that use the best of voice and deploy the best of voice are very rare.

Speaker 2

因此，无论是构建模型，这些公司通常不会自己开发模型，而是选择与其他公司合作。

So whether that's building the models, frequently those companies won't build the models themselves and will choose to partner with others.

Speaker 2

而你需要部署智能体或进行创意工作的应用层面实际上相当复杂。

And then the application side that you need to deploy the agent or the creative work is actually pretty complex.

Speaker 2

如果你要制作一本有声书，你需要经历多个步骤。

If you're creating, let's say an audiobook, you need to go for a number of steps.

Speaker 2

我们之前在离开场景时就讨论过这一点。

And we spoke this before heading out of scene.

Speaker 2

你需要纠正发音，确保每个词都完美无瑕。

You need to correct the pronunciation to make sure the words sound perfect.

Speaker 2

你需要为每个角色选择合适的声音。

You need to select the right voices for each of the characters.

Speaker 2

因此，整个有声书的创作和配音过程非常复杂。

So the whole exercise of creating and narrating the audiobook is pretty tricky.

Speaker 2

当我们公司刚开始做这个项目时，我们首次发布11 Labs界面时，它只是一个很小的框，用户可以复制粘贴类似推文长度的文字并下载生成的音频。

When we started the company actually on that case, when we first released our 11 Labs interface, it was this tiny box where people could copy paste effectively a tweet length and download the work.

Speaker 2

但很快我们就发现，这远远不够。

And what quickly transpired is it's not enough.

Speaker 2

然而，我们仍然有一些用户会这么做，比如一位有声书作者，他会把整本书逐段复制粘贴，下载后拼接起来。

However, we still had users that would we had one of these audio book authors that would copy paste his entire book, download it, stitch it up together.

Speaker 2

他要进行大约300次复制粘贴，然后发布，效果却非常好。

It was like 300 copy pastes, and then release it and it was great.

Speaker 2

这位用户后来带着其他人回来了，我们意识到，要真正完成这项工作，界面需要提供更多的功能。

That person returned with number of other people, and we realized, okay, you need a lot more of the interface to actually build it.

Speaker 2

在智能代理方面也发生了完全相同的情况，我们曾与一家名为Hippocratic的医疗健康领域客户深度合作。

The same exact thing happened with agents, where we worked deeply with a customer in a healthcare space, a company called Hippocratic.

Speaker 2

他们有一个非常出色的用例：帮助自动化患者拨打医疗单位时的预约安排，并在之后回拨提醒患者吃药和确认预约。

They do this incredible use case where they help automate appointment scheduling for patients calling in to the healthcare units, but then also call them back to remind them about taking medicine and remind them about taking the appointment.

Speaker 2

所有这些功能都是通过语音代理实现的。

And all of that is done for voice agents.

Speaker 2

但要实现这一点，他们投入了大量时间来整合文本转语音、大语言模型、语音转文本，并将这些技术协调起来，最终落实到实际应用中。

But to be able to do that, they invested so much time to combine the text to speech, the LLMs of the brain, the speech to text, orchestrate that altogether, and then bring that to their real actions.

Speaker 2

随后，许多其他客户也出现了类似情况，我们意识到，要规模化解决这一用例，我们必须做更多工作。

And then the same thing appeared across so many of other customers, and we knew that to be able to solve that use case at scale, we need to do a lot more.

Speaker 2

我认为，这是我们与其他许多公司不同的独特之处。

I And think that's the unique thing that we don't see many other companies do.

Speaker 2

应用层面非常浅显。

The application side is very shallow.

Speaker 1

我在人工智能基础模型领域看到的是，公司一直在试图变得更大。

Something I've seen in the AI foundation model space is that companies keep trying to get bigger.

Speaker 1

他们不断试图涉足更多领域。

They keep trying to tackle more.

Speaker 1

他们需要更多资金，因此必须更具雄心。

They need more funding, so they need to be more ambitious.

Speaker 1

我想知道，你们目前几乎整个公司都聚焦在音频上，但你们有没有考虑过转向其他类型的模型，比如生成视频或图像的模型？

I wanna know, you know, you've basically centered your entire company on audio right now, but have you considered focusing on other kinds of models, generating video or images kind of models?

Speaker 2

你说得对，蒂加。

You are right, Tiga.

Speaker 2

在大多数其他模态中，模型的规模正在扩大。

The size of the models is scaling in most of the other modalities.

Speaker 2

在音频领域，略有不同的是，规模仍然不是那么重要。

In audio, the slight difference is that it's still less about the scale.

Speaker 2

更重要的是模型架构。

It's a lot more about the model architecture.

Speaker 2

我缺乏一位出色的联合创始人。

I have a lack of having a brilliant co founder.

Speaker 2

我认识的最聪明的人，领导着我们大量的研究，已经成功聚集了全球最顶尖的音频研究人员，并解决了某些模型架构的难题。

Smartest person I I know who leads a lot of our research has been able to assemble some of the best researchers in the world for audio, and have been able to crack some of the model architecture challenges.

Speaker 2

我们仍然认为，在未来一两年内，音频领域仍将如此。

And we still think this will be true in the audio space for the next year or two.

Speaker 2

从长远来看，未来几年内将会趋于商品化。

Over long term, will commoditize over the next couple of years.

Speaker 2

这很关键。

It's critical.

Speaker 1

你觉得会商品化吗？

You think it will commoditize?

Speaker 2

从长远来看，是的。

In the long term, yes.

Speaker 2

即使存在差异，我认为这些差异在某些语音、某些语言上仍然存在，但随着时间推移，这些差异会越来越小。

Even if there is differences, which I think will be the true for some voices, some languages, on its own, the differences will be smaller over time.

Speaker 2

所以现在

So today

Speaker 1

既然你觉得它们会商品化，那为什么还要做这些模型？

So why make the models if you think they'll commoditize?

Speaker 2

因为在短期内，它们是你能获得的最大优势和最大突破。

Because in the short term, they are the biggest advantage and the biggest step change you can have.

Speaker 2

如今，语音仍然听起来不够自然，我们需要解决这个问题。

Today, the voices still don't sound good, we need to solve that problem.

Speaker 2

现在，当交互听起来不好时，我们需要解决这个问题。

Now when the interactions don't sound good, we need to solve that problem.

Speaker 2

解决这个问题的唯一方法是自己构建模型。

The only way to solve it is for building the models yourself.

Speaker 2

从长远来看，也会有其他参与者来解决这个问题。

And then over long term, there'll be other players that will solve too.

Speaker 2

希望我们能一次又一次地率先做到。

Hopefully, are first time and time again.

Speaker 2

但关键是要解决这些问题。

But the call is to solve those.

Speaker 2

但针对你的问题，我们观察到的模式是：如果你想要可靠且可扩展的应用场景，你仍可能为不同场景使用不同的模型。

But to your question, the pattern we are seeing, if you want reliable, scalable use case, you will likely still use different models for different use cases.

Speaker 2

因此，要选择各领域中最优秀的。

So best of breed across.

Speaker 2

例如在智能体领域，你希望在可靠性与可扩展性方面作为关键组件，同时还需要模型的可解释性。

So like in the agent space, you want the reliability and scalability across that, the kind of as the key component, you want explainability of the model.

Speaker 2

所以你对使用语音转文本、LLM文本转语音没有问题，你知道每个步骤，并能以正确的方式将其与工具连接起来。

So then you are good with using speech to text, LLM text to speech, so you know each of the steps, you can link it up to the tooling in the right way.

Speaker 2

如果是稍微不同的领域，我们预计在未来一两年内，会出现你提到的情况，即越来越多的模型将转向多模态或融合方法。

If it's a slightly different space, what we do expect in the next year or two is kind of what you mentioned, where increasing amount of models will move into a multi model or fused approaches.

Speaker 2

因此，你将在对话环境中同时生成音频和视频，或同时生成音频和语言模型。

So you will create audio and video at the same time, or audio and LMs at the same time in a conversational setting.

Speaker 2

谷歌发布的VO3这一令人惊叹的模型，就是一个很好的例子，展示了将这些技术结合在一起能实现什么效果。

Google with VO three incredible model that they've released is a good example of what you can achieve if you combine those together.

Speaker 2

我们正计划与其他公司合作并独立开展实验，看看能否将我们的音频专长与其他模型的专长结合起来，让这一切对我们的客户来说更简单。

And we are planning to do experiments in partnership with other companies and and independently to see whether we can combine our audio expertise with some of the expertise of other models to make that easier for our customers.

Speaker 2

因为从长远来看，这确实是必需的。

Because ultimately, in the long long term, yes, it will be needed.

Speaker 1

你考虑与哪些其他公司合作，它们正在训练AI视频模型？

Other companies training AI video models you're thinking about partnering with?

Speaker 2

要么是其他AI公司，要么是开源项目。

Either AI other AI companies or open source.

Speaker 1

嗯。

Yeah.

Speaker 1

这些公司是哪些？

What are some of those companies?

Speaker 2

他们还在摸索这个问题。

They are still figuring this one out.

Speaker 2

我们和几位亲密伙伴保持着密切沟通。

We have close conversations with a few of our close close friends.

Speaker 1

这是我们经常看到的一个趋势，比如Sora和BO3，你同时生成两者。

That is a trend that we see a lot where these, you know, Sora and b o three, you generate both at the same time.

Speaker 1

我想象创意人士正越来越多地想要这样做。

And I imagine creatives are increasingly looking to do that.

Speaker 1

我的意思是，你面临的另一个挑战是，你要与这些玩家竞争，而他们的资金远比你雄厚。

I mean, there's other challenge that you face of you're competing against these players and they're far more capitalized than you are.

Speaker 1

所以，你是否更倾向于考虑合作，或者未来可能被他人收购？

So are you thinking more about partnerships or potentially being acquired by someone in the future?

Speaker 1

比如，你们和他们竞争吗？

Like, do you compete with them?

Speaker 2

我们正在打造一家具有代际意义的公司，我们希望保持独立，构建能长期创造价值的东西。

We are building a generational company, and we we want to stay independent and build something that will create that value in the long term.

Speaker 2

因此，我们关于如何结合模型与其他工作的整体理念是：你不能仅仅依赖模型。

So the whole thesis of how we think about combining the model and other work is that you cannot just rely on the model.

Speaker 2

你需要构建模型，但同时也需要应用端。

You need to build the models, but then you'd also need that application side.

Speaker 2

这就是为什么对我们来说，将模型视为起步优势、进行投资，并在音频领域做到最好，同时构建创意平台和代理平台至关重要。

That's why for us, it's so important to treat models as the head start, invest, and be the best in the audio space, while at the same time build a creative platform offering and build agents platform offering.

Speaker 2

这就是构建人工智能与构建产品的结合。

And that's the combination of building AI and building product.

Speaker 2

这就是衡量标准。

That's the metric.

Speaker 2

就像软件和硬件是苹果公司的衡量标准一样。

The same way, like, software and hardware was the metric for Apple.

Speaker 2

我们认为，产品和人工智能将成为下一代最佳应用场景的衡量标准。

We think the product and AI will be the metric in the next generation of the best use cases.

Speaker 2

现在，我认为棘手的地方在于音频本身，这也是未来几年至关重要的部分。

Now, what I think is tricky, and that's where we expect in the next couple of years is essential, is on the audio side itself.

Speaker 2

到目前为止，我们已经能够创建一个文本转语音模型，最近在语音识别和音乐方面都超越了一些基准。

So far we've been able to create a text to speech model that was beating out of some benchmarks recently on speech to text, now on music.

Speaker 2

接下来的第二步是如何将这个模型与，比如说，一个开源的视频模型结合起来。

Then there's a second step of how you combine that model with, let's say you take an open source video model.

Speaker 2

这种融合步骤实际上具有挑战性且复杂。

That fusing step is actually challenging and tricky.

Speaker 2

因此，在这里，我们也希望成为首批解决这一问题的人，以高质量且可靠的方式实现它。

So here too, we hope to be one of the first to figure that out, to do it in both high quality way and reliable way.

Speaker 2

我们目前并不计划训练竞争对手所投入的那些计算或数据密集型模型。

We don't today plan to train some of the compute or data intensive models that competitors do.

Speaker 2

当然，这只是一个细分领域。

And of course, that's just a segment.

Speaker 2

这就是为什么做好产品方面如此重要。

That's why it's so important to do the product side.

Speaker 1

不管你们在做什么，你们提到的应用层确实为你们取得了巨大成功。

Well, whatever you're doing, I mean, you talk about the application layer, and that has really been a success for you guys.

Speaker 1

我知道很多公司都在使用Eleven Labs。

I know a lot of companies that are really using Eleven Labs.

Speaker 1

当我看到OpenAI和谷歌及其产品套件时，我没有看到任何一家公司与你们构建的许多产品形成直接竞争。

And I'm curious when I look at OpenAI and Google and their suite of products, I don't see a one to one competitor to a lot of the things that you build.

Speaker 1

我很想知道，为什么你们的一些大型竞争对手还没有涉足这些产品，或者我有没有

And I'm curious why you think some of your larger competitors haven't encroached on these products or I have

Speaker 2

不知道。

don't know.

Speaker 2

不知道。

I don't know.

Speaker 2

他们当然会覆盖一系列使用场景。

They, you know, of course they do set of use cases across.

Speaker 2

所有这些公司都会有一些版本的创作者工具和某种形式的分发渠道。

All of those companies will have some version of a creator tooling and some version of distribution.

Speaker 2

但我认为，真相在于，所有这些情况下，关键都在于专注。

But I think the truth is in all those cases, it's a matter of focus.

Speaker 2

在我们的情况下，我们专注于解决人机交互这一问题，这体现在语音上，现在又将其与其他系统连接起来，我认为许多公司有太多其他部分需要正确地解决和应对。

And I think in our case, were straight focused on solving this human computer interaction problem, and it shows voice, now it's linking it to other systems, and I think many of the companies have just so many other parts that they rightly are trying to fix and tackle.

Speaker 2

但与此同时，当你思考竞争时，在很多方面，我们并不以那种方式看待他们。

But at the same time, think, you know, like as you think about the competition, in many ways we are not looking at them in that way.

Speaker 2

当然，我们会关注最新的动态，但我们的愿景是独立于他们的做法的。

Of course, you know, we are seeing what's the newest things that are happening, but kind of our vision is independent of what they do.

Speaker 1

我认为有一个论点可以说明，批评者会说，你们所创建的是一个高风险的平台，人们可以利用它来制作深度伪造内容，而像OpenAI和谷歌这样已经因他们的AI工具面临大量批评的公司，可能不希望让自己陷入这种风险。

I think there's one argument to be made where critics would say that what you've created is something of a risky platform where people have been able to use and to create deepfakes and maybe OpenAI and Google who are already facing a lot of criticism for their AI tools don't want to subject themselves to that.

Speaker 1

你认为这个论点公平吗？

And do you think that's a fair argument?

Speaker 2

你也可以认为，随着最近Serato的发布，他们所承担的风险比我们大得多。因此，当我们把自己视为语音领域的领先公司时，将大量发布与安全措施结合起来至关重要。

You could also argue that with recent Serato release, they are taking much more risk than we would ever So as we think about ourselves as like a leading company in the voice space, so important that you combine a lot of releases with the safeguards.

Speaker 2

对我们来说，这意味着从我们发布那一刻起，你总是可以追溯内容的来源，知道是谁生成了这些内容。

So for us, what this means is from the moment we release, you can always trace back the content back to who generated that content.

Speaker 2

你可以追踪并判断内容是否由AI生成，我认为这在该领域将变得非常重要——明确区分AI内容与非AI内容。

You can trace and understand whether it's AI or not AI, which I think will be an important part in the space, that you understand AI versus not AI content.

Speaker 2

其次，我们在对不同声音和内容的审核上投入了大量资源，以便在内容生成前就能发现或标记出来以供进一步审查。

But then second, we do invest lot in the moderation of different voices and different content itself, so you can catch that before it's created or flag it for further review.

Speaker 2

最后一点是，尤其是在过去一年里，我们与行业其他参与者广泛合作，将我们发现的这些方法推广开来，希望将其打造成行业标准。

And then the last thing is we, especially over last year, is we collaborated a lot in how we can use a lot of those things that we find ourselves with other players to hopefully bring that as a standard to space.

Speaker 2

其中一个例子是，你需要能够检测AI生成的内容，但不仅仅针对11 Labs，而是针对所有音频公司。

So one of those examples is you need to be able to detect AI content, but not only for 11 Labs, but for all audio companies.

Speaker 2

而我们现在看到的最大趋势是，大量开源公司以及来自中国的其他商业模型正作为竞争对手涌现，而这些内容你无法检测出来。

And now the biggest wave that we see is a lot of open source companies, a lot of other commercial models from China that are coming as a competition, and you cannot detect that.

Speaker 2

因此，我们正与牛津大学、加州大学伯克利分校和Rati Defender合作，共同开发有效的分类模型来检测这些内容。

So we are collaborating with University of Oxford, Berkeley, Rati Defender to create effectively classifier models to detect it.

Speaker 2

但也许最后一点是个很好的例子：几个月前，我们与英国一家慈善机构合作，他们利用语音代理，在来电时根据IP地址判断是否可能是诈骗电话，如果是，语音代理就会接管通话，目的是拖延诈骗者的时间，效果非常好。

But maybe last thing, which was a good example, recently, a few months ago, we worked with a charity in UK where there was they used effectively the voice agents, AI voice agents to, as the call was coming in, based on IP, if it was likely a scammer, the voice agent would take over with the purpose of wasting the scammer's time, and it worked beautifully.

Speaker 2

这是一种利用AI语音精准对抗诈骗和欺诈的巧妙方式。

And it was a nice way of using AI voices exactly against the scam and fraud that was happening.

Speaker 0

厌倦了AI搜索总是抓不住重点、无法理解上下文吗？

Tired of AI search that misses the point and can't understand context?

Speaker 0

MongoDB的Voyage AI帮助解决了这个问题。

Voyage AI by MongoDB helped solve this.

Speaker 0

由来自斯坦福、麻省理工和伯克利的世界级研究团队打造，Voyage AI提供行业领先的嵌入和重排序模型，精准捕捉语义并优化结果。

Built by a world class research team from Stanford, MIT, and Berkeley, Voyage AI delivers industry leading embedding and re ranking models that capture meaning and refine results with precision.

Speaker 0

无论您是在构建语义搜索、RAG应用还是代理型AI，都可追求更高精度与可靠性，同时避免复杂性。

Whether you're building semantic search, rag applications, or agentic AI, aim for greater accuracy and reliability without the complexity.

Speaker 0

准备好部署真正有效的AI了吗？

Ready to ship AI that works?

Speaker 0

立即前往mongodb.com/build开始构建。

Start building at mongodb.com/build.

Speaker 1

我想问问关于深度伪造的问题，因为感觉一年前我们一直在讨论这个问题。

I wanna ask about the deepfakes because it feels like a problem that we were talking about a lot a year ago.

Speaker 1

然后，我认为很多人都觉得，我们生活在一个无法分辨AI语音和真实语音差异的世界里。

And then I think a lot of people have just kind of felt like we're in a world where we can't tell what is the difference between an AI voice and a real voice.

Speaker 1

我的意思是，你对此有何回应？

I mean, what's your response to that?

Speaker 1

你觉得在未来几年会怎样？

Like, how do you think in the next few years?

Speaker 1

你能跟我谈谈你们为应对这个问题所做的努力吗？

Like, can you talk to me about some of your efforts to combat that?

Speaker 2

所以这有两个方面。

So there are two parts.

Speaker 2

一方面，我们肯定认为，AI生成的内容将超过人类生成的内容。

One is, what will definitely happen is we think there'll be more AI generated content than human content.

Speaker 2

你可能最近看到过一些AI音乐的例子，那里AI生成的内容已经接近超过50%。

You might have seen some of that recently on AI music, where that's closely becoming over 50%.

Speaker 2

所以我认为，每个人都应该把这一点当作默认前提。

So I think that should be a default that everybody is operating on.

Speaker 2

AI生成的内容将会存在，同时也会有更多人类创作的内容。

AI generated content will be out there, with more human content.

Speaker 2

因此，我们必须预期并意识到，绝大多数现有内容都可能是AI生成的。

And by extension, we'll need to expect and know that most of the content that's out there can be AI generated.

Speaker 2

现在棘手的是，将来会出现完全由AI生成的内容，也有部分经过编辑的AI内容，以及少量由AI辅助编辑的内容。

Now the tricky thing is, what will start happening is, you will have, of course, fully AI generated content, but also semi edited AI content, and then a little bit AI edited content.

Speaker 2

这样一来，就很难分辨了。

And then it's like very hard to tell.

Speaker 2

我们目前的工作是，Audible今天已禁止在Bing上发布AI有声书。

We worked, so Audible today bans AI audiobooks from Bing out there.

Speaker 2

我们面临着一个有趣而棘手的困境，一些使用AI的作者有时只用它来生成一句话。

And we have this interesting, tricky dilemma with some of the book authors that use it, that sometimes you would use it for a sentence.

Speaker 2

这算是AI生成的内容吗？

Is that AI generated?

Speaker 2

还是可以接受的？

Or is it fine?

Speaker 2

或者100%由AI生成，那可能就不合适了。

Or 100% AI generated, probably not fine.

Speaker 2

所以，这个平衡点在哪里呢？

So it's like, where is that balance?

Speaker 2

当你开始编辑、去除声音、消除噪音时，AI当然现在是个热门词，但它实际上已被应用于整个制作过程的各个环节。

And then as you start editing, removing sounds, removing noise, AI is of course such a now buzzwordy term, but it actually is used in all parts of that production process.

Speaker 2

因此，这变得极其复杂。

So it becomes super tricky.

Speaker 2

所以，这是第一点。

So that's the first thing.

Speaker 2

我会明确说明，我认为我们的预期是，大部分现有内容都将由AI生成。

I would make it clear, and I think it's the expectation that we are going with that most of the content out there will be AI generated.

Speaker 2

然后第二点，真正的解决方案是什么？

And then the second thing, what's the actual solution?

Speaker 2

如果我们能跳过常规思维，与致力于此的政府和机构合作，这可以看作是一个三步走的方案。

If we like leapfrog in how we think about it, and how we are collaborating with the governments and institutes working on that, it's kind of a three step approach.

Speaker 2

第一层是检测人类生成的内容。

The first layer would be a detecting for human content.

Speaker 2

因此，与其关注AI生成的内容，不如实际检测这是否来自人类一方。

So instead of thinking about AI generated content, you actually detect whether this is a human side.

Speaker 2

人类真实性是一个独立的问题，但总体而言，我们会将设备编码为属于我自己的设备。

There's a separate problem of human authenticity piece, but in general, we encode the devices that this is device belonging to me.

Speaker 2

如果我给你打电话，你的设备会解码，确认这是来自马蒂的来电，然后你就知道这很可能是一个可信的来源。

If I'm calling you, your device decodes, okay, this is call coming from Matti, and then you know that this is likely a trustworthy source.

Speaker 2

一切都没问题。

All fine.

Speaker 2

第二层是经过水印标记的AI内容。

Second level is opted in watermarked AI content.

Speaker 2

我们每个人都会拥有自己的语音助手，代表我们执行任务。

So all of us here will have their own voice agent that will do tasks on our behalf.

Speaker 2

我们会打电话给餐厅预订座位，重新安排医生预约，而你需要这个语音助手。

We'll call a restaurant to book it, we'll reschedule doctor's appointment, and you will need the voice agent.

Speaker 2

但你希望这是一个经过认证的语音代理，对方也需要知道这是一个经过认证的语音代理。

But you want it to be authenticated voice agent and the other side to know that this is authenticated voice agent.

Speaker 2

在这里，水印技术的作用至关重要，它能传递我授权语音代理代表我行事的信息。

And here, the watermarking aspect of effectively bringing that information that I give the permission for the voice agent to act in my behalf will be important.

Speaker 2

第三层是，默认情况下，所有其他内容都是AI生成的或伪造的。

And then the third layer, by default, everything else is AI and fake.

Speaker 2

然后你可以添加额外的模型，比如检查或不检查，但这才是我们应该走向的方向。

And then you can add additional models like check or not check, but that's where we should be going.

Speaker 2

我们目前主要在三个层面运作，尤其是第二和第三层。

And we operate on on all three levels, mostly on a second and third one today.

Speaker 2

第一层涉及整个领域的协作，非常复杂。

The first one is super tricky about the collaboration of the entire space.

Speaker 2

但我们正在探索并已开始实施水印技术，以及可以跨平台携带的更多元数据。

But the second that we are exploring and already working on watermarking, and then on like more additional metadata that you can carry across.

Speaker 1

具体到你们的产品，你们是否看到有人使用Eleven Labs的工具来策划对抗性攻击？

Specifically to your products, do you see people using Eleven Labs tools to orchestrate adversarial attacks?

Speaker 2

不。

No.

Speaker 2

我们一直在与这些行为作斗争。

We are fighting all the time against those.

Speaker 2

因此，这既包括审核和标记系统，也包括在您使用产品之前就实施的限制，这些措施已经阻止了大量不良用户。

So it's both the system of moderation and flagging that, but then also just the restrictions we have before you even use the product are fighting out a lot of the bad users.

Speaker 2

当然，如果出现新的使用场景，我们会尝试扩展这些措施。

Of course, if the new use cases appear, we try to extend this.

Speaker 2

我认为这里棘手的地方在于，如果你考虑全球范围内的滥用问题，就会发现，像Eleven Labs这样的平台之外，相关技术已经公开存在且没有任何限制。

I think the tricky thing here is like if you think about the misuse for the world, is that technology is already out there in the open without any restrictions at eleven Labs boots.

Speaker 2

因此，当你考虑开源时，完全可以实现一些恶意用途，而无法实现追踪、审核或检测这段内容是否为AI生成，或由谁生成。

So as you're thinking about open source, you can do exactly some of the nefarious cases, and there is no way to have traceability, to have moderation, to have the ability to detect whether this was AI or not, or who generated that.

Speaker 2

因此，这是一个复杂的领域，我们也在与其他方合作来解决这个问题。

So it's a tricky space, and here too, we are collaborating with others to find this.

Speaker 1

我对这一切很好奇，比如，你说未来AI生成的内容会比人类生成的还多。

I'm curious with all this, like, talk about, like, you know, you said there's gonna be more AI generated content out there than human.

Speaker 1

在人工智能时代，你认为人类生成的声音和人类创作的音乐价值何在？

What do you think the value is of a human generated voice and a human generated piece of music in the AI age?

Speaker 2

看，这涉及很多方面。

Look, there's so many things.

Speaker 2

一是细微差别和语气，我认为这在人工智能和人类领域都适用，比如方言、口音和声音的细微差异。

One is the nuance and the slight and and I think it applies actually both in the AI and the human space, nuance of dialect, the accent, the voice.

Speaker 2

我会稍微谈一下这一点。

So I'll I'll speak about this a little bit.

Speaker 2

但还有第二点，当你想到音乐时，如果你在制作有声读物或电影，归根结底，讲的是故事。

But then there's also the second piece, as you think about music, if you think about creating an audio book, if you're creating a movie, at the end of the day, it's about the story.

Speaker 2

讲的是人们在特定时刻所感受到的某些情感。

It's about some of the emotions that people felt in a given moment.

Speaker 2

那种能让你与对方产生连接、真正使其具有人性的东西。

Something that connects you to the other side that actually makes it human.

Speaker 2

我相信你关注过这个领域里的很多人，真正让它们如此珍贵的，往往不是创作行为本身，而是其他更多因素。

I'm sure you follow a lot of a lot of people in the space, and it's a lot more frequently than the just the act that was created that makes it so valuable.

Speaker 2

我认为，只要我们所有人都在这里，这一点就会持续存在：你会与他人产生连接，追随那个故事和情感部分。

And I think this will remain true as long as all of us are are here, where you do attach to the other person, you follow that story and and an emotional part.

Speaker 2

在语音方面，首先，即使在今天的AI领域，我们也创建了一个语音市场，每个人都可以创建并分享自己的声音，当其他用户使用这些声音时，你就能因此赚钱。

And then on the voice side, so first of all, even in AI space today, we've created a voice marketplace where everybody can create their voice and share the voice, and when the voice is being used by other users, you earn money as a result.

Speaker 2

这种语音支持所有70种语言。

The voice works across all 70 languages.

Speaker 2

我们最受欢迎的声音是一个西班牙语声音，但听起来却极其地道的英语。

Our top voice is a Spanish voice that sounds incredible in English.

Speaker 2

因此，大多数用户都是英语使用者。

So most of the users are English speakers.

Speaker 2

如果你们中有任何人想在这个平台上注册，我们还需要更多声音。

If any of you want to register on that platform, we need more voices.

Speaker 2

但有太多不同类型的声音，它们有效地展示了方言、口音和不同语言的重要性。

But there's just so many different ones that effectively show how dialect, accent, different languages is important.

Speaker 2

然而，我们只是刚刚触及表面。

And still, we are just scratching the surface.

Speaker 2

我们已经通过这种方式分享了10000个声音，并向社区返还了1000万美元。

We have 10,000 voices shared this way, paid $10,000,000 back to the community.

Speaker 2

我认为，在整个宇宙中，还有一股其他独特声音的浪潮。

And I think there's like just a wave of other special, unique voices across across the universe.

Speaker 2

所以这是第一点。

So that's one.

Speaker 2

第二点是，在更深层次上进行收集的能力。

And then two, ability to collect on the on the deeper level.

Speaker 1

我的意思是，你认为我们会进入一个有人成为你粉丝的世界吗？

I mean, do you think we're moving to a world where there's someone that you're a fan of?

Speaker 1

比如，你可能是一个电影导演或故事讲述者的粉丝，你会主动寻找他们的作品，但围绕他们的所有内容都可能变成AI生成的。

Like, maybe there's like a a filmmaker or a storyteller you're a fan of and, you know, you seek out their content, but then maybe all the content around them becomes AI generated.

Speaker 1

你明白吗？

Do you understand?

Speaker 1

我的意思是，我认为你所做的很多工作，其实是在边缘地带操作，比如旁白或背景音乐，你说这些都可以交给AI来完成。

I mean, because I I think a lot of what you're doing is, you know, you're you're kind of taking these things at the edges, like the voice over or like the background music, and you're saying that can just be AI.

Speaker 1

但你意识到这会创造一个只有少数人能成为艺术家的市场。

But you realize that that creates a market where fewer people can be artists.

Speaker 2

首先，关于你问题的第一部分，我非常喜欢的一位艺术家是威尔。

So first of all, to the first part of the question, so one of artists I I love is Will.

Speaker 2

我。

Speaker 2

是。

Am.

Speaker 2

他将在11月11日的Eleven Lab峰会上发言，当然。

He'll be speaking on eleven Lab Summit on eleventh eleven, of course.

Speaker 2

但我们之前稍微讨论过这一点。

But he we spoke a little bit about this.

Speaker 2

当然，质量必须得好。

And of course, you know, the quality needs to be good.

Speaker 2

我认为我们离这个目标还很远，但目前还没达到。

I don't think we are close there, but it's not there yet.

Speaker 2

但是的，我可以想象像这样的艺术家如何创作他们的作品，你可以在此基础上扩展和个性化，或者进行本地化。

But yes, I do imagine where artists like this could create their work and you could extend it and personalize it, or you could localize it.

Speaker 2

夏奇拉以英语和西班牙语演唱而闻名。

So Shakira famously sings in English and in Spanish.

Speaker 2

如果这种模式能推广到所有不同语言，那将令人难以置信。

If that was possible across all the different languages, it would be incredible.

Speaker 2

所以这是第一部分。

So that's the kind of the first part.

Speaker 2

第二部分，是的，随着技术革新，会有更多人创作内容，但同时，它也解放了那些以前因负担不起而无法创作的人——比如在音乐领域，你曾经必须去录音棚才能录制。

The second part, yes, with the technology innovation, there will be more people that create content, but at the same time, it unlocks all the people that couldn't create before because it was unaffordable, because you needed to go to a studio to record if you just zoom into the music.

Speaker 2

现在，所有这些资源都对那些没有预算去录音、找不到工作室或设备的人开放了。

Now all of that is accessible to people that don't have the budget to go record and find a studio to find equipment.

Speaker 2

在以前，如果没有录音棚的访问权限，音乐创作根本不可能实现。

Two shifts before, the music would just not be possible to be creative if you don't have the access to the studio space.

Speaker 2

我对人工智能技术也有同样的感受。

And I feel the same about AI technology.

Speaker 2

是的。

Yes.

Speaker 2

内容更多了，但这也降低了其他人开始创作的门槛，而以前这是不可能的。

There's more content, but it lowers the bar for other people to be able to start, which has not been possible.

Speaker 2

其次，仍然会有同样数量的人关注并渴望建立更深层次的联系，甚至更多，而这以前是不可能的。

And then two, still you will have the same amount of people following and keen to have the deeper connection or even more, which wasn't possible before.

Speaker 1

我想稍微聊聊你的业务，因为你的业务确实发展得非常兴旺。

I wanna just get into your business for a second here because you do have a really booming business.

Speaker 1

几个月前，你们宣布已经实现了超过2亿美元的年经常性收入（ARR）。

You announced a few months ago that you guys had surpassed $200,000,000 in ARR.

Speaker 1

并且你们希望到2025年能突破3亿美元的年经常性收入（ARR）。

And that by the 2025, you'd like to surpass $300,000,000 in ARR.

Speaker 1

你们离这个目标还有多近？

How close are you to that goal?

Speaker 2

我们已经很接近了。

We are close.

Speaker 2

我们仍然按计划进行。

We are still on target.

Speaker 2

我们与思科、德国电信等公司建立了令人难以置信的合作伙伴关系，它们正在自动化客户体验工作，并创新新的应用场景，这种趋势甚至延伸到一些增长最快的初创公司，比如Perplexity，我们将它作为设备交互的一部分使用。

We have just incredible partners from companies like Cisco or Deutsche Telekom, automating their customer experience work to innovating of the new use cases, which is true all the way to some of the fastest growing startups, like Perplexity, which we'll use that as part of the interaction of the devices.

Speaker 2

但即使在这一领域之外，一个几乎堪称积极惊喜的是，一些政府也开始积极参与，并探索如何利用这项技术。

But even outside of that space, what was probably the almost a positive surprise is some of the governments leaning in and how they can use that.

Speaker 2

最近，我们宣布了与乌克兰政府的合作，他们正在打造首个全政府AI议程，让社会各阶层的人都能通过AI获得教育，通过AI客服获取政府服务，这令人难以置信，而且是无偿提供的。

Recently, we announced our work with the government of Ukraine, who are creating the first agenda government, where all the people across society will have access to education through AI, will be able to access the government services through AI customer support, which is just incredible to see, because pro bono.

Speaker 2

但纵观这一切，很明显，我们与技术的互动方式正在发生变化，希望在11月，我们能发挥重要作用，推动这一变革，并在年底前实现3亿美元的目标。

But across all of that, it's very clear that the way we interact with technology is changing, and hopefully at 11, we can play a big role in making that and crossing the 300 by the of the year.

Speaker 1

你们已经拓展到了许多人们意想不到的领域，比如与政府以及音乐行业的合作。

You've expanded into a lot more places than I think people expected working with governments, for example, in music.

Speaker 1

我很好奇，五年后，你如何看待Eleven Labs？

And I'm curious, five years from now, you know, how do you think about Eleven Labs?

Speaker 1

我们应该如何理解你们呢？

And, like, how should we think about you?

Speaker 2

我希望我们能成为任何创意需求的首选之地。

I hope that we are a go to place for anything creative.

Speaker 2

如果你正在寻找一个可靠的对话代理来部署，我们就是你的首选之地。

We're a go to place if you are trying to deploy conversational agent that's reliable.

Speaker 2

我们致力于解决让沟通变得简单顺畅的问题，以应对技术领域层出不穷的惊人创新，而我们在此过程中发挥了重要作用。

And our mission of solving the problem of making it easy and seamless to communicate with that incredible innovation that's happening across technology is solved, and we've played a big a big role in that.

Speaker 1

好的。

Alright.

Speaker 1

我想我们的时间到了，但谢谢你，马蒂。

I think that's our time, but thank you, Matty.

Speaker 1

感谢马克斯。

Appreciate Max.

Speaker 1

好的。

Alright.