本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
欢迎收听《零知识》播客,我们将探讨零知识研究和去中心化网络的最新进展。
Welcome to Zero Knowledge, a podcast where we talk about the latest in zero knowledge research and the decentralized web.
我是主持人安娜。
The show is hosted by me, Anna
还有我,弗雷德里克。
And me, Frederic.
本周我们邀请到IBM研究院的高级研究科学家弗拉维奥·贝尔加马斯基。
This week, we are sitting with Flavio Bergamasky from IBM Research, where he acts as a senior research scientist.
本周我们将主要讨论全同态加密(FHE)。
This week, we're gonna be primarily talking about FHEs.
欢迎来到节目,弗拉维奥。
So welcome to the show, Flavio.
谢谢。
Thank you.
我们过去曾简单讨论过全同态加密,但本期节目将特别有趣,因为能让我们了解FAG研究领域的最新动态。
Now we've talked a little bit about FHEs in the past, but I think this is going to be a very interesting episode because it's gonna give us a chance to kinda catch up with what's been happening in the FAG research space.
我认为我们从未真正从基础层面讨论过它是什么,当前的技术状态如何,甚至其发展历程。
I don't think we've ever really talked about, you know, at a fundamental level, sort of what it is or, what the current state of technology is or even its development over time.
我想我们在开场白中稍微提到过,人们对全同态加密能做什么或不能做什么似乎存在很多先入为主的观念。
I think, we talked a little bit about this in the preamble of, like, there are there seems to be a lot of preconception about what fully homomorphic encryption is is capable of doing or not capable of doing.
深入探讨这些问题并了解现状会很有趣。
It'll be interesting to dig into all these things and see what status quo is.
完全同意。
Totally.
好的。
Alright.
我喜欢在关于全同态加密(简称FHE)的对话开始时,先提出一个我通常向听众提问的问题。
So I I like to start all these conversations about fully homomorphic encryption, FHE for short, with a question that that I ask the audience typically when I'm presenting.
想象一下,如果你能在不解密的情况下对加密数据进行计算,你能做些什么。
So imagine what you could do if you could compute on encrypted data without ever decrypting it.
对吧?
Right?
那么如果数据始终处于加密状态,有哪些现在无法实现的事情能通过这项技术完成?
So if the data was always encrypted, what are the things that you don't do today that you would be able to do with this technology.
你有时会得到哪些类型的回答?
What are the kinds of answers that you get sometimes?
哦,有好几种。
Oh, there are several.
但在人们真正认识到它在隐私和机密性等方面带来的价值之前,你会发现大家会更愿意分享信息,因为这些信息永远不会以明文形式存在。
But but I guess until people actually realize what it brings to the table in terms of privacy and confidentiality and everything, you see that people will be more inclined to share information because that information is never in the clear.
嗯。
Mhmm.
而且人们可以在不泄露信息的情况下对这些数据进行计算。
And also that people can compute on that without leaking, the information.
所以即使发生数据泄露等情况,由于数据始终加密,风险也会小很多。
So you might have data breaches and everything else, but the data is being always encrypted is less of a risk.
这在当前显得尤为重要,因为我们看到需要更多共享数据来追踪全球动态。
And this seems super relevant right now as we see the need for more kind of shared data in order to map the goings on in the world.
然而现在人们非常担忧,比如,如何才能真正保持这些信息的私密性?
And yet now there's a lot of concern that, like, how how could you actually keep any of that private?
我们有必要大规模计算所有这些数据。
It's necessary that we compute all of these things en masse.
但确实,我们似乎还没有找到既能造福个体又能公平对待个体的方法工具。
And yet, yeah, we don't necessarily have it feels like we don't yet have the tools to do this in a way that will be beneficial to the individual or fair to the individual.
或许这样说更准确
Maybe that's a The better way of saying
关键在于如今人们不像从前那样重视隐私和机密性,尤其是个人隐私和机密性。
main thing is that these days people carry less about privacy and confidentiality, individual privacy and confidentiality that they used to do before.
现在人们几乎在社交媒体上分享一切,而这项技术或许能让我们在不必过多暴露个人生活、行为和行踪的情况下实现很多目标。
There is a lot of social sharing of pretty much everything, and with this technology it may be may be possible to do this, to achieve a lot without having to give away so much about your life, about what you do, about where you go, or not.
是的。
Yeah.
我们之前节目中也讨论过这个话题,我觉得有趣的是人们已经习惯了某种生活方式,某种程度上提升隐私意味着要回溯并剥夺某些既得权益。
And then then we've had some discussions on this in the show in the past and I think it's an interesting sort of people have gotten accustomed to a certain lifestyle and to some degree fixing it, like increasing privacy means going back and and, like, taking things away from someone.
但倒退几乎从来都行不通,所以需要找到新的解决方案,让人们既能做想做的事,又能有所改进。
But going back almost never works, so you need to find new solutions to to let people do what they want to do, but still improve.
比如真正保护他们的隐私。
So things like actually still preserving their privacy.
我们深入探讨了很多,我还是想知道,你在IBM研究院工作。
We dig super deep into things, I still want to know, you're at IBM Research.
IBM为什么对这个感兴趣?
Why is IBM interested in this?
IBM的研究科学家具体做什么工作?
And what does a research scientist at IBM do?
没错。
Right.
实际上IBM在差不多11年前就发明了这项技术。
So IBM actually invented this technology about ten, almost eleven years ago.
这项技术发明时,它解决了一个困扰人们十多年的问题:如何在加密数据上执行复杂计算。
And when it was invented, it kind of solved a question that people had had for more than a decade on how you could perform complex computation on encrypted data.
但那时候它运行得非常慢,慢得可怕。
But at that time it was very slow, terribly slow.
所以当时有句流行语说,'哦,这数学理论很棒,但在我有生之年是看不到实际应用了'。
So the the one of the quotes at that time was people said, oh, this is fantastic mathematics and everything else, but not in my life time.
哦,'有生之年无望'是吧,天啊。
Oh, not in my lifetime was the, oh, no.
但现在不同了,随着算法进步和全球协作,它的速度已经能满足某些实际应用场景了。
Well, now, right, with all the advances in algorithmic and and collaboration across the world, it's at a speed that's practical for certain use cases.
对吧?
Right?
这正是我们现在专注的方向——当性能、算法和应用性达到实用水平后,就能将其推向市场了。
And this is one of the points that we focus now, because becoming practical in terms of performance, algorithmics, and use, and so on, you can start putting that into the market.
虽然现在还不像2009年那样处于技术萌芽期。
It's still not as early days as it was in 2009.
对吧?
Right?
但此刻正是转折点,处理速度已能满足需要隐私和保密性的应用场景。
But it is it is that inflection point where things are fast enough for applications that require privacy and confidentiality.
回到弗雷德里克的问题,关于IBM研究院,或许我们该更深入了解下,IBM研究院的目标究竟是什么?
But going back to that question from Fredrik, like IBM Research, maybe it's good for us to understand a little bit more, like, what is the goal of IBM Research?
是为了探索某些理论方向以寻找应用可能,还是已有IBM项目推动了这项研究?
Is it to explore certain sort of theoretical threads to see if there's applications, or was there already some IBM project that prompted this?
最初启动时更偏向理论性,后来通过资助金支持逐步推动技术发展。
Well, when it started, it was more, theoretical and has been evolved with funding grants for us to advance the technology.
现在我们正与客户和用户共同设计,将其转化为实际资产。
And now we are actually looking at co designing what becomes an asset with clients and with the users.
这很关键——就像我说的,想象一下能不解密就直接计算加密数据会带来什么可能。
And this is an important thing because, like I asked, imagine what you could do if you could compute an encrypted data without decrypting it.
因此我们需要与客户深入交流,更好地理解应用场景。
So we need to talk to clients to understand better the use cases.
我们掌握大量用例,但这些都基于现有认知范畴。
We have a plethora of use cases, but those were related to what we know.
对吧?
Right?
所以,如果我提供一项技术,让你能做到以前无法实现的事情,但只有你自己知道那些是之前做不到的。
So if I offered you a technology that allowed you to do things that you were not able to do before, but only you know what you were not able to do before.
对吧?
Right?
因此这就是这种共同设计的过程。
So and then it's this co design.
IBM研究院通常会在非常前沿的领域开展研究。
So IBM Research typically will do research in very advanced areas.
我们还会尝试推动这些新兴技术落地,验证这些理论进展是否真能应用于日常生活。
We will also try to bring these emerging technologies, try to emerge them, and validate whether those theoretical advances and so on are actually useful or not for everyday usage.
所以这是个非常非常长线的布局。
So it's a very a very long game.
确实是个长期布局,但当你到达那个转折点时,就会变成一场速战速决的游戏。
It is it is a long game, but it's the kind of things that when you get to that inflection point, it becomes a very short game and you have to play it quickly.
因为当它到达转折点时,对所有人来说都是转折点。
Because it becomes when it gets to the inflection point, it gets to the inflection point for everybody.
对吧?
Right?
所有从事同态加密的人也都处于这个技术转折点。
Everybody out there that's doing homomorphic encryption is also in this inflection point of, technology.
我们之前邀请过Luca DeFeo,大概是几个月前,也可能是几周前或几个月前。
So we've had we had Luca DeFeo, on, I guess, a few months ago, a few weeks ago, a few months ago.
他也是IBM研究院的一员,主要从事同源映射方面的研究。
And he's also somebody who's in IBM research working, in his case, on isogenies.
我有点好奇,你会和Luca合作吗?
I'm curious a little bit, like, would you ever be working with Luca?
你们有不同的团队专注于这些不同课题吗?
Do you have different groups that focus on these different topics?
具体是怎么运作的?
How does that work?
我们有专注于不同课题的小组,但大家都会相互协作。
We have different groups that focus on different topics, but we all collaborate.
我们之间经常保持沟通交流。
We all talk to each other.
这不像你独自回到花园后的小屋,闭门造车然后带着新发明出来。
So it's not like you go back to your shed at the back of the garden, you do everything and come back out of it with a a new invention.
现在的工作方式不是这样的。
It doesn't work like that anymore.
对吧?
Right?
所以协作才是关键。
So collaboration is the key.
明白
Got
了。
it.
实际上我们与IBM内部的其他研究人员合作很多,同时也与其他机构的研究人员合作。
And we actually collaborate a lot with other researchers in IBM, but other researchers in other institutions.
我们的一些竞争对手在酷项目上也是我们的合作伙伴。
Some of our competitors are also our collaborators on the Cool.
算法要点。
Algorithmic points.
所以这不像是一个人偷偷做完然后带着新成果出现那样。
So it's it's not like some one man does it hidden and comes back with something else.
现在不是这样运作的。
It doesn't work that way.
这是
It's
这很有趣。
That's interesting.
如今协作要多得多。
A lot more collaboration these days.
这其实是我之前的一个疑问,关于IBM研究院与其他机构,比如英特尔研究院或其他研究团队之间是否存在某种竞争关系。
That was actually a question that I had was about whether or not there was sort of a competitiveness between maybe IBM Research and, I don't know, like Intel Research or these other groups that will be looking into things.
我在想,这里是否有点贝尔实验室或施乐帕克研究中心那样的氛围?
I wonder, does it does it have at all like echoes of Bell Labs or like Xerox PARC kind of vibe?
这些企业研究部门之间的?
These research these in like corporate research departments?
嗯,我没有。
Well, I didn't.
贝尔实验室早期开展研究时,我还没参加工作。
I wasn't at work when Bell Labs was doing what they were doing right at the beginning.
对吧?
Right?
不过我读过很多关于他们的资料。
So but I read a lot about them.
所以现在的情况和以前有些不同,但差异并不大,对吧?
So it is it is somewhat different than it was before, but not that much, right?
因为在企业内部,我们在IBM内部进行协作。
Because within the corporations we collaborate internally in IBM.
我们也会发表我们的研究成果,而像英特尔、微软这样的公司,以及学术界和大学等机构,同样会发表他们的工作成果。
We also publish the work that we do, and other people like Intel, Microsoft, and the academics, and the universities, and so on, they also publish their work.
我们共享信息,发表后供人们使用,他们可能会加以改进,然后我们就能看到这些改进。
And we share information, So we publish it given our people use and they might enhance it, and then we see those enhancements.
同样地,如果他们提出新想法,我们也会合作。
Similarly, if they come up with some new ideas, we collaborate.
可以说,现在整个行业对合作的态度友好了许多,但同时竞争依然激烈。
So the the community now has become a lot more amicable to collaboration, let's put this way, but also fierce competitors.
所以这是一种相当微妙的局面。
And so it's a kind of a it's a tricky scenario.
对吧?
Right?
在某些方面我们合作密切,而在另一些方面则竞争激烈。
Certain things we we we collaborate a lot, and certain things we compete a lot.
因此我们正在合作制定同态加密的标准。
So we are collaborating on the standards for homomorphic encryption.
这是一个临时组建的联盟。
So it's a consortium which is ad hoc.
所有从事同态加密研究的团队都认为或许我们应该将其标准化,否则大家会各自为政。
Everybody that's doing homomorphic encryption said oh perhaps we should standardize these, otherwise everybody's going through different directions.
如果由此诞生产品,人们如何确信它符合特定标准。
And if a product come out of it, how people can be assured that it's that follows certain standards.
过去几年我们一直在努力推进这项工作。
So we have been busy trying for the last few years trying to to get there.
标准化确实困难重重,因为各方都有自己的实现方案等等。
It is it is a hard thing to to standardize in terms of because each one had their own implementation, so on.
但现在我们正逐渐就某些方案达成共识,并寻找合适的标准组织加入或提交标准提案。
But now we are converging to certain schemes, and we are looking for which body, which standard body to join or to propose the standards for.
这听起来有点像零知识证明社区的现状,只是阶段稍晚——他们也在努力为电路设计及零知识证明的各种约束系统表达方式制定标准。
Sounds a little bit like the same, maybe slightly later in stage than the zero knowledge proof community, which is also trying to work towards creating standards for things like circuits and how to express these different constraint systems of a zero knowledge proof.
如果每个人都采用自己的方法,那么所有工具都无法互相兼容。
If everyone comes up with their own method, then none of the tools are interoperable.
你无法用一个库生成证明,再用另一个库验证它,诸如此类的操作都无法实现。
You can't, you know, create a proof with one library and verify it with another or anything like that.
如果实现标准化,就能开展更多的跨领域协作。
If it was standardized, then you'd have a lot more cross collaboration.
确实如此。
That's very true.
我们正处于这个阶段。
And we we are in that phase.
对吧?
Right?
我们认识到必须制定一个标准。
So we recognize that there needs to be a standard.
作为一个团体,我们正努力达成共识,并确定应在哪个层面实现这种互操作性。
And as as a group, we are trying to come up with that and identify at what level that interoperability should occur.
是的。
Yeah.
那么更具体地谈谈全同态加密(FHE),您在这方面的研究是什么?您又是如何对它产生兴趣的?
So focusing in a little bit more on FHEs, what's your research in this and, you know, how did you become interested in it?
嗯,我大约从2014年开始接触这个领域,当时我们约克镇的同事们正在进行大部分关于全同态加密的研究,而我们则在探索加速这项技术的可能性。
Well, I've I've started kind of working with it back in 2014 or so when our colleagues in Yorktown were doing the most of the research around FHE, and we were looking at options to accelerate that.
自那时起,我就一直投入时间研究它。
And since then, I've been devoting time to it.
更近些时候——我说的‘更近’是指2017年左右,大约2017年年中之后。
And more recently, when I say more recently, it's in 2017, half mid two thousand seventeen or so on.
那时我们开始进入所谓的发展新阶段。
Then we start entering what we call a new phase in the development.
因为在2009年,当我们的同事发明全同态加密时,我们开始了所谓的可行性验证阶段。
Because back in 2009 when people, when a colleague of ours invented FHE, then we start what we call the plausibility phase.
就是验证这项技术是否真的可行。
Let's check whether this is plausible or not.
对吧?
Right?
而从2012年左右开始,我们进入了改进阶段,这个阶段还行。
And from twenty twelve ish onwards, we started the improvement phase, which is okay.
它是可行的。
It's plausible.
我们怎样才能让它运行得更快一些?
How can we make it go a little faster?
对吧?
Right?
我们一直在进行这个改进阶段的工作。
And we have been working on this improvement phase.
但在2018年左右,整个行业都进入了可用性阶段,这时我们开始让技术更加健壮,便于普通用户使用。
But back in 2018, around the industry, everybody started the usability phase, which is when you start making the technology more robust, useful for usable by normal users.
你不需要成为数学家或密码学家也能编写。
You don't need to be a mathematician or a cryptographer to write.
你仍然需要那些人的帮助。
You still need the help of those.
对吧?
Right?
我们一直非常专注于标准部分和可维护性。
We've been focused on the standards part very much and the serviceability.
所以现在有很多软件工程工作正在进行。
So there's a lot of software engineering work going right now.
因此我从2014年开始就参与了不同阶段的工作。
And so I've been involved since 2014 in different phases of this.
现在由我领导这个团队。
And now I lead the team.
我们有一个团队,大家共同努力使这项技术更具可用性。
Well, we have a team and we we all work together in making this a more usable technology.
在加入IBM之前你是做什么的?
What were you doing before IBM?
哦,
Oh,
在加入IBM之前,我有过几段不同的人生经历。
before IBM, I had several past lives.
这么说吧。
Let's put it this way.
我们都有过这样的经历。
As have we all.
我是一名物理学家。
I I am a physicist.
在加入IBM之前,我主要从事高性能计算和雷达技术方面的工作。
And before before IBM, I worked a lot on high performance computing and radar technologies.
哦,真酷。
Oh, cool.
所以我想回到回答问题的重点,一个人能做些什么?
So I want to get to that point of answering the question, what could one do?
但我觉得要回答那个问题,我们首先得弄清楚FHE到底是什么。
But I think to get there, we have to first answer what FHEs actually are.
就像我们之前提到的,节目中也简单讨论过,但我想我们从未真正从基础层面定义过它。
As as we said earlier, we've talked about it a little bit on the show before, but we've never really defined it, I think, in any sort of fundamental level.
全同态加密。
So fully homomorphic encryption.
明白吗?
You know?
我理解'全'这个词,也理解'加密'这个词。
I understand the word fully, and I understand the word encryption.
中间那个词是什么意思?它们又是如何关联起来的?
What's the thing in between and how does it link together?
我想从你之前的解释开始——对加密数据进行计算,我觉得这是个...嗯...
I guess starting at, you know, the your explanation from before, computing on encrypted data, I think that that's a Mhmm.
很好的总结。
A good summary.
但如果我们再深入一层,FHE到底意味着什么?
But if we dig one level deeper, you know, what what does FHE mean?
对。
Right.
2009年,当IBM研究员克雷格·金特里发明FHE时,在此之前,所谓的加密数据计算,指的是能否在不解密的情况下对加密数据执行基本数学运算——主要是乘法和加法。
So in 2009 when Craig Gentry, an IBM researcher, invented FHE, until then, you could when you say compute on encrypted data, it means, can you perform the basic mathematical operations, which are basically multiplications and sums on encrypted data without decrypting it.
在此之前,你只能在同一种加密方案中实现其中一种运算,这在一定程度上限制了实际可操作的范围。
Until then you could do one or the other within this, what we call the same scheme, which basically limited to some extent what you can actually do.
对吧?
Right?
能在同一方案中同时进行加法和乘法运算,就能实现复杂计算,因为你可以把这些运算作为基础构建模块。
By being able to do both sums and multiplications within the same scheme allows you to perform complex computation because you can use those as your building blocks.
比如机器学习,你能用机器学习语言做什么?
So if you think of a machine learning, what's the machine learning language that you can do?
你可以进行加法和乘法运算,然后在这个基础能力之上构建其他所有功能。
You can do sums and multiplications, and now you build everything else on top of that basic capability.
对吧?
Right?
有了这个基础,你可以进行矩阵运算,或者人们所说的多项式求值,通过多项式求值,你可以近似实现各种数学功能,这意味着你能够构建非常复杂的运算。
With that, you can think of a matrix computation, you can think in what people call polynomial evaluation, and with polynomial evaluation you can approximate functionality, mathematical functionality, which basically means that you can build very complex operations.
在这种加密的黑箱中同时具备加法和乘法的能力,这就是‘完全’的部分吗?
Is the ability to do both addition and multiplication in this encrypted kind of black box, Is that the fully part?
这就是你称之为‘全同态’的原因吗?
Is that why you say fully homomorphic?
因为你能同时进行两种运算,还是说‘完全’另有所指?
Because you can do both at the same time, or is the fully does that mean something else?
‘完全’意味着除了加法乘法外还有其他特性。
The fully means that plus something else.
好的。
Okay.
明白了。
Okay.
所以‘完全’意味着你可以进行加法、乘法运算,再加上
So fully is you can add and multiply plus
再加上能够进行任意深度的计算,意味着你可以几乎无限次地进行加法和乘法运算。
Plus the ability to have arbitrary depth of computation, meaning that you can keep doing additions, multiplications almost forever.
这就是‘完全’部分的含义。
That's the Foley part.
酷。
Cool.
部分同态加密只能支持其中一种运算。
The partial homomorphic encryption is either one or the other.
明白了。
Got it.
对吧?
Right?
还有另一种叫做‘部分同态加密’,它只能进行有限次数的加法和乘法运算,超过这个次数后就无法解密或理解数据了。
And there is another one, which is the somewhat hallmark encryption that is related to being able to do just a certain number of multiplications and additions before you can't decrypt anymore, before before you can't make sense of it.
明白了。
Got it.
所以它没有你刚才描述的那种深度,我猜。
So it doesn't have the depth that you just described, I guess.
没错。
That's right.
是的。
Yes.
不过,深度已经足够用了。
Well, but there is enough depth.
我们至今构建的大多数应用并不需要无限深度。
So we most of the applications that we have built to date did not require the infinite depth.
所谓的无限深度并非真正无限,而是当你运算到一定深度后,可以执行一个叫做'自举'的操作——它本质上会重新加密所有内容到顶层,然后你可以继续运算,再次重新加密到顶层,如此循环。
And the infinite depth or so is not because it's infinite, it's because when you you perform up to a certain depth and then you perform an operation that's called bootstrap, which basically means that recrypts everything to the top and you can keep doing it, and you recrypt to the top again, and you keep doing it.
这就是完全同态加密的部分。
So this is the fully fully homomorphic encryption part.
好的。
Okay.
这就是完全。
That's the fully.
是的。
Yes.
那么,如果我理解正确的话,这本质上意味着你可以对任意类型的计算进行操作。
So I think that if I understand it correctly, that essentially means you can do it on any sort of arbitrary computation.
所以,如果我完全拥有它,我基本上可以表达任何程序来对这些数据进行计算,对吧?
So I I can essentially express any arbitrary program to compute over this data, right, if I have it fully.
而如果我在深度上受限,那么我的程序能做的事情也会受限。
Whereas if it if I'm limited in-depth, I'm limited in what I what my program can do.
是这样吗?
Is that right?
你的程序在算法上是受限的。
You are limited on the on the algorithm of your program.
是的。
Yeah.
对。
Yes.
比如操作步骤的数量。
Like the number of operational steps.
但我们可以将其与协议结合,如果有两方进行计算,服务器端在云端计算,客户端参与。
But we can combine that with protocols if you have two parties computing, the server part computing the cloud there, the client.
你可以将计算推进到某一阶段后传回给你,解密后重新加密,再传回去继续计算。
You could potentially perform the computation up to a certain level, send it back to you, you decrypt, recrypt, and send it back to be computed again.
这都涉及到我们之前略过的计算特性之一——每次加密信息时,本质上都是将其隐藏在海量干扰和噪声中。
So it all has to do with one of the characteristics of that computation that we kind of skipped a bit, which has to do with whenever you encrypt a message, something, right, you are basically hiding that in a lot of clutter, in a lot of noise.
我们就是这么称呼它的。
That's what we call it.
所以你的信息就藏在这种模糊的噪声里。
So and your message is hidden in this fuzzy noise.
每次进行加法运算时,噪声都会累积。
Whenever you perform an addition, that noise adds up.
而每次进行乘法运算时,噪声则会倍增。
But whenever you perform a multiplication that noise multiplies.
对吧?
Right?
所以如果噪声增长过大,你就无法再解密结果了。
So and if this noise grows too big you can't decrypt the result anymore.
实际上你在结果上看不到任何有效信息。
You can't actually see anything on the result.
因此相关技术会随着计算过程逐步降低噪声,但降噪能力有限,最终你必须中止当前的计算。
So the techniques that are involved are some techniques that are involved to reduce the noise as you go along, but there is only so much you can reduce and then you have to stop the computation that you are doing.
正如之前提到的,某些特定电路结构能支持你完成所需计算。
So there are certain circuits, as one mentioned before, that allows you to perform the computation that you need to do.
例如在机器学习中的逻辑回归,我们无需重新加密或引导操作就能实现。
For instance, logistic regression for machine learning, we can do that without requiring this recryption or this bootstrapping.
因此,在进行引导或重新加密之前,可以执行相当复杂的计算。
So it can be quite complex computation that one can do before you need to bootstrap or to recruit.
嗯。
Mhmm.
在一些与神经网络相关的场景中,你需要执行引导操作。
And some of the more neural network related scenarios, you need to perform a bootstrapping.
这让我想到中间那个词——同态。
So this leads me to the the middle word, homomorphic.
对。
Right.
它的意思就是变形后仍保持相同性质。
It just means that it morphs to something that is the same.
它是属性保持的,不过
It's property preserving, but
确实如此。
It is.
没错。
Exactly.
它保留了什么呢?
What is it preserving?
同态。
Homomorphic.
当运算具有同态特性时,我们暂且先不考虑加密。
When when when operations have a homomorphic characteristics, let's just forget about encryption for a moment.
对吧?
Right?
如果你有两个值,值a和值b,以及一个函数f,你可以分别计算f(a)和f(b),然后通过乘法或加法对这两个结果进行运算。
So if you have a value two values, value a and value b, And you have a function f, so you can compute f as one thing, and you can compute f as a separate thing, and then you can operate those two by multiplying or by adding them together.
如果具有同态性质,就意味着F(a)加F(b)等于f(a+b)。
Now if this have homomorphic properties it means that the same F plus F equates to the f of a plus b.
这就是它的来源。
And that's that's where the it comes from.
所以如果我能以这种方式执行该操作,就意味着我分开执行的操作与合并执行代表的是同一回事。
So if I can perform that operation that way, it means that what I'm doing separately represents the same thing as if it was done together.
因此,如果该函数是加密函数,我可以说值a的加密与值b的加密进行运算后,等于a加b的加密结果。
So if that function is an encryption function, I can say that the encryption of the value a operated with the encryption of value b is equals the encryption of a plus b.
对吧?
Right?
而这正是你在加密数据上执行此类计算,最终仍能表示相同结果的方式。
And and that is how you perform this computation on encrypted data that represents the same result at the end.
这是否意味着,这种思路为并行处理的概念开辟了可能性?因为数据规模会...等等
Does this mean does this sort of, like, open up an opportunity for this idea of, like, parallel parallelization, this idea of doing things at the same time because the sizes would be is it wait.
实际上,可能我完全理解错了。
Actually, maybe this is completely off.
但输出结果的规模是否相同?
But are the are the sizes of the outcomes the same?
你所说的a加b,是指结果的规模还是其他什么?
Is that when you say a plus b, or is it more like the outcome?
不。
No.
是结果。
The the outcome.
实际值。
The actual value.
结果。
Outcome.
所以如果我有一个加密的1加上加密的2,就等于加密的1加2。
So if I have an encryption of one plus encryption of two, that is the same as the encryption of one plus two.
好的。
Okay.
对。
Right.
这是其中一部分。
This is this is one part.
明白
Got
懂了。
it.
关于并行化,这是某些实现具备的另一特性,它允许你将数据打包进一个密文中。
Now regarding parallelization, this is another property that certain implementations have which allows you to pack data in a ciphertext.
那么什么是密文呢?
So what is a ciphertext?
密文是指获取一个值并对其进行加密,结果就是密文——即你的消息或数值被隐藏在这层噪声中。
Ciphertext is take up a value and you encrypt that value, the result is a ciphertext, is your messages, your values hidden in this noise.
这就是你的密文。
So this is your ciphertext.
由于密文的构建方式,你可以认为一个密文内部可以包含多个被编码的元素。
Because the way the ciphertext is construct, you could consider that a ciphertext can have several elements inside, encoded inside.
对吧?
Right?
如果在密文中编码了多个元素,那么当你对两个密文进行操作时,这些操作会按元素逐个执行。
And if you have several elements encoded inside the ciphertexts, whenever you do an operation between two ciphertexts, you do them element wise.
因此我可以将数值1、10和15全部放入同一个密文中。
So I can put the value one, the value 10, the value 15, all in the same ciphertext.
然后所有密文中对应的元素位置都存储着其他数值。
And then on all the ciphertexts I have other values in the corresponding elements.
无论我是进行乘法还是加法运算,这些操作都是按元素逐个进行的。
Whenever I multiply or whenever I add to ciphertexts, those happen element wise.
即第一个密文的第一个元素与第二个密文的第一个元素进行运算。
So the first element of the first ciphertext with the first element of the second ciphertext.
这确实能实现并行化——只要你能将问题转化为向量解法。从数学角度或者说逻辑上,它具有一种特性(我不确定现在还有多少人记得学校教的这个概念),称为单指令多数据流。
And this does allow for parallelization as long as you can turn your problem into a vector solution, which mathematically or we tend to say so, is logically, it has a kind of property that I'm not sure how many people remember that in school, which is called single instruction multiple data.
通过单条指令对多个数据进行操作,这就是其实现原理。
You have multiple data being operated in a single instruction, so that's when you then you get that.
不知道现在学校还教不教这个了。
I don't know if they teach that in schools anymore.
但是
But
是的。
Yes.
我不确定学校是否还教这个,但SIMD至少在工程领域,当你想优化某些东西时,就会想到它。
I don't know if they teach it in schools, but SIMD, at least, is like, in engineering and, like, oh, I wanna optimize something.
让我看看是否有某些SIMD指令可以利用,让一切运行得更快。
Let me see if there's some SIMD instruction somewhere that I can exploit to make everything faster.
完全一样。
That's exactly the same.
我们非常擅长利用这一点,将问题转化为输入数据,使其能以SIMD方式解决。
And we we got very good at exploiting that, at taking a problem and kind of converting the input data in a way that it can be resolved in a SIMD way.
全同态加密另一个有趣的方面是,如果所有中间值都被加密,代码中的条件变量也被加密,如何进行分支判断?
The other interesting aspect of a full homofficial encryption is if everything is encrypted, if all the intermediate values are encrypted and everything and conditional variables in your code are encrypted, how do you branch?
因为你无法进行测试。
Because you can't test.
你无法测试变量是真是假来做出决策。
You can't test whether variable is true or if the variable is false to make a decision.
所以你最终不得不执行代码中的所有部分。
So you end up having to execute all the parts in your code.
对。
Right.
这基本上意味着你开始设计代码时减少那些分支,因为它们代价高昂
Which basically means that you start designing this the code with less of those branches because those cost
非常大。
a lot.
对吧?
Right?
并尝试让它们更像电路一样运作。
And try to make them more like a circuit.
是的。
Yeah.
这很有道理。
That makes sense.
通常来说,尤其如果你不够谨慎,程序中很容易出现路径爆炸的情况,确实如此。
And usually, especially if you're not very careful, you can very easily have an explosion of paths in a program where it's Yes.
我是说,确实如此。
I mean, it Yes.
这有点像是指数级增长。
It's sort of increases exponentially.
哦,不。
Oh, no.
是啊。
Yeah.
哦,对。
Oh, yeah.
我们我们经历过这种情况,因为这是我们同态加密研究资助项目的一部分。
We we have been through through that because it's part of the grants that we have for working homomorphic encryption.
我们有一项来自美国政府的项目,旨在构建一个安全计算工具链,其中同态加密是其中之一。
We have one from the US government to to construct a tool chain for secure computation, homomorphic encryption being one of them.
所以我们的构想是,程序员可以在更高层次上通过常规编码方式加上一些注解,来指定应用程序的安全约束条件。
So the idea is at the higher level the programmer can specify the security constraints of the application and code in a normal way with some annotations.
然后这个工具链会处理编译器版本、中间语言等所有环节,将其转换为同态加密代码。
And then this tool chain will do the compiler version, the the intermediate language and everything to convert it down to homewash encryption code.
作为该项目的一部分,我们正在设置他们所称的'挑战'。
So as part of this program, we are giving what they call challenges.
挑战其实就是具体应用场景。
Challenges are use cases.
对吧?
Right?
就是给定一个应用场景,你必须用现有的技术、算法和理论,以最佳方式来解决它。
A given use case that you have to solve the best way you can with the technology that you have, with the algorithm that you have, with the theory that you have.
而目前有些解决方案实际上只是解释性的说明,对吧?
And some of the solutions are really explanation, right, today.
对吧?
Right?
但我们知道如何推进此事,并利用我们讨论过的一些技术,比如SIMD操作或信息打包。
But we know how to move this forward and take advantage of some of the techniques that we discussed, like the SIMD operations or packing of more information.
但我们无法在程序员层面实现这一点。
But we don't we we can't do that at the programmer level.
我们希望程序员编写任何程序,然后由编译器负责进行修正。
We expect the programmer will write any program and then the compiler has is the one that has to to fix it up.
如今是密码学家帮助应用程序正确理解。
Today is a cryptographer helping the application right to understand.
所以我们必须理解业务需求是什么?
So you we have to understand what's the business case?
你的
What is your
是的。
Yeah.
算法?
Algorithm?
好的。
Okay.
那么如果我们操作这部手机里的数据,是否能让同态加密更容易实现性能提升?
So what if we manipulate the data in this phone that will make it easier for the homoffic encryption to achieve performance?
在我们继续之前——可能这个问题解释起来太复杂了——但我很好奇,当初的突破点究竟是什么?
Before we move on, and maybe this gets too complicated to explain, I don't know, but I'm curious, what was the innovation?
具体来说,2009年究竟采取了什么关键步骤,才真正实现了将加法与乘法运算统一在同一个框架下?
Like, what was the step that, you know, was taken in 2009 that actually managed to achieve bringing together addition and multiplication under one roof here.
没错。
Right.
这并非某个人突然灵光一现就说'啊,我要解决这个问题'那么简单。
So it wasn't like it isn't someone that woke that woke up and they just said, oh, this I'm gonna solve it.
对吧?
Right?
这是建立在大量先前研究基础上的,那些研究当时正在进行。
So this is building on a lot of research and that had happened before, that was happening at the time.
而最后一步是要理解如何利用格结构,使其在同一方案中同时具备乘法和加法的特性。
And the the the final step was to understand how to work with lattices that would exhibit the property for both multiplications and additions within the same scheme.
没错,这才是最棘手的部分。
Right, so that was the tricky part.
但当时全球有很多研究人员在合作,克雷格实际上在他的博士论文中借鉴了其他人的研究,只是前人没能走得更远。
But there was a lot researchers around the world were collaborating and Craig, he wrote his PhD thesis actually, was based on other research that other people did, but cannot got that far enough.
对吧?
Right?
所以他能够整合不同成果,最终提出了可行的解决方案。
So he was able to combine different things and come up with a solution that would work.
自那以后,出现了许多改进方案和不同方法来实现相同目标。
And since then, there have been a lot of improvements and different solutions that accomplish accomplish the same.
我们能不能具体聊聊格...这个该怎么念?
Can we can we actually talk about what lattice how do you say this?
展开剩余字幕(还有 314 条)
是类似基于格的密码学吗?
Is it like lattice based cryptography?
你是这么表述的吗
Is that the way you word
那个?
that?
对。
Yeah.
我们可能需要几期播客才能讲清楚这个。
We're gonna need we're gonna need a few podcasts just to go through that.
这确实是个好消息。
That's actually good to hear.
也许我们应该
Maybe we should
这么做。
do that.
绝对应该这么做。
Should absolutely do that.
我们之前讨论过基于格的加密技术,尤其是一些量子安全算法依赖于这方面的原理。
We have touched on lattice based crypto before, especially there are some, you know, quantum secure algorithms that depend on some of this.
对吧?
Right?
这是否意味着
Does that mean that
是的。
Yes.
你知道,全同态加密是量子安全的吗?
You know, the FHE is quantum secure?
确实是。
It it is.
这意味着全同态加密——我喜欢称之为量子抗性的。
That means that FHE I I like to say that is quantum resistant.
是的。
Yeah.
这是最恰当的说法。
That's the appropriate way to
就我们目前所知,没有任何量子算法能够以低于普通计算机的复杂度破解同态加密。
To put the best of our knowledge today, there is no quantum algorithm that can break Komorphic encryption with any less complexity than a normal computer.
因此我认为它是量子安全或者说抗量子的。
So it is quantum safe or quantum resistant, I would say.
我想我们已经讲完这三个词了。
So I think we've covered the three words.
对吧?
Right?
全同态加密。
Fully homomorphic encryption.
关于最后一个词'加密',我们还有什么需要补充的吗?
I is there anything else we have to say about the last one, encryption?
我是说,它基于格密码这个事实对我来说是个新闻。
I mean, I I the the fact that it was lattice based is news to me.
这很有趣。
That was interesting.
是的。
Yes.
没错。
Yes.
它是基于格密码的。
It is lattice based.
而且它还是个公私钥方案。
And and also it's a public private scheme.
你可以用同一个密钥进行加密和部分计算操作,但只有持有私钥才能解密。
So you encrypt with one key that you also use for certain of the computations, but you can only decrypt if you have the secret key.
我想提出一个使用场景,如果我拥有完全同态加密的无限使用权,我会这样应用它。
I I wanna throw out a use case, something that I imagine that I would use if I had unlimited access to fully homorephic encryption.
然后你可以告诉我现实中我们离这个目标有多接近。
And then you can tell me how close we are to that in reality.
好的。
Okay.
而且这个应用场景现在非常相关,因为基本上目前不存在具备端到端加密的视频会议软件。
And the use cases are very relevant right now where there's basically doesn't exist a video conferencing software that has end to end encryption.
原因是服务器需要处理,你看,每个人都在发送他们的视频流。
And the reason is that the server so, you know, every person is sending their video feed.
如果用点对点方式实现,你需要同时接收可能10个视频流,而只发送1个。
And if you did it in a PTP way, then you would be sending you would be receiving maybe 10 video feeds and you'd be sending one.
但每秒接收10个视频流,约30兆比特的流量,对任何人的网络连接来说都太大了。
But receiving 10 video feeds, something like 30 megabits per second, is too much for anyone's connection.
所以实际做法是先发送到中央服务器,由服务器将这些视频流复用合并后再发回一个流。
So what actually is done is you send it to a central server, that central server multiplexes together all these video streams and sends you one.
因此无论会议室有多少人,作为参与者你始终只发送和接收一个视频流。
So regardless of how many people are in a room, you know, you as a participant only send and receive one stream.
服务器之所以能将所有这些不同的流合并成一个,是因为它们没有被加密。
And it like, the server can multiplex all these different streams into one because they're not encrypted.
因此视频会议无法实现端到端加密,因为它无法完成这个基本操作。
And so you can't have end to end encryption in video conferencing because, you know, it it can't perform this basic action.
人们讨论过各种解决方案,但一个非常巧妙的方法是:我向服务器发送加密流,服务器能接收这10个不同的加密流,计算它们的合并版本并发送出去,然后参与者可以解密。
So people talk about various ways of solving this, but one very neat way of solving it would be if I can send the server an encrypted stream, and then the server can take all these 10 different encrypted streams and compute a multiplexed version of that and send it out, and then the participants can decrypt it.
如果不需要同态加密所需的大密文量,这确实是个有趣的用例。
That's that's an interesting use case if it wasn't for the size of the ciphertext required for homomorphic encryption.
对吧?
Right?
因为密文的体积非常庞大。
Because the size of the ciphertext is very large.
是的。
Yeah.
而且使用不同的密钥进行加密和计算虽然可行,但目前还处于研究阶段。
And that having having different keys of encryption and computation, is possible but it is something that's on the research side right now.
所以我们有一些关于如何实现这一点的想法。
So we have ideas on how that can be done.
比如我用我的密钥加密,你用你的密钥加密,安妮用她的密钥加密,然后我们将所有内容复用在一起,再发送回去解密。
Like I encrypt with my key, you encrypt it with your with your key, and Anne encrypts with her key, and and then we multiplex everything together, and then we send it back for decryption.
问题在于你会使用哪个密钥进行解密。
The question is which key you would use for decrypting.
目前有一些方法可以实现这一点,其中一些还处于非常实验性的阶段。
So there are ways, some of these is very experimental at the moment, that allows you to do that.
但在同态加密方面,这将涉及密文处理。
But on the homomorphic encryption side it would be the ciphertext.
如果我们能以某种方式编码视频,使其能在密文中表示,那将导致非常大的数据量,再加上计算需要一定时间来完成。
If we are able to encode the video in a way that we can represent on the ciphertext it would be very large size which would make it, plus the computation would take some time to perform.
这种复用操作将无法
That multiplexing operation wouldn't be
无法实现实时处理。
It wouldn't be real time.
不会那么简单。
Wouldn't be as simple.
这么说吧。
Let's put it this way.
同态加密适用于可以接受一定延迟的应用场景。
Home offer encryption is good for applications that you can accept some delay.
目前它对实时计算效率不高,因为加解密过程耗时较长。
It's not today not efficient for real time computation because the encryption decryption takes takes a long time.
我们一直在研究加速方案,探索如何利用硬件加速。
We have been working on acceleration and we have been looking at how to use hardware acceleration.
即便是硬件加速的目标,我们也只能将某些基准用例的同态计算开销降低到10倍。
And even the goal for hardware acceleration is that we can bring the overhead of homomorphic computation for certain benchmark use cases down to 10 times.
嗯。
Mhmm.
也就是说,进行同态加密计算所需时间是非加密计算的10倍。
Meaning, it takes 10 times longer to perform computation homomorphically than it takes to perform it without encryption.
您能说说这些基准应用具体是什么吗?它真正擅长哪些方面?
What are those benchmark applications and and sort of what is it really good at, would you say?
我们获得的基准应用主要与机器学习场景相关,无论是基于回归的机器学习还是卷积神经网络机器学习场景,看我们能否在最多10倍开销下实现。
So the the benchmark applications that we have been given are related to machine learning scenarios, either regression based machine learning or convolutional neural network machine learning scenarios, whether we can do that with at most 10 times the overhead.
所以这类似于在加密数据上训练模型?
So this would be something like training a model on encrypted data?
在加密数据上训练模型或进行推理,也就是实际预测。
Training a model on encrypted data or doing the inference, doing the actual prediction.
使用基于回归的算法对加密数据进行预测,在给定变量数量(我们案例中是16个)的情况下,我们已将其开销降至50倍。
So doing the prediction on encrypted data using regression based algorithms with a given number of variables, 16 in our case, we got it down to 50 times.
这相当于一个数量级。
So it's one order of magnitude.
不是上千倍。
It's not a thousand.
所以我得到了50倍的开销。
So I got that 50 times the the overhead.
现在进行训练,耗时会更长,因为即便没有加密,训练过程本身也比单纯推理耗时更多。
Now doing the training, it's longer even because the doing the training even without the encryption takes longer than just the inference.
但训练模型的能力至关重要。
But the ability to train a model is important.
另一个成功的实验案例是我们与某银行客户合作完成的,即用加密的新数据对现有模型进行再训练。
The other experiment, a successful experiment that we did, and that was part of the work that we did with one of the our clients, our banking clients, was to retrain an existing model with new data that's encrypted.
因为使用回归算法或机器学习算法的人常会遇到这种情况:随着数据变化,模型会逐渐失去原有的稳定性。
Because this is something that most of the people that use regression based algorithms or machine learning based algorithms do is over time as your data changes your model loses its stability.
它不再像从前那样稳定了。
It's not as stable as it was before.
所以当你获得用于预测的新数据或新趋势时,能否以加密形式接收这些数据并重新训练模型?
So if you're getting new data or new trends that are for the prediction that you are, can you take that new data in an encrypted form and retrain your model?
没错。
Right.
这正是我们证明可以实现的一个功能。
So this is one thing that we we show could be done.
我们发表了一篇相关论文,并作为概念验证进行了实现。
We published a paper on that and we implemented it as a proof of concept.
因此,我们的下一步是将现有成果泛化,使其成为同态加密之上可执行的高级功能之一。
So our next steps will be to generalize what we did and that being offered as one of these higher level functions that you can do over home morphing encryption.
你不仅限于做加减乘除,实际上可以调用一个能实现更具体功能的函数。
You don't need to do just multiplications and additions, it can actually call a function that will do something more tangible.
难点在于哪些因素导致了大量额外时间的消耗?
Is the tricky bits like what what causes a lot of the extra time?
是因为你提到的视频案例中数据量的问题吗?是数据规模还是计算复杂度,或者两者兼有?
Is it because with the video example you mentioned, like, the size of the data, so is it the size of the data or the complexity of the the computation or maybe both?
哪个因素会让情况变得更糟?
Like, which one makes the situation worse?
如果我用两个小数相乘,会比用两个大数相乘慢很多吗?
If I had two small numbers and multiply them together, would it be way slower to have two large numbers and multiply them together?
还是说连续进行两次乘法运算会更慢?
Or would it be way slower to, like, multiply them twice?
你明白我的意思吗?
You know what I mean?
对。
Right.
这更多与数据量有关。
It's it's more related to the volume.
让我们暂时从视频场景中抽象出来。
So let's ups let let's abstract from the from the video scenario for a moment.
加密。
Encryption.
当我加密某物时,正如我所说,我实际上是在隐藏我的信息、我的秘密,以及大量噪音。
So when I encrypt something, I basically, as I said, I'm hiding my message, my secrets, and a lot of noise.
其表现形式是通过构建一个多项式,对吧?
The way this is represented is by constructing a polynomial, right?
密文由两个多项式表示,而不仅仅是一个。
The ciphertext is represented by two polynomials, not only one.
要实现所需的安全性,这个多项式的阶数需要达到6万甚至更高。
And the order of this polynomial to achieve the security that is needed is of the order 60,000 to more.
所以我们可以根据安全需求适当降低,比如3.2万左右。
So we can do it depending on the security you can be a little less like 32,000 or so.
但这就是多项式的阶数范围。
But that's the order of the polynomial.
而这个多项式的系数可以有400、600甚至更多比特位。
Now the coefficient of that polynomial can have 400, 600 or more bits.
对吧?
Right?
目前没有计算机能进行600比特或1000比特的计算。
So there is no computer that can do 600 bits or a thousand bits computation today.
因此你需要将其分解为更小的表示形式,基本上就是把600比特的系数分解成相当于54或64比特的系数。
So you need to break that into a smaller representation which basically takes those 600 bits coefficient, break them in the equivalent of a 54 or 64 bits coefficients.
但这样最终会得到更多的系数项。
But then you end up with a lot more of those.
对吧?
Right?
而在底层计算这些运算的方法,实际上是通过执行快速傅里叶变换来实现的。
And the way you compute these operations at the lower level is actually by performing fast Fourier transforms.
所以是FFT。
So FFTs.
可以再回来。
Can back again.
就是这样。
There you go.
所以你可以看到,即使我只有一个数字,比如数字5,或者说数字1,它只有两位。
So you can see that if even if I have one number, let's say the number five, right, which is or the number one that's only two bits.
当我需要在密文中表示这些时,最终会得到一个庞大的密文来存放值1。
When I need to represent those in this ciphertext, I end up with this massive ciphertext to put the value one.
如果我利用之前提到的打包技术,就可以在其中放入多个值,从而能够同态地执行运算。
If I exploit that packing that I mentioned before, I can put multiple values in there so the operations can be performed homomorphically.
这就是计算发生的地方,对吧?
And that's where the computation goes, right?
所以这就是额外开销产生的地方。
So that's where the additional overhead happens.
而降低这种计算成本的关键,在于审视具体用例,看看我能多大程度利用这种SIMD特性——也就是我的同态访问能力。
And the trick in making this computation not so expensive is actually looking at the use case and see how much of that I can do, I can exploit that SIMD characteristic that my suffered access.
没错,这种利用程度越高越好,因为计算1个元素和16,000个元素的乘法成本是相同的。
Right, the more I can do that the better because the cost of me doing a multiplication of one element or 16,000 elements is the same.
对吧?
Right?
在我们之前的预测场景中,使用的批量预测大小就是16,000。
So in the prediction scenario that we had, the batch size that we were using for predictions was 16,000.
因此我能在十秒内完成16,000次预测——在256位安全级别下仅需不到十秒。
So I could do 16,000 predictions in tens about ten seconds, just under ten seconds for a security level of 256 bits.
如果不利用这种能力,单次预测同样需要十秒。
To do one prediction takes the same ten seconds if I don't exploit that capability.
但这同时也意味着,如果我有另一个线程可以运行单独的预测任务,我就能在同样的十秒内完成另一批16,000次预测。
But that also means that to do if if I had another thread where I can run a separate prediction, I can do another batch of 16,000 within the same ten seconds.
所以数量会从16,000翻倍到32,000,再到64,000,而总时间仍是十秒。
So it goes 16,000, 32,000, 64,000 within the same ten seconds.
因此人们常常试图用均摊成本来讨论同态加密的性能表现。
So quite often people try to to talk about the performance of homomorphic encryption in terms of the amortized costs.
我不太认同这种方式,因为除非你进行批量处理,否则根本谈不上什么成本均摊。
I don't like that too much because unless you are doing a batch, it's you are not gonna amortize anything.
对吧?
Right?
嗯...
So Mhmm.
我
I
这里有个非常基础的后续问题:乘法和加法在耗时上是否存在差异?
have sort of a bay very basic follow-up question here on is there a difference in terms of the timing of multiplication and addition?
是的。
Yes.
比如,如果你用的是部分同态加密或其他类型的加密方案,我记不清具体是哪些了。
Like, so if you had this partial homomorphic encryption or one of the other ones, I forget what they were.
某种程度上是。
Somewhat.
部分同态加密,然后某种程度上是。
Partial and then Somewhat.
对。
Yes.
如果只进行加法运算,速度会更快吗?
If you were doing just addition, would that be faster?
即使使用完全同态加密实现,如果只进行加法运算,速度也比进行乘法运算要快。
If I'm doing even using a fully homomorphic encryption implementation, if I'm doing only additions, it's faster than if I'm doing multiplications.
乘法运算的成本更高。
Multiplications are costier.
好的。
Okay.
但乘法几乎无处不在,所以当你进行推理或预测部分时,你需要做一次矩阵乘法和一次多项式求值。
But multiplications are used in pretty much everything so when you're doing the inference part or the prediction part you do one matrix multiplication plus one polynomial evaluation.
所以你不能,但某些类型的聚合操作——这其实是个重要的问题——比如我在处理电子表格中的列数据时,如果只是进行求和、计算标准差或平均值这类聚合运算。
So you can't, but certain types of aggregation, this is an important actually question that you ask because certain type of operations, like if I'm doing, let's say I have a spreadsheet with values, column values, and what I'm doing is I'm doing only doing aggregation, which is basically summing those values and computing standard deviation or or computing averages and so on.
这可以高效完成,因为你只需要将所有列相加,最后计算平均值或标准差(后者涉及平方运算)。
That can be done quite efficiently because now the only thing you are doing is all the columns are going to be added and at the end you're going to compute an average or you're going to compute a standard deviation which involves a squaring.
根据你想隐藏(即保护)的内容,可以采用混合方案——比如若不在意数据库条目数量,就不需要对除法进行同态加密。
Depending what you want to hide, when I say hide is what you want to protect, You can use some hybrid schemes like I don't need to do the division homomorphically if I'm not worried about how many entries I have in the database.
也就是我的电子表格中有多少行数据。
So how many how many rows I had in my in my spreadsheet.
对。
Yeah.
所以确实存在一些人们可以操作的有趣场景。
So there is there's some there's some interesting scenarios that people can do.
我第一次深入研究全同态加密构造是在一个学习小组里。
One of the first times that I dug into FHE constructions was in, it was in a study club that I did.
实际上是关于Nigel Smart的SPEEDS协议,它结合了多方计算和全同态加密。
It was actually about Nigel Smart's SPEEDS protocol, which combined MPCs and FHEs.
嗯。
Mhmm.
我当时就在想这种技术与其他方法的结合。
And I wondered about that combination with other techniques.
你看,全同态加密可以像我们看到的那样与多方计算结合使用,但它们是否也能与零知识证明协同使用或以某种方式结合呢?
You know, are are FHEs, like, they can be used in tandem with MPCs as we've seen, but can they also be used in tandem or somehow together with, like, zero knowledge proofs?
所有这些技术都可以互为补充。
All these technologies can be complementary.
它们可以相互配合。
They can complement each other.
所以全同态加密并非万能的解决方案。
So FHE is not the answer for everything.
对吧?
Right?
所以这个方案虽然不错,但在处理安全问题时,必须分析你具体要保护什么、存在哪些风险,从而运用恰当的技术组合来降低风险。
So it is it is a a nice scheme and so on but whenever you deal with security you have to analyse what exactly you want to protect, what is at risk, And therefore use the correct combination of technologies to mitigate any risk or so.
比如当下常见的一个场景是联合营销,多家公司拥有部分重叠的数据集,希望共同进行某种分析。
So let's say one of the scenarios that's quite common these days is called co marketing, when you have different companies that may have data sets that overlap and they want to perform some type of analysis on this data set.
但他们显然需要保护数据中的个人隐私信息,比如姓名等。
But they want obviously to protect the privacy of the entries in there or the names of people or so on.
这时可以采用同态加密来计算所谓的集合交集。
So you could use homomorphic encryption to compute what is called the set intersect.
集合交集只会显示各方在特定用途上的共同数据部分。
The set intersect is revealing only what is common to everybody in terms of whatever you use for that common.
举个病毒爆发前的简单例子:前往某城市的旅客会搭乘特定航班、预订酒店、光顾餐厅等等。
Let's say a simple example that happened before this virus scenario, but people traveling to a given city will be going using a given airline, they will be reserving hotels, they're going to be going to restaurants and so on.
而这些数据通常都是分散的。
And quite often this is all disjoint.
对吧?
Right?
所以如果有人想进行联合营销,他们会让航空公司与某个城市的某些餐厅、酒店等沟通,以了解有多少乘客入住特定酒店或在特定餐厅用餐等等。
So if someone wants to do a co marketing, they're going to say the airline is going to talk to some of the restaurants in a given city, to some of the hotels, and so on to understand how many of their passengers stay at a given hotel and eat at a given restaurant and so on.
因为如果你这样做,并且了解他们的消费模式,你就可以推出促销活动,比如如果从我这里购买机票,就可以在这家酒店享受折扣或在这家餐厅获得代金券等等。
Because if you do that and if you have an understanding of their spending patterns, you can come up with a promotion whereby you say oh if you buy the ticket from me you get discount on this hotel or a voucher for this restaurant or so on.
这就是联合营销的场景。
So this is the co marketing scenario.
要做到这一点,你需要弄清楚我的乘客是否入住了你的酒店,这就涉及到隐私部分,即人们的姓名。
To do that you need to find out whether my passengers stayed at your hotel, so there is the privacy part which is the people's names.
但我对个人不感兴趣,联合营销关注的是聚合数据,是你提供的 demographics(人口统计数据)。
But I'm not interested interested on the individual, the interest for the co market is on the aggregation, is on the demographics that you bring that up.
因此,你可以使用同态加密来完成这个集合交集的计算,或者使用其他计算速度更快的加密类型来计算集合交集,然后再用同态加密来计算你想要进行的聚合。
So you can use homoffic encryption to complete this set intersection, or you could use other types of encryption that may be faster to compute the set intersection but then you use homomorphic encryption to compute the aggregations that you want to perform.
对,所以这是一种技术,举个例子,比如使用AES加密你要搜索的键,使用同态加密加密你要计算的值。
Right so and this is a technique that just as an example of using let's say AES for encrypting the keys that you're going to search on and using homomorphic encryption to encrypt the values of what you're gonna be computing on.
因此,未来我们预期的大多数案例将涉及如何将同态加密与其他类型的保护措施结合使用。
So that there will be most of the cases that we expect for the future will be related to where you can combine homomorphic encryption for certain things with other types of protection.
对吧?
Right?
所以我喜欢的一个场景——实际上用同态加密实现并不难——就是我们被要求居家隔离前常做的事:每当出门上班时,第一件事就是拿起手机询问某个服务‘路况如何?’
So the the one scenario the one scenario that I like, which is not that difficult to do homomorphically actually, is the kind of things that we used to do before we were told to lock down at home is whenever you leave home for work, the first thing you would do is pick up your phone and ask a service somewhere, what's the traffic like?
‘最近的咖啡店在哪里?’
Where's the nearest coffee shop?
诸如此类。
And so on.
当你这样做时,基本上就泄露了大量隐私。
And when you do that, you are basically giving away a lot of your privacy.
是的。
Yeah.
因为执行该操作的服务商会收集所有信息,并能结合你邻居的数据来了解你可能不愿让人知晓的生活规律,对吧?
To whatever service is performing that operation because they're collecting all that information and they can combine with your neighbors or so they start to understand your patterns of life which you may not want people to know, right?
因此,如果我们能通过同态加密在早晨向服务发起同样的查询,服务会给你答案,却不知道它给了你什么答案,只有你能解密这个答案。
So if we're able to perform that same query to a service in a morning homomorphically, the service would give you the answer, not knowing which answer it gave you and only you can decrypt that answer.
这将改变许多人的商业模式,但我的工作专注于隐私保护这一边。
So it would change the business models of a lot of people out there but my job is on the privacy side.
你不能再免费进行那种查询了,因为公司无法再出售数据。
You couldn't make that query for free anymore because the company can't sell the data anymore.
是的。
Yes.
完全同意。
Totally.
这涉及到的一个事实是:每次提问时,你都在暴露意图。
So that that has to do with one of the the facts that whenever you ask a question, you are revealing intent.
当你暴露意图时,实际上你正在泄露大量
And when you review intent, you are giving away a lot of
的数据。
Of data.
是的。
Yes.
信息。
Information.
那么回到你向观众提出的第一个问题,关于这种工具可能实现哪些应用,最受欢迎的应用有哪些,或者你目前对哪些应用感到兴奋?
So going back to that first question that you posed to the audience, this idea of, like, what applications could be possible with such a a tool, what are the most popular applications, or what applications are you currently excited about?
我知道你刚才提到了几个。
I know you've just mentioned a couple.
我们谈到了营销方面的应用,但也许你可以再分享几个。
We had this sort of marketing one, but maybe you can share a few more.
我们先来看看各个行业。
So let's look at the industries first.
对吧?
Right?
受监管的行业,比如金融业、医疗保健行业等等。
So industries that are regulated, and those would be the finance industry, health care industry, and so.
而政府部门,由于保密性以及工作人员可能表现出的疏忽性,是最需要运用这项技术来保护信息隐私和机密性的主要候选领域。
And that government, because of confidentiality and oblivious queerness that my people might want to perform, are the biggest candidates for using this technology to protect the privacy and the confidentiality of of the information.
保护隐私机密性意味着几件事。
Protecting the privacy confidentiality means several things.
许多对此非常重视的行业和企业会将数据存储在高度隔离的环境中,以防止数据被盗或外泄。
A lot of the industries and companies that are very serious about it will have that data in a very segregated environment, so people can't steal the data or can't exfiltrate data.
但数据外泄仍是我们面临的最大问题之一,因为归根结底,公司内部有权访问机密私人数据的人员可能出于各种原因将这些数据带出公司。
But they take filtration is also one of the biggest issues that we have because at the end of the day someone in a company that has access to confidential private data may for whatever reason take that data out of the company.
因此,任何涉及隐私数据且受GDPR等更多法规约束的应用或环境,现在都需要对其隐私进行更严格的控制,这些都是潜在适用场景。
So any application or any environment where the data is private and with more regulations like GDPR and others, now you need, there is the need for more control of their privacy, potential candidates for that.
所以我经常思考的一个场景是:即便在同一家公司内部,不同部门之间也存在这种情况。
So the scenario that I, one of the scenarios that refocuses a lot was even within a given company different organizations in the same institutions.
比如有些机构同时提供零售银行、投资、保险、健康保险服务,甚至经常还拥有你接受治疗的医院。
Let's say there are institutions that provide you with retail banking, with investment, with insurance, with health insurance, and quite often they own the hospitals that you get treated on.
所以
So
这些数据来自该组织的不同部门,可能因运营需求而汇集在一起。
it's a lot of data coming from different parts of this organization that may be coming together for any operational requirements that they might have.
实际上有人正在查看这些明文数据。
Someone is actually seeing that data in the clear.
一旦发生这种情况,该团队就成为数据滥用或泄露的最薄弱环节。
And when that happens, is that the that group is the weakest link for data misuse or exfiltration.
因此,能够在保持数据分析师执行所需分析的同时,在该层面应用同态加密技术,使他们只能看到分析结果而非具体数据条目,这类应用正是我们的目标场景。
So being able to apply homomorphic encryption at that layer while we're still allowing data analysts or analysts to perform the analysis that they want to do, but only see the result of the analysis and not the individual entries, are the type of applications that are candidates.
这不仅涉及金融领域,还包括医疗健康行业。
And that involves not only finance but also involves healthcare.
是的。
Yeah.
当前我们认为需要进行数据分析的场景是最佳应用方向。
So the scenarios where data analytics needs to be performed are the ones that we see as a good candidate these days.
我们曾运行的一个概念验证项目,就是通过分析交易数据来预测某人未来三个月是否需要贷款。
One of the proof of concepts that we ran was on analyzing transaction data to predict whether someone will be needing a loan in the next three months.
如果能准确预测,企业就能进行贷款追加销售。
If you can predict that well, the companies can do upsell loans.
这是一种应用场景。
That's one scenario.
但其中隐含的另一层是,当你这样做时,实际上是在对个人的财务健康状况进行某种分析,这些分析可用于提供消费建议或其他各种组合服务。
But the other one that's kind of a hidden in there is that when you do that you are actually performing an some sort of analysis on the financial health of the individual which then could be used to provide advice on spending or all sorts of other combinations that people do.
比如,你可以整合信用卡账单并按计划还款等等。
Like, you can aggregate your credit cards to pay it in a given schedule or so on.
这很有趣,因为即便在你刚举的例子中,事情总有两面性。
It's so funny because even in that example that you just gave, there's always two sides to it.
对吧?
Right?
既有潜在积极的社会影响——比如发现可能需要贷款的人,提示他们可能陷入财务困境意味着我们可以提供帮助;也有追加销售行为,这通常与帮助背道而驰。
There's the potentially positive societal impact of, like, finding people who maybe would need a loan, suggesting that they might be in financial trouble, meaning maybe we can help, or we can upsell, which is usually the opposite of helping.
是的。
Yes.
确实如此。
Indeed.
有意思。
Interesting.
还有一些场景,比如在医疗领域,假设你想检查自己是否有某种遗传性疾病,无论是哪种,对吧?
And there's also the scenarios where for instance on the healthcare scenario, let's say that you want to check yourself for some inherited condition, whichever one that be, right?
眼睛的颜色、头发的颜色,或是某种疾病。
Color of the eyes or color of the hair or some illness.
所以你会去做一些基因筛查,对吧?
So you're gonna go through some genetic screening, right?
然后你会把你的基因样本送到实验室去检测。
And then you're gonna send your genetic material to be tested by a lab.
但这些都是公开进行的。
But that's all in the open.
有了同态加密,我们可以用同态方式实现这一点。
With homomorphic encryption we could do that in a homomorphic way.
因此只有当你拿回结果时才能解密答案,或者根据信任级别由你的医生来解密。
So only you when you get back the result can decrypt the answer or your doctor depending on the trust level that people have.
另一种情况是当你有多个实体,它们从不共享数据。
And the other one is when you when you have multiple entities that would never share their data.
如果现在它们能以安全方式共享数据,仅将聚合结果或聚合后的结果展示给相关方,这些就是即将出现的新应用场景。
If now they can share their data in a secure way where only the aggregation or the results of that aggregation is reviewed to the parties, those are some of the the new applications that are gonna be that are gonna be appearing.
虽然这听起来非常像多方计算(MPC),简直就像是专为MPC设计的场景。
Although that one sounds an awful lot like MPCs, like kind of exactly like what an MPC is designed to do.
对吧?
Right?
这种多源数据聚合,最终只展示组合结果的方式。
This aggregation of multiple sources with only the final with only the sort of combined results as displayed.
是否需要将全同态加密(FHE)与多方计算(MPC)结合才能实现你刚才描述的场景?
Would you need to combine FHEs with MPCs to make that possible, what you just described?
不需要。
No.
我想这要看情况
I guess it depends
取决于具体应用场景
Depends on the application.
MPC本质上是一种分布式计算,即各方分别计算部分答案,然后汇总得出最终结果
MPC is basically the boot part computation in this sense is each party will compute a little bit of the the answer and then they have to combine everything to find the final answer.
正如你所说,每个人只能看到被允许查看的内容
And as you said everybody will see what they are allowed to see.
对于同态加密,你可以结合MPC使用,有多种技术手段可以实现
For home morph encryption you could use with MPC, there are sorts of techniques to do that.
正如我所说,它们是相辅相成的
As I said they complement each other.
我提到的场景中,服务器可能负责部分计算后传回给你解密或重新加密,这某种程度上就是两个实体间缺乏单边信任的多方计算
The scenario where I mentioned that a server might be computing part of the computation, send it back to you, you decrypt or recrypt, is somewhat kind of a multi part computation between two entities, where there isn't much trust on one side.
但在数据与计算需要外包给不太可信环境(无论是多租户环境还是云环境)的场景下,人们会更倾向于使用同态加密
But the scenarios where you might want to outsource the data and the computation to a not so trusted environment being it a multi tenant environment or being a cloud environment are the ones that people will be using Home Office encryption more.
也就是说,我确实信任云服务,但信任程度有限。
Which is I have the cloud there I trust but I don't trust the cloud that much.
这实际上是一种威胁模型。
This is actually one of the threat models.
其中一种威胁模型被称为'诚实但好奇'。
One of the threat models is known as honest but curious.
它会诚实地执行计算,但同时会窥探数据内容。
It's gonna honestly perform that computation but it's gonna be looking inside.
而对于威胁模型更为复杂的场景,即执行计算的实体可能不诚实时,可以结合其他协议和加密形式来实现目标。
And for scenarios where you have a threat model which is a lot more complex, where the entity performed the computation might be dishonest, then there are other protocols in combination of form of concretion that could be used to accomplish that.
因此你可能有一组参与者,其中部分诚实、部分不诚实或怀有恶意,我们可以为此组合其他协议。
So you might have a collection of participants, some of them are honest, some are dishonest, or some are malicious, and that we can combine other protocols for that.
这更像是应用类别而非单一应用的问题。
It's more like categories of applications as opposed as a single application.
我认为回到最初的问题,这也不存在完美的解决方案。
And I think to the point of the original question, it's also not like there's the perfect answer.
就像,这很大程度上取决于你能想象用它来做什么?
Like, the it is very much like, what can you imagine doing with this?
可能还有一些事情是至今没人想到的。
There probably are things that no one has yet imagined.
确实如此。
That's very true.
我们的一些客户提出过一些想法,比如'这个能做吗?'
Some of our clients came up with some, oh, can we do this?
首先,这些问题中有些是非常实用的方法,是的,那是一个特定的用例。
And the first question that well, some some of those are very pragmatic approaches that, yep, that's a given use case.
完美。
Perfect.
这很合理。
We'll make sense.
还有一些问题是,你想做什么?
And some are, what do you want to do?
但这可能是个商业案例。
But there might be a business case.
所以这是这种互动中最棘手的部分——不仅要与开发人员沟通,对吧?
So this this is the this is the tricky part of of this interaction is not only to talk to the developers, right?
关键在于如何以业务部门能理解的方式呈现这项技术。
It's exposing this technology in an understandable way to the line of business.
因为现在业务部门可能会说:哦,我可以把这两组数据结合起来。
Because now the line of business might say, oh I can combine these two pieces of data.
我从没想过还能这样做。
I never thought I could do that.
如果能实现这个,我还能发现其他可能性。
If I can do this, I can discover something else.
对吧?
Right?
是的。
Yeah.
是的
Yeah.
总结一下,我想稍微回到你刚才提到的内容
Wrap up, I wanna bring it back a little bit to what you mentioned.
你们正在进入可用性阶段
You're entering this phase of usability.
那么你们具体在构建什么,做些什么来让我这样的普通开发者能方便使用呢?
So what are you actually building and doing to make this usable to me, the average developer?
对
Right.
我们正在做的是将我们的库开源
So what we are doing is we have our library is open source.
它存放在GitHub的homelink/aglib仓库
It's on GitHub on homelink/aglib.
去年我们发布了四个测试版,今年初我们发布了1.0正式版
And throughout last year, we made four beta releases and early this year we released version one.
我们很快会发布1.0.1版本,这是1.0版的更新迭代
Soon we're gonna release version another refresh of version one which is one zero one.
但包含了各种功能增强和问题修复
But has enhancements and fixes and everything.
这部分内容都在GitHub上公开
This is the GitHub part so it's there.
同时我们也在研究如何让开发者更容易上手使用
But we have also been working on how to make everything more consumable to developers.
我们有多种实现方式
And there are several ways of doing that.
其一是创建自包含的示例代码,并配套不同难度的教程
One is by creating self contained examples with little tutorials that people can follow at different levels.
有些人可能只想了解基本功能
Someone might just want to know what I can do.
对吧?
Right?
所以我可以执行一个查询。
So I can do a query.
很好。
Great.
还有人想知道,当他们执行这个查询时,幕后发生了什么。
And someone wants to know, oh, happens under the covers when when they do this query.
因此我们正在着手创建这些场景和示例。
So we are we are working through creating those scenarios and examples.
第一个示例将在IBM Research上提供。
And the first one will be available in the IBM Research.
几周后我们的技术网站上会发布,供人们参考。
Try our tech website in a few weeks for people to to follow there.
然后可以跟进到GitHub。
And then a follow to the to the github.
与此同时,正如我之前提到的,美国政府有一笔拨款用于将其开发成工具包。
And in parallel with that, as I mentioned before, there's a grant of the US government to make this as a toolkit.
对吗?
Right?
因此,每当我们开发出可用的新功能时,都会及时向公众开放使用。
So and that will become available every time something new that we do becomes usable, then we will make it available to for people to use.
这类资助项目也是开源的吗?还是有附加条件?
Is that kind of grant work open source as well or are there conditions?
不是的。
No.
我们的库是完全开源的。
They our library is all open source.
采用Apache许可证。
It's Apache Apache license.
太棒了。
Cool.
我们会把这些链接放在节目笔记里,方便大家查找。
We'll get those links in the show notes so people can find them.
是的。
Yep.
作为最后的总结,我在想如果有人真的想深入研究这个领域,无论是想参与研究还是开始着手相关工作,你会给他们什么建议?
As a final sort of note, I I wonder if someone really wants to dig into this, wants to, you know, either get involved with the research or start working on this, what would your advice be to them?
我们目前正在将开源项目推向一个更具协作性的环境。
We are right now making actually, bringing what is the open sourcing to a more collaborative environment.
我们很快就会公布一个平台,让想要提供帮助的人参与进来。
And we'll have this announced soon on where people that want to help.
已经有一些人在提供帮助,我们也采纳了部分来自社区的协作成果。
Some people there have been helping and we have been taking some of the collaboration that people have been given.
但我觉得目前大家普遍缺少的是一个引导新人的教程,对吧?
But I guess the one thing that's missing from everybody is a kind of a tutorial that brings people in, right?
所以如果有一个解释清楚的教程,你并不需要成为密码学专家也能参与。
So you don't need to be a hardcore cryptographer if there is a tutorial that explains.
另外就是,如果你愿意学习格密码学和现代算术这些知识的话。
And the other thing is if you are willing to learn about Lattice crypto and the modern arithmetic and some of these things.
所以如果你不畏惧学习,那将是一个轻松参与协作与学习的途径。
So if you're not afraid of learning then that is that is an easy way to to collaborate and learn.
嗯,如果要描述我们的观众特点,'不惧学习'绝对排在前列。
Well, I hope if there's one way to describe our audience, not afraid of learning would be on top.
我们拭目以待,看看是否有人会对此产生足够兴趣深入探究。
So we'll see if if anyone gets interested enough to dig in.
是啊。
Yeah.
也感谢你帮助我们更深入地探索全同态加密技术。
And thanks for helping us explore deeper this fully homomorphic encryption.
非常感谢你参与本期节目。
Thank you very much for being on the show.
哦,谢谢。
Oh, thank you.
太好了。
Great.
也感谢我们的听众朋友们的收听。
And to our listeners, thanks for listening.
谢谢大家的收听。
Thanks for listening.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。