本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
我是比尔·肯尼迪,现在是Go时间了。
So I'm Bill Kennedy, and it's now go time.
《Go Time》是一档每周播客,我们讨论围绕Go编程语言、社区以及其间一切的有趣话题。
It's Go Time, a weekly podcast where we discuss interesting topics around the Go programming language, the community, and everything in between.
如果你目前正在使用Go,或者希望使用Go,那么这档节目就是为你准备的。
If you currently write Go or aspire to, this is the show for you.
好了,各位。
Alright, everybody.
欢迎回到《Go Time》的又一期节目。
Welcome back for another episode of Go Time.
这是第六期。
It's episode number six.
我是埃里克·圣。
I'm Eric St.
马丁。
Martin.
今天和我一起的还有Brian Kettleson。
Today here with me, we have Brian Kettleson.
打个招呼吧,Brian。
Say hello, Brian.
大家好。
Hello.
我们还有Carlicia Campos。
And we have Carlicia Campos.
打个招呼。
Say hello.
很高兴来到这里。
Glad to be here.
大家好。
Hi, everybody.
今天我们还有一位特别嘉宾。
And we have a special guest with us today.
今天和我们在一起的是来自Arden Labs和GoBridge的比尔·肯尼迪。
Bill Kennedy from Arden Labs and GoBridge is here with us today.
你可能也熟悉他做的所有工作坊,比如现在全世界范围内的那些。
You might also know him from all of his workshops that he does, Like, the world now.
是这样吧,比尔?
Right, Bill?
是的。
Yeah.
今年我有幸去了几次欧洲。
I've been lucky enough to get into, Europe a couple of times this year.
是的。
Yeah.
这太糟糕了。
This is bad.
太疯狂了。
It's crazy.
我们每天都在别的地方看到你。
It's like every day we see you somewhere else.
我不知道飞机是怎么准时赶到你的工作坊的。
I I don't know how the planes arrive in time for your workshops.
安排时间确实有时候很困难。
Scheduling is difficult sometimes for sure.
我的意思是,你的常旅客里程有多少了?
I mean, what's your mileage look like for frequent flyer miles?
我觉得我现在大概有13万英里了。
I think I'm at, like, a 130,000 miles right now.
我真不羡慕你。
I don't envy you.
钻石精英?
Diamond Elite?
在美国航空,我现在是白金会员,正在向执行白金会员迈进。
I'm, on American, I am now platinum on my way to executive platinum.
不错。
Nice.
但这些确实不是你应该追求的目标。
But, yes, these are not goals that you should you should want to achieve.
你有喜欢的、装在小瓶里的香皂和洗发水吗?
Do you have your favorite, soap and shampoo that comes in a small bottle?
我会尽可能利用酒店提供的任何东西。
I leverage whatever the hotel has to the extent that I can.
所以今天,我们将和比尔聊聊机械同理心。
So today, we're gonna be talking with Bill about mechanical sympathy.
我觉得这个话题会非常有趣。
So I think this is gonna be a really interesting topic.
在进入主题之前,我们先聊聊新闻和一些有趣的项目。
Before we get into that, let's, let's talk news and interesting projects.
在开始和比尔讨论之前,大家有什么有趣的事情想分享吗?
Anybody have anything interesting they wanna talk about before we get into it with Bill?
你知道,从我的角度来看,GO新闻这周相当平静,但我确实发现了两个相对有趣的项目。
You know, it was a pretty quiet week in in GO News from my perspective, but I did find two relatively interesting projects.
第一个项目,我认为可能会赢得年度最佳黑客奖。
The first one, I thought might be a winner in the best hack of the year award.
在节目笔记中,你会找到来自axon.acsin.com博客文章的链接,他们拼凑出一种方法将StatsD类型的指标发送到Google Analytics,这看起来像是一个有趣的强行适配,而且似乎运行得相当不错。
In the show notes, you'll find a link to the blog post from axon, acksin,.com, where they hacked together a way to send stats d type metrics to Google Analytics, which seems like an interesting shoe horn, and it looks like it works pretty well.
所以你可以通过不当使用Google Analytics,为你的服务器获得一个不错的免费StatsD监控。
So you get a nice, free StatsD monitoring for your servers, using Google Analytics inappropriately.
我完全赞同这个计划。
I approve completely of this plan.
不过,你知道,有趣的是你可以把它和你已经在谷歌分析中收集的指标放在一起看,看看其中一些因素可能会如何影响你的转化漏斗。
You know, but the interesting thing about that though is you can see it alongside, metrics that you're already collecting in Google Analytics and how some of those things might impact, your funnel.
所以一时想不出我会用它来做什么具体用途,但我认为它确实有潜力变得很有价值。
So can't think of any specific uses I'd use it for off the bat, but it does I think it has potential to be valuable.
至少可以说,这很有趣。
It was interesting to say the least.
我不确定我会把它部署到一个有用的生产系统上。
I'm not sure I would put it in production on a useful system.
万一谷歌决定识别出这种流量并开始丢弃它呢?
What if Google decided that they could figure out that traffic and start tossing it?
是啊。
Yeah.
我觉得我更喜欢Grafana之类的工具。
I think I'd prefer Grafana or something like that.
但是
But
Datadog 所有
Datadog all the
一路。
way.
Datadog 也是好东西。
Datadog is good stuff too.
所以我发现的第二个有趣的项目是众多依赖管理项目中的一个。
So the second interesting project I found is is one in the multitude of vendoring projects.
这个项目叫 Manul,拼写是 m-a-n-u-l,你可以在节目笔记中找到它的链接。
This one's called Manul, m a n u l, and you'll find the link to that in the show notes.
它同样是使用 Git 子模块进行依赖管理的工具,看起来是支持 Git 子模块的较优依赖管理包之一。
And, it's it's another one that does vendoring, with Git submodules this time, and it looked to be one of the better vendoring packages that supports Git submodules.
因此它有一些非常不错的
So it had some very nice
命令和实用工具。
commands and utilities with it.
不过我很好奇他们是如何解决使用子模块的一些缺点的,因为很多人对使用 Git 子模块持保留意见。
I'm interested to see though how they solve some of the drawback from using submodules because a lot of people have reservations about using git submodules.
它的运作方式存在一些固有的缺陷。
There's kind of some inherent flaws with the way it works.
比如第一个问题就是,你仍然依赖于该仓库在未来是否依然存在。
Like number one would be that you're still reliant on that repository to exist in the future.
所以如果它宕机了,或者有人决定删除他们的项目,因为这种事情完全不会发生。
So if it went down or somebody decided to delete their project, because that totally never happens.
或者甚至只是重命名一下。
Or even just rename it.
你仍然无法访问代码,但这也与子模块的工作方式有关。
You still wouldn't have access to the code, but some of it also comes in the way sub modules work.
所以如果我拉取你的项目,我需要执行 git submodule update 来更新我本地的那些子模块,但如果我不这么做,我仍然在使用那些子模块的旧版本。
So if I pull down your project and I need to do a git submodule update to update my local versions of those sub modules, but if I don't do that, I'm still running with my prior versions of those sub modules.
但当你检出代码时,它并不会自动更新我的子模块。
So but by checking out your code, it doesn't move my sub modules with it.
所以我可能会不小心提交了你项目的旧版本,而这些错误非常容易被忽略。
So I can accidentally commit my older versions of your stuff, and those lines are really some like, really easy to miss.
而且这些子模块在合并时也存在一些问题。
And there's a couple issues too with the way those things are kind of merged and stuff too.
所以我很好奇他们是如何解决这些问题的。
So I'm interested to see how that's solved.
因为人们经常覆盖彼此的子模块。
Because people step on each other's sub modules all the time.
你会看到这种情况,比如我拉取了你的更改,但没注意到你更新了子模块。
You see it where, you know, I pull down your changes, but I didn't notice you had sub module updates.
但当我提交我的更改时,我的子模块版本和你的不同,结果我就覆盖了你的版本。
But then I commit my commit and my sub module versions are different than yours, and I just kind of step on yours.
不过这些问题几年前就有人遇到了,所以也许 Git 现在已经有了一些应对措施。
But I mean these are problems that people were having years ago, so maybe there's some stuff in git now that accounts for it.
也许工具本身也稍微做了一些处理。
Maybe the tool accounts for it a little bit too.
我想如果你在提交钩子之类的地方做点处理,大概是可以解决的。
I guess if you did it on like a commit hook or something you could probably.
但确实挺有意思的。
But yeah, it's interesting though.
子模块可能很有用,但也可能很麻烦,不过我想编程里的所有东西都差不多,对吧?
Submodules can be valuable and they can also be a pain, but I guess everything in programming can be, right?
总是如此。
Always.
你们之前用过子模块吗?
Have you guys used submodules before?
有人用过吗?
Anybody?
我用过。
I have.
我用过。
I have.
我用的时候没遇到什么问题。
I did not run into any problem with it.
没做什么疯狂的操作。
Didn't do anything crazy.
只是把子模块放进去用来访问它。
Just drop a submodule there to access it.
是的。
Yeah.
我会找一个关于这些陷阱的链接,并在发布前放到节目笔记中。
I'll find a link surrounding some of those pitfalls, and we'll drop it in the show notes before this is released.
这件事已经过去几年了,所以我一时想不起具体的名字,但我知道很多人遇到了很多奇怪的问题。
This has been a couple years, so I can't remember the name of one off the top of my head, but I know people were having a lot of weird issues.
我们还想聊点别的吗?
So anything else we want to talk about?
我没什么了。
That's all I had.
我也没有。
I don't have anything.
我知道我们想聊什么。
I know what we do want to talk about.
机械交响曲?
Mechanical symphony?
没错。
Exactly.
是的。
Yes.
首先,这个名字是从哪里来的?
First first thing, where did this name come from?
我们之前聊过这个,但我想要听亲口说的人讲。
We were talking about this earlier, but I want to hear from the horse mouth.
这名字不是我起的。
It didn't come from me.
这个术语,我想我是从马丁·汤普森那里听说的,看过他视频的人都知道,他说他是从一位赛车手那里听来的。
This is a term that, I think I got from, Martin Thompson who watched any of his videos, he says he got it from a race car driver.
对。
Yeah.
杰基·斯图尔特是一位一级方程式车手。
Jackie Stewart was a Formula One driver.
我认为在一次采访中,他曾说过类似这样的话:你不需要是工程师或机械师才能成为赛车手,但你必须具备机械共鸣感。
And I think during an interview, had said something along the lines of, you you don't need to be an engineer or a mechanic or something to be a race car driver, but you need to have mechanical sympathy.
基本上,他的意思是,只要对机器——也就是汽车——的工作原理有一定理解,就能让你成为一个更好的驾驶员。
And basically, he was just implying by having some level of understanding of how the machine, the car worked, that it made you a better driver.
而且我认为,正如比尔指出的,马丁·汤普森——我想是他——开始将这一理念应用到编程中。
And I think, as Bill pointed out, Martin Thompson, I think it was, started applying that to programming.
所以比尔,你能不能给我们详细讲讲,你认为这个概念是如何应用于编程的?
So Bill, would you like to fill us in a bit on how how you think that that concept applies to programming?
是的。
Yeah.
我的观点仅限于Go语言这一方面,这也是我在培训中非常关注的一点。
I mean, I I only have a perspective on it from the Go side and it's something I really focus on in the training.
我在培训中主要关注两个方面:数据导向设计和机械共鸣感,并试图展示Go语言本身是如何紧密契合这两个理念的。
I kind of focus on two things in the training, data oriented design and mechanical sympathy and try to show how the language Go itself is very in tune around these two ideas.
我深信,如果你不理解你所处理的数据,你就无法真正理解你试图解决的问题。
And really believe, that if you don't understand the data that you're working with, you do not understand the problem that you're trying to solve.
一切都从这里开始。
It all starts there.
就像我们试图解决的每一个问题,本质上都是某种形式的数据操作问题。
Like everything, every problem that we're trying to solve is really a data manipulation problem in some fashion, in some way.
所以一切都始于数据,这个观点是:如果你不了解你所处理的数据,你就无法理解这个问题。
So it all really starts with the data And it's this idea that if you don't understand the data you're working with, you don't understand the problem.
如果你不了解解决这个问题的成本,你就无法真正地思考如何解决它。
And if you don't understand the cost of solving that problem, you can't really reason about solving it.
要能够评估成本,你必须理解每一行代码的作用,以及它如何影响操作系统和硬件——这些硬件正是用来执行你花时间编写的指令的。
And to be able to reason about the cost, you have to have some understanding of what every line of code is doing and how that's affecting the operating system and the hardware, which is there to execute those instructions that you're spending time writing to begin with.
因此,我真正感兴趣并思考的是这种关系,以及Go语言如何帮助我们实现这一点。
So I I think it's it's that relationship that I'm really interested in and and think about in terms of what Go is doing to help us.
当你谈到机械同理心时,你指的是磁盘、缓存、CPU等物理层面的东西,是电气层面的事务。
So when you talk about mechanical sympathy, you're talking about things at the physical level, like the disks, the caches, the CPU, electrical things.
作为程序员,我们需要关心其中多少内容?
How much of that as a programmer do we have to care about?
我真正关注的是你所处理的数据。
I really focus it around the data that you're working with.
因此,我学到的一点是,我们今天所使用的硬件是处理器。
And so, you know, one of the things that I've learned is that the hardware that we're working today are processors.
它们现在都是多核处理器,每个核心都有自己的本地缓存。
You know, they're now multicore processors, and every core has their own sets of local caches.
在许多情况下,L1和L2缓存属于每个核心。
That l one and l two cache, in many cases, belong to each core.
核心之间可以共享L3缓存,而你已无法直接访问主内存。
Cores could then share an l three cache, and you just don't have access anymore directly to main memory.
因此,如果你编写的代码导致硬件无法预测你所访问的数据,就会出现缓存未命中,这可能耗费数百个时钟周期。
So if you're writing code where the hardware cannot predict access to the data you're working with, then you're going to have these cache misses that can cost you hundreds of clock cycles.
在Scott Myers的一次演讲中提到的一种架构里,每次缓存未命中都会耗费107个时钟周期。
In one architecture that Scott Myers uses in one of his talks, it's a 107 clock cycles every time you have a cache miss.
不过,这个数值会因硬件不同而有所变化。
And now that's going to change from hardware to hardware.
但如果你想象使用某种链表数据结构,在每次迭代中访问列表中的不同节点,而列表中的每个节点都与缓存系统不兼容,不在同一缓存行上。
But if you can imagine employing some sort of linked linked list data structure, where on every iteration, you're accessing a different node in the list and every node in that list is not sympathetic to the caching system, doesn't exist on their on the same cache lines.
我的意思是,你可能在不知不觉中大量访问内存,却根本不知道为什么速度这么慢。
I mean, you could be chugging through memory without even realizing it, without even understanding why it's as slow as as it is.
所以让我们稍微退一步,因为很多人
So from Let's let's back up here just a a second too because, a lot
许多人都来自动态语言,比如 Ruby、Python,甚至 Go 也把这些概念对你隐藏了。
of people come from dynamic languages, you know, Ruby, Python, and even Go abstracts these concepts from you.
让我们花点时间谈谈 CPU 缓存以及它们是什么。
Let's let's take a second and talk about CPU caches and and what those are.
因为我认为,很多人甚至都不熟悉 CPU 缓存是什么。
Because I I would argue that probably a lot of people aren't even familiar with what a CPU cache is.
所以我们必须以非常高层的抽象层面来讨论这个问题,因为硬件差异真的很大。
So we've gotta talk about this at a very high representative level because hardware is really different.
但本质上,我们处理的是一种带有缓存的硬件,从我们的角度来看,它们可能都是一样的。
But in essence, we're dealing with piece of hardware that has caches, and from our perspective, it can be all the same.
因此,其理念是硬件需要将我们所使用的内存尽可能靠近它。
And so the idea is that hardware needs to have the memory that we're working with as close to it as possible.
而今天的情况是,如果你需要主存中的哪怕一个字节,它也必须从主存移动到一级或二级缓存中才能被使用。
And what's going to happen today is if you need any even a byte of memory that's sitting out in main, it's got to move from main memory into into, let's say, the l one or l two cache for it to be used.
这些缓存是以缓存行的形式被加载和卸载的,而目前默认的缓存行大小通常是64字节。
And these caches get pulled in and out on cache lines, and the default cache line today will probably work when there's a 64 byte cache line.
因此,现在的理念是,如果你的指令正在处理数据——而这正是我们所做的。
And so the idea is that we the the idea now is if you have instructions that are working with data, which is what we do.
对吧?
Right?
我的意思是,这正是我们每天都在做的事情。
I mean, this is what we do all day.
我们在读取内存。
We're reading memory.
我们在向内存写入。
We're writing to memory.
这些内存现在必须进入缓存系统,我们才能使用它。
This memory has to now get into the caching system in order for us to be able to use it.
这些数据将通过64字节的缓存行在主存和缓存之间来回移动。
This data is gonna be moving on these 64 byte cache lines from main and back in.
因此,为了顺应硬件,我们可以尽量以连续的块来处理数据。
And so one of the things that we can do to be sympathetic with the hardware is try to work with data in as contiguous blocks as possible.
我们的数据越连续,通常就意味着你正在遍历这些数据。
The more contiguous our data is, you usually then at that point are probably iterating over that data.
遍历数据可以产生所谓的可预测访问模式,现代硬件能够识别这些模式。
And iterating over data can create what are called, you know, predictable access patterns to that data that the hardware today can pick up on.
因此,如果我们真的想让硬件充分发挥其潜力,就必须顺应它。
And so if we really want to give the hardware its best opportunity to take advantage of everything that's in there, we've got to be sympathetic with it.
我们必须从数据的工作集角度来审视数据。
We've got to try to look at data in a way of what are our working sets of data?
我们能否将数据连续排列、连续处理,并围绕它创建可预测的访问模式?
Can we lay data out contiguously, work with data contiguously, and can we create predictable access patterns around that?
因此,硬件可以识别出哪些是接下来可能用到或肯定会用到的缓存行,并在这些指令需要它们之前将它们提前加载到缓存中。
So the hardware can pick up on what what are the next cache lines that are probably in play or will definitely be in play and pull those into the caches before those next instructions need them.
人们该如何学习什么是可预测的访问模式呢?
How does somebody learn about what predictable access patterns look like?
他们又该如何实现这种模式呢?
And what can they do to achieve that?
从今天的视角来看,数组确实是硬件层面最重要的数据结构,因为只有数组能让你创建连续的内存块。
From today's perspective, it is the array that is really the most important data structure from the hardware perspective, because it is the array that allows you to create contiguous blocks of memory.
嗯,我想结构体也是这样对齐的,对吧?
Well, I guess structs are aligned that way too, right?
再说一遍,我没听清。
Say it again, I'm sorry.
结构体也是连续对齐的。
Structs are also aligned contiguously.
没错,但假如我要创建一个用户结构体,并且要创建十万条这样的数据,如果我没有把它们连续排列,而是用链表的方式存放,这些用户值就会随机地分散在内存各处。
They are, but but if I was gonna create a user struct and I was gonna create a 100,000 of those and I didn't lay that out contiguous, I've laid you know, I created a just a linked list of these particular user values and they laid out all over in memory almost randomly.
当你开始遍历这个链表时,硬件无法识别出任何模式,你基本上只能逐个访问内存,因为每次访问都会导致缓存未命中。
And you started walking down that linked list, the hardware is not going to be able to pick up on any pattern there, and you're basically going to be chugging through memory because every access is going to be a cache miss.
因此,我们试图通过尽可能将所有数据集中在一起,尽量减少占用的缓存行数量来消除这种情况。
So we're trying to eliminate that by trying to keep all of the data that we can as close together as possible on the least number of cache lines as possible.
没错,因为多年来,尽管处理器的速度没有显著提升,但它们在多任务处理方面变得好得多。
Right, because over the years processors, even though they haven't got significantly faster, they've become much better at multi tasking.
因此,即使处理器在同一周期内正在执行数学计算,它也可以预取下一次步长并加载下一个缓存行,这样在下一次迭代时,数据已经就位,几乎是免费的。
So while the processor may be performing a math calculation in the same cycle, it can be pulling, making the next stride and pulling the next cache line in, so that the next iteration, the data is already there and it's basically for free.
如果它能预测下一个缓存行的位置,那么它完全可以做到这一点。
If it can predict what that next cache line is, then it absolutely can do that.
但如果我们不配合、不帮助它预测这些内容,它就无法提前加载缓存行,直到它确切知道‘现在我需要的数据就在这里’。
But if we're not being sympathetic and helping it be able to predict these things, then it can't pull that cache line until it knows exactly, now this is where the data is that I need.
抱歉。
I'm sorry.
我本来想说,访问模式——你提到的关于数据以及你如何使用数据的重要性。
I was just going to say, so the access patterns, how you talk about it being important thinking about the data and how you're working with it.
我想了想,主要有两个要点:时间局部性和空间局部性,也就是处理内存中相邻的数据,或者同时处理相同的数据块,对吧?
I guess the two main points I can think of is basically temporal and spatial locality, Working with things that are located next to each other in memory or working on the same pieces of data at the same time, right?
这跟你提到的尽量减少缓存未命中有点关系。
Kind of to your point where you can minimize the number of cache misses.
是的。
Yeah.
而且希望即使我这么说吧,你不可能完全避免缓存未命中,但如果你有一组数据需要频繁处理,一旦被加载进来,它就一直在那里,你可以反复利用。
And hopefully, even if I mean, you're not gonna avoid cache misses altogether, but if you have a working set of data that you're gonna be doing a lot of processing on, once it gets pulled in, now it's there, you can leverage it.
如果你一直在内存中来回跳转,而且这种跳转还比较随机,那就只能慢慢拖着走了。
If you're bouncing around memory all the time and it's somewhat random, you're just going to chug through it.
我们来举个例子说明一下什么是内存中的来回跳转。
Let's give an idea of bouncing around memory.
对我而言,链表就是一个典型场景:你有一个数据节点,这个节点指向另一个数据节点,而那个节点又指向下一个数据节点。
A link a link list to me could be a scenario where, right, you have a node of data, that node of data points to another node of data, and that node of data points to another node of data.
而这些数据是何时、如何创建的,决定了它们可能分布在堆中的任意位置,具体取决于它们的创建方式、时间以及连接方式。
And depending on how and when that data was created, that could be almost anywhere, in the heap, depending on how that's getting created and when and how it's getting hooked up.
在这种情况下,你无法保证每个节点都在同一个缓存行上,甚至无法保证它们在相邻的缓存行中。
You can't guarantee in that case that every single node is on the same cache line or even in cache lines that are next to each other.
我想另一个例子是多维数组,对吧?按行遍历和按列遍历的区别。
And I guess another example would be like a multidimensional array, right, iterating over row based versus column based.
是的。
Yeah.
是的。
Yeah.
我们在训练中实际上有一些相关的基准测试示例,可以明显看到性能上的差异。
We actually have some examples in the training with that over benchmarking where you actually see a significant difference in performance.
如果你按行遍历,速度会快得多;而按列遍历则相当于逆着数据的自然顺序进行,效率很低。
If you go row based, you see it's much faster than if you go in column based, kind of breaking the going against the grain.
是的。
Yeah.
不过,这很有趣,对吧?
So it's interesting, though, right?
因为我们通常认为内存是免费的,对吧?
Because we we typically think about memory for free, right?
你知道,我们就像,哦,是的。
You know, we're like, oh, yeah.
它在内存里。
It's it's in RAM.
内存很快。
RAM's fast.
至少我不需要为此访问磁盘。
At least I don't have to go to disk for it.
对吧?
Right?
但是像先列后行或先行后列这样遍历数组的方式,确实很好地说明了访问RAM比CPU缓存慢了多少。
But doing something like column first or row first, iteration over like an array like that, I mean, it really demonstrates the point how much slower it is to go to RAM than CPU cache.
而且随着规模增大,这种差异会愈发明显。
And it really shows its head the bigger that grows.
还有其他因素,仅仅因为我们使用的典型操作系统有——我总是记错这个名字。
And there's other things too, even just because we use typical operating systems have the I always mess up this name.
我想它叫做事务后备缓冲器(TLB),基本上它将程序中的内存地址映射到实际的物理内存地址。
The transaction look aside buffer, I think is what it's called, where basically it maps the address from memory your program has to the real memory, physical memory address.
它也有页表,可能会被换出,然后需要重新加载进来。
And that has pages too that you can blow out, and it kinda has to load back in.
这代价很高。
That's expensive.
对吧?
Right?
所以每次当你因为处理的内存不是相邻位置而导致缓存失效时。
So every time you blow that out because you're you're not working with memory that's located next to each other.
你知道,我对此的兴趣其实是在接触Go语言后开始发展的,因为在此之前,我使用的是C#,我们有列表、队列、栈和各种数据结构。
You know, my interest in this actually started to develop when I came into Go because of from that before that, I was in c sharp, and we had lists, we had queues, we had stacks, we had data structures.
对吧?
Right?
就连 C++ 也给我们提供了所有这些数据结构。
And even c plus plus gave us all these data structures.
当我接触 Go 时,我就想:我的那些数据结构去哪儿了?
And when I came into Go, I was like, where are all my data structures?
我不理解这个。
Like, I don't understand this.
所以我只看到一个数组。
So I I just see an array.
我看到一个切片,当时我根本不懂它,还看到映射。
I see a slice, which I honestly didn't understand at the time, and I see maps.
这其实很可笑,因为我根本没理解切片是什么。
And it's really silly because I didn't really understand what slices were.
我只是觉得它们就是普通的数组。
I just thought they were really just arrays.
在学校里,我们被教导说,数组很难使用。
And back in school, we were really taught that, you know, arrays are difficult to work with.
在刚开始使用Go语言的头几个月里,我实际上避开了切片,转而使用链表,因为我真的不明白为什么我们没有那些数据结构。
And I actually avoided slices for the first couple of months working in Go using linked lists because I I I honestly didn't understand why we didn't have data structures.
但最终,在某个时刻,我意识到大家都在使用切片,而语言本身也在引导你使用切片。
And, eventually, at some point, I realized that everybody's using slices and the language is pushing you towards slices.
于是我意识到,我必须真正弄清楚这到底是什么。
And I figured out I had to really learn what this is.
现在,当你从这个角度回头审视时,切片的底层数据结构其实就是一个数组。
And now when you step back and you look at it from this point of view, I mean, the the underlying data structure for the slice is an array.
对吧?
Right?
切片是Go语言中最重要的数据结构。
The slice is the most important data structure in Go.
随着我每个月都更深入地了解Go语言,我不断发现,Go其实一直在引导我们编写富有同理心的代码。
And as I peel this onion every month about more and more about Go, all I keep seeing is how Go is pushing us towards writing sympathetic code.
Go在无声无息中引导我们去做正确的事。
Go is pushing us towards doing the right things without anybody realizing it.
Go 希望我们使用这些切片,因为这样我们实际上是在操作数组和连续内存,它为我们提供了在不知不觉中与硬件保持协同的最佳机会。
Go wants us to work with these slices because then we're really working with arrays and contiguous memory, and it's giving us our best opportunity to have these sympathies without even realizing that we're being sympathetic with the hardware.
所以就这一点而言,Go 对我来说是一种极其迷人的语言。
So Go to me is just an incredibly fascinating language when it comes to that.
在语言的其他方面也是如此,比如并发层面的操作系统调度器,你也在不知不觉中与之保持协同。
And other areas of the language too where you see that you're really being sympathetic or like the operating system scheduler on the concurrency side without even realizing it.
对吧?
Right?
我们经常告诉人们去做的这些惯用法和做法,并不仅仅是因为。
Just these idioms and these things that we tell people to do all the time, they're they're they're based not on just, hey.
我们希望你这么做。
We want you to do this.
它们基于性能、简洁性、可读性等真实因素。
They're based on real things around performance, simplicity, readability, those types of things.
这一切最终形成了一个完整的循环。
It all kind of comes full circle.
是的。
Yeah.
我认为有很多编程习惯用法可以遵循来提供帮助。
Think there's a lot of programming idioms that can be followed to help.
但我觉得你是对的。
But I think you're right.
我想我从未真正考虑过某些语言功能,它抽象掉了这些东西,并默认使我们的程序更具协同性。
I guess I had never really considered some of the language functionality that, it's abstracting away these, these things and making our programs more sympathetic by default.
通道也是一个很好的例子,对吧?
Channels are a good example too, right?
你知道,你在线程之间传递数据片段,这样数据就可以保留在本地缓存或特定线程上。
You know, you're you're you're passing pieces of data over between threads so that, the data can stay locally on the cache or that particular thread.
或者引用类型呢?
Or what about the reference type?
你的切片、你的映射、你的通道值。
Your slice, your maps, your channel values.
对吧?
Right?
我们总是被告诉,不要共享这些。
We're always told, do not share these.
每个人都可以获得这些值的副本。
Everybody can get a copy of these values.
这样做让我们避免给垃圾回收器带来压力。
And what that's doing is it's allowing us to not put pressure on the GC.
我们可以充分利用栈,因为每个人都可以获得这个切片的副本。
We get to leverage the stacks to the fullest extent because everybody can get a copy of this slice.
真正需要在堆中共享的,只是底层的那部分内容。
The thing that's being shared underneath is what, let's say, necessarily has to be in the heap, just that.
所有这些我们需要在程序边界之间传递的小小对象,我们都可以利用栈。
And all these little objects that we need to pass values around across these program boundaries, we get to leverage the stack.
对吧?
Right?
因为有两个领域我们会想要关注性能。
Because there's gonna be two areas where we're gonna wanna focus around performance.
一个我认为会是关于数据导向设计,以及我们是否对硬件和缓存系统友好?
One will be, I think, around data oriented design and are we being sympathetic with the hardware and the caching systems?
我们是否以最佳方式处理数据?
Are we working with data the best way we can?
另一方面则是,我们能否减轻垃圾收集器的压力,让它不必频繁运行?
And then the other side is gonna be, can we reduce pressure on the on the garbage collector so it doesn't have to run as much?
对吧?
Right?
我认为有两个方面,我们可以在第一天就关注性能问题,尤其是在性能不足的时候。
And these are two areas, where I think we can focus just day one around performance when we're not getting enough of it.
但我总是告诉班上所有人,不要被这些东西搞得束手束脚。
But I I tell everybody in my classes all the time, I go, don't become paralyzed by all this stuff.
你必须先让你正在做的东西运行起来,然后才能分析和衡量哪些是有效的。
You you have to get whatever it is you're working on working first, and then you can profile and and measure what's working.
但性能分析工具非常出色。
But the the profiling tooling is amazing.
对吧?
Right?
你可以看到那些容易解决的问题,然后找出真正值得投入时间的地方。
You can see the low hanging fruit and then look at where you can spend real time.
你的精力应该用在哪儿?
Where does your time need to be?
然后这些工具会帮助你理解:我该如何在这里获得更好的性能?
And then these things kick in to help you understand how can I get some better performance here?
我有没有考虑到缓存系统?
Am I not being sympathetic with the caching system?
我有没有考虑到操作系统?
Am I not being sympathetic with the operating system?
我有没有考虑到垃圾回收器?
Am I not being sympathetic with the garbage collector?
因为我在这里分配了太多不需要的东西。
Because I'm just allocating too much stuff here where when I don't need to.
那数据导向设计呢?
How about the data oriented design?
我的意思是,我理解我们不希望在还不知道需要优化什么、在哪里优化之前就提前最大化性能。
I mean, I understand we don't want to maximize performance ahead of time before you know what you need to optimize and even where you need to optimize.
数据导向设计这个概念呢?
How about the concept of data oriented design?
我很难想象你能设计出一种完全不以数据为导向、却还能正常运行的软件。
I can't totally see you designing your software in a way that is not data oriented, and is to make it work.
你可能会有性能问题,也可能没有。
And you might or might not have performance issues.
但假设你确实有,你知道,你想做一些调整。
But let's say you do, you know, you want to change things around.
在我看来,如果你一开始就没有考虑数据导向的设计方式,那么要做出这些改变将会非常困难。
It seems to me, if you didn't start out thinking about data oriented, that way of doing it, the changes will be so great.
重新做会非常困难,而如果你一开始就以这种方式思考,就不会这么麻烦。
The redoing will be so great versus if you had started out thinking in that way.
我们是否应该这样做,不是为了过早地优化性能,而是因为从数据导向设计开始,是否还能带来超越性能的其他好处,比如代码的可读性、可维护性之类的?
And should we be doing that not so much in terms of let's try to optimize performance too early, but is there are there payoffs of starting out with data oriented design that go beyond performance, maybe just code readability, maintainability, that kind of thing?
Go 是一种面向对象的编程语言,但我不希望人们用 Go 写面向对象的程序,我认为这就是界限。
So Go is an object oriented programming language, but I don't want people writing object oriented programs in Go, and I think that's the line.
如果你在写面向对象的软件,你就没有先考虑数据,而是在考虑所有那些关系。
I think if you're writing object oriented software, you're not thinking about the data first, you're thinking about all of those relationships.
面向对象的编程设计最终往往会形成链表。
Object oriented programming designs tend to create linked lists at the end of the day.
它们就是在做这件事。
That's what they're doing.
在我看来,它们不契合当今硬件的工作方式。
They're, to me, not sympathetic with the way the hardware works today.
对我来说,这关乎尽可能分离你所处理的数据和作用于这些数据的行为。
For me, this is about separating where you can, the data that you're working on and then the behavior that's going against that data.
我是函数的忠实粉丝。
I'm a big fan of functions.
我热爱函数。
I love functions.
当初接触Go语言时,最棒的一点就是我能重新使用函数了。
One of the things that was so great when when I I got got into back, Go was I had my functions back.
并非所有东西都必须作为类的方法存在。
Not everything had to be a method on a class.
而且我认为,当你以‘这是我的状态,这是一些行为’的方式使用函数时,它们也能帮助大幅减少代码量。
And I think functions can also help reduce a huge amount of your code when you're using them in a sense where here's my state and here's some behavior.
我的意思是,方法在Go语言中扮演着非常重要的角色。
I mean, methods play a huge role in Go.
我并不是说你就不会使用方法。
I'm not saying that you're not gonna have methods.
但对我来说,关键在于不要以面向对象的设计模式来架构事物,而是要真正思考:这是数据,这些是操作,这是输入,这是输出,以及如何用最少的代码来实现它?
But for me, it's about not thinking about architecting things in terms of an object oriented designer pattern, but really thinking about this is the data, these these are the manipulations, this is the input, this is the output, and how do I do that with the least amount of code?
数据导向设计这个概念我认为源自游戏编程领域。
And now the data oriented design concept came from the game programming world, I I believe.
是的。
Yes.
而且他们面临的问题也非常相似。
So and a lot of their problems were were similar.
对吧?
Right?
他们需要事情快速发生,因为他们需要高帧率。
They needed things to happen fast because they need high frame rates.
因此,他们尝试以一种方式组织代码,使他们处理的数据在空间上集中在一起。
So they tried to start organizing their code in a manner so that their so that the data that they were working with was spatially located.
他们将经常一起使用的数据分组在一起进行传递,而不是处理对象。
So they grouped the data they worked with commonly together to pass around versus working with objects.
是的。
Yes.
对。
Right.
他们必须在固定的时间内完成若干任务,而时间是不会变的。
They have to do n number of things in x amount of time, and and time is not changing.
对吧?
Right?
所以他们必须让这一切发生。
So they have to make that happen.
因此,是的,他们逐渐意识到,他们必须比任何人都更体贴。
And so, yeah, they they started to learn that if they they had to be even more sympathetic, than anyone else.
但我认为,只要可行、合理,充分利用切片这个概念,就能在你不知不觉中为你带来很多好处。
But I think the slice, the idea of being able to leverage the slice as much as possible, when it is practical, when it is reasonable, is giving you a lot of this without you even realizing it.
这就是我喜欢Go语言的一点:Go语言给了我们所需的东西,并通过说‘我不会给你其他数据结构’来引导我们走向这些理念。
And that's one of the things I love about Go is that Go has given us the things that we need and is pushing us towards these things by saying, well, I'm not going to give you any other data structure.
我只给你映射和切片。
I'm giving you maps, I'm giving you slices.
即使是映射,其底层也在利用连续内存。
Even maps are leveraging contiguous memory underneath.
对于所有引用类型值,如果我们不共享或传递它们,每个人都会得到一份副本。
Then with all the reference type values, if we're not sharing them, or passing them around, everybody gets a copy.
你只是在获得这些东西。
You're just getting these things.
但没错,我想我最早是在迈克·阿克顿2014年的一次演讲中听到这个术语的,他深入探讨了底层细节,有些内容我甚至都看不懂,讲的是他在构建游戏系统时如何利用数据导向设计。
But yeah, I think I heard the term first from Mike Acton on a talk from 2014 where he goes into lower level detail than some of it I can't even understand about how he's leveraging data oriented design, in that gaming systems that he's building.
所以当我们研究这期节目时,我发现了‘伪共享’这个术语。
So when we were researching this show, I came up I found the term false sharing.
这个概念在这个整体图景中是如何体现的?
How does that fit into this whole picture?
没错。
Yeah.
现在我们开始深入到硬件层面了。
So now we're getting really deep inside the the hardware a little bit.
但关键是,由于每个核心都会加载缓存行,如果你有两个线程,分别运行在两个不同的核心上,并且它们操作的数据恰好位于同一个缓存行中,那么实际上你就有了两份该数据的副本——一份在核心一的缓存中,另一份在核心二的缓存中。
But the idea is that because every core is going to be loaded with cache lines, If you have, let's say, two threads, one each running on a different core, working with the same data that happens to be on the same cache line, you now technically have two copies of that data, one in the cache for core one and one in the cache for core two.
因此,即使每个线程只是操作缓存行中的不同字节,你也不会遇到并发问题。
And so you don't really have, let's say and even if each thread is working with a different byte on each cache line, you don't have a concurrency issue there.
你也不会有数据竞争的问题。
You don't have a data race issue.
但你确实会遇到一种情况:同一份数据被重复存储在两个不同核心的缓存中。
But you do have a situation where the same data is now duplicated inside of a cache for two different cores.
而虚假共享正是由此产生的。
Now the false sharing comes in because of that.
从你的角度来看,这其实是‘虚假’的共享,因为你并没有真正地共享数据。
It's false you don't really from your perspective, you're not really sharing data.
但从硬件的角度来看,这些数据确实被共享了。
But from the hardware perspective, this data is being shared.
虚假共享的问题并不源于读取操作,因为如果你只是读取数据,问题只会在数据被修改时出现——一旦某个核心上的线程修改了该缓存行中的任何数据,所有其他核心中该缓存行的副本都会被标记为无效。
The problem with fault sharing doesn't come from reading because if you're reading data, I mean, it's it's when that data gets mutated because as soon as one thread on one core mutates any data in that cache line, all other copies of that cache line in all other cores now have to be considered dirty.
当另一个线程想要操作该缓存行时,它的缓存行副本是脏的,此时你必须等待新的缓存行版本加载进来。
And when that other thread goes to do something on that cache line, its own copy of the cache line, and it's dirty, you now have to wait for a new version of the cache line to come in.
这可能会导致性能问题。
That can create performance problems.
Scott Myers 举的一个例子是,有人创建了一个全局的计数器数组。
An example that Scott Myers uses is that somebody's created a global array of counters.
这些计数器全部位于同一个缓存行上,假设共有16个计数器,你启动16个线程,每个线程获取该缓存行上的一个独立计数器索引,且每个线程都运行在独立的核心上。
So all of these counters, let's say there's 16 counters all on the same cache line, and you launch 16 threads, each thread getting its own index of counter on this cache line, and all 16 threads getting their own core.
现在,这个缓存行和其中的计数器在每个核心中都拥有了16份副本。
We now have 16 copies of this cache line, of these counters in every single core.
每当一个线程写入并递增其计数器时,其他15个缓存副本都必须被标记为脏。
And every time one thread writes, increments its counter, all other 15 other caches now have to be marked as dirty.
你正在频繁地访问内存,因为每个线程对其计数器执行自增操作时,都会迫使其他所有线程等待其缓存行副本被更新。
And you're chugging through memory because every thread that does a plus plus on their counter is causing every other thread now to have to wait for their copy of the cache line to get updated.
这正是伪共享的真正含义。
So that's really what default sharing is all about.
展开剩余字幕(还有 271 条)
所以一个例子就是,如果你有一个单一的后备数组来存放所有的计数器。
So an example of that would be if you had a single backing array holding all of your counters.
是的。
Yep.
所以我们经常看到这样的情况。
So and and we see stuff like this all the time.
对吧?
Right?
所以,比如在你的包中,如果你有一个公开导出的数组或切片,而且它没有被追加过。
So, like, in your package, if you had a publicly exported array or slice for that matter that just isn't isn't appended to.
嗯,即使它被追加过。
Well, even when it is appended to.
但举个例子,你有一个包含八个八字节整数的数组,你用作计数器。
But for example, you have an array of, you know, eight eight byte integers that you're using as counters.
所以你的例子就是,如果使用这些计数器的每个线程被调度到不同的CPU或核心上,那么递增其中任何一个都会导致该特定缓存行的所有缓存失效。
So your example would be if each one of the threads using those were scheduled onto different CPUs, or cores, that incrementing any one of those would cause all of the caches to be blown out for that particular cache line.
没错。
That's right.
因为从你的角度来看,你并没有缓存。
Because from your perspective, you're not caching.
但从硬件的角度来看,你确实有缓存。
But from the hardware perspective, you are.
因为每个核心都有自己独立的相同缓存行副本。
Because every core has a unique has its own copy of that same exact cache line.
所以我想这其实呼应了你的整个数据导向设计。
So and I I guess this kind of echoes back to your whole data oriented design.
对吧?
Right?
因为如果你把正在处理的所有数据都保持在本地,它们就不会被集中放在其他地方。
Because if you were if you were keeping all the data locally that you're working with, they wouldn't be grouped together somewhere else.
对吧?
Right?
因为计数器放在一起没有意义。
Because the counters don't make sense together.
它们各自独立才有意义。
They make sense in So their individual
因此,由于每个goroutine栈以及在该特定情况下任何goroutine的栈帧都将位于其独有的缓存行上,解决此类问题的方法是在局部变量上执行计数器操作,该变量将位于每个线程(或在此情况下每个goroutine)的缓存中。
the to that is since every goroutine stack and the stack frame in that particular case for any goroutine is going to be on its own unique cache line, the solution to something like that would be to perform your counters on a local variable that would be on your cache for each, say, or go for each thread in that case.
因此,每次每个线程执行加一操作时,都是在独有的缓存行上进行的。
Therefore, every time each thread performs a plus plus, it's on a unique cache line.
在该算法结束时,你可能会对全局进行一次最终写入,这不会对你造成影响。
And at the end of that algorithm, you might perform one last write to the global, and that's not going to hurt you.
那就是一次性搞定,砰砰砰砰。
That's one time boom, boom, boom, boom.
数据局部性,当我们讨论的不仅是读取还有写入时,也能在与缓存系统工作方式相协调方面提供巨大帮助。
Data locality, when we're talking about not just reading, but writing, can also add a huge help in terms of being sympathetic with the way the caching system works.
所以如果你必须为每个人挑选几个要点,在开发过程中要意识到的事情,至少开始这段与硬件更协调的旅程,这些要点会是什么?
So if you had to pick just a couple takeaways for everyone, things to be cognizant of when developing to be to start at least the journey of being more sympathetic to the hardware, what would those be?
我告诉每个人,如果你不确定如何做某事,就问问什么是做这件事最符合语言习惯的方式,然后照着做。
I tell everyone, if you're not sure how to do something, ask the question around what is the most idiomatic way to do this and go.
因为许多这类答案已经针对与操作系统或硬件保持友好进行了优化。
Because many of those answers are already tuned to being sympathetic with the operating system or the hardware.
我告诉人们的下一件事是,如果你在处理数据,尝试以值切片作为你的初始数据加载方式。
The next thing I tell people is if you're working with data, try to work with slices of values as your initial load of data.
你可以共享这些切片的不同元素,但你所处理的核心数据,我们尽量保持其连续性。
You can share different elements of those slices, but the core data you're working with, we try to keep it as contiguous as possible.
这不可能完美,因为你将处理字符串和引用类型,它们会指向其他东西。
It's not going to be perfect because you're going to have strings and you're going to have reference types that will have pointers to things.
但编译器是一个工具,如果我们与它合作,它会尽力帮助我们实现这一点。
But the compiler is a tool, it's going to do its best if we work with it to help us there.
尝试思考,当你处理非常大的数据集时,任何给定时刻你可能正在处理的工作集是什么?
Try to think about if you're working with very large sets of data, what are the working sets that you might be working with at any given time?
尽量将这些数据集中在一起,并尽可能避免使用链表之类的东西,因为它们无法帮助你形成可预测的访问模式。
Try to keep that together and really try to avoid when you can and when it's practical, things like link lists that are not going to really help you create predictable access patterns.
有时候,无论你在做什么,你试图构建的算法并不适合数组和线性遍历等结构,事实就是如此。
There are times where whatever you're doing, the algorithms you're trying to build are not, they're just not going to be practical for arrays and linear traversals and things, that is what it is.
但我认为很多时候,你可以以某种方式布局数据,并在某种方式下工作,从而获得这些协同效应,同时仍然实现你试图实现的算法。
But I think a lot of times you can lay that data out in a way and work within a way where you can gain these sympathies and still implement the algorithms that you're trying to implement.
那么在更低层级上布局数据呢?
So what about laying out your data at a lower level?
我记得几个月前我们讨论过这个,你来坦帕拜访我们时,谈到了结构体的大小以及将它们保持在字边界内的问题。
I know when we talked about this a few months ago, when you came to visit us in Tampa, you talked about the size of the structs and keeping them within word boundaries.
这如何影响处理速度?
How does that affect processing speed?
我不太确定。
I'm not really sure.
如果我这么说,我不是结构体,如果它是数据的话。
If I said that, I'm not The struct, if it's data.
对我来说,当我查看结构体时,我会从两个不同的角度来看待它。
To me, when I look at a struct, I look at it in two different ways.
我要问的是,这个结构体代表的是纯粹的数据,还是一个类似于Go池或团队这样的概念?
What I'll ask is, does this struct represent pure data or is it a struct that's going to be some concept like a pool of Go or Teams?
我正在创建一个Go池或团队。
I'm creating a pool of Go or Teams.
我会创建这个东西的多个实例。
I'm going to create multiple instances of this thing.
它在管理Go或团队,就让我干活吧。
It's managing Go or Teams, just let me do work.
这是一种情况。
That's one thing.
但如果这个结构体是纯粹的数据,那么它的大小就应该是它所需的大小,无论是四千字节还是24字节,它就是那样。
But if the struct is going to be pure data, then the size of that struct is what it needs to be, whether it's four k or 24 bytes, it is what it is.
但那时我关注的是填充的概念。
But what I'm looking at then is the concept of of padding.
对吧?
Right?
如果这是一个纯数据结构,我要创建十万甚至更多这样的结构体,并将它们在内存中连续排列,我就不会希望字段的排列方式导致中间出现多余的填充字节,从而让我使用比实际需要更多的内存。
If it's pure data where I'm gonna create a 100 thousands of these structs and even lay them out contiguously in memory, I don't wanna lay the fields out in such a way where I'm going to get extra padding bytes in between that's gonna cause me now to have to use more memory than I need to.
但这种情况仅在我认为这个结构体是纯粹的数据时才成立。
But that's only when the struct in my in my view, the struct is really pure data.
除此之外,我希望按照结构体所承担的功能,以逻辑上合理的方式排列字段。
Other than that, I want to lay fields out in a struct that makes sense organizationally to what that struct is doing.
是的,抱歉。
Yep, sorry.
我本来想说,如果你看一下结构体,当结构体足够大,无法全部放入一个缓存行时,这可能会成为一个因素,对吧?
I was just going say, so one of the things I think might come into play there if you look at structs is if the struct is large enough where it doesn't all fit in a cache line, right?
如果你把属性放在结构体的顶部和底部,在进行常规操作时,可能会不断导致缓存行失效。
If you're using properties at the top of it and at the bottom of it, you could keep blowing out the cache line as you're just doing typical work.
因此,有时为了确保经常一起使用的数据能正确对齐,将它们分组排列是有意义的。
So I think sometimes it might come into play to organize your struct in a manner so that the things that are often used together are grouped together to ensure that they align properly.
但我的意思是,这已经深入到性能优化的细节了。
But I mean, this gets into going in-depth into performance optimizations.
有时候这有点过头了。
And sometimes it's a little too far.
这可能是一种微观优化。
Can be a level of micro optimization.
如果这些结构体,比如说,跨越了64字节,它们仍然分布在两个缓存行上,而下一个结构体可能又延续到下一个缓存行。
If these structs, let's say they do span over 64 bytes, they're still being laid out across two cache lines, And the next one might be laying into the next one.
所以你可能还是会遇到同样的问题。
So you still might see the same sympathies anyway.
如果你开始修改这些内容,那么我们就又回到了虚假共享的问题。
If you start mutating these things, right, then we go back to the false sharing issues.
而且,你知道,现在的硬件设计得非常擅长快速复制数据。
And, you know, the hardware today is designed to copy copy data really fast too.
我告诉人们,别因为觉得你的结构体太大而恐慌,觉得现在我们必须到处共享它。
I tell people, don't panic because you think you've got a struct that's too large to copy and now we're just going to start sharing it everywhere.
直到你进行性能分析,真正了解情况之前,都不要轻易下结论。
Until you performance, until you do some performance profiling, you really know.
所以我更希望代码能围绕我们要解决的问题保持合理,而不是在编写代码时就开始考虑性能。
So I'd rather the code be really reasonable around what we're trying to solve and not start thinking about performance as you're writing the code.
我们总可以在之后再进行性能分析和优化。
We can always go in performance and profile it later.
然后我们可能会决定,你知道吗?
Then we just might decide that, you know what?
是的。
Yeah.
根据我们的使用方式,这个结构体太大了,不适合复制。
This was too large to make copies of based on how we're using it.
在这种情况下,跨程序边界共享它性能更好。
And it was better performing and sharing this across these program boundaries.
有哪些简单的方法或经验法则可以帮助人们实现这种面向数据的设计,即把会一起使用的数据分组放在同一位置?
What are some easy things to do, some easy rule of thumbs that can help people achieve this data oriented design, thinking about grouping data that you're going to use together in the same place?
在开始一个程序时,你是如何思考这些问题的?
When you start out a program, how do you think about these things?
我坚信我们解决的每一个问题都是数据问题。
I really believe that every problem we solve is a data problem.
这本质上是某种数据操作。
It's some data manipulation.
因此,在项目一开始,我首先会问:我们处理的是什么数据?
And so the very first thing I'm doing on projects is I'm asking what is the data that we're working with?
我的输入是什么?
What is my input?
我们想要实现什么目标?
And what is it we're trying to achieve?
我们要到达哪里?
Where are we trying to get to?
这是我的输入,这是我的输出。
Here's my input, here's my output.
然后我们才能开始思考如何从这里到达那里。
And then we can start thinking about how we're to get from here to there.
有时这些问题非常复杂。
Sometimes these are really complex problems.
我们必须将它们分解为非常小的、可实现的、更小的数据转换问题。
We've got to break them down into really small, obtainable, smaller data transformation problems.
当我开始思考这些数据是什么样子时,我会考虑:这部分是纯粹的数据吗?这部分是否更多是围绕我们如何进行操作而构建的?还有Eric和Brian之前提到的,我们知道这些数据可能会跨越多个缓存行,因为数据量相当大。
That's for me when I start thinking about what does this data look like, is some of this pure data, is some of this more constructs around how we want to do the manipulations, and then things that Eric and Brian were already saying, well, we know that this is going to go across maybe multiple cache lines as a pretty large data.
我们能否将工作集组合在一起?
Can we group the working sets together?
诸如此类的问题。
These types of things.
我不会因此完全陷入瘫痪,因为我们必须解决问题。
I don't get completely paralyzed over it because we have to solve the problem.
如果你连东西都搞不起来,就几乎无法做任何这些事情。
If you don't get something to work, you can't apply almost you can't do any of this.
你必须先让某些东西运行起来。
You gotta get something to work first.
但我认为Go语言的精妙之处在于,它推动我们朝着第一次就做对的方向前进——只要我们遵循惯用法、使用值切片,并按照社区近年来倡导的方式行事。
But I think what's brilliant is Go is pushing us in the direction to do things fairly right the first time if we follow the idioms, if we work with slices of values, if we're doing things the way that as a community over the last few years, we've directing people to do.
你有什么资源可以让我们开始探索这些概念吗?
Do you have any resources for us to go out and start exploring these concepts?
有的。
Yeah.
在Go培训的GitHub仓库ardenlabgotraining下,我实际上在那里有一个名为Reading的文件夹。
On the Go training GitHub repo under ardenlabgotraining, I actually have a folder in there called Reading.
我整理了大量链接供大家阅读,其中有一整个部分是关于CPU缓存、Linux操作系统、调度器工作原理等内容的。
I've got a ton of links that I've pulled out for people to read and there's a whole section there around CPU caches and the Linux operating system and how the scheduler works and things like that.
在每一节的培训材料中,都有大量链接和资源可供深入学习。
Throughout the training material for each section, there's a ton of links and resources to learn more.
我所知道的一切都来自这些视频和文章,而且我也会经常重读它们,因为内容实在太丰富了。
Everything that I know comes from these videos and articles and I'm always rereading them as well because there's so much air.
有时候我需要花几个月时间消化某些内容,之后会回头再读一遍,获得更多收获。
It takes me sometimes a couple of months to absorb some stuff and then I'll go back and read it again and get more.
所以,是的,这些东西都能找到,我努力整理了一个很好的合集,全部都在培训仓库里。
So yeah, it's all out there and I've tried to create a good collection of this stuff and it's all there in the training repo.
好吧,我们稍微换个话题。
Alright, have a little bit of a change of a subject.
现在有一个草根运动,我不确定你是否知道,今年有好几个人在谈论在GopherCon上扮演你。
There's a grassroots movement going around, I'm not sure if you're aware of it but there are several people that are talking about cosplaying as you this year at GopherCon.
你知道这件事吗?
Did you know about this?
你能给他们一些建议,比如去哪里找帽子吗?
Could you give them some advice on maybe where to find the hat?
再说一遍。
Say that again.
他们想要帽子?
They want the hat?
他们今年打算在GopherCon上装扮成你。
They're they're looking to dress up as you this year at GopherCon.
好几个人都提到了这件事。
Several people have mentioned it.
玩这个游戏的成本太高了。
That is too expensive to be playing that game.
我要把这件事说出来。
I'm I'm gonna put this out there.
任何装扮成比尔·肯尼迪的人都可以免费喝啤酒。
Free beer for anybody who comes dressed as Bill Kennedy.
我的天。
My god.
我会来一场彻夜狂欢。
I will take a barbie all out night.
真有趣。
Funny.
我甚至会偷走那顶真正的帽子然后送给
I I will even steal the real hat and give it to
你。
you.
如果
If
这正是你缺少的。
that's all you're missing.
哦,这太值了。
Oh, that's priceless.
我们知道你很忙,比尔。
So we know you're a busy guy, Bill.
你在GopherCon上有工作坊。
You've got workshops going on at GopherCon.
你还有书在进行中。
You've got the book going on.
除了你的培训之外,你还有什么想告诉我们关于你正在做的这些事情的吗?
Is there anything you wanna tell us about of any of those things that you've got going on other than your training?
我的意思是,这些培训总是非常令人兴奋。
The the the thing I mean, the trainings are always really exciting.
我非常期待在Go大会的第三天举办NAATS研讨会,但目前让我特别兴奋的是,我和Carlicia通过GoBridge启动了一个远程线下活动平台,我们现在正在集结一批顶尖的演讲者,他们将在六月和七月开始发言。
I'm I'm really excited to be doing a NAATS workshop on the on the third day of Go for Con, but I think one of the things I'm really excited about right now is, Carlicia and I, through GoBridge, we started the remote meetup platform, and we're putting an all star lineup of speakers together right now that will start speaking in June and July.
这将非常棒,因为无论你住在哪儿,每个人都能聚在一起,这个平台在促进协作方面真的非常出色。
It's going to be awesome because it doesn't matter where you live, everybody's going be able to come together and the platform, the big market platform is really amazing in terms of being able to have collaboration.
但我们的真正目标并不是让GoBridge拥有一个远程线下活动,而是让任何地方的人——无论他们住在哪个小镇或偏远地区——都能自己发起线下活动,找到他们感兴趣的主题和演讲者,即使他们是世界上某个小地方唯一的人也没关系。
But the real goal for us here is not for GoBridge to have a remote meetup, for anybody, no matter where they live, to be able to start their own meetup, to be able to find their speakers of the things that they're interested in and and have a meetup even if they're the only person that lives in this small town or remote area of the world.
发起一个线下活动,找到志同道合的人,寻找你自己的演讲者,然后开始交流。
Start a meetup, find people with similar interests, find your own speakers, and and and start to meet.
你知道的?
You know?
我真的很希望到今年年底,我们能看到再有十到十五个Go线下活动,都是围绕这个远程线下活动的理念开展的。
I'm I'm really hoping that we can see another 10 or 15 Go meetups by the end of the year all being driven around this idea of a remote meetup.
这是个很棒的想法。
That's a neat idea.
是的。
Yeah.
布莱恩和我通常去不了塔姆帕的那个活动。
Brian and I, commonly don't make it up at the Tampa one.
我的意思是,时间总是不等人。
I mean, time gets the better of you.
所以。
So
对。
Right.
我知道有很多人来找我,即使我在迈阿密,旧金山那边有我想听的演讲者举办的活动,我也去不了,有时候真让人沮丧。
I I know so many people that come to me and they they they get even me when I'm in Miami, and San Francisco's holding a meetup with people that I want to hear and I can't get out there, it can be depressing sometimes.
但很棒的是,你可以自己发起一个线下活动,来自世界各地的演讲者都能参与,你就不会觉得自己错过了什么。
But what's great about this is you're going be able to really start your own meetup and speakers from all around the world can come in and you don't have to feel like you're missing out.
我非常喜欢Go社区。
And I love the Go community.
我的意思是,任何人都可以联系Brian,问他:Brian,你能为一场线下聚会做个演讲吗?
I mean, you can reach out anybody can reach out to Brian and say, Brian, will you give me a can you give give a talk for a Meetup?
Brian一定会答应的。
And Brian's gonna say yes.
他会答应的。
He will say yes.
我也会答应的。
I will say yes.
有人会替他答应的。
Somebody sign yes on his behalf.
我没法让Eric答应,但总有一天我们会让他答应的。
I can't get Eric to say yes, but we're gonna get Eric to say yes one day too.
总有一天。
One day.
比尔,你能提一下已经同意举办聚会的那些人吗?
Bill, do you wanna mention some of the people that have already agreed to do a meetup?
是的。
Yeah.
我们有——希望我没念错她的姓氏——Dropbox的Tammy Buto,她已经安排好了演讲。
So we have, I hope I'm pronouncing her last name right, Buto, Tammy Buto from Dropbox, who scheduled the talk.
我还没公布这个消息。
I haven't published this yet.
Kelsey Hightower也同意来做一次演讲。
Kelsey Hightower has agreed to give a talk too.
我会很快在那些日期公布出来。
I'll be publishing that very soon on the days that are there.
我们已经联系了更多人。
We've reached out to a few more people.
我还没收到确认回复,希望很快会有消息,我们会把信息发布在聚会页面上,也会在推特上发布,对此我们真的非常非常兴奋。
I haven't gotten confirmations yet, so hopefully they're going to be coming in soon and we'll publish that on our meetup page and, you know, we'll tweet that out and we're really, really excited about that.
这真是太棒了。
That's really awesome.
是的。
Yes.
我建议大家去注册。
And and I suggest to people to sign up.
参会人数限制为100人。
There is a limit of a 100 attendees.
所以当你看到推文发布时,赶紧去注册,别错过机会。
So when you see the tweet going out, just go and sign up so you don't be left out.
我得说,Compose。
I have to say, Compose.
他们赞助了我们的Plus账户,让我们能容纳100人。
Is sponsoring our Plus account that gives us the 100 people.
我们非常高兴他们伸出援手,支持Go社区。
So we're really excited that they stepped up and they're supporting the Go community.
当然。
Absolutely.
太棒了。
That's great.
他们也是 GopherCon 的赞助商,所以更要给他们点赞。
They're also GopherCon sponsors, so double props to them.
没错。
Yep.
所以现在我主要专注于这一点,利用我有限的时间,尽量安排足够的演讲者,真正向人们展示这个平台的强大功能,让其他人也能加入并创办自己的线下聚会。
So that's what I'm kind of focusing on now with what little time I have trying to get enough speakers set up and really show people the power of the platform so others will come in and start their own Meetups.
归根结底,这正是我最希望看到的。
The end of the day, that's what I would love to see.
不过我觉得你还不够忙。
I don't think you're quite busy enough, though.
你又不是经常出差什么的。
It's not like you travel or anything.
我的意思是
I mean
嗯,你知道的,我在飞机上有好多时间。
Well, you know, I have a lot of time on planes.
在飞机上睡觉。
Sleep on them.
哦,我做不到。
Oh, I can't.
我在车里睡不着。
I can't sleep in cars.
我在飞机上也睡不着。
I can't sleep on planes.
总的来说,我就是睡不着,我想。
I I just in general, I can't sleep, I guess.
所以,通常来说,我想我们时间快到了,但通常我们结束节目时,都会感谢那些让我们的生活变得更美好、更轻松的开源项目,以表达感激之情。
So typically, like I guess we're running out of time here, but, typically, the way we close out of, the shows is we like to thank open source projects that have kind of made our lives better and easier just to show appreciation.
所以我们会快速地在虚拟房间里轮流发言,每个人都可以简单地致谢一下。
So we'll quickly go around the virtual room here, and everybody can give a quick shout out.
比尔,如果你有合适的项目,欢迎分享。
Bill, if you've got one handy, you're welcome to join.
我最近在做的一个项目是Anvil IO,因为我参与了CORAL项目,这是一个开源项目,它提供身份验证和授权功能。
One that I've been working on, because I I do some work on the CORAL project, which is an open source project, is Anvil IO, which is providing authentication and authorization.
它全部用Node编写,但我们已经在客户端添加了对Go的支持,这是一个非常棒的平台。
It's all written in node, but we've added some Go support on the client side and it's a really cool platform.
太棒了。
Awesome.
布莱恩呢?
Brian?
这周我想特别提到的一个项目是GoValidator。
So, one of the projects that I wanted to shout out this week was GoValidator.
链接会在节目笔记中,但如果你曾经需要验证传入数据,就知道编写电子邮件验证或信用卡验证的正则表达式有多痛苦。
The link will be in the show notes, but if you've ever had to validate inbound data, you know how painful it is to write that regex for email validation or credit card validations.
这是由亚历克斯·萨斯科维奇开发的项目,收集了所有可能需要对传入数据进行的常用验证,简直就是绝佳验证工具的宝库。
This is project by Alex Saskovich that collects all of the important validations that you might need to do for incoming data, and it's it's just a treasure trove of of good validations.
你知道吗?
You know?
即使你反对依赖项,这个工具你也应该拥有,因为它提供了一个非常井然有序的验证项列表,用于验证你的数据。
Even if you're against dependencies, this is one you want to have because they're very nicely organized list of things to validate your data.
太棒了。
Excellent.
Karlicia?
Karlicia?
我想表扬一下乔·菲茨杰拉德。
I want to give a shout out to Joe Fitzgerald.
我发音不太准。
I can't pronounce it properly.
他是为Adam维护所有Go包的人,做得非常出色。
He is the one who does all the Go packages for Adam, and he does an amazing job.
他开发了Go Plus、Autocomplete Go、Meta Linter和Tester Go等多个包。
He has Go Plus and Autocomplete Go, Meta Linter and Tester Go, a bunch of packages.
我经常使用它们。
I use them all the time.
他太棒了。
He's amazing.
他经常在 Go 的 Slack 编辑器频道上,非常乐于助人。
He's frequently on the editor channel on Go for Slack and very helpful.
我很喜欢他为 Adam 做的一切。
Love the things that he is doing for Adam.
谢谢你,Joe。
Thank you, Joe.
我甚至不知道还有个编辑器频道。
I didn't even know there was an editor channel.
这些频道出现得太快了。
These these channels pop up too fast.
居然还有个这样的频道?
It's like, There's a channel for that?
有没有一个烧烤频道?
Some is there a barbecue channel?
现在有了。
There is now.
有一个烧烤频道。
There is a barbecue.
是的。
There is.
嗯。
Yeah.
这问题真傻。
That's a silly question.
所以
So
这很有趣。
this is funny.
这有点跑题了。
This is kinda sidelining here.
但有人提到需要一个烧烤土拨鼠。
But somebody made a comment about, like, needing a barbecue gopher.
他们说,我们应该看看阿登的人会不会为我们做一个。
They were like, we should totally see whether the Arden guys will create one for us.
但 apparently 已经有一个了。
And there apparently already exists one.
已经有一个土拨鼠,站在烤架旁,或者我记不清它长什么样了。
There's already a gopher, like, standing at a grill or what I forget what it looks like now.
他站在烤架旁。
He's standing at a grill.
他戴着一顶牛仔帽。
He's got a cowboy hat on.
他系着围裙,手里拿着烧烤夹。
He's got an apron, and he's got the barbecue tongs.
我能看出来
And I can tell
我告诉你,衬衫已经下单了。
you that the shirts have already been ordered.
我的在哪?
Where's mine?
已经在路上了,埃里克。
It's in the mail, Eric.
太棒了。
Sweet.
所以,我要感谢 HashiCorp。
So for me, I'm gonna thank HashiCorp.
特别是这周我用到了他们的 LRU 缓存,之前也多次使用过 Vagrant、Vault、Console 等众多其他工具,都非常有用。
Particularly, I'm using their l LRU cash this week that they have available, but many times before trying to, you know, Vagrant, Vault, Console, just so many other tools are useful.
所以至少我要提一下 HashiCorp。
So I'm just gonna bling at least say HashiCorp.
所以我们鼓励大家通过Twitter或其他社交媒体感谢他们最喜欢的开源项目。
So we encourage everybody else to thank their favorite open source projects through Twitter or any other social media.
正如Brian在第一集中提到的,主动联系通常是一件好事。你知道,有时候得到人们的评论真的能带来很大的不同。
Reaching out is often just a good thing, as Brian spoke to in, I think it was episode one, You know, just getting that that comment from people makes it all the difference sometimes.
那么,说到这里,我想我们时间到了,很遗憾,但这一集确实很有趣。
So with that said, I think we are out of time, unfortunately, but it has been quite a fun episode.
而且,我们绝对要感谢Bill来参加我们的节目。
And, we definitely wanna thank Bill for coming on the show with us.
而且我知道我自己,我会去深入研究他培训材料里的更多内容,因为我也有大把的空闲时间。
And I'm I know myself, I'm gonna be digging through more of the stuff he's got in the training material because I've got tons of free time too.
对吧?
Right?
我们所有人都是。
All of us.
没错。
Exactly.
谢谢你们邀请我参加。
And thanks for having me on.
这真的非常有趣。
It's it's been a lot of fun.
我们一定要感谢比尔,哦不,是布莱恩和卡莉西亚参与了这次小组讨论。
Definitely wanna thank Bill or I'm sorry, Brian and Carlicia for for the panel.
我觉得这是我自己最喜爱的一集之一。
I think this has been one of my favorite episodes.
感谢所有收听的朋友们。
Thank everybody who's listening.
我觉得亚当告诉我们,这周有25多人在线收听。
I think Adam told us what there's, like, 25 plus people listening this week live.
这太疯狂了。
That's crazy.
太棒了。
It's great.
它在增长。
It's growing.
是的。
Yeah.
所以我们发布了第一集,这既好又糟糕。
So we also released our first episode, which is both good and terrible.
吓人。
Scary.
是的。
Yeah.
是的。
Yeah.
确实吓人。
Definitely scary.
但你可以得到它。
But you can get it.
因此,在CMS完成之前,GoTime FM 将重定向到 Changelog 网站,我们的第一集就托管在那里。
So GoTime FM will redirect to changelog site where our first episode is hosted while the CMS is completed.
应广大听众要求,我们在CMS完成之前就开始发布剧集了。
By popular demand, we have started releasing episodes before the CMS is completed.
你可以在那里找到它们。
So you'll find that there.
而且可能在接下来的一周内,还会发布更多剧集,给所有等不及的听众。
And probably within the next week, some more episodes will be dropping for everybody who's impatient.
我不确定注册通讯的入口是否在那个网站上,但如果不在,那它就在那里。
I don't know whether the newsletter sign up is on that site, but if it's not, it is there.
伊拉很快就会到。
Ira will be here soon.
所以请持续关注 GoTime FM,重新订阅。
So keep checking back to the GoTime FM to to sign back up.
iTunes 大概会在一周半后上线,差不多是那个时间,因为他们审核总是拖很久,除非他们因为某种原因不喜欢我们的节目。
ITunes will drop, I think, in about a week and a half, something like that, because they take forever to approve unless they tell us for some reason they don't like our show.
未通过。
Not approved.
所以
So
我们在 Twitter 上是 Gotime FM。
we are on Twitter at Gotime FM.
当你实时收听时,可以加入 Slack 上的 Gotime FM 频道。
When you are listening live, Gotime FM channel on Slack.
你也可以和我们互动。
You can also socialize with us.
我有没有漏掉什么?
And did I miss anything?
我们有没有都讲全了?
Did did we get it all?
没有。
No.
这是一期很忙碌的节目。
It was a busy episode.
好的。
Alright.
太棒了。
Awesome.
那么,感谢大家参加本期节目,我们下周再见。
So with that, thanks everybody for being on the show, and we'll see you next week.
再见。
Bye.
再见。
Goodbye.
拜拜。
Bye.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。