本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
准备好启程探索脑机接口的奇妙世界吧。
Get ready to blast off into the incredible world of brain computer interfaces.
《神经载体·挑战不可能》将带您结识那些勇敢突破可能边界的先驱者。
Neurocarriers Doing the Impossible is taking you on a journey to meet the fearless pioneers pushing the boundaries of what's possible.
在本系列特别节目中,我们将聚焦国际BCI大奖的提名者与获奖者——这是脑机接口领域最负盛名的奖项之一。
In this special series, we'll be shining a spotlight on the nominees and winners of the International BCI award, one of the biggest and most prestigious awards in the BCI world.
您将聆听BCI专家分享他们的革命性工作,并一窥获奖背后的故事。
You'll hear from BCI professionals as they share their revolutionary work and get a behind the scenes sneak peek at what it takes to be a winner.
系好安全带,备好零食,让我们共同见证脑机接口世界中将不可能变为现实的奇迹。
So buckle up, grab a snack, and get ready to be amazed as we explore the impossible becoming a reality in the world of BCIs.
但在启程前,我要特别感谢本期BCI大奖播客的联合主持人——
But before we blast off, I want to give a big thanks to the co host of this BCI award edition on our podcast, Doctor.
克里斯托夫·格鲁格与GTEC医疗工程团队。
Christoph Gruger and GTEC Medical Engineering.
当我在辛辛那提大学和辛辛那提儿童医院医学中心任教时,我的团队为需要癫痫手术的患者完成了一些非凡的工作。
When I was a faculty member at the University of Cincinnati and Cincinnati Children's Hospital Medical Center, my team and I did some pretty amazing stuff for patients needing epilepsy surgery.
我们会将特殊传感器(也称为电极网格)直接放置在大脑表面,以比常规方法更快更安全的方式为手术做准备并绘制脑图。
We would put special sensors also called grids directly on their brain to prepare for surgery and create maps of their brain in a much faster and safer way than is usually done.
这被称为高伽马映射技术,能在短短几分钟内定位出手术中必须保留的关键脑功能区。
It's called high gamma mapping and it lets us figure out in just a few minutes the essential parts of the brain that need to be spared during surgery.
你能想象吗?
Can you imagine?
转眼间就能绘制出语言、运动甚至大规模信息处理的脑功能图谱。
You can create a map of language, motor and even mass processing in no time.
这是辛辛那提儿童医院医疗中心首次应用这项创新技术。
It was the first time this innovative technology was used at Cincinnati Children's Hospital Medical Center.
当我转职佛罗里达后,在当地医院创建了首个功能性脑图谱与脑机接口项目。
When I moved to Florida, I established the first functional brain mapping and BCI program at Florida Hospital.
我继续运用高伽马映射技术,帮助癫痫患者在术后保留语言和运动能力。
I continued to use high gamma mapping to help epilepsy patients avoid losing their ability to speak or move after surgery.
但更酷的是,我设计的脑机接口研究能让患者实时用大脑控制外部设备。
But even cooler than that, I created brain computer interface studies that let patients control things with their brain in real time.
我们的患者甚至仅凭大脑就能以惊人的速度拼写单词。
Our patients could even spell words with incredible speed by just using their brains.
完全不需要用手。
No hands involved.
我还发现,脑机接口可以帮助中风多年后手部或腿部长期受损的患者活动肢体,而其他方法对此束手无策。
I also discovered that brain computer interfaces could help patients move their chronically impaired hands or legs years after a stroke when not much else could help them.
因此我开始与Advent Health University的教职人员合作,帮助这些患者恢复手部功能。
So I started working with the faculty from Advent Health University to help these patients restore their ability to use their hands.
我为此接受了专业培训,现在回想起来依然觉得这项技术酷得不可思议。
I received special training for it and it's still mind blowing how cool it is.
我最喜欢的工作环节是教学。
My favorite part of what I do is teaching.
在我创立的神经方法研究所里,我将所有经验和知识整合成了一门独特的脑机接口课程。
At my established Institute of Neuro approaches, I've integrated all my experience and knowledge into a unique course on brain computer interfaces.
我会指导神经生物学专业的学生,他们先学习理论知识,然后我会给他们配备BCI设备,让他们获得脑机接口的实操经验。
I would graduate neurobiology students who started with theory and then I gave them the BCI equipment so they could have hands on experience working with BCIs.
这显著改善了学生的学习方式,他们非常喜欢这种学习如何使用神经技术的实践环节。
It significantly improved the way students learn, and they absolutely love this practical part of learning how to use neurotechnology.
而这一切的实现都要归功于GTEx卓越的脑机接口技术。
And all of this was made possible thanks to GTEx awesome brain computer interface technology.
他们拥有从高科技的10-24通道脑电图系统到可穿戴设备的一切,包括用于神经康复的工具,如医疗级系统恢复套件,以及带有独角兽混合学习模块的教育套件,用于学习如何运用脑机接口技术。
They have everything from high-tech up to ten twenty four channel EEG systems to wearables, tools for neurorehabilitation such as the medical grade system recovery and educational kits with the unicorn hybrid block for learning how to work with BCI technology.
但最重要的是他们的支持与关怀。
But the most important part is their support and care.
我很享受与博士的合作共事。
I have enjoyed working and collaborating with Doctor.
我与Christoph Gruger博士及GTEC员工已合作超过十五年,,我期待未来更多成功的合作。
Christoph Gruger and GTEC employees for over fifteen years and I hope for many more successful years ahead.
所以如果你对GTECH的脑机接口和神经技术感兴趣,可以访问他们的官网gtech.net。
So if you are interested in GTECH's brain computer interfaces and neurotechnologies, check out their website at gtech.net.
非常值得一试。
It's worth it.
你好,尼克。
Hello Nick.
你好,我的特雷。
Hello my Trey.
今天能邀请你参加我们的播客真是太棒了。
It's wonderful to have you today on our podcast.
非常感谢你的到来。
Thank you very much for coming.
你能向我们的听众介绍一下自己吗?
And can you please introduce yourself to our listeners?
谢谢邀请我们。
Thank you for having us.
我是米格里。
I'm Migree.
我是加州大学戴维斯分校神经假肢实验室的博士后研究员,目前从事语音神经假肢的研究工作。
I'm a postdoctoral researcher at UC Davis in neuroprosthetics lab, and I'm currently working on speech neuroprosthesis.
我的研究方向是从大脑皮层神经数据合成语音。
And my focus is to synthesize voice from, intracortical neural data.
我是尼克·卡德。
And I'm Nick Card.
我也是加州大学戴维斯分校神经假体实验室的博士后,该实验室是BrainGate联盟的一部分。
I'm also a postdoc in the same neuroprosthetics lab at UC Davis, which is a part of the BrainGate consortium.
我的研究同样聚焦于语音神经假体,但更侧重于大脑到文本的流程,与马特里斯的大脑到语音方向不同。
And my research is also focused on speech neuroprostheses, but more of a brain to text pipeline rather than brain to voice like Matrice.
嗯。
Mhmm.
我们将在你们的研究中看到这两条路线,这非常令人着迷。
And we will see both of those lines in your study, which I think is fascinating.
所以你们的工作将形成互补。
So you will be complementing each other.
你们是如何开展这项绝妙工作的?
How did you get to do this absolutely amazing work?
也许你可以带我们回到过去,告诉听众你们是如何对神经科学和神经技术产生兴趣的,又是如何发展到能够进行如此惊人研究的阶段的?
Maybe you can bring us in time into the past and tell our listeners how did your interest in neuroscience, neurotech develop, and how did you get to this stage that you can conduct such amazing studies?
嗯,你是资深博士后。
Well, you're the senior postdoc.
没错。
Right.
我的旅程始于英国。
So my journey started in The UK.
我在英国完成了本科和博士学位。
So I did my undergrads and PhD in The UK.
我在高中时就对脑机接口产生了兴趣。
I got interested in brain computer interfaces when I was in high school.
我非常想进一步探索这个领域,于是找到了一所拥有优秀研究项目的大学,并在那里攻读本科。
So I really wanted to explore that further, and I found a university with great research program, which I enrolled to for my undergrad.
我在雷丁大学学习,这让我在本科早期就有很多机会参与脑机接口项目。
So I studied at University of Reading, which gave me plenty of opportunities to get involved in BCI projects early on in my undergrad.
因此我参与了关于无创脑电图BCI的暑期项目。
So I did summer projects on noninvasive EEG based BCIs.
后来我继续攻读博士学位,研究无创VCI的运动系统。
Later on, I continued with my PhD studying motor systems with non invasive VCIs.
我致力于开发用于中风康复的VCI,通过脑电图研究运动生成的神经机制。
So I worked on VCIs for stroke rehabilitation, looking at neural mechanisms of movement generation through EEG.
这就是我开启VCI研究之旅的起点。
And that is how I started my VCI journey.
之后我在伦敦帝国理工学院做博士后研究,期间拓展了LittleBrit项目,研究用于帮助痴呆患者的对话代理,并探索情感机器人技术。
Later on, I did my postdoctoral research at Imperial College London, again in The UK, where I diversified LittleBrit and I worked on conversational agents to help people with dementia and looking at affective robotics.
这种结合语音相关技术与VCI技术的工作让我非常着迷,Sergei和David在UC Davis建立新实验室(作为BrainGate联盟的一部分)时提供的这个机会堪称完美。
So this combination of working on speech related technologies and VCI technologies really fascinated me, and this was a perfect opportunity that was presented to us by Sergei and David, who were starting their new lab at UC Davis as part of BrainGate Consortium.
他们的研究重点是言语神经生理学,以及开发用于生成语音的BCI,这让我十分着迷。
So their focus was studying speech neurophysiosis and making BCIs for generating speech, which was really fascinating to me.
于是我便向他们表达了我的兴趣。
So I presented my interest to them.
然后,是的,我加入了实验室。
And then, yeah, I joined the lab.
非常感谢。
Thank you very much.
尼克呢?
And Nick?
好的。
Yeah.
我的经历有些相似,不过是在美国国内。我在匹兹堡大学读本科。
So I have sort of a similar story, from within The United States instead, I went to undergrad at the University of Pittsburgh.
大一那年,我第一次听说了脑机接口,觉得这非常酷。
And in my freshman year, I for the first time, I heard about brain computer interfaces, and I thought that sounded pretty cool.
后来我成功以本科生研究员的身份加入Aaron Batista的团队,研究神经群体如何编码运动信息,通过灵长类动物和脑机接口进行实验。
And I managed to start as an undergraduate researcher with Aaron Batista there studying neural populations and how those neural populations encode movement and doing this with primates and using brain computer interfaces.
我在他的实验室做了三年本科生研究员,之后留在匹兹堡读研究生,虽然仍研究灵长类动物的运动系统,但离脑机接口领域稍微远了一些。
So I spent three years as an undergraduate researcher in his lab, And then I stayed at Pitt for graduate school, and I still studied the motor system in primates, but it was a little bit a step away from BCI.
这更像是迈向基础科学的一步,专注于追踪神经回路这类研究。
It's more of a step toward basic science and tracing circuits and that type of thing.
我知道这段经历对我理解这些系统如何运作非常宝贵,但我也清楚研究生毕业后,我想尝试重返脑机接口领域,利用这些知识产生影响。
And I knew that that experience would be very valuable for me for learning about how these systems work, but I also knew that after grad school, wanted to try to go back to the BCI field and leverage that knowledge and make an impact.
后来我听说谢尔盖·斯塔维斯基正在这里作为BrainGate的一部分组建新实验室,就给他发了邮件,经过面试后一切顺利。
So I heard that Sergei Staviski was starting a new lab out here as part of BrainGate, and I sent him an email, and he interviewed me, and it worked out from there.
谢尔盖和大卫作为实验室的两位联合首席研究员,你很难找到比他们潜力更大的组合了。
And between Sergei and David, who are the two co principal investigators of this lab, I mean, you couldn't ask for a duo with more potential.
我觉得这样的组合确实很难得。
It would be hard to find, I think.
而来到这里后,这点已经得到了充分验证。
And that's certainly proven true since being here.
我是在梅特里加入几个月后进入实验室的。
So I joined the lab just a few months after Maytree.
那大概是一年半前的事了。
And that was, what, like a year and a half ago, something like that.
我们仍有一个加入全新实验室的独特机会。
And we still have a unique opportunity to join a brand new lab.
因此我们得以真正参与建设,从零开始搭建所有系统、所有设备。
So we got to really help and build it up, build everything, all our systems, all our devices from scratch.
这是一次非常独特的经历,同时也非常有成就感。
So it was a very unique experience and very rewarding as well.
当然。
Of course.
从零开始构建一切,我认为没有比这更好的机会了。
To start everything from the beginning, I think there is no better opportunity.
那么你们认为是什么特质使两位成为这个职位的合适人选?
And what do you think made you both suitable candidates for this position?
我相信还有其他申请者,但在这些申请者中你们被选中了。
I'm sure there were other applicants, but from those applicants, you were selected.
你们认为是哪些技能帮助你们获得这个机会?
What do you think were the skills that helped you to get here?
对我来说,谢尔盖提到过这点,他比我更了解情况。
I think for me, Sergei has mentioned this, and he he would know better than I do.
但我想,从猴子研究转向人类脑机接口研究的人似乎形成了一种趋势。
But there's, I guess, somewhat of a trend of people going from monkey research to human BCI research.
在他看来,那些人都取得了成功。
And in his opinion, those people were successful.
所以我认为这可能在他们选择我时起到了作用。
So I think that's that probably helped me in in their selection of me.
但归根结底,我认为这全靠勤奋工作、投入大量时间,并通过本科和研究生阶段的研究证明自己确实有能力、工作努力、发表优质论文、具备批判性思维这类素质。
But at the end of the day, I think it's all just hard work and putting a lot of time in and, you know, trying to demonstrate in your undergraduate and graduate research that you're, yeah, competent and hardworking, putting out good publications, thinking critically, that type of stuff.
当然其中也包含很多运气成分。
And then there's also a lot of luck involved too.
当我寻找博士后职位时,这个实验室刚成立正在招聘博士后。
When I was looking for a postdoc, this lab was starting and trying to hire a postdoc.
有很多非常优秀的候选人因为时机不够幸运而错过机会,而我能遇上这样的时机实在非常幸运。
And there's a lot of really great candidates whose timing does not work out quite so luckily, and I'm very fortunate that it did for me.
是的。
Yes.
是的。
Yes.
在正确的时间出现在正确的地点,并具备合适的技能,这非常重要。
It's very important to be at the right place at the right time and have right skills.
是的。
Yes.
所以你全都具备了。
So you got altogether.
是的。
Yeah.
我很幸运能在本科和研究生期间从优秀的导师那里接受大量出色的培训,我认为这一点的重要性怎么强调都不为过。
I was lucky to benefit from a lot of great training during undergraduate and graduate from great advisers, and I think that really can't be understated how important that is.
要成为一名优秀的研究人员,你必须接受那些既优秀又善良、耐心的研究人员的指导。
To become a good researcher, you have to be trained by good researchers who are also kind and patient and that type of thing.
是的。
Yeah.
绝对如此。
Absolutely.
完全正确。
Absolutely.
谢谢你,尼克。
Thank you, Nick.
你呢,迈特蕾?
What about you, Maitreyi?
是的。
Yeah.
正如尼克所说,关键在于在对的时间出现在对的地方——因为我是实验室最早招募的成员之一。
I think as Nick said, it is being at right place at the right time because as one of the first member of the lab to be hired.
我认为他们当时需要的是在转化研究领域具有更多元化经验的人。
And I think what they were looking for was someone with more kind of diverse experience in transitional research.
我之前曾与不同类型的脑机接口合作过,主要是非侵入式的,还包括语音技术。
And I had worked with different types of BCIs before, mostly noninvasive and also speech technology.
所以我算是拥有这些不同背景的正确组合。
So I had kind of right combination of these different backgrounds.
我还曾与多种神经系统疾病患者合作过,包括中风、痴呆症和帕金森病。
I also worked with several patients with different neurological conditions, so it's stroke, dementia, and Parkinson's.
所以我在该领域的研究经验可能也有所帮助。
So probably my research experience in that area also helped.
我认为还有一点是,我将自己在无创PCI方面的研究应用到了谢尔盖学生时期生成的部分数据上,从而证明了我确实具备可迁移的技能。
I think it was also that I applied my research on noninvasive PCIs to some of the data that Sergey has generated while he was a student and then showed that, yeah, I can demonstrate in my transferable skills.
所以这可能也对我的录用起到了作用。
So that might have also contributed in my hiring here.
嗯。
Mhmm.
那么这些数据是公开可用的吗?是你自由获取并进行分析的吗?
So was this data available, freely available, and you analyzed it?
你用的就是这种方法吗?
Is that approach that you used?
是的。
Yes.
我对探索单神经元分辨率的数据非常着迷。
I was really fascinated to explore single neuron resolution data.
当我研究脑电图时,就想看看我开发的方法是否也能适用于我们从完全不同尺度记录的神经元数据。
When I was working with EEG, wanted to see whether the methods that I developed can also work on the neuron data that we record from a very different scale.
这就是我开始探索那些数据的原因。
And that's why I started exploring that data.
事实证明,是的,我们开发的方法具有很好的可迁移性。
And it turns out, yes, the methods that we develop are quite transferable.
我们可以用这些方法研究不同类型的神经记录。
We can investigate different types of neural recordings using those methods.
所以这很有趣。
So that was interesting.
确实,开放数据集在我的本科、博士乃至今天的研究中都给予了极大帮助。
And, yes, open data sets have really helped me in my research throughout my undergrads and PhD and even today.
因此,数据共享和利用开放数据集对于原型验证和方法创新至关重要。
So data sharing, looking at open data sets is really important to prototype different things and drive different methods.
是的,完全同意。
Yeah, absolutely.
然后把你现在用的代码和解决方案都放进作品集里。
And then add today's code and the solution that you used into your portfolio.
对。
Yes.
确实如此。
So absolutely.
完全正确。
Absolutely.
刚才你提到了这个Brain Gate联盟。
Now you mentioned this Brain Gate Consortium.
是的。
Yes.
还有你来时刚刚成立的实验室。
And the lab that was just being established when you came.
确实,我们认识加州的一些实验室,比如Eddie Chungs的,已经很多很多年了。
And indeed, we know some labs in California, Eddie Chungs, for many many years.
你的实验室听起来很新,不太熟悉。
Your lab sounds very new, not very familiar.
你能多介绍一下这个BrainGate联盟和实验室吗?
So can you tell a little bit more about this BrainGate consortium and the lab?
实验室的主要目标是什么?
What are the main goals of the lab?
正在进行哪些类型的工作?
What type of work is going on?
实验室的未来计划有哪些?
What are the future plans for the lab?
BrainGate目前由四个实验室组成。
BrainGate currently is a collection of four labs.
分别是布朗大学、斯坦福大学,以及最近加入的加州大学戴维斯分校和埃默里大学。
So there's Brown University, there's Stanford University, and now there's UC Davis and also Emory University.
很长一段时间里,只有布朗大学一个实验室。
And for a long time, it was just Brown.
后来斯坦福大学加入,戴维斯和埃默里是最新加入的成员。
And then Stanford joined in with them, and then Davis and Emory are the most recent additions.
每个实验室都有各自的重点研究方向。
And each lab kind of has its own thing that it's focusing on.
布朗大学历来专注于通过脑机接口实现精准光标控制。
So Brown historically has been really focused on getting really good cursor control out of brain computer interfaces.
他们也是联盟和现有临床试验的发起者,显然扮演着重要角色。
And they also started the consortium and the clinical trial as we know it, so they're all obviously an important player.
斯坦福大学是其中历史第二悠久的实验室。
Stanford, which is the second oldest lab in there.
斯坦福实验室由Jamie Henderson和已故的Krishna Shanoi领导。
So the Stanford lab is led by Jamie Henderson and the late now Krishna Shanoi.
我认为至少最近他们在BrainGate项目上主要致力于开发通信速率不断提升的脑机接口。
And I think at least recently, really what they've been focusing on as far as brain gate goes has been creating BCIs with ever increasing communication rates.
从基础光标控制发展到Frank Willett几年前发表的手写体解码技术。
So that went from basic cursor control to the handwriting decoding that Frank Willett published a couple years ago.
再到去年他们最先进的语音神经假体工作,这是闭环皮层内语音神经假体的首批重大展示之一。
And most recently, last year, their speech neural prosthesis work, which was one of the first great demonstrations of this closed loop intracortical speech neural prosthesis.
现在在戴维斯分校,我们显然专注于语音研究,高度专注;而埃默里大学则专注于光标控制,运用抽象机器学习技术处理这类数据,并开始涉足语音领域。
And now here at Davis, obviously, we're really focused on speech, laser focused, and Emory is also focused on cursor control and using some, like, abstract machine learning techniques to do interesting things with these types of data, and they're also now dipping their toe into speech as well.
这是个不断增长的趋势。
It's an a growing trend.
我还想指出,我们实验室的联合首席研究员Sergei Stabiski曾在斯坦福BrainGate实验室做过博士后,之前还是布朗大学BrainGate实验室的研究工程师。
I also wanna note that Sergei Stabiski, one of the co PIs of our lab, did a postdoc in the Stanford BrainGate Lab, and he was also a research engineer before that at the Brown BrainGate Lab.
因此这些实验室之间存在着某种科学传承关系。
So there's sort of a scientific tree that flows through these.
还有大卫·布兰德曼。
And David Brandman too.
是的。
Yes.
当然。
Course.
大卫·布兰德曼,布朗大学的博士。
David Brandman, PhD in Brown.
对。
Yeah.
今年是BrainGate联盟成立20周年。
So this year, BrainGate is 20 years old, the consortium.
嗯。
Mhmm.
嗯。
Mhmm.
那么是何时成立的?
And when established?
大约两年前。
About two years ago.
谢尔盖和大卫在2021年以助理教授身份加入加州大学戴维斯分校,随后于2022年重启工作。
So Sergei and Davids joined UC Davis as assistant professors in 2021 and then restarted in 2022.
嗯。
Mhmm.
你刚才提到已经不止一个实验室了,是的,联盟内已有两个实验室在研究语音神经假体。
And you mentioned that already not only one lab, yes, within the consortium already two labs are working with the speech neuroprosthesis.
为什么对语音神经假体如此关注?
Why is such interest in speech neuroprosthesis?
它的价值有多大?
How valuable is it?
语音神经假体的需求量如何?
How much in demand are speech neuroprosthesis?
嗯,这算是相对近期的趋势。
Well, it's sort of a relatively recent trend.
有趣的是,脑机接口(尤其是BrainGate项目长期专注于手臂和手部控制),无论是控制光标、机械臂还是类似功能。
Interestingly, BCIs, especially in BrainGate for a long time, have focused on arm and hand control, and whether that's controlling a cursor or a robotic arm or some version of that.
我认为该领域曾有一种观念,认为解码言语比处理手臂等低维度动作要困难得多,因为言语发音需要在高维空间中进行。
I think there was just sort of this conception in the field that decoding speech would be way harder than doing something lower dimensional like arm and hand, because there's this high dimensional space of speech articulators.
直到最近人们才开始真正尝试,特别是借助近年来兴起的机器学习和大型语言模型,这使该目标变得更可实现。
And only recently have people actually started to try it, and especially leveraging a lot of this machine learning and large language models that have been coming up in the last few years, that's been helping to make it a more attainable goal.
所以我认为人们一直都有这样的共识:这将会很棒。
So I think it's always sort of been on people's radars of like, this would be great.
我们很希望能实现它。
We'd love to do this.
而就在最近几年,这项技术才开始真正爆发式流行。
And only in the last few years has it really begun to explode in popularity.
其影响我认为是显而易见的,特别是对于构音障碍或失语症患者而言。
And the impact is I would say it's obvious, but I I think that it's for people who have dysarthria or anarthria.
我是说,无法与你周围的人、家人、爱人、同事或任何朋友交谈。
I mean, not being able to talk to people around you, your family and loved ones or your work or any your friends.
这简直是在生活的方方面面都让人丧失能力,无法沟通。
Like, it's debilitating in every sense of somebody's life to not be able to communicate.
而之前被认为恢复这种能力是痴人说梦,现在却触手可及,这一点正变得显而易见。
And what was previously considered to be a pipe dream of restoring that ability is now within reach and it's becoming obvious.
在我们实验室里,我们已经看到与我们合作的临床试验参与者产生了非常积极的影响。
Already in our lab, we've seen a really positive impact with the clinical trial participant that we've been working with.
那么全球范围内,你认为有多少患者需要这种类型的言语神经修复技术?
And how many patients across the world do we think are in need of this type of speech neuropractices?
我相信全球有数百万人患有某种形式的构音障碍、严重构音障碍或完全失语症。
I believe that there are millions of people around the world with some form of dysarthria or severe dysarthria or anarthria, the complete inability to speak.
比如中风患者、患有ALS等神经退行性疾病的人,或创伤性脑损伤、布罗卡失语症患者等。
So people who've suffered from strokes or people with neurodegenerative diseases like ALS or people with traumatic brain injuries, bronchus aphasia, I guess.
有很多病症综合起来,意味着大量人群可以从这类技术中受益。
There's a lot of conditions that collectively add up to a lot of people that could benefit from this type of technology.
非常感谢。
Thank you so much.
你刚才提到,我认为现在有一个关于语音神经假体和语音解码研究的爆发式增长。
And you mentioned that there is, I would say, a boom, an explosion of studies related to speech neuropresthesis, speech decoding.
能否简要向我们的听众概述一下目前世界上已有的技术,以便与你们的新研究进行对比?
Can you maybe give our listeners a brief overview of what's already available in the world so that we can already compare it with your new study?
传统上,需要辅助技术进行交流的人会使用眼动追踪控制器、陀螺仪头部鼠标、旋转板等技术。
So traditionally, people in need of some assistive technology to communicate have used eye tracker controller or gyroscopic head mouse or spinner boards and other kind of technologies.
但这些技术使用起来非常繁琐,且交流速率极慢。
But those are very tedious to use and the communication rates are very slow.
他们必须逐个字符选择,或在屏幕上选择物体和图片。
So they have to select either character by character or select objects or pictures on the screen.
使用这些技术进行交流可能非常困难且受限。
And it can be very difficult and restrictive to communicate using those technologies.
因此我们开始探索脑机接口是否能填补这一空白,以及能否直接从神经信号解码语音,因为我们目标参与者的皮层功能是完好的。
So there was a drive to kind of explore whether BCIs can fill this gap and whether we can decode speech directly from neural signals because the participants that we are targeting have intact functioning in their cortex.
因此我们可以直接从他们的皮层记录神经信号,这些信号无法传递到发音肌肉,这就是他们无法说话的原因。
So we can record the neural signals from their cortex directly and the signals are not being able to be sent to their articulator muscles, that's why they're not able to speak.
但如果我们能在神经数据的源头直接截获这些信号,或许就能从中破译出语音。
But if we can intercept those signals directly at the source in the neural data, we can maybe decipher speech from that.
这就是探索脑机接口用于交流和言语背后的动机。
So that was the motivation behind exploring BCIs for communication and speech.
我们已经看到一些早期研究,实验室使用ECoG从神经数据中破译字符和有限词汇,这表明我们可以解码脑信号来识别言语标记。
So we have seen some early research where labs have used ECOG to decipher characters and limited vocabulary and words from neural data, and that was kind of an indicator that we can decode brain signals to identify speech markers.
在BrainGIP联盟内部,我们能够获取来自UTAres的极高分辨率神经记录数据。
And then within BrainGIP consortium, we have access to very high resolution neural recordings from UTAres.
这些是我们直接植入皮层的微电极。
So these are microelectrodes that we implant directly into the cortex.
因此它们能够记录单神经元分辨率的信息。
So they are able to record signal neuron resolution of information.
这能为我们提供关于言语关联和神经活动的高维度信息。
So that can give us high dimensional information about speech correlates and the neural activity.
我们发现,确实可以通过观察神经数据来可靠地解码语音。
So we found that, yes, it is possible to look at neural data and decode speech reliably.
这正是我们实验室背后的研究动机。
And that is what was the motivation behind our lab.
我们希望能做得更好。
We wanted to do it better.
我们想定位更能反映语音特征的大脑区域,并开发更好的解码器来从脑信号中解析语音。
We wanted to target better brain areas that import speech features, and we wanted to develop better decoders to decode speech from brain signals.
说得好。
Well said.
你特别提到了现有技术的问题。
You asked specifically about available technologies.
正如Maitri所指出的,目前许多辅助沟通技术在速度、效率或使用自然度方面都非常有限。
And as Maitri pointed out, a lot of the available assistive communication technologies right now are very limited in their speed or their efficacy or how naturalistic they are to use.
而我们讨论的这类脑机接口技术目前仍主要局限于学术界的临床试验阶段。
And the type of brain computer interfaces that we're talking about are still mostly confined to these clinical trials within academia.
直到最近一两年,像Neuralink或Synchron这样的公司才开始探索将这些技术推广给更多人群。
And only in the last year or two have companies like Neuralink or Synchron started to explore extending these types of technologies out to more people.
Neuralink刚刚招募了他们的首位受试者,显然他们希望将这项技术推广给更多、更多的人。
So Neuralink has just introduced their first participant, and they're hoping, obviously, to extend their technology to many, many more people.
同样地,Synchron也已在少数临床试验参与者中进行了演示,目标同样是实现更广泛的应用转化。
And Synchron similarly has demonstrated with a handful of people as part of their clinical trial, again, with the goal of really translating beyond that.
可以说这类技术正处于面向需求人群开放的初期阶段,但我们的进展已经比历史上任何时候都更深入。
So we are sort of in the early days of this type of thing being available to anybody who needs it, but we're also further along that timeline than we ever have been.
过去一年里,我们朝着这个目标取得的进展比以往任何时候都要多。
And there's been more progress in the last year towards that goal than there ever has been.
所以未来是光明的。
So it's a bright future.
是的。
Yeah.
随着我们拥有能记录更优质脑神经信号的新设备——这些信号更可靠、信噪比更高——我们将能为有需要的用户提供更好的神经假体系统。
And as we have new devices to record better neuron signals from the brain, more reliable neuron signals with higher signal to noise ratio, we will be able to deliver better neurobascetic systems to the users in need.
是的,完全正确。
Yes, absolutely.
我想讨论一下用于解码语音的不同方法。
And I would like to discuss a little bit different approaches that are being used to decode speech.
我知道有些研究在解码字母,比如单个字母、音素和单词。
I know that some of the studies, they are decoding letters, let's say, single letters, some phonemes and some single words.
实际上有研究——也是BCI奖提名者之一——正尝试从脑数据中解读语义。
There are studies, actually one of the also BCI award nominees that are trying to decipher semantics from the brain data.
是吗?
Yes?
你能告诉听众这些方法的区别,以及你在研究中采用的是哪种吗?
So can you tell our listeners the differences between those and which one of those approaches you are using in your studies?
当你开始思考语音脑机接口时,直觉上会认为应该解码人们试图说出的单词。
When you start thinking about a speech BCI, it sounds really intuitive to think that you should decode words that people are trying to say.
这似乎是语言的基本单位。
That seems like the base unit of language.
问题在于,每个人掌握着3万到4万个不同的词汇,要全面采样并训练模型处理所有这些词汇非常困难。
The problem with that is that any given person knows 30 or 40,000 different words, and to adequately sample all of those and train a model to do that is very difficult.
所以你或许可以训练模型识别一千个单词,但要在此基础上进行泛化,我认为会变得极其困难。
So you could probably train it on a thousand words, but then generalizing beyond that, I think, would get really hard.
因此使用更小的语言单位(如你提到的字符或音素)的优势在于:英语只有26个字母,39个音素。
So the advantage to using smaller units of language like characters or phonemes, you mentioned, is that there's only 26 characters, or there's only 39 phonemes in the English language.
如果能教会模型预测这些单词的组成部分,就有望获取足够数据训练它,从而使其能泛化到任何你想说的单词——只要这些单词由音素和字符构成。
And if you can teach a model to predict these smaller parts of words, then, hopefully, you can get enough data and train it, and then it can generalize to any words that you might wanna say as long as it's built out of phonemes and characters.
对吧?
Right?
语义解码也很有意思,在理想的语音神经假体中可能有其优势。
And then the semantic is also an interesting one and probably would have strength in the ideal speech neural prosthesis.
但我们在脑文本转换研究中,至少我的重点是解码音素。
But we focus on, or at least I focus on, for brain to text generation, focus on phonemes and decoding phonemes.
正如我提到的,英语中约有39个构成整个语言体系的音素。
So as I mentioned, there's like 39 phonemes, which are sounds that make up the entire English language.
因此,任何你能想到的单词都只是这39个音素的某种组合序列。
So any word you can think of is just a sequence of some combination of these 39 phonemes.
如果你能训练一个解码器从神经活动中预测音素,那么理论上你就能说出任何想要的单词。
And if you can train a decoder to predict a phoneme from neural activity, then hopefully you can say whatever word that you want.
根据我们的经验,这一理论已被充分验证。
And in our experience, that's been very much true.
要知道,在12.5万词汇量中,我们能够相当准确地预测某人想说的内容,即使这个词是他们从未通过语音神经假体尝试说过的。
You know, out of a 125,000 word vocabulary, we can predict pretty accurately what somebody's trying to say, even if it's a word that they haven't ever tried to say with the speech neural prosthesis before.
这就是我们当前的研究方向。
That is our current approach.
我认为现实地说,正如我提到的,整合某种语义信息可能会很有用,但语言是所有这些要素的层级结构。
And I think realistically, like I mentioned, integrating some sort of semantic information would probably be useful, but language is a hierarchy of all these things.
除了音素、单词、句子和语义之外,还有许多其他层级可能都有必要纳入,现在只是开始探索这些领域的阶段。
And beyond phonemes and words and sentences and semantics, there's many other layers that would probably be advantageous to include, and it's just a matter of starting to explore those types of things.
我们还在语音PCI方面更进一步,尝试直接从神经活动解码语音。
And we are also taking a step further with speech PCIs and trying to decode voice directly from neural activity.
正如尼克提到的,我们能够准确解码这些声音的基本单位——音素,然后将它们按顺序串联起来形成单词,以文本形式呈现在屏幕上。
So as Nick mentioned, we are able to accurately decode these fundamental units of sounds, which are phonemes, and then string them together in the sequence to form words which appear as text on the screen.
但自然语音远不止于此。
But natural speech is much more than that.
对吧?
Right?
当我们说话时,会有语调的变化、节奏的起伏、音高的调整,以及融入语音的各种表情。
So when we speak, we have changes in our intonations, our cadence, our pitch, expressions that we put into speech.
因此,为了恢复自然语音,我们正在研究如何将神经活动直接解码为声音或语音,当参与者尝试说话时,我们能实时解码并生成声音回放,让他们能像我们现在这样进行实时对话。
So in order to restore naturalistic speech, we are exploring how we can decode this neural activity directly into sounds or voices that when a participant attempts to speak, we can decode that in real time, produce sounds, and play it back so they can have conversations in real time as we are having right now.
我们不需要等待参与者说完才在屏幕上显示内容,再通过那种方式进行交流。
And we don't have to wait for the participant to say something for it to appear on the screen and then the communication to happen that way.
所以我们也在探索语音神经修复的下一阶段,这更具挑战性,因为声音没有离散的分类标准。
So we are also exploring kind of next level of speech neuropathesis, which is much more challenging because, again, we do not have discrete classes for voice.
你知道,你可以用任何方式、任何节奏随心所欲地说话。
You know, you can speak in any way, in any manner you like, in any pace you like.
因此,以这种方式合成声音带来了额外的挑战。
So it becomes additional challenge to synthesize voice that way.
但我们实验室也在探索这一途径。
But we are also exploring that avenue as well in our lab.
非常感谢。
Thank you very much.
现在让我们直接来看这项荣获2023年BCI大奖赛一等奖的研究。
And now let's get directly to the study that won an award, the first place in the BCI award competition twenty twenty three.
能否简要介绍一下这项研究,并说明你们在该研究中的主要目标是什么?
So can you just briefly introduce the study and tell what was the main goal for you in that study?
是的。
Yeah.
如我所言,我们属于这个开展临床试验的BrainGate联盟。
So as I mentioned, we're part of this BrainGate consortium, which runs a clinical trial.
作为BrainGate在加州大学戴维斯分校的分支,我们招募了一名45岁的ALS男性患者参与临床试验。
So the UC Davis site of BrainGate, we recruited a 45 year old man with ALS to our clinical trial.
他实际上从颈部以下都瘫痪了。
And he's effectively paralyzed from the neck down.
这是一种可怕的退行性疾病。
It's a terrible degenerative disease.
他患有严重的构音障碍,即使是专业听者也很难听懂他的话。
He's severely dysarthric, so it's very, very difficult to understand him, even for an expert listener.
因此在他的日常生活中,他的交流方式非常受限,要么只能与受过专业训练的翻译人员交谈,要么使用陀螺仪头部鼠标在电脑屏幕上移动光标,非常缓慢地打出句子。
So in his day to day life, his communication is very restricted to either speaking to a highly trained interpreter or using a gyroscopic head mouse to move a cursor around on a computer screen and very slowly type out sentences.
使用这两种方式,他每分钟只能交流六七个单词,速度非常慢。
So with either of those modalities, he can communicate at about six or seven words per minute, which is not very fast.
所以我们的目标是创建这种多模态语音神经假体,能够将他的神经活动转化为他想说的话,无论是转化为屏幕上的文字,还是直接转化为听起来像他声音的音频。
So our goal was to create this sort of multimodal speech neural prosthesis that could translate his neural activity into the words that he was trying to say, whether that's into text on a screen or directly into audio that sounds like his voice.
谢谢。
Thank you.
据我理解,你们每个人都负责这个项目的某一部分。
And as I understood, each of you was responsible for one part of this project.
是吗?
Yes?
那么,尼克,对你来说,是负责转换为文本的部分。
So, Nick, for you, it was conversion into text.
而我的部分,则是负责直接转换为语音。
And my tray, it was translation into direct voice.
能否详细介绍一下这两个部分,以及你们各自选择这种特定方法的原因?
Can you tell a little bit more about those two parts and why each of you chose that particular part of the approach.
是的。
Yeah.
这种转换为文本的技术,我们称之为'脑到文本',它并非全新概念。
So this conversion into text, we call it brain to text, and it's not necessarily a new concept.
去年《自然》杂志发表的两篇论文证明了这个可能性,他们的错误率在20%到25%之间,意味着每解码四五个单词中就有一个是错误的。
There were two Nature papers that came out last year demonstrating that you could do this, and they got somewhere between a 2025% error rate, which means that one out of every four or five words that they decoded was wrong.
我刚说的这句话里错误可远不止这个比例。
I mean, the sentence I just said had a lot more than that.
想象一下每四个字就有一个错误,那将完全无法理解。
And if you imagine that one out of four words is wrong, it would have been totally incomprehensible.
但我们仍能从他们的经验和这些研究展示的结果中获益良多,尤其是与我们密切合作的斯坦福团队。
But, we were still able to really benefit from their experience and what the results that those studies demonstrated, especially Stanford, who's a close collaborator of ours.
因此,我们在‘脑转文字’项目中的具体目标是向大脑更高产区域植入更多电极记录阵列。
So our goal specifically in this brain to text was to implant more electrode recording arrays into higher yield parts of the brain.
最终目标实际上是精确解码参与者试图表达的内容。
Effectively, eventual goal is to very accurately decode what the participant was trying to say.
所以在这个高度协作的项目中,我不想说只有我负责文字部分而她只负责语音。
So, you know, through in this highly collaborative project I don't wanna say, like, I'm the only text and she's only voice.
我的意思是,两个项目都是高度协作的,但我们各自领导自己的项目。
I mean, we're both both projects are highly collaborative, but we lead our respective projects.
但在文字项目中,我们得以在手术植入电极后的第25天就开始了工作,对象是一位ALS患者。
But in this text project, we were able to so we went in on the first day, and this is twenty five days after the surgical implant into our participant, a man with ALS.
我们等待25天是为了让伤口愈合,同时让电极阵列在他大脑中稳定下来。
And so we wait twenty five days just for healing and so that the arrays could stabilize in his brain.
于是我们在第一个研究环节就开始了,收集了他大约半小时的公开说话数据,他尝试朗读提示句子。
So we came in on the very first research session, and we collected about a half an hour of data of him speaking overtly, attempting to speak prompted sentences.
这意味着屏幕上会出现一个句子,他会尝试大声朗读出来。
So what that means is a sentence would appear on the screen, and he would try to speak that aloud.
在他朗读的同时,我们记录着他的神经数据。
And while he's doing that, we're recording his neural data.
在收集足够多的这类句子后,我们训练了一个机器学习模型,试图将他的神经活动与这些句子中的音素对应起来。
And after so many of these sentences, we then trained a machine learning model to try and map his neural activity to the phonemes that he was saying in those sentences.
然后利用这个训练好的机器学习模型,我们又进行了另一个环节,让他尝试朗读提示句子。
And then using that trained machine learning model, we did another block where he's attempting to read prompted sentences.
但这次,我们使用那个机器学习模型实时预测他想说的话,并将这些单词显示在他面前的屏幕上。
But this time, we were predicting what he was trying to say in real time using that machine learning model, and we were showing those words on a screen in front of him.
所以在第一天,我们不知道效果会如何,为了简单起见,我们只使用了50个单词的词汇量。
And so on this very first day, we we didn't know how good things were gonna be, so we limited ourselves to a vocabulary of only 50 words just to keep it simple.
有无数可能出错的情况:从解码器完全崩溃,到信号中只有电子噪声,甚至是龙卷风来袭之类的意外。
There was a million things that could have gone wrong from our decoder completely breaking to just having electrical noise in the signals or, you know, a tornado rolling through or something like that.
但我们非常高兴地发现,在这些最初的预测他试图说什么的区块中,我们几乎能以99.6%左右的准确率完美实现。
But we were really pleased to find out that in these very first blocks where we're predicting what he's trying to say, we were able to do it almost perfectly with 99.6% accuracy or so.
我们能够预测他按顺序说出的50个单词中的每一个。
We were able to predict which of the 50 words he was saying in sequence.
这是一个非常令人振奋的结果。
And that was a really exciting result.
我们都惊呆了。
We were stunned.
于是两天后我们进行了第二次研究会议。
So we came back two days later in the second research session.
这次我们没有局限于50个单词的词汇量,而是直接跃升至我们拥有的最大词汇量——12.5万个单词。
And instead of just doing 50 word vocabulary, we jumped straight up to the biggest vocabulary that we had, which is a 125,000 words.
同样地,我们收集了更多训练数据,向他展示句子并让他朗读出来。
And, again, we collected a little bit more of this training data where we show him sentences and he reads them aloud.
经过一两个小时的训练后,我们建立了一个新模型。
And after an hour or two of that, we trained a new model.
然后利用这个庞大的词汇库,我们能够以低于10%的词错误率预测他想表达的内容,这与当时刚发表的一些研究相比简直令人惊叹——那些研究只能达到约20%到25%的词错误率。
And then with this huge vocabulary, we were able to predict what he was trying to say with under a 10% word error rate, which is kind of amazing in contrast to some of those studies that were just coming out at that same time, which were only able to get to about 20 or 25% word error rates.
因此我们能在首次实验中就取得这样的成果,充分证明了这种方法蕴含着巨大潜力。
So the fact that we could do this in our very first session showed us that there was huge potential in this approach.
我们非常荣幸成为首批验证这种潜力确实存在的团队。
And we were really pleased to be the first people to demonstrate that this potential was there.
是的。
Yes.
完全同意。
Absolutely.
毫无疑问。
Absolutely.
梅,你对此有什么要补充的吗?
My tray, do you have anything to add to that?
有的。
Yeah.
因此我们真正专注于研究言语神经病理学的所有途径,探索如何优化这些途径,并深入研究我们能从脑信号中获取多少可转化为言语的信息。
So we are really focused on looking at all the avenues of speech neuropathosis and how we can make those better and really explore how much information can we get from the brain signals that can be translated into speech.
正如尼克提到的,我们已从斯坦福大学BrainGate的合作伙伴那里获得了一些证据,他们证明了文本解码的可能性,而我们在后续项目中做得更好。
So we already had some evidence, as Nick suggested, from our collaborators at Stanford from BrainGate who had demonstrated that it is possible to decode text, and we did it better with the next project.
所以我们想进一步研究的第二件事是,看看是否也能从这些相同的信号中解码出声音。
So the second thing we wanted to look at is to go a step further and see whether we can also decode voice from these same signals.
我们的参与者再次朗读提示句子时,我们想尝试能否在他试图说话时实时合成出他的声音。
So our participant was, again, speaking the prompted sentence, and we wanted to see if we can synthesize the voice in real time as he's attempting to speak.
这里面临的额外挑战是,由于他的发音严重不清,我们无法确定他如何说这句话或具体说了什么。
So the additional challenge we face here is that we don't know how he's saying the sentence or what he's saying because his speech is really dysartic.
我们训练这些脑机接口算法的方式是通过机器学习。
And the way we train these BCI algorithms is using machine learning.
具体而言,我们采用监督式机器学习,需要向其展示神经活动的样本及对应的言语样本,然后以此方式训练算法。
Specifically, we use supervised machine learning, which requires you to show it an example of neural activity and an example of the speech that is corresponding to this neural activity and then train that algorithm that way.
但我们没有他声音的样本,因为他无法清晰地说话。
But we did not have an example of his voice because he couldn't speak intelligibly.
展开剩余字幕(还有 480 条)
因此,我们为合成其声音必须克服的主要挑战是开发新技术,以生成与其大脑活动同步的合成声音,从而能以极高分辨率估算他在特定时间点想表达的内容,并将其大脑活动与可能的声音效果相匹配。
And so the major challenge we had to overcome to kind of synthesize his voice is to develop new techniques to generate the synthetic voice that is aligned to his brain activity so we can estimate what he must be saying at this particular time point with very high resolution and kind of match his brain activity to what his voice might sound like.
在克服这一挑战后,我们还证明了可以合成近乎可辨识的声音。
And overcoming that challenge, we also demonstrated that we could synthesize voice, which is nearly intelligible.
虽然尚未达到100%完美,但我们在该领域已取得重大进展。
It's not 100% there, but we have made huge progress in that area.
因此他能够按自己想要的节奏表达想法,而我们听到的效果也是如此。
So he is able to say what he want with the pace that he wants, and it sounds that way to us.
我们实际上能听到他的停顿、对不同词语的强调。
So we can actually hear his pauses, his emphasis on different words.
这是让这些脑机接口更具表现力、更自然,并使其更接近人类交流方式的重要一步。
And then this is a step towards making these BCIs more expressive, more natural, and bring them closer to more human communication.
我们也在第一天就尝试了这种方法。
So we also tried this on the first day.
我们收集了相关数据。
We collected the data.
现在我们收集的数据可以传递给不同的解码器,尝试从中识别不同的内容。
Now the data that we collect, we can pass it on to different decoders and try to identify different things from it.
也就是说,我们可以利用同一组数据来识别文本和单词。
So, like, use the same data to identify text and words from the same data.
我尝试合成语音。
I tried to synthesize voice.
我们在第一天就进行了尝试,从第一天开始,然后,是的,我们首次实时听到了由神经活动合成的语音。
We tried it on the first day, from first day, and then, yeah, we're able to hear for the first time the voice synthesized from the neural activity in real time.
这对我们所有人来说都非常令人兴奋。
And that was also very exciting for all of us.
我们相信,这两项应用都是PCI技术在首日实际应用中的首次展示。
And both of these applications are the first, we believe, are the first demonstrations of PCI in action on the very first day.
这有助于我们预估,如果这项技术能惠及更多患者群体,我们需要投入多少时间才能使这些技术对他们产生实际效用。
So this kind of helps us to estimate that if this technology becomes available to larger populations of patients, how much time do we need to invest for these technologies to be useful to them?
最初的估计是,我们需要收集数日的数据才能让系统开始运作。
So initial estimate was that we have to collect days and days worth of data before they start working.
但现在我们知道,至少在最基础的功能上,它们可以在半小时或一小时后立即开始工作。
But now we know that at least in the minimum capacity, they can start working immediately after hour or half hour.
嗯。
Mhmm.
非常感谢。
Thank you so much.
你认为是什么让你的研究取得了这些惊人成果,而这些成果在之前甚至最近的研究中都未能实现?
And what do you think made your study to provide these amazing results that were not possible in previous studies and very recent previous studies?
是的。
Yes.
那些报告仍显示25%的错误率等等?
Who reported still twenty five percent error rate and so on and so forth?
我认为有很多原因。
I think there's a lot of reasons.
其一是我们有幸看到其他研究尝试过什么方法以及那些方法的成功程度。
So one is we have the benefit of seeing what these other studies have tried and how successful those approaches were.
尽管他们真正开创性地进入这一领域,在未知会发现什么的情况下相应开发方法,而我们则有了这个起点可供参考,这是极大的帮助。
Whereas they really pioneered their way into this field, not knowing what they would find and developing the methods accordingly, we have that starting point to go from, which is a huge help.
我认为另一个主要因素还是运气。
And I think another major factor is just luck again.
就像我们在寻找博士后时提到过运气一样。
Like, we've mentioned luck before when looking for postdocs.
但这不仅仅是运气。
And it's not it's not only luck.
你明白吗?
You know?
这是通过技能实现的,但在临床试验、外科植入、首日实验中可能出错的事情太多了。
It's that's engineered by skill, but there's a lot of things that can go wrong in a clinical trial, in a surgical implant, in a day one experiment.
即使一切顺利,你也可能遇到一个不努力、不愿配合、不投入的参与者。但在每个环节上,事情都对我们非常顺利。
Even if everything goes right, you can have a participant that's just not a hard worker and doesn't really want to do it and isn't engaged, but in every facet of this, think things have worked out really well for us.
这要从大卫·布兰德曼说起,他是实验室的联合首席研究员,也是神经外科专家。他首先精心筛选了这位ALS患者并纳入临床试验。
It starts with David Brandman, who's one of the co PIs in this lab and an expert neurosurgeon, And he meticulously Well, first of all, he identified this man with ALS and enrolled him into the clinical trial.
其次,他精心规划了阵列植入手术的位置和目标区域,完美执行了手术,并自此为参与者提供了卓越的定制化神经外科护理。
And second of all, he meticulously planned the array implant surgery and where we were going to target the arrays and executed it with perfection and has provided wonderful boutique neurosurgery care to the participants since then.
而在我们这边,我们一直在幕后开发实时数据采集系统,通过大量努力和协作,我们建立了一个非常优秀且灵活的平台,用于数据收集和实验运行。
And then, you know, on our ends, we are in the background developing our real time data collection rig, and through a lot of hard work and collaborative efforts, we have developed a really nice, flexible platform for us to collect data and run experiments with.
然后一切准备就绪,到了第一天你到场的时候。
Then it gets to the point where everything's ready, and you show up on the first day.
当你第一次接入这些阵列时,你并不确切知道会看到什么,因为全球植入这种设备的人寥寥无几,而且目前对神经记录质量存在的大量变异源尚未完全明了。
The first time you plug in these arrays, you don't know exactly what you're going to see, because there's only so many people in the world that have been implanted with these, and there's huge sources of variance in neural recording quality that aren't completely understood right now.
所以从这个角度看,你无法确切知道手术效果如何。
So you don't know how well the surgery went from that perspective.
可能房间角落里有台Xbox正在发出大量电噪声,这会干扰你的记录或类似情况。
There could be an Xbox in the corner of the room that's sending off a bunch of electrical noise, and it's gonna mess up your recordings or something like that.
所以影响因素不计其数。
So there's a million factors.
简而言之,我认为我们真的很幸运。
And in a nutshell, I think we got really lucky.
我们的辛勤工作在多个方面都得到了回报。
Our hard work paid off in many ways.
我们能够在前运动皮层植入的电极数量是斯坦福大学先前研究的两倍,并且我们针对了一些新区域,这些区域我们预计会对言语产生重要贡献,但由于之前从未记录过这些区域,我们并不确定。
We were able to implant twice as many electrodes as the previous Stanford study was into the premotor cortex, and we targeted some new areas there that we thought would contribute heavily to speech, but we weren't sure because they haven't been recorded from before.
而它们确实如此。
And they do.
它们是非常好的贡献者。
They are very good contributors.
归根结底,正如我所说,这也取决于参与者本人。
And at the end of the day, it also, like I said, comes down to the participant.
我们的参与者是一位非常优秀且勤奋的人,他对科学充满好奇和奉献精神,非常投入。
And our participant is a wonderful person and a very hard worker, and he's curious and dedicated to the science and just very committed.
你找不到比他更好的合作对象了。
You couldn't ask for a better person to work with.
正如尼克所说,我认为从现有的科学文献和先前研究中学习,确实帮助我们明确了方向,特别是在替代方案方面。
And as Nick said, I think learning from scientific literature out there and previous studies has really helped us shape our direction, especially for a replacement.
正如尼克所说,我们将电极数量增加了一倍,因为我们从先前研究中发现,他们使用的电极数量不足以实现精准的速度解码。
As Nick said, we have doubled the number of electrodes because we have seen with previous studies that the number of electrodes they had was not sufficient to give us accurate decoding of speeds.
因此,通过将电极数量翻倍并针对大脑中我们认为对语音还原起关键作用的更局部区域,这确实帮助我们改进了系统。
So doubling the number of electrodes targeting more localized areas in the brain that we believe contribute towards speech reduction really helped us to improve our system.
同时我们也投入大量精力开发实时系统。
And we also focused a lot in developing real time systems.
所以我们的系统运行速度非常非常快。
So our system works really, really fast.
系统能以毫秒级分辨率工作——采集数据、过滤、处理,从而实现实时语音解码,并以极低延迟生成文本或声音。
It works at one millisecond resolution, collecting the data, filtering it, processing it, and then which enables us to decode speech in real time and produce text or sound with very small latencies.
这太令人惊叹了。
That is amazing.
让我们更深入地探讨一下这个话题。
And let's talk a little bit more about that.
首先谈谈与Villa先前研究的区别。
About, first of all, the difference with the previous studies from Villa.
是子宫的数量问题。
It was the number of uteruses.
对吗?
Yes?
我记得他们的研究中用了两个。
I think in their studies, it was two.
而在你们的研究中用了四个。
And in your study, four.
是这样吗?
Is that correct?
是的。
Yes.
差不多吧。
More or less.
斯坦福大学研究中对12号受试者植入了四个微电极阵列,其中两个植入到他们已知会产生口面部运动相关信号的前运动皮层。
So the Stanford study with their participant t 12, they did implant four microelectrode arrays, and they implanted two of them into premotor cortex where they knew there would be these orofacial motor related signals.
他们将另外两个电极植入额下回,这是语言网络中的一个关键枢纽。
They implanted two more of their electrodes into the inferior frontal gyrus, which a key hub in the language network.
在所有你能获取的与言语相关的功能磁共振成像中,看起来你能够从那里的信号解码出言语内容。
On every speech related FMRI you can take, it looks like you'd be able to decode speech from the signals there.
只要是在广播区域。
As long as broadcast area.
是的。
Yes.
是的。
Yes.
但当他们构建了言语解码器并尝试解码言语时,他们从额下回区域获得的神经信号对此目的并不特别具有信息量。
But when they built their speech decoder and they tried to decode speech, the neural signals that they've gotten from that IFG area weren't particularly informative for that purpose.
所以在他们实际的言语解码研究中,他们只使用了位于前运动皮层的两个阵列。
So for their actual speech decoding study, they've only used the two arrays that are in their premotor cortex.
而现在在他们进行的超出该研究的基础科学工作中,他们正从这个其他脑区学到很多东西。
And then now in the basic science that they're doing beyond that study, they're learning a lot from this other brain area.
只是在我们实验室使用的特定语音解码器设计中,那个区域似乎贡献不大。
Just in the particular speech decoder design that our labs use, that area didn't seem to contribute very much.
于是你们得出结论要加倍数量,并将它们全部植入运动前区。
And you made conclusions to double the number and implant them all in the premotor area.
那么你们选择植入电极的依据是什么?
So what was your choice of implanting the electrodes?
你们针对的是哪些脑区?
What areas did you target?
布洛卡区本应是语言中枢,对语言的基本构成等方面有所贡献。
So Broca's area was supposed to be kind of language hub which would contribute towards the fundamental composition of language and things like that.
但我们从斯坦福的研究中发现事实并非如此。
But we have seen from Stanford study that it did not do so.
不过文献表明,中央前回中部还有个有趣的区域叫55b区,它也对语言产生和语言准备有贡献。
But literature suggested that there is another interesting area in kind of middle precentral gyrus, which is known as 55b, which also contributes towards more language production and language preparation.
所以我们认为靶向这个区域会非常有意思。
So we thought it would be really interesting to target that area.
我们从未在那个区域植入过新电极,因此这将是一个独特的机会来研究该区域的功能。
We have never implanted a new cara in that area, so that would also be a unique opportunity to study what that area does.
前运动皮层确实与言语产生相关,我们从其他犹他州的研究中已看到它对控制手部运动的贡献。
And premotor cortex really functions for speech production, and we have seen from other Utah studies how it contributes towards hand movement control.
因此,这是一个锁定中央前回中部新区域的独特机会。
So this was a unique opportunity to target that new area of middle precentral gyrus.
我们选择了一条从言语热点区(即腹侧前运动皮层)出发,一直延伸到中部前运动皮层的替代路径。
And we selected a replacement to go from the speech hotspot, which is ventral premotor cortex, and then go all the way towards middle premotor cortex.
我们不仅覆盖了更广的空间区域,还将电极数量增加了一倍。
And we are covering a wide spatial area as well as doubling the number of electrodes.
这让我们有机会观察不同脑区如何参与言语解码和产生,并为研究语言感知、语言形成及其他言语相关科学发现开辟了更多可能性。
So that gives us opportunity to look at different brain areas, how they contribute towards speech decoding, production of speech, and also opens up further opportunities to study language perception, language formation, and other speech related scientific discordings.
在电极植入前,总是需要进行脑部映射这一步骤。
And before the electrode implantation, there is always this procedure of mapping the brain.
是的。
Yes.
找到那个区域
Finding that area.
所以我对你们用于定位中央前回中部的方法非常好奇
So I'm very curious about what methods did you use, for that middle precentral gyrus.
是的
Yeah.
有一个名为人类连接组计划(HCP)的项目,我不确定它是一个团队还是一个项目
So there's this project that is called the Human Connectome Project, HCP, and this I don't know if they're a group or a project.
我确实不知道该如何称呼他们
I don't really know how to refer to them.
但参与这项研究的人员基本上对数百名医院患者或志愿者进行了功能磁共振扫描
But the people involved in this study, essentially they did fMRI scans of, I think, hundreds of patients in hospitals or just volunteers.
然后他们将所有大脑数据平均处理并映射到一个共同空间
Then they averaged all those brains together and mapped them to a common space.
HCP流程扫描包含多种不同的独立MRI扫描,其中一些是在被动阅读时进行脑部扫描,或尝试说话,或尝试移动双手等行为时进行的扫描
An HCP pipeline scan involves many different individual MRI scans, and some of them involve passively reading while your brain is scanned, or trying to say something, or trying to move your hands, or something.
通过这个非常复杂的流程(我简单总结一下),你基本上可以得到一个人脑的地图,并相当准确地估计出各个功能区的位置。
Through this very complicated pipeline, which I'll just summarize, you can essentially get a map of somebody's brain with pretty good estimates of what areas are where on it.
BrainGate最近采用了这个HCP流程,将其作为新临床试验参与者的标准程序,这样我们就能将电极阵列精准定位到我们认为最适合记录信号的位置。
BrainGate has recently adopted this HCP pipeline, making it standard for new participants that are enrolled into the clinical trial so that we can best target our brays to the spots that we think are going to be best to record from.
我认为斯坦福的参与者T12是第一个接受这种扫描的,在T12身上效果很好。
I think that Stanford's participant, T12, was the first one to undergo this scan, and it worked well in T12.
我们现在已经在T15植入前重复了这一流程。
We have now repeated that with T15 prior to the implant.
最终你会得到一张地图,每个像素点都有X%的置信度,表示你有多大把握认为这是目标区域而非其他部位。
You kind of end up getting a map with X percent confidence at each pixel of how confident you are that it's this area versus something else.
然后就需要神经外科专家将电脑上的扫描图像与实际开颅手术中看到的大脑结构进行对应匹配。
Then it's up to the expert neurosurgeon to map between that scan on a computer and the brain that he sees in the Operating Room when he opens the craniotomy.
有很多很酷的手术方法可以实现这一点,但我不够了解就不详细展开了。
And there's a lot of cool surgical approaches to accomplish that, but I don't know enough about them to go on about them here.
总之,这个HCP流程确实很成功,至少目前在BrainGate项目中对于电极阵列的定位非常有效。
Anyways, yeah, this HCP pipeline really has been successful, at least in BrainGate so far, for targeting where these arrays should go.
这是个非常有趣的方法。
That's a very interesting approach.
看来你们在了解先前方法后已经运用了几处显著差异。
So we already see several differences that you utilized when you learned about previous approaches.
你们在运动皮层使用了四组而非两组阵列,并额外覆盖或采样了中央前回中部区域。
And you, of course, used four versus two arrays in the motor cortex and additionally cover or sample from that middle precentral gyrus.
你们还在视频中将研究与前人Eddy Chung团队的研究进行了对比。
You also compare your study, at least in your video, with the previous study by group of Eddy Chung.
他们采用了不同的信号采样方式。
And they use a different approach to sampling the signal.
他们使用的是置于皮层表面的网格电极。
They use grids that they place on the surface.
你认为这对研究结果会产生怎样的影响?
How do you think this affects the results of the studies?
你认为穿透式电极(如你们所用)与网格电极在采样上的主要区别是什么?
What do you think is the major difference between sampling with penetrating electrodes like you are using and grid electrodes?
我们容易给出有偏见的回答,但我会尽量保持中立,除非你想回答。
We're prone to giving a bias answer, but I'm gonna do my best to be neutral, unless you'd like to answer.
是的。
Yeah.
我认为这归根结底取决于我们使用这两种不同技术所能记录的信息分辨率。
I think it comes down to the resolution of information that we are able to record using these two different technologies.
通过使用犹他电极阵列的方法,我们能够真正定位单个神经元,除了观察单个神经元外,没有其他更基础的信息来源,而且这些神经元都位于大脑高度靶向的区域。
With our approach of using Utah rays, we can really target single neurons, and there is no other fundamental kind of source of information rather than looking at individual neurons that and really highly targeted areas of the brain too.
因此我们针对的是那些我们认为参与言语控制的非常局部化的区域,即言语发音器官的控制区域,同时我们也在观察极高分辨率的信息。
So we are targeting very localized areas that we believe to be contributing to speech control, the control of the speech articulators, as well as we are looking at very high resolution of information.
这两个因素使我们在连接的数据中获得了极高的信噪比,从而能够解码更复杂的言语相关神经关联。
So those two factors give us very high signal to noise ratio in the data that we connect that enable us to decode more intricate speech related neural correlates.
而我认为当你观察其他研究中使用的网格电极时,你看到的是空间分布更广泛的信号,反映的可能是传感器下方数百上千个神经元的聚合活动。
And I think when you look at the grids that are used in other studies, you look at more spatially diverse signal with more aggregate activity of maybe hundreds and thousands of neurons that lie underneath those sensors.
所以这提供了不同类型的信息,可能不那么针对言语功能。
So that kind of gives you a different set of information, maybe not that targeted to speech.
因此我认为这是我们的记录系统能更好地解码语言的原因之一。
So I think that is one of the differences that may have helped us to decode language better from our recording system.
是的。
Yeah.
话虽如此,我是说,ChangLab采用的ECOG方法确实有一些优势。
And that being said, I mean, there are a few advantages to the the ECOG approach that the ChangLab uses.
比如,你不需要像我们刚才讨论HCP时提到的那些非常精确的定位操作。
And, I mean, one is that you don't have to do all this really meticulous targeting that we were just talking about with HCP.
因为只要你知道大致要记录的位置,就可以直接把记录网格覆盖在那个区域上。
Because as long as you know roughly where you want to record, you can put the recording grid on top of that.
另一个特点是,虽然不进入大脑记录有其利弊——正如Maitreyi提到的,弊端在于你不是直接记录神经元活动,而是在记录一个低分辨率的、远距离的群体信号。
And the other is that because it's while there's advantages and disadvantages to it not going into the brain to record, A disadvantage is that, as Maitreyi mentioned, you're not directly recording the activity of neurons, but you're kind of recording an aggregate of many with low resolution and from kind of far away.
但优势在于这只是一张覆盖在大脑表面的薄片,所以非常稳定。
But an advantage is that this is just a sheet that sits on top of the brain, so it's very stable.
它不会损伤大脑,而且随着时间的推移也不太会移位。
It doesn't injure the brain, and it doesn't really move that much over time.
因此他们在研究中证明,大约一百天左右,你可以在不重新校准解码器的情况下获得良好的解码效果。
So they have demonstrated in their studies that for a hundred days or something, you can get good decoding without recalibrating your decoder.
长期以来,这被认为是ECOG相对于犹他阵列的一个关键优势,但我们开始在自己的工作以及与合作伙伴的研究中展示,通过一些巧妙的机器学习重新校准技巧,可以克服犹他阵列植入物可能带来的任何信号不稳定性问题。
For a long time, I think that has been considered one of the key advantages of ECOG over Utah arrays, but we are beginning to show in our work and with our collaborators that with some clever machine learning recalibration tricks, can overcome any of the instability that would have arised from Utah array implants, the neural signals that you would report on them.
所以简而言之,我认为这两种方法都有很多值得学习的地方,而且通过更多的工程改进,或许能够克服各自的缺点。
So really, think the short answer is that there's a lot to learn about both of these approaches, and probably the shortcomings of each might be able to be overcome with more engineering.
对于ECOG来说,他们正在不断使这些网格更加密集,从越来越多的部位进行记录,目前尚不清楚这种改进的极限在哪里。
So for ECOG, they're constantly making these grids denser and denser and recording from more and more sites, and it's not really known what the limit of that is gonna be.
到目前为止,每次他们这样做,信号质量都会变得更好。
So far, every time they've done it, signal's gotten better.
如果他们继续这样做,这种改善会持续下去吗?
So if they do it again, will it continue?
我们拭目以待。
We'll see.
是的。
Yeah.
我们在最近的研究中已经看到了这一点。
We've seen that in the recent study.
他们将电极数量翻倍,解码准确率几乎等同于在相似区域放置两个Uktaries的效果。
They doubled the number of electrodes, and their decoding accuracy was almost equivalent to that dose of two Uktaries placed in similar areas.
所以,是的,我认为这确实取决于我们使用的技术及其随时间的演变,以及在特定时间点哪种技术能带来最佳效果。
So, yeah, I think it's really dependent on the technology that we use and how that evolves over time and which is the best technology to give best outcome at a given time point.
所以,是的,有很多选择需要做出。
So, yeah, lots of choices, yeah, to make.
是的。
Yes.
绝对如此。
Absolutely.
每个犹他阵列有多少个电极?
And how many electrodes does each Utah array have?
它们有几种不同型号,但我们使用的是8x8的网格阵列。
There's a few different models of them, but the ones that we use are eight by eight grids.
它们有64个电极,每个电极的尖端都有一个记录点。
They have 64 electrodes, and each of those has a recording site on the tip of it.
当植入大脑时,每个电极会深入皮层约1.5毫米,然后通过每个通道末端的接触点进行记录。
When this is implanted into the brain, each electrode goes about a millimeter and a half into cortex, and then you record from the one contact at the end of each channel.
不过也有更大或更小的阵列,或者电极上带有多个记录点的设计。
There's other arrays that are bigger or smaller, though, or have multiple recording points along the electrode.
而且,显然现在有各种新型记录技术正在开发中——ParadromX正在制造一种升级版犹他阵列,具有更多通道和更高密度,而Neuralink则采用另一种方法,将单个电极线植入大脑。
And then, you know, obviously, there's all of these new recording technologies being developed, ParadromX is making basically an upgraded Utah array with many more channels and higher density, whereas Neuralink is doing this other approach where they thread individual electrodes into the brain.
目前还很难说哪种技术会成为最佳标准,但看到这些发展确实令人兴奋。
It's not so much of a It's exciting to see what's going to turn out to be the best, the standard.
这也取决于具体应用场景。
And it also depend on applications.
不同的应用可能需要不同类型的记录技术或神经接口,这些技术会更适合特定应用需求。
So different applications might require different types of recording technologies or neural interfaces, which would be more suitable to those applications.
是的。
Yes.
非常感谢。
Thank you very much.
许多听众对与犹他阵列相关的伪影感到好奇,因为人们正在将颅内记录与头皮表面记录(非侵入式记录)进行比较。
Many of our listeners are curious about the artifacts that are associated with the Utah arrays because people are comparing intracranial recordings with the recordings from the surface of the head, noninvasive recordings.
那么,在您的记录中看到的伪影与非侵入式记录有何不同?
So what is the difference between the artifacts that we can see in your recordings versus noninvasive recordings?
我的猜测可能是因为您同时研究过这两种技术。
My train might be because you worked with both technologies.
对于脑电图(EEG)来说,主要挑战是肌肉伪影。
So with EEG, the major challenge is muscular artifacts.
如果你眨眼,就会在脑电图上看到这个动作。
So if you blink, you see that in EEG.
如果你最有可能的动作,会立即反映在脑电图上。
If you most likely, you see that immediately in EEG.
你的心跳也会出现在脑电图中。
Your heartbeat is also present in EEG.
用于消除这些伪迹的算法可能不够稳定,而且实时应用于我们设想的VCI用途时相当复杂。
And the algorithms that you use to remove those artifacts might not be that stable and are quite complicated to apply them in real time for the kind of VCI use that we have in mind.
但使用犹他阵列时,你不会看到这些肌肉伪迹。
But with Utah rays, you don't see those muscular artifacts.
所以我们看到的非侵入式记录中的伪迹类型确实不同。
So really different kinds of artifacts that we see from noninvasive recordings.
使用犹他阵列时,我们能看到电源线噪声的干扰。
With Utah rays, we could see interference from power line noise.
有时信号会出现这种非平稳性。
Sometimes we had this kind of non stationarity in the signal.
这可能是由于你们的移动造成的。
This could be because of the movement of you guys.
当你呼吸时,你的大脑也会移动。
So when you breathe, your brain also moves.
如果UTI电极没有与大脑紧密贴合,没有与皮层完全垂直植入,你可能会看到UTI电极的微小移动,从而导致一些非平稳现象。
And if the UTIs are not implanted very snugly with the brain going directly perpendicular to the cortex, you might see some micro movement in the UTIs that might lead to some non stationarities.
因此我认为主要的伪影来源是随时间变化的非平稳性。
So I think major source of artifact is non stationarity over time.
你的神经记录会随时间变化。
Your neural recordings change over time.
你的神经调谐也会随时间变化。
Your neural tuning also changes over time.
举个例子,如果有一群神经元对特定任务反应非常敏感,一个月后可能会看到另一群神经元对相同任务产生反应。
So for example, if you have a population of neurons that is really, really responsive to a particular task, after a month, you might see a different population of neuron responding to the same task.
所以,我认为主要挑战之一就是重新校准解码器,以适应可能发生在更长时间周期内的神经变化。
So, yeah, I think one of the main challenges is recalibrating our decoders to match that neural shifts that might happen over longer periods.
那么需要多久重新校准一次?
And how often would you need to recalibrate?
是每次会话、每天都需要,还是可以间隔更长时间才需要重新校准?
Is it with each session, each day, or there are more extended periods of time that you wouldn't need to do that recalibration?
通常是怎么操作的?
How does it usually work?
是的。
Yeah.
这个问题的答案可能因人而异,甚至同一参与者在不同日子也会有所不同。
The answer to that can vary participant to participant or even day to day within the same participant.
我们的解决方案是始终在后台进行重新校准。
Our solution has been to just always be recalibrating in the background.
所以对于脑到文本的解码,模型会不断用最新数据更新,以使其适应神经数据中调谐的最新特性。
So for brain to text decoding, there's the model is constantly being updated with recent data to attune it to whatever the most recent properties of the tuning in the neural data are.
这种方法至少在脑检测方面非常成功。
And this approach has been, at least for brain detects, very successful.
而且,正如我提到的,参与者之间确实存在一些差异。
And, you know, again, I mentioned there's kind of some differences, participants to participants.
有无数因素会影响这一点。
There's a million factors that can influence this.
所以有时候你会看到大量、大量的这种非平稳性,在一个人身上每分钟都在变化。
So sometimes you see lots and lots and lots of this non stationarity, you know, minute to minute within somebody.
解码器可能有效,也可能无效。
A decoder might work or not.
对于我们的参与者来说,大多数情况下情况相当稳定,我们认为这可能与我们记录的通道数量有关。
With our participant, things are pretty stable most of the time, and we think that may have something to do with the number of channels that we're recording from.
因此,如果在256个电极之间的信号中存在一定程度的冗余,那么当其中一个发生变化时,只要其他电极仍在捕捉到类似编码的信号,影响就不大。
So if there's a level of redundancy embedded in the signals across the 256 electrodes, it doesn't matter so much when one shifts as long as the others are still picking up something that's encoded similarly.
这就是我们对此的一个理论解释。
So that's one theory for it.
但真正的答案是,我们并不完全理解为什么存在这种非平稳性,或者是什么导致了它。
But the real answer is it's not really understand completely why this nonstationarity is there or what is causing it.
但我们知道的是,我们可以实施软件解决方案来解决这个问题,就像我提到的,通过我们进行的持续微调。
But what we do know is that we can implement software solutions to solve it, like I mentioned, with the continuous fine tuning that we do.
是的。
Yes.
当然,最常见的问题是这种UTI植入物能在脑中停留多久?
And of course, the most common question is how long can this UTI raise stay implanted in the brain?
就我们的参与者而言,目前植入手术已过去八个多月了。
So with our participant, we are more than eight months after the implant right now.
BrainGate项目的运作方式是,参与者一旦接受植入,就同意至少参与一年的研究。
The way it works in BrainGate is when a participant is implanted, they agree to be a part of the study for at least a year.
一年之后,他们可以选择继续参与,也可以选择停止数据收集。
Beyond a year, they can choose to continue, or they could choose to stop collecting data.
如果他们真的愿意,还可以选择将所有硬件移除。
Or if they really wanted, they could choose to have all the hardware explanted.
至今为止,还没有人选择将硬件移除。
No one's ever chosen to have the hardware explanted.
我认为绝大多数参与者确实从这些研究中获益良多,也为科学做出了贡献。
And I think the vast majority of participants really get a lot of value out of doing these studies and contributing to science.
所以他们通常会选择继续参与研究。
So they typically choose to stay enrolled within the study.
我们的参与者植入已超过八个月,而斯坦福和布朗大学各有一位参与者植入时间已超过五年,我记得是这样。
So our participant has been implanted for more than eight months, and there's a participant at Stanford and one at Brown who have each been implanted for more than five years, I think.
因此上限尚不明确。
So the upper limit is not really known.
原因有些令人不安——参与这些临床试验的患者本身健康状况就不佳。
The reason for that is a little bit morbid in that people that enroll in these clinical trials are not healthy people.
以ALS(肌萎缩侧索硬化症)为例,确诊后平均预期寿命可能只有两到四年。
So when you're diagnosed with ALS, for example, you have maybe two to four years to live on average.
所以看到远超这个时间范围的情况可能会有些非典型。
So seeing timescales much outside of that range might be a little bit atypical.
同理,那些瘫痪、中风或存在类似状况的参与者往往伴随多种健康并发症。
Or similarly, participants that have paralysis or had strokes or something, there's a lot of health complications.
我想说的是,我们尚未见过有参与者能保持植入设备长达十年。
I guess what I'm getting at is that we haven't seen a participant keep an implant for ten years yet.
另外补充一点:这些犹他电极阵列算是种老技术了。
And one more side point to that is that these Utah rays are sort of an old technology.
自上世纪80年代获得FDA批准用于植入设备后虽经多次改进迭代,但基础技术原理始终未变。
They've been improved by the FDA since the '80s for implants, and they've been iterated upon a bit since then, but it's the same base technology.
我认为归根结底,这个领域都清楚当前的犹他电极阵列形式不会成为长期植入每个人的解决方案。
And I think at the end of the day, the field knows that the Utah array in its current form is not gonna be the long term solution of something that you're implanting into everybody.
而这个空白正是Neuralink、Paradromix和Synchron等公司试图填补的。
And that's the gap that companies that Neuralink and Paradromix and Synchron are trying to fill.
Neuralink投入了大量资金开发无线设备,无论其存在哪些优缺点,目标都是推向市场让任何愿意的人都能植入。
Neuralink has poured a ton of money into engineering something wireless and with whatever other advantages or disadvantages that it has for the goal of reaching a market where anybody can get this implanted if they want to.
这些公司进行了各种测试来证明理论上这些植入体可以持续数十年,但这是否属实还有待观察。
They and these other companies do all sorts of testing to show that theoretically these implants could last for decades, And it just remains to be seen if that's true or not.
是的。
Yes.
太感谢了。
Thank you so much.
我们已经讨论过你们研究在方法论、硬件和电极植入等方面与其他研究的不同之处。
So we discussed already what your study did differently in terms of methodology, the hardware, yes, implanting the electrodes when compared with other studies.
现在让我们深入信号处理部分,你也提到过你们在这方面采用了不同的处理方式。
Now let's get into the signal processing, which you also already mentioned that you did in a way differently.
你们确实非常专注于实时应用,比如我的托盘系统。
You're really focused on real time applications like my tray set.
也许你可以再详细说明一下这一点。
Maybe you can elaborate a little bit more on that.
那么有哪些改进呢?
So what improvements?
你们在项目中引入了哪些创新?
So what innovations did you bring into your project?
我认为我们可以谈谈我们拥有的两项技术——Brain2Text和Brain2Voice。
So I think we'll speak about both the technologies that we have Brain2Text and Brain2Voice.
这确实是一种非常新颖的方法,我们此前从未见过这种脑到语音方法在言语神经假体中的应用演示。
So this is really very new approach, and we have really not seen previous demonstrations of this brain to voice approach for speech neural prosthesis.
因此我们面临了许多需要克服的挑战。
So we had lots of challenges to overcome.
我们的首要目标始终是以极低的延迟快速处理神经数据。
Our priority was always to get neural data processed very quickly with very little latency.
正如我之前提到的,我们在获取神经数据后的一毫秒内快速完成了所有信号处理和噪声消除。
So as I mentioned before, we did all the signal processing and noise removal very quickly within one milliseconds of acquisition of neural data.
这使得我们能够将更多计算时间用于解码器,由它来实际解码这些数据。
So that allowed us to spend a little bit more compute time on the decoder, which would actually decode that data.
从神经数据解码语音有多种方法。
So there are different ways when you decode speech from neural data.
首先是数据采集。
The first one is data acquisition.
我们从硬件设备记录数据。
We record the data from the hardware.
数据传入计算机后进行降噪处理。
It goes into a computer where we denoise the data.
我们对其进行滤波,并从中提取有用特征。
We filter it, and we extract useful features from that data.
通过滤波去除环境噪声、低频噪声及其他伪迹后,再进行特征提取。
So we filter it to remove ambient noise, low frequency noise, other things, other artifacts, and then extract features.
通过单神经元分辨率的数据,我们可以获取动作电位,其表现形式就是尖峰信号。
So with single neuron resolution of data, we can get action potentials, and the proxy to that is spikes.
因此我们可以看到在特定时间区间内神经元放电的次数。
So we see how many times a neuron has fired in a particular time bin.
这些就是我们获取的尖峰信号。
So those are spikes that we get.
此外我们还会观察宽带活动,这能为我们提供所采集神经数据的更宏观视角。
And then we also look at the broadband activity, which gives us more broader view of the neural data that we have collected.
这些就是我们为解码器选择的特征。
So these are the features that we chose for our decoders.
在我们的处理流程中,所有这些处理都在1毫秒内完成。
And all of this processing was done within one milliseconds in our pipeline.
然后我们将这些特征发送给预训练的大脑到语音和大脑到文本解码器,它们会实时合成语音或解码音素序列。
Then with these features, we send them to the pretrained decoders for brain to voice and brain to text, and then they synthesize either voice or decode phoneme sequences again in real time.
对于脑到语音转换,我们的挑战是以最小延迟获得听起来像你自己声音的输出。
So for brain to voice, our challenge was to get the output with minimum latency such that it sounds like your own voice.
因为当我们说话时,我们听到自己的声音几乎没有延迟。
Because when we speak, we hear our own voice with very little latency.
如果延迟增加,听起来就会不自然。
And if you have that increased delay, it won't sound natural to you.
这样无法形成闭环。
You're kind of it wouldn't close the loop.
因此我们专注于实现闭环,使实时说话和听觉反馈对话成为可能。
So we were focused on closing the loop and enabling this real time speaking and conversation with auditory feedback.
脑到语音的流程以十毫秒的分辨率运行。
So the brain to voice pipeline worked at a resolution of ten milliseconds.
每十毫秒就会产生一个十毫秒的语音窗口。
So every ten milliseconds, you would produce a ten millisecond window of voice.
所以语音的产生是即时的。
So it was instantaneous production of voice.
然后我们会将其发送到扬声器,扬声器会将其回放给你听。
And then we'd send that to the speaker, which would play it back to you.
因此这也会带来一点额外的延迟。
So that also incurred a little bit more delay.
举个例子,就像你对着麦克风说话时,房间后方的扬声器会有轻微延迟,但你说话时仍能听到回声。
So to give you an example, it sounds like when you speak on a microphone and you have a speaker at the back of the room, have this little bit delay, but when you speak, you still hear it back.
这就是我们系统中存在的那种延迟。
That is the kind of latency that we have in the system.
而在脑电转语音技术中,我们面临的一个重大挑战是无法再次获取语法语音来训练解码器。
And with brain to voice, one of the major, major challenges we have is not having the grammar speech again to train the decoder.
我们的受试者虽然能说话,但我们不知道他在说什么,每个词是如何发音的,音节之间如何间隔等等。
So our participant speaks, but we don't know what he's saying, how he's saying each word, how he's spacing his syllables, etcetera.
因此我们还开发了NOC算法来创建这种引导式语音,帮助训练解码器并实现实时语音生成。
So we also worked on the NOC algorithm to kind of create this guided speech, which would help to train this decoder and produce speech in real time.
我们还在研究语音的其他方面,比如如何从脑活动中解码音高、语调或音量,这将使未来的语音更加自然和富有表现力。
We are also investigating other aspects of speech, so how we can decode pitch from the brain activity or intonations or volumes which could make this even more natural and more expressive moving forward.
非常有趣。
Very interesting.
是什么帮助你们实现了这些成果,将延迟时间缩短到仅十毫秒?
And what helped you to achieve these results to minimize this delay time to basically ten milliseconds?
是的。
Yes.
非常,并且实现了如此高的分辨率。
Very and and have this very high resolution.
关键在于设计了整个信号采集处理系统。
It was designing the whole signal acquisition processing system.
我们有一块大板卡,上面有四台不同的计算机,各自快速执行不同的任务。
So we have a big card with four different computers doing four different tasks very quickly.
它们都配备了高性能处理器。
They have high compute processors on them.
这确实起到了关键作用。
So that definitely helped.
我们还采用了智能算法来最小化计算时间。
We also used intelligent algorithms to minimize the compute time.
正如计算机科学中所说,我们降低了算法的时间复杂度,并巧妙设计了系统,使这些解码器能够超高速运行。
So reduce the time complexity of the algorithms, as you call it in computer science, and really cleverly designed the system so that these decoders work super fast.
这也对解码器架构的设计提出了一些限制。
And that has also posed some limits on what the decoder architecture could be.
我们本可以获得更好的解码效果,但同时我们也希望速度足够快。
So we could get better decoding, but we wanted to get it fast enough as well.
因此我们始终考虑如何尽快获得输出结果,并努力在精度和速度两方面突破极限。
So we had those considerations in mind to get those outputs as quickly as we can and try to push this boundary with accuracy as well as speed.
我想补充一点,Maitri在优化她解码器中的信号处理流程并使其高速运行方面做得非常出色。
I just wanna add that, Maitri was so good at optimizing the signal processing pipeline in her decoder and making them so fast.
她在这方面如此优秀,以至于延迟的最大障碍竟然是在声音生成后如何足够快地通过扬声器播放出来——这真是个令人啼笑皆非的问题。
She was so good at that, that the biggest hurdle in latency has just been playing the sound out of the speaker fast enough after it's generated, which is a really silly problem.
是啊。
Yeah.
这太不可思议了。
That's amazing.
Mayitri,你还提到了信号去噪,也就是预处理的第一阶段,以及低频移动的处理。
And, Mayitri, you also mentioned denoising the signal, you know, the first preprocessing stage, and you mentioned moving low frequencies.
那么你们后续分析主要关注哪些频段呢?
So what frequencies did you concentrate on for further analysis?
我们主要研究的是所谓的尖峰频带。
So what we are looking at is called spike band.
尖峰频带指的是250赫兹以上直至5000赫兹的高频段。
Spike band is higher frequencies above two fifty hertz up to 5,000 hertz.
我们可以观测该频段内的尖峰活动。
We can look at spiking activity in that band.
因此我们既关注尖峰活动,也关注宽带活动。
So we are looking at spiking activity as well as broadband activities.
尖峰活动高度局限于单个记录电极,而宽带活动能提供更强的鲁棒性和稳定性,因为它能反映该电极周围更综合的神经活动。
So spiking activity is really localized to individual electrode that we are recording from, but broadband activity gives us more robustness and more stability because it kind of looks at somewhat more aggregate activity around that particular electrode.
通过结合这两个特征,我们成功提升了解码器的性能。
So using the combination of these two features, we are able to enhance our decoders.
因此在进行解码时,我们只关注250赫兹以上的信号。
So when we're decoding, we're only looking at signals above 250 hertz.
但显然,许多领域能从比这更慢的神经信号中获取大量价值。
But, obviously, there's lots of fields that get a lot of value out of neural signals slower than that.
所以我们也会保存这些其他信号。
So we also save those other signals.
我们会保存整个宽带信号,之后可以离线分析。
We save the whole broadband thing, and then we, you know, can look at that offline.
或许这能让解码效果更好。
And maybe it'll make decoding better.
谁知道呢?
Who knows?
但目前我们使用的是250赫兹以上的信号。
But for now, above two fifty is what we use.
嗯。
Mhmm.
谢谢。
Thank you.
米特拉,你还提到了观察语音的其他方面,比如语调或音量。
And, Mitra, you also talked about looking at other aspects of speech such as intonation or volume.
你注意到什么特征了吗?
Did you notice any features?
你们能提取出与这些成分相关的特征吗?
Were you able to extract any features related to those components?
是的。
Yeah.
我们确实还处于研究这些并行语言特征的早期阶段,但我们相信如果能够足够可靠地解码它们,这些特征必将为更好的语音合成做出贡献,并极大提升使用这种VCI进行自然语音交流的整体体验。
We are really in early stages of looking at those parallel linguistic features, but we believe that if we are able to decode them reliably enough, they would definitely contribute towards better speech synthesis and really improve the overall experience of using this VCI for naturalistic speech.
这太棒了。
That is amazing.
尼斯,关于脑信号转文本的算法和信号分析部分,你有什么要补充的吗?
Nith, do you have anything to add for your part of brain to text in terms of algorithms and signal analysis?
是的。
Yeah.
我可以补充,实际上,我使用了Maitrey描述的相同神经特征来进行脑到文本的解码流程。
I can and so, really, I I use the same neural features that Maitrey described for the brain to text decoding pipeline.
但我还要指出,当然,脑到语音的延迟是一个大问题,实时播放音频会产生很大影响。
But I can add that, you know, of course, brain to voice latency is a huge deal, and playing that audio back in real time is makes a big difference.
对于脑到文本,你在延迟方面有更多灵活性,这让你在模型架构和预测频率等方面有更多调整空间。
For brain to text, you have more flexibility with the latency, which, you know, gives you a lot more flexibility over the model architecture and how frequently you're gonna be making predictions and that type of thing.
我们仍希望保持相对较低的延迟,使文字能在他说话时显示在屏幕上,但这个延迟可能是几百毫秒而非她所说的十毫秒。
We still wanna keep the latency relatively low so that words are appearing on the screen as he's speaking, But, you know, that that can be hundreds of milliseconds potentially rather than ten milliseconds like she's describing.
因此,根据你要解码的内容以及它将如何影响闭环方案中的参与者,需要做出不同的考量。
So there are, you know, different considerations to make depending on what you're trying to decode and how it's going to affect the participant in this closed loop scheme.
我认为在脑到文本流程中,语言模型也有助于填补解码过程中出现的误差,从而提升性能并使系统对参与者更实用。
And I believe in brain to text pipeline language models also help in filling the gap or the errors that happen in the decoding to really improve the performance and make the system usable for the participant.
是的。
Yeah.
是的。
Yeah.
正如Maitri之前提到的,她是从脑信号因果预测语音,而脑到文本转换中,我们拥有使用语言模型的巨大优势,因为英语具有统计结构。
So as Maitri alluded to earlier, she is causally predicting voice from brain signals, whereas with brain to text, we have this huge advantage of using language models because the English language has statistical structure.
当某人说了三个词时,你可以很好地猜测下一个词会是什么,或者你可以填补空白,诸如此类。
When somebody says three words, you can make a pretty good guess what the next word's going to be, or you can fill in gaps, or all types of things like that.
所以我们虽然实时预测文本并更新在屏幕上,但也会非因果性地回顾所有已预测单词之间的关系, 并用最佳猜测进行更新。
So although we are predicting text in real time and updating it on the screen, we're also non causally looking back at all the relationships between the words that have been predicted so far and updating with our best guess.
在一句话结束时,当我们确认完成且参与者按下完成按钮进行最终确认时,我们就有更多时间来处理,这时可以充分利用这些大型语言模型——这也是另一个快速发展的领域。
And then at the end of a sentence, when we know we're done and the participant presses the done button to finalize, we have more time to work with there, and we can really leverage these large language models, which is another exploding field.
我们很幸运这个领域与语音神经假体技术同步爆发,因为对于脑信号检测而言,它们非常有用。
We're fortunate that it's exploding alongside speech neural prostheses because for brain detects, they're very useful.
所以我的意思是,在句子结束时可以进行这种更耗时的步骤——称为重评分步骤——根据解码内容重新分析参与者可能表达的所有可能性,然后得出最有价值的答案。
So what I'm saying is at the end of a sentence, can sort of take this more expensive, it's called a rescoring step, to reanalyze what all of the possible things that the participant may have said there based on what you decoded, and then coming up with the most valuable answer.
这个步骤还可以根据参与者或对话中已有的上下文进行一定程度的个性化调整。
And that step can also be personalized a bit toward a participant or toward a conversation where there's existing context.
所以你三句话前说的内容可能会影响你现在想表达的内容,我们可以利用这些信息。
So what you said three sentences ago might be influencing what you're trying to say now, and we can leverage that information.
此外,参与者的一些个人情况,比如他们的日常工作、家庭成员名字或居住地等,也会极大影响他们可能谈论的话题。
And then also, some facts about the participant, like their day job, or their family members' names, or where they live, things like that, that can also heavily influence what they might be talking about.
因此整合所有这些上下文信息是这个文本解码方案的另一个巨大优势。
So incorporating all of this contextual information is another big advantage of this text decoding scheme.
这就是为什么弥勒说得对,脑到语音转换要困难得多。
So this is why Maitreya rightly said that brain to voice is much harder.
你没有任何这些上下文线索可以整合到解码流程中。
You don't have any of those contextual cues to incorporate into your decoding pipelines.
你只需要非常擅长解码神经信号。
You just have to be super good at decoding neural signals.
是的。
Yeah.
因为没有纠正机制——一旦声音播放出去,你就无法根据前后内容进行回溯修正使其更清晰。
There is no corrector mechanism because once you play the sound that's out there, you cannot go back and correct it based on what was said before or after and make it more intelligible.
这是我们正在攻克的主要挑战之一。
So that is one of the major challenges that we are working on.
我还想提到,先前的研究采用了某种声音离散化的方法。
I also wanted to mention that previous studies have used kind of discretization of voice.
比如Chang 2023年的研究,他们也尝试从脑信号合成语音,但他们的方法与我们的截然不同。
So for example, Chang twenty twenty three study, they also try to synthesize voice from their brain signals, but their approach is quite different to ours.
他们将所有声音离散化为一些基本的声音单元,但这些仍是离散单元,通过串联序列来生成声音;而我们的方法是在连续空间直接输出声音。
So they're discretizing all the sounds into some fundamental units of sound, but they're again discrete units that they're stringing together the sequence to make the sound, whereas our approach is to output sound in continuous space.
受试者可以随心所欲地说任何话,我们希望能合成这种未经离散化的、不局限于特定单词的原始声音。
The participant can literally say anything, any way he likes, and would like to synthesize that voice without discretizing it to sound like a particular word or something else.
这也带来了额外的挑战,因为输出空间是无限的。
So this also poses additional challenges because your output space is infinite.
你可以发出无限种声音,也可以用无限种方式说同一个词。
There are infinite sounds that you can make and infinite ways in which you can say a word.
所以每次说同一个词,发音都会有所不同。
So every time you say a word, you'd say it differently.
这不会是相同的。
It's not going to be the same.
这些就是我们为了产生更自然的声音所面临的挑战。
So those are the kind of challenges that we are dealing with to produce more naturalistic sound.
非常感谢。
Thank you so much.
你们的参与者最想交流的内容是什么?
And what was your participant mostly interested in communicating?
他个人想通过你们提供的界面传达什么?
What did he want personally to communicate through the interface you provide?
我想我可以提供一些关于他日常生活的更多背景信息。
I think I can give you a little bit more context about his day to day life to inform.
对于ALS患者来说,他曾是个非常健谈、幽默的人。
So for ALS, he was a very talkative, funny person.
他说过他可以滔滔不绝地说上好几个小时,甚至完全不用思考。
He said that he could just speak for hours and hours and not think about it even at all.
他总有说不完的话。
He just had a lot to say.
自从ALS导致他出现构音障碍后,这种能力显然被剥夺了。他仍能与亲人交流,但需要付出大量努力、忍受挫折和保持耐心。
Since his dysarthria has set in with ALS, obviously that's been very much taken away from him, He can communicate with his loved ones, but only through a lot of effort and frustration and patience.
所以日常生活中,他的交流大多局限于事务性内容,比如'你能帮我做这个吗?'
So most of the time, day to day, what he communicates has been limited to transactional things like, can you do this for me?
或者'我不舒服'、'请做这个'。因此他基本无法进行超出基本需求之外的交流。
Or I'm uncomfortable or please do So he's not really speaking for the sake of communicating beyond basic needs.
而通过语音神经假体技术参与这项研究时,他表现出了强烈的竞争意识。
So with a speech neural prosthesis, and when he decided to enroll in this study, he's also very competitive.
首先他想打破所有记录,, 其次他想恢复自己的声音和幽默感, 想讲笑话, 想和女儿聊天, 想和家人交谈, 想和同事交流。
Well, first of all, he wanted to break all the records, and second of all, he wanted to get his voice back and his sense of humor back and tell jokes, talk to his daughter, talk to his family members, talk to people at work.
他现在仍然全职工作。
I mean, he still works full time.
他希望能在工作环境中更轻松地与人沟通。
He wants to be able to more easily communicate with people in that environment.
他想,比如,在电子游戏里开别人玩笑之类的事情。
He wants wants to, like, you know, I don't know, make fun of people on video games, things like that.
就像任何我们习以为常的日常交流小事,可以说他都希望能做到。
Like, just any any little thing that we take for granted with communication, I think you could say he wants to be able to do that.
所以
So
而且这个系统对我们来说也非常有用,因为他通过这个系统与我们交流,因为没有翻译我们听不懂他的话。
And it's it has been incredibly useful to us as well because he communicates with us using this system because we cannot understand his speech without an interpreter.
所以我们去参加疗程时,就直接打开这个系统。
So when we go to the sessions, we just turn on the system.
他跟我们说话。
He speaks to us.
我们跟他交谈。
We talk to him.
这确实也帮助了研究,并获取他对不同任务效果的反馈。
And that has really also helped the research and get feedback from him on how different tasks have worked for him.
所以,是的,这确实是一个他能够真正用来与人交流的不可思议的系统。
So, yeah, it's really an incredible system that he can really use to communicate with people.
那么他对这个过程以及你们所取得的结果有什么反馈?
So what was his feedback on the process and on the results that you were able to achieve?
我们问他是否知道或认为这个系统会运作得这么好。
Well, we asked him if he knew it was gonna work this well or if he thought it was gonna work this well.
他说当然。
And he said, of course.
是的。
Yes.
当然会。
Of course, would.
他就是他。
He's him.
他总是赢家,就是那种感觉。
He always wins, that type of thing.
所以,我的意思是,他当然很感激,而且他确实对这项技术感到惊叹。
So, I mean, of course, he's grateful, and he he really marvels at this technology.
我想说的是,他并没有把这视为理所当然。
He does not take it for granted, I guess, is what I'm trying to say.
我认为在某种程度上,他对效果如此之好感到惊讶,无论他是否愿意承认,就像我们所有人都感到惊讶一样。
I think he was, on some level, surprised at how well it worked, whether he wanted to admit it or not, as were we all surprised.
看到像这样的东西立即产生效用真是太神奇了。
Seeing immediate utility out of something like this was amazing.
我没有提到的是,在第二次研究会议中,他使用了这个非常大的词汇量来说出这些提示句子。
Something I didn't mention was in this second research session where he used this very large vocabulary to speak these prompted sentences.
那天结束时,我们还给了他机会尝试说出任何他想说的话,看看我们是否能解码,而不是让他从屏幕上读出内容。
At the end of that day, we also gave him the opportunity to just try to say whatever he wanted and see if we could decode that instead of him reading something off of the screen.
所以那天结束时我们只剩一点时间,但我们还是完成了10个句子。
So we only had a little bit of time left at the end of this day, but we were able to do 10 sentences.
他用了全部10个句子来和他的女儿交谈,这可能是多年来第一次真正意义上的交流,因为他一直无法说话,这是他唯一能和女儿沟通的方式。
And he used all 10 of them to talk to his daughter for probably the first time really in years because he hasn't been able to speak and is that that that's the only way he can communicate with his daughter.
于是他询问她今天过得如何,并说他一直期待能和她交谈,等待这一刻已经很久了,说了诸如此类的话。
So asked her how her day was and said he was have to talk to her and he'd been waiting for this for a long time and those types of things.
所以我认为这直接证明了这项技术对患有此类疾病的任何人——尤其是对他本人——的生活能有多么大的帮助。
So I think that was really an immediate demonstration of how useful something like this could be in anybody's life who has this type of disease, but also specifically in his.
是的。
Yes.
完全同意。
Absolutely.
那第二次实验之后发生了什么?
And what happened after that second session?
因为你主要提到了第一次和第二次实验。
Because you were referring mostly to first and second session.
之后的情况是怎样的?
What was going on after that?
你们进行了哪些类型的研究?
What type of studies were you doing?
你们探索了哪种沟通方式?
What type of communication did you explore?
效果如何?
And how did it work?
是的。
Yeah.
我们有几个正在进行中的项目。
So we have several ongoing projects.
我的意思是,显然在这两个解码项目中,我们持续收集了大量数据。
And, I mean, obviously, with both of these decoding projects, we we continued to collect a lot more data.
但对于VCI奖项,我们想仅聚焦于初期影响,展示这种即时效果。
But for the VCI award, we wanted to just limit ourselves to this initial impact and, you know, just show like, wow.
这太棒了。
This is great.
它能立即见效。
This can work immediately.
这对该领域来说意义重大。
That's huge for the field.
对吧?
Right?
关于脑部检测项目,我们目前正在准备发表相关成果。
So for the brain detects project, we are working on publishing it right now.
但我可以说,我们一直在持续收集大量这类数据。
But I can say that, you know, we've continued to collect a lot of this data.
我们持续进行语音解码,实际上他已经开始每周两天在日常工作中使用这套系统。
We've continued to decode speech, and he has actually began to use this system just for his day job, two days a week right now.
每周两天,我们为他初始化系统,他可以使用八小时——目前最长的连续解码记录应该是八小时。
So for two days a week, we initialize the system for him, and he can use it for eight hours or however I think the I think the current longest continuous collection right now is eight hours of decoding.
他整天都在使用它。
And he just uses it all day.
他用它进行Zoom会议和Slack通话。
He does Zoom calls and Slack calls.
他发送电子邮件。
He sends emails.
他与访客交谈。
He talks to visitors.
他会讲笑话,诸如此类的事情。
He cracks jokes, that type of thing.
所以这对我们和他来说当然都非常令人兴奋。
So that's been really exciting for us and for him, of course.
我们正在准备发表这些研究成果。
We're working on publishing those results right now.
我们还在研究关于语音如何在大脑这一区域编码的基础科学问题,以及我们能从中获取哪些其他信息。
We're also looking at different fundamental science questions about how speech is encoded in this part of the brain and then what other information we can get from it.
因此过去八个月我们一直非常忙碌。
So we have been very busy for the last eight months.
我们正在调查各类不同的问题。
And we're investigating different kinds of questions.
是的,希望我们能从这项研究中看到更多成果。
And, yeah, hopefully, we will get to see much more that comes out of the study.
是啊。
Yeah.
我还有个问题是,这个过程对他来说有多累?
And my question also is how tiring is this process for him?
你提到他可以工作八小时。
You mentioned that he can work for eight hours.
由此我猜,他感觉相当不错。
So from this, I would guess that, you know, it's pretty he feels pretty good.
但他是否花了一段时间才适应这个系统?
But did it take time for him to get used to the system?
整个过程是如何发展直到他...嗯...
So how did it all develop until he yeah.
我认为随着时间的推移,他找到了一种对他消耗最小但仍能获得高质量解码结果的策略。
I think over time, he's found a strategy that is minimally exhausting for him, but still results in very good decoding quality.
例如,如果我们要求他大声持续地明确尝试说话,这可能会让他感到非常疲惫。
So for example, if if we ask him to speak or to attempt to speak overtly and with volume continuously, that can become very tiring for him.
但如果他使用非常低的音量,那种你甚至听不到他在说话的音量,他说他能听到自己在说话。
But if he if he uses a much lower volume, it's kind of a volume that you can't even hear that he's speaking, but he he says he can hear himself speaking.
采用这种方法,他可以持续工作八小时甚至更久。
Using that approach, he can go for eight hours or more.
这八小时的限制并非来自他本身。
The limit in that eight hour case was not him.
只是因为一天结束了。
It was just the end of the day.
所以我认为对于每个参与者来说,这个问题的答案会根据他们的具体情况而大不相同。
So I think for every participant, though, the answer to that is gonna be very different based on their condition.
而在他这个案例中,目前的情况就是这样。
And in in his case, this is just what it is right now.
我认为对我们的参与者来说,这比用自己的语言交流更省力,因为正如尼克所说,即使是专业听众也很难听懂他的话。
And I think for our participant, this is less effortful than communicating user his own speech because it is really hard to understand him even by expert listeners as Nick said.
因此,如果他要说些什么,就必须重复五到十遍同样的话,这对他来说非常耗费精力。
So if he has to say something, he has to repeat the same thing five times, 10 times, which can be exhausting for him.
所以这为他提供了一种快速高效的沟通途径。
So this gives him an avenue to communicate very quickly and very fast.
我还很好奇,从他刚到实验室开始,到能够通过系统进行交流,这个过程需要多长时间?
And I'm also curious how much time does it take from the beginning when he just gets to the lab until he can already communicate through the system?
首先我想说明,我们所有的研究数据收集工作都是在他家里完成的。
Well, so first, I wanna note that we do all of our research collection at his house.
我们在那里设有设备,具体时间会因当天情况有所不同,但通常很快就能开始。
So we have a setup there, and that's a little bit dependent on the day, but normally, the answer is very quickly.
有时我们会发现,刚连接好设备,解码效果就能达到最佳状态。
So you we may find that we plug him in and instantly decoding is as good as it's ever been.
而有些日子,就像我们讨论过的这种不可预测的非平稳性,可能需要稍长时间进行校准,但也不会太久。
And other days, like what we talked about with this unpredictable nonstationarity, other days, it might just take a little bit longer for calibration to happen, but it's not that much longer.
通常在10到20句话之后,情况就会稳定下来,之后他就能全天使用这个系统了。
It's like within 10 or 20 sentences, things will begin to settle down, and then he can use it for the rest of the day.
但从我们到达那里开始,我们基本上只需要打开系统并完成设置。
But from when we get there, we just kinda have to, like, turn on our system and set it up.
我们需要将记录硬件连接到他的头部并接入系统。
We have to hook up the recording hardware to his head and plug him in.
然后还需要进行一些简短的初始化步骤才能让一切运转起来。
Then there's a few, like, short initialization steps that we have to do just to get things running.
但总的来说,我想说整个过程不超过十五到二十分钟左右。
But all in all, I'd say that takes under fifteen, twenty minutes, something like that.
嗯。
Yep.
谢谢。
Thank you.
你能总结一下这项初步研究的结果吗?
Can you summarize the results of this initial study?
我来总结脑转文字的部分,你可以总结脑转语音的部分。
I'll summarize it for brain to text, and you can summarize it for brain to voice.
在一名45岁的ALS患者中,我们将四组微电极阵列植入他的中央前回。
So in a 45 year old man with ALS, we implanted four microelectrode rays into his precentral gyrus.
在记录他脑部信号的第一天,我们就能从一个50个单词的词汇表中几乎完美地解码出他想说的话。
And on the very first day of recording his brain signals, we were able to decode what he was trying to say out of a 50 word vocabulary almost perfectly.
第二天,当我们扩展到更大的词汇量时,能以超过90%的准确率解码他的意图表达,他甚至能用这个系统与女儿对话,说出多年来一直想对她说的自发语句。
And on the second day, when we expanded that to a very large vocabulary, we were able to decode what he was trying to say with more than 90% accuracy, and he was even able to use it to speak to his daughter and say unprompted sentences that he had wanted to say to her for years.
自那以后,我们又收集了大量数据。
And in the time since then, we've collected a lot more data.
我们已经优化了算法并进行了改进。
We've optimized our algorithms and made enhancements.
在准确性方面有所提升,他现在每周都会使用它多次来完成工作并与家人和爱人交流。
Things have gotten better accuracy wise, and he now uses it multiple days a week to do his job and talk to his family and his loved ones.
好的。
K.
因此,使用相同的电极和同一名参与者,我们还能够实时合成他的声音并提供实时反馈,这向恢复失去语言能力者的自然语音迈进了一步。
So with the same electrodes and with the same participant, we have also been able to synthesize his voice in real time with real time feedback, which is a step closer towards restoring naturalistic speech in a person who has lost their ability to speak.
我们不得不克服一些挑战,因为参与者无法清晰说话,所以没有真实语音作为基准。
We have had to overcome challenges because the participant cannot speak intelligibly, so there's no ground truth speech.
而在语音合成方面,延迟也是一个主要问题。
And then with voice synthesis, the latency is also a major issue.
通过克服这些挑战,我们已经能够合成出这种近乎可理解的语音。
So overcoming those challenges, we have been able to synthesize this voice, which is nearly intelligible.
非常感谢。
Thank you so much.
你认为你们研究的局限性有哪些?未来希望在哪些方面进行改进?
And what would you consider the limitations of your study upon which you want to improve in the future?
在这类研究中,我们显然总是受限于参与者的数量。
So in these types of studies, we're always limited by the number of participants, obviously.
所以这是一个单例研究,我们非常想看看这种高性能语音解码能否推广到其他参与者身上,无论是其他ALS患者还是其他病症患者。
So this is an n equals one study, and we're really interested to see if this high performance speech decoding can translate to other participants, either other participants with ALS or perhaps other conditions.
是的。
Yeah.
在语音合成方面,我们仍未实现完全清晰可懂,因此还有很大改进空间。
In terms of voice synthesis, we are still not fully intelligible, so there is a lot of progress to make.
我们已经展示了第一步,但仍需改进。
We have shown the first step, but it needs to improve.
,Breakoutext已发展到可在个人生活中实际应用的先进阶段,这很棒。
Whereas, Breakoutext is at advanced stage where it can be usable in his personal life, which is great.
但正如尼克所说,我们需要在更多参与者身上复制研究和这些方法,以使其适用于更广泛人群。
But, again, as Nick said, we'll have to replicate the study and these approaches on multiple participants to kind of make it available to wider population.
我要指出的是,尽管BrainDetect的表现非常出色,但总有改进的余地。
And I'll point out that although the BrainDetect's performance is very good, there's still always room for improvement.
就像我们之前讨论过的语言模型,它们利用英语语言的统计结构进行预测,但在某些情况下处理专有名词时仍存在不足。
So I can say, we talked about language models earlier, which were harnessing the statistical structure of the English language to make predictions, a failure point of that is proper nouns in some cases.
有很多句子可能会用到专有名词。
So there's many sentences where where you might put a proper noun.
首先,它可能是任何其他专有名词,也可能只是一个普通名词,例如。
It could first of all, it could be any other proper noun, or it could just be a normal, like, a noun, for example.
如果你问一个语言模型,那里最可能出现的词是什么?
And if you ask a language model, what's the most likely word that should go there?
它不会是一个具体的名字。
It's not gonna be a specific name.
有时会是其他词。
It's gonna be some other word sometimes.
所以这类解码总有一些可以改进的小细节。
So there are some small things like that that we can always improve for this type of decoding.
当然,当涉及到临床应用时,人们不会希望头上插着电线,也不会想要旁边放个衣柜大小的电脑推车来做解码。
And then, of course, when it comes to clinical translation, people don't really they're not going to want these wires poking out of their heads, and they're not going to want closet sized computer cart sitting next to them to do the decoding.
因此行业还需要做大量工作来完善这类记录系统,把它们变成小型嵌入式计算机。
So there's a lot of work for the industry to do to perfect these types of recording systems and make them into little embedded computers.
也许有一天它能在你的手机上运行。
Maybe maybe it can run on your phone someday.
是的。
Yeah.
尽管我们已经进行了八个月的研究,但我们仍需观察这种解码器的长期稳定性和长期准确性。
And even though we have been conducting this study for the last eight months, we still have to see the long term stability and long term accuracy of this decoder.
一旦接受神经外科手术并植入这些电极,期望该系统在未来几年内都能保持相同的准确度。
And once you undergo neurosurgery and implant these electrodes, the expectation is that the system will work with equal accuracy for several years to come.
这也将成为另一个挑战。
And that is also going to be another challenge.
而且疾病的发展进程因人而异。
And then the disease progresses, it progresses differently in different people.
因此我们必须观察这种疾病进展如何影响我们建立的脑机接口。
So we have to see how that disease progression affects the brain computer interface that we have built.
最终,患有ALS或退行性疾病的患者会完全丧失语言能力或进入闭锁状态。
And eventually, people with ALS or degenerative diseases lose their ability to speak altogether or go into locked in states.
这也可能带来其他挑战。
So that might also pose other challenges.
因此我们还有许多待研究的开放性问题。
So there are many open questions that we want to investigate.
是的,希望通过开展这类试验,并在更长时间内纳入更多参与者,能帮助我们解答其中一些问题。
And, yeah, hopefully doing these kinds of trials and more participants over longer durations will help us answer some of those questions.
谢谢。
Thank you.
在这个项目中,你遇到的最大挑战是什么?又是如何解决的?
And what was the biggest challenge for you in this project and how did you solve it?
最大的挑战之一——我不确定是否算最大的挑战——可以说是福祸相依。
One of the biggest challenges, I don't know if it was the biggest challenge, sort of a blessing and a curse.
当你加入一个全新实验室时,必须从零开始搭建一切。
When you join a brand new lab, you have to build everything from the ground up.
这样做的好处是你可以按照自己的意愿来打造它。
So the good part of that is you can make it the way that you want it.
而困难之处在于你真的必须亲力亲为。
And the hard part is that you actually have to do it.
我们很幸运,我记得我们俩加入实验室后,大约有一年时间才招募到第一位临床试验参与者。
We were fortunate enough to have I think we both joined the lab and we had about a year before we recruited a clinical trial participant.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。