人工智能能否加速科学发展？Andy Beam博士谈AI的下一个前沿

本集简介

Andy Beam博士曾训练模型、指导科学家，并运用数据量化治疗方案的价值。在本期《新英格兰医学杂志AI大查房》节目中，Raj Manrai与联合主持人角色对调，回顾Andy童年误诊经历及人类记忆的局限性如何揭示了机器学习的诊断潜力。作为哈佛大学教授，他培养跨界思维人才，开发工具评估安全性而不仅是性能。如今作为Lila Sciences首席技术官，他正构建一套实验性AI系统，能够自主生成假设并在现实世界验证。这场对话带您前排直击科学演进的下一篇章。文字稿。

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

这些是房间里的机器人吗？实验方面是什么？

Are these robots in a room? What is the experimental side?

Speaker 1

是的，它们是房间里的机器人。它们是房间里的无实体机械臂。我们有一个系统。我们有一个自动化实验平台，如果你熟悉实验操作的话，它们通常在培养板上进行，比如96孔板或384孔板。这些培养板在我们拥有的平面电机系统上磁悬浮，并可以快速移动到这条大导轨旁边。

Yeah, they are robots in a room. They are disembodied robot arms in a room. We have a system. We have an automated experimental platform where if you're familiar with how experiments work they often work on plates, so either like a 96 well plate or a three eighty four well plate. These plates magnetically levitate over this planar motor system that we have and they can zip next to this big rail.

Speaker 1

那里有放着实验设备的台子，机械臂会从导轨上拿起培养板，放入设备中，完成后放回导轨，然后培养板可以快速移动到下一站。所以我对这个的抽象理解是，实际上我们正在构建一种新型计算机，这个平面电机系统，这条导轨，本质上就像一个PCI总线，而我们所做的是在现实世界中将新设备连接到这个PCI总线上。这个想法不是拥有几个能完成人类工作的站点，而是拥有大量这样的站点，可以进行规模化实验，然后它真的开始感觉像是一种新型实验集群，我们将其与传统的GPU集群配对。

There are benches with experimental equipment on it and the robot arm will pick the plate up off the rail, put it in the piece of equipment, when it's done put it back on the rail, and then the plate can zip off to the next stop. So the abstraction that I have for this is that actually we're building this new kind of computer and that this planar motor system, this rail, is essentially like a PCI bus and what we're doing is hooking new devices on to this PCI bus in the real world. And the idea is not to have a couple of these stations that can do what humans do, to have buildings of these stations that can do experimentation at scale, and then it really does start to feel like a new kind of experimental cluster that we pair with a traditional GPU cluster.

Speaker 0

大家好，欢迎来到另一期NEJM AI大查房。我是你们的联合主持人Raj Manrai。这一期我玩得很开心，因为我得以对我的好朋友兼联合主持人Andy Beam反客为主，他是今天节目的嘉宾。Andy和我认识很久了。我们在哈佛一起做博士后时， literally坐在相邻的隔间里，然后我们差不多同时进入学术求职市场并建立了自己的实验室。直到去年，Andy还是哈佛公共卫生学院的教授，现在他是LILA Sciences的首席技术官，这是一家致力于科学超级智能的公司。

Hi, and welcome to another episode of NEJM AI grand rounds. I'm your cohost, Raj Manrai, And for this episode, I had a lot of fun because I got to turn the tables on my good friend and cohost, Andy Beam, who is the guest of today's episode. Andy and I have known each other for a long time. We were postdocs together at Harvard, literally sitting in cubicles next to each other, and then we went on the academic job market and started our labs around the same time. Until last year, Andy was a professor at the Harvard School of Public Health, and now he's the CTO of LILA Sciences, a company working on scientific superintelligence.

Speaker 0

我非常了解Andy，但在这段对话中我仍然了解到了很多关于他的新事情，包括他早期在医疗保健领域的经历。他如何对医学产生兴趣、他对人工智能长达数十年的痴迷，以及他对AI在医学领域乃至更广泛领域的预测，都给我留下了深刻印象。

I know Andy really well, but I still learned a lot of new things about him during this conversation, including about his early experiences in health care. I was struck by how he got interested in medicine, his decades long fascination with artificial intelligence, and his predictions for AI both in medicine and more broadly.

Speaker 2

NEJM AI大查房播客由Microsoft、VizAI、Lyric和Elevance Health赞助播出。我们感谢他们的支持。

The NEJM AI Grand Rounds podcast is brought to you by Microsoft, VizAI, Lyric, and Elevance Health. We thank them for their support.

Speaker 0

那么，我很高兴为大家带来我与Andy Beam的对话。好了，Andy Beam，欢迎来到AI大查房。这次轮到我这么说了。所以，首先让我说，我真的很兴奋能第一次向你提出这个问题，考虑到过去十年我们在一起度过了那么多时间，我想我大概能相当好地模拟这个场景。

And with that, I'm delighted to bring you my conversation with Andy Beam. Alright. Andy Beam, welcome to AI grand rounds. Get to say that this time. So, let me let me first say I'm truly excited that I get to pose this question for the first time to you, and I think I could probably simulate this reasonably well given how much time we've spent together over the last decade.

Speaker 0

但是安迪·B，你能告诉我们你自己神经网络的训练过程吗？你是如何对AI产生兴趣的，又是哪些数据和经历让你走到今天的？

But Andy B, could you please tell us about the training procedure for your own neural network? How did you get interested in AI and what data and experiences led you to where you are today?

Speaker 1

是的，说起来挺有意思的。拉吉，我会尽量给你一些新信息，不过你大概能猜到很多这样的发展轨迹。我从小就是个工程书呆子，对医学真的没什么兴趣。

Yeah. It's funny. I'll try and give you some new information here, Raj, but you probably can predict a lot of this trajectory. So, I was always kind of like an engineering nerd as a kid. I wasn't really interested in medicine.

Speaker 1

我妈妈讲过这个故事：幼儿园时他们搞了个实验性设置，有各种站点，孩子们可以轮流去阅读和算术。而我整个幼儿园一年都待在乐高站，以至于幼儿园毕业后我连自己的名字都不会写。所以我一直就对捣鼓、建造和工程感兴趣。真正开始大量思考计算机科学和计算机工程是在高中。我买了台戴尔电脑——伙计，你要有台戴尔了，还记得那些广告的人应该懂。

My mom tells this story that in kindergarten they were doing this new experimental set up where they had stations and the kids could rotate through to reading arithmetic. And I literally spent my entire kindergarten year at the Lego station to the point that I couldn't write my name after kindergarten. So I've always just been interested in tinkering and building and engineering. And it was really in high school when I started to think a lot about computer science and computer engineering. I got this Dell, dude, you're getting a Dell for people who remember those ads.

Speaker 1

哦，奔腾三，733兆赫兹。我就完全沉迷于电脑能做的所有事情。我稍微涉足了硬件黑客，所以在大学里搞了个副业，改装Xbox。我可以拿别人的Xbox，在主板上焊两个跳线点，就能刷BIOS，基本上把它变成一台通用电脑，可以安装更大的硬盘，还能在上面运行超级任天堂游戏。我大一大二时靠改装Xbox赚了不少啤酒钱。你走进我宿舍，会看到一堆Xbox从地板堆到天花板，因为宿舍里的人会拿来，我收他们大概50美元改装他们的Xbox。高中时我还上了第一门编程课，在当地社区学院学了QBasic，之后我就觉得计算机科学就是我想做的事情。

Oh, the Pentium three, like seven thirty three megahertz. And I just completely got rabbit holed by all the things that you could do on a computer. I got into like hardware hacking a little bit so I made like a little side hustle in college modifying xboxes so I could take someone's xbox you could solder two jumper points on the motherboard that would let you flash the bios and essentially turn it into a general purpose computer so you can install a bigger hard drive you could run super nintendo games on it and I made like a decent amount of beer money my freshman and sophomore year modifying Xboxes. You'd come into like my dorm room and there would just be like a stack of Xboxes like floor to ceiling because people in my dorm would bring it by I'd charge them like $50 and modify their Xbox. In high school I also took my first programming class so I took a QBasic course at a local community college and I think after that experience I was just like computer science is like what I want to do.

Speaker 1

它触动了我感兴趣的那么多方面，我真的觉得那是一种指导原则。那是我最能共鸣的领域。我会继续讲我的轨迹，但我确实换过很多领域。虽然我仍然以计算机科学的视角看待大多数事情，但也受到我从事的其他一些工作的影响。我要说，我有一个童年时的 formative 记忆，让我对医学和计算机科学的交叉产生了兴趣。

It spoke to like so many of the things I was interested in and really I think that's sort of guiding principle. That's the field that I most resonate with. I'll keep going through my trajectory here, but I really have changed fields a lot. And I think that though I still view most things through the prism of computer science, but informed by some of these other things that I've been working on. I will say I have this formative memory that got me interested in medicine and the intersection of computer science from my childhood.

Speaker 1

实际上，我小时候经常生病，一年得四次链球菌性喉炎，还同时得过水痘和带状疱疹。所以我成长过程中经常进出医生办公室。还有一次在附近玩“抓野鹅”游戏时，一根棍子插进了我大腿三英寸深。

I was actually, I got sick a lot as a kid, had strep throat like four times a year, I had chicken pox and shingles at the same time. So I was like in and out of doctor's offices a lot growing up. I also got like a stick stuck three inches into my quad playing wild goose chase in my neighborhood.

Speaker 0

所以我

So I

Speaker 1

比如，我有很多这样的经历。

like, I had like lots.

Speaker 0

我想我当时并不了解四联症的情况。不记得了。哇。是的。所以你从小就有这些非常、非常强烈的医疗系统经历，甚至在很小的时候。

I don't think I knew the stick of the quad. Think No. Wow. Yeah. So you have these strong, strong experience with the healthcare system even growing up very, very young.

Speaker 1

是的。其中有一个经历后来在我生活中重现，让我更加确信AI在医学中的潜力。那是在六年级，也就是我初中的第一年，像大多数书呆子初中生一样，那年我去了太空营。实际上我还去了一个叫做天才营的地方，在北卡罗来纳州，那是暑假里学术天赋异禀的书呆子们去的夏令营。但从太空营回家后，我开始像狗一样吠叫。我有一种咳嗽，非常像狗叫。

Yeah. And there's one of these that I think came back to me later in life that reinforced the potential for AI in medicine. It And was in sixth grade so it my first year of middle school and like most nerdy middle schoolers I went to space camp that year. I actually went to also something called spec camp which in North Carolina was like the academically gifted nerd camp that you went to in the summer but I got home from space camp and I started barking like a dog. I had this cough that was like very much like barking like a dog.

Speaker 1

那是我妈妈听过的最奇怪的事情。当时我们在海滩，咳嗽持续不断，最终有一天晚上我咳得太厉害，导致窒息无法呼吸。最后甚至开始呕吐。抱歉，这故事有点恶心。

It was the strangest thing my mom had ever heard. We were at the beach and it continued and continued and eventually one night I coughed so much that I became asphyxiated and couldn't breathe. And it eventually just went into vomiting. Sorry that this is kind of a gross story.

Speaker 0

天啊。

Oh my God.

Speaker 1

这真的很创伤。非常创伤。于是我妈妈带我去急诊室，他们完全不知道是什么问题。第二天又带我去看儿科医生。在儿科医生办公室里，我又一次完全发作，剧烈咳嗽、支气管痉挛、呕吐，就发生在儿科医生面前，而儿科医生说'我觉得你可能是鼻窦感染'。

And it was just traumatic. Like it was traumatic. And so my mom took me to the ER, they had no idea what it was. They took me to the pediatrician the next day. And in the pediatrician's office, I went in the exact same full spell, like full coughing, bronchospasm, emesis, right in front of the pediatrician and the pediatrician was like you know I think you have a sinus infection.

Speaker 1

我妈妈说'这孩子绝对不是鼻窦感染，我不知道他得了什么，但绝对不是鼻窦感染'。这样又持续了好几天，我妈妈半夜醒来，突然想起她小时候的一件事。她记得和我祖父母，也就是她的父母一起在车里时，我祖母也发生过同样的事情。他们不得不把车停在路边，我祖母在路边呕吐，她得的是百日咳。所以我实际上呈现的是典型的百日咳症状，但那位儿科医生在整个执业生涯中从未见过这种病例。

My mom was like this kid does not have a sinus infection like I don't know what he has but he does not have a sinus infection. So I had this for several more days and my mom woke up in the middle of the night and had this flashback to when she was a kid. And she remembers being in the car with my grandparents, her mother and father and the same thing happened to my grandmother. And so they had to pull the car over and my grandmother vomited on the side of the road and my grandmother had whooping cough. And so I was actually having textbook presentation of whooping cough, but the pediatrician had never seen this during their entire practice.

Speaker 1

他们这辈子都没见过这种病，百日咳基本上已经被根除了。于是第二天我妈妈打电话给儿科医生，问他：‘你觉得安德鲁会不会得了百日咳？’医生回答说：‘说来也巧，我们刚接到疾控中心的一个随机电话，说本县其他地区出现了几例百日咳确诊病例。’结果发现我确实得了百日咳，医生给我开了些巨大的难以下咽的马药丸般的药片，但总算治好了。我爸爸是牙医，疾控中心甚至来到他的诊所和我们家进行了全面排查，导致他的诊所停业了几周。但这件事让我意识到，我和大多数成长过程中的人一样，对医生怀有一种近乎神圣的敬畏——他们仿佛是神职人员与全知者的结合体，能够准确诊断一切疾病。

They had never seen this in their life, whooping cough had mostly been eradicated. And so the next day my mom called the pediatrician and he was like, do you think Andrew might have whooping cough? And they were like, well, it's funny you say that we just got this random call from the CDC and there've been a couple documented cases of whooping cough in other parts of the county and so turns out I did have whooping cough, I got these huge horse pills that were terrible to take but remedied it. My dad's a dentist and so they actually shut down his practice for a couple weeks due to the the CDC came to my dad's practice and came to our house and kind of like did a full canvas. But what this told me was that, I had like most people growing up this sort of like reverent view of physicians that they are some mix of meeting members of the clergy but also have some non trivial amount of omniscience and can correctly diagnose everything.

Speaker 1

但我的儿科医生无法诊断的原因是他们从未见过这种病例。虽然我的症状完全是教科书式的典型表现。如果根据我的症状来评估患百日咳的条件概率，应该无限接近百分之百。但医生存在这种近期偏见，他们就是有个认知盲区。这是一种非常人性化、经过充分研究的认知偏差，就像近期偏见那样。

But the reason why my pediatrician was unable to diagnose is that they had never seen it. Like it was a textbook presentation. So if you're looking at the conditional probability of whooping cough, given my symptoms, it would have been close to one. But the fact that there was this recency bias with the pediatrician, they just had a blind spot. And there's a very like human well studied, like cognitive bias, like recency bias.

Speaker 1

这件事让我长期铭记于心——人们对诊断的思维方式存在这种缺陷。本科期间我学习计算机科学、计算机工程和电子工程，试图决定未来方向。我曾考虑成为网络工程师去思科工作，也曾在高通实习，为骁龙处理器做大型电路设计验证。

So that stuck with me for a long time that there was this flaw in the way that people think about diagnosis. And so as I moved through undergrad, I was studying computer science, computer engineering, electrical engineering, trying to decide what I was going to do. I thought about being a network engineer and going work for Cisco. I was interning at Qualcomm doing very large circuit design verification for the Snapdragon processor.

Speaker 0

这些当时都发生在北卡罗来纳州，对吧？

So this is all in North Carolina at the time, right?

Speaker 1

都在北卡罗来纳州。是的，北卡罗来纳州立大学。我在北卡州立读的本科。当时我以为自己会从事这两条职业道路中的一条。后来我选修了一门人工智能课程——北卡州立的本科AI课程。

All in North Carolina. Yeah, NC State. So I went to undergrad at NC State. And so I thought that I was gonna do one of those two things. And then I took an AI class, the undergraduate AI course at NC State.

Speaker 1

用的是Russell和Norvig那本绿色的现代AI经典教材。那完全是我见过最令人震撼的学科。我们讨论了忒修斯之船悖论，探讨了所有这类哲学问题——比如意识意味着什么？但也涉及非常实用的内容，比如如何用A*算法等工具在巨大空间中进行搜索？如何进行定理证明？

It was the green like modern AI book from Russell and Norvig, like the classic textbook. And it was just like the completely the most mind blowing thing that I had subject I had ever seen. We talked about the Ship of Theseus, we talked about like all of these philosophical issues, like what does it mean to be conscious? But then also the like very practical things like how do you search over large spaces with things like A Star? How do you do theorem proving?

Speaker 1

这门课基本上融合了我所有超级感兴趣的学科，彻底改变了我的人生轨迹。于是我决定投身人工智能领域，放弃那些工程类工作。接着我尝试从这个认知出发逆向思考，探索AI领域中最令人兴奋的研究方向。这时我又回想起六年级那场百日咳的经历，意识到医学必须是最具影响力的研究方向之一，而且具备所有这些有趣特性。于是我咨询了许多教授，征求他们的建议。

And it was like, just essentially an amalgamation of all the subjects that I found super interesting and really was a hard fork for the trajectory of my life. So I decided that I wanted to do AI, I didn't wanna do those engineering things. And so then I tried to work backwards from that realization and figure out what were the most exciting things that I could work on in AI. And again, I had this like flashback to this whooping cough episode in sixth grade and said like medicine has to be one of the most impactful things that you could work on and also has all these interesting properties. So I then decided, spoke to a lot of my professors, asked them for their advice.

Speaker 1

我从教授这门课程的AI教授那里得到了非常有智慧的忠告。那是在2006或2007年左右，他说，你知道这个叫机器学习的东西看起来真的很重要，似乎正处于上升轨道。所以如果我是你，我会去深入理解概率论和不确定性下的推理，真正钻研进去。于是我听从了他的建议，最终留在北卡罗来纳州立大学，获得了统计学硕士学位。

I got very sage advice from the same AI professor who taught the course. This was in like 2006 or 2007 and said, you know that this thing called machine learning really seems to be important. It seems to like be on an upward trajectory. So if I were you, I would go like really understand probability theory and reasoning under uncertainty and really get deep in that. And so I took his advice, I ended up staying at NC State and getting a master's in statistics.

Speaker 1

同时还在EPA（美国环保局）做了一些研究，学到了很多关于概率论基础理论的知识，有很多超级有趣的内容。我在北卡罗来纳州立大学完成了生物信息学博士学业，研究贝叶斯神经网络用于全基因组关联分析。这是在AutoGrad出现之前的贝叶斯神经网络时代。GPU计算才刚刚起步，所以就像是手动编写CUDA内核，手动实现反向传播。

Also doing some research at the EPA, learned a whole bunch about the foundational theory behind probability theory, lots of super interesting stuff. Finished my PhD at NC State in bioinformatics doing Bayesian neural nets for genome wide association studies. This is Bayesian neural nets before AutoGrad was a thing. GPU computing had just started. So it was like writing CUDA kernels by hand, writing the back prop by hand.

Speaker 1

那时候没有自动求导。每次做出改动，你都得回到你的

There was no autograd. Anytime you made a change, you had to go back to your This

Speaker 0

这真是个好老派的做法。

is a really good back in my day.

Speaker 1

是的，确实是老派做法。因此，我再次学到了很多关于底层深度学习的知识，因为这些是贝叶斯神经网络。还有关于如何训练这些模型的许多细节问题。我当时面临的是双职工问题。节目的老听众都知道我娶了一位医生。

Yeah, it really is back in my day. And so again, I learned a lot about low level deep learning because these were Bayesian neural nets. And just like a lot of the sort of nitty gritty about how to train those models. I was part of a two body problem. Longtime listeners of the show know that I'm married to a physician.

Speaker 1

所以她去全国各地参加儿科住院医师面试时，我也跟着一起去，在她面试的地方寻找博士后机会。波士顿似乎在双职工问题优化方面显然是最佳选择。我的博士导师是约翰·多伊尔，他来自MIT，曾与一个叫扎克·科恩的人共事。所以在博士快结束时，我开始和约翰讨论谁在医学前沿工作，他说你应该去和我的朋友扎克谈谈。

And so she had gone out on residency interviews and pediatrics sort of all across the country. I kind of tagged along and looked for postdocs for the places she interviewed. Boston seemed to be the clear winner in terms of the two body problem optimization. And my PhD advisor was someone named John Doyle who came from MIT and had worked with this guy named Zach Cohane. So towards the end of my PhD, started talking to John about like who's working on the frontier of medicine, he's like you should really go talk to my friend Zach.

Speaker 1

所以当克里斯汀在这里面试时，我去了DBMI（当时还叫CBMI，生物医学信息学中心）和扎克见面，他真是太棒了。他正是我寻找的完美博士后导师，同时也完全理解双职工问题。他说，我明白这是怎么回事，我们很希望你能来。如果你们匹配到了波士顿，告诉我，我很乐意为你提供博士后职位。

So while Kristen was up here interviewing I went by DBMI, it was actually CBMI at the time, the Center for Biomedical Informatics and had to talk to Zach and he was just awesome. He was like exactly the perfect postdoc mentor that I was looking for also like completely understanding about the two body problem. So he was like, I understand how this works. Like we'd love to have you up here. If you guys match in Boston, let me know and love to post you for your postdoc.

Speaker 1

所以我们去了选秀日，你知道，感觉真的像NFL选秀大会，我们实际上带了不同城市的帽子，她的家人在那里，我的家人也在那里，走上舞台，拿到了写着波士顿的信封，于是我立刻拿出手机告诉扎克我们要来了，来到波士顿，和扎克一起度过了三年精彩的博士后时光。那正是医学AI的早期，我记得我博士后第一天报到时就跟扎克说我们需要更多GPU。那是2014年左右，他说神经网络将彻底改变几乎所有事情。他问，我在二月份训练过神经网络，现在有什么不同？我说，这些GPU每块的算力都比2000年代初的国家实验室还要强大，所以情况完全不同了。

So we went to the match day you know really feels like the NFL draft we actually like brought hats from the different cities with us her family was there my family was there got up on stage got the envelope that said Boston and so I immediately got on my phone told Zach that we were coming, came to Boston, spent like three awesome years with Zach doing a postdoc. Really in the early days of medical AI, I remember I showed up on the first day of my postdoc and it was like Zach we need to get some more GPUs. This was like 2014 And he was like, is like neural nets are like going to really change almost everything. He's like, I trained neural nets in the year February, what's different? I was like, well, these GPUs essentially each one of them has like more computing power than the national labs had in the early 2000s like so it's just very different.

Speaker 1

所以他立刻明白了，我们开始在这个领域做很多工作。在我不知道你准备了什么之前，也许再补充几点，我很期待看到你的问题。

So he immediately got it and we started working on lots of stuff in the space. Maybe just a few more points before I actually don't know what you have in store for me so I'm excited to see what questions you have.

Speaker 0

是的，现在让我转个方向。因为我觉得接下来我们会讨论你博士后期间的很多工作。我有很多反应，首先是我觉得我可以模拟出相当一部分安迪的内容，但你确实加入了一些新东西。

Yeah. Let me redirect it now. Cause I think we're going to talk about a lot of your work after your post op for the next part. So I have so many reactions. So the first thing is I think I could simulate a decent amount of that Andy but you definitely put in some new content there.

Speaker 0

我有很多反应，其中一个非常重要：你在讲述这个故事，你最终被诊断出患有百日咳，这段深刻的记忆一直伴随着你，引导你走向医学，并致力于医学AI问题的研究。这里面有很多层面，对吧？你妈妈在这件事中非常参与，她是患者的亲人。

And I have so many reactions. One of them that is really important is you're talking about this story where you had eventually would go on to be diagnosed with whooping cough and very sort of strong memory that stuck with you and that led you to medicine and led you to work on problems in medical AI. There's so many things about this, right? Your mom was very involved in this. It's a loved one of the person who's suffering.

Speaker 0

这一直是一个持续的主题。然后你最初被误诊了，现在的你肯定在想，如果你还没做过，类似ChatGPT的模型会如何处理那种临床表现？我猜它在鉴别诊断中排名会相当高，但是。

This has been a persistent theme. And then you were misdiagnosed first, and now I'm sure the current version of you is wondering if you haven't already done this, how would ChatGPT similar models have done with that presentation? My guess is it would have been quite high on its differential diagnosis, but.

Speaker 1

当然，我肯定想过。实际上，你知道，在我博士后期间我们做的一些早期模型，我们会逐字给出概率变化。百日咳一直是我用来测试它的案例之一。

For sure, I definitely have. And it was actually, I mean, you know some of the early models that we did during my postdoc, like we would give you the word by word change in probability. Whooping cough was always one that I would use to test it.

Speaker 0

所以即使在当前大语言模型之前的原始时代，它就已经排名很高了。

So even in the primitive era before current large language models, it was high up there.

Speaker 1

即使是基于少量数据训练的LSTM也能做对。

Even the LSTMs trained on small data could get it right.

Speaker 0

是的，是的。所以这就是我接下来想深入探讨的。这是你的学术工作，我觉得你正要讲到这个。

Yeah. Yeah. So this is what I wanna dig into next. So this is your academic work. I think you're about to go here.

Speaker 0

所以也许我会试着简单接续你刚才的话题。不得不说，我认为这个回答也非常精彩地解释了你对AI和医学产生兴趣的起源。你在Zach那里做了博士后，非常成功。实际上，也许我可以就此发表些评论。我们曾长时间共用相邻的隔间，以至于——可能我们在节目中提到过——我们太吵闹了，整天开玩笑、互相分心，玩得太开心，最后不得不被分开。

And so maybe I'll try to briefly, I think, pick up where you left off. And I do have to say, I think that was a fantastic answer for also the genesis of your interest in AI and medicine as well. So you did a postdoc with Zach, very successful. Actually, maybe I'll just give my comments on this. We spent a lot of time together with neighboring cubicles to the point that I, maybe we've mentioned this on the show before, but we were so loud and having so much fun just joking and distracting each other during the day that we eventually did get separated.

Speaker 1

确实如此。我的意思是，我们基本上就是把博士后期间的状态搬过来，放两个麦克风在面前，然后开始

We did. I mean, we essentially just took our postdoc and put two mics in front of us and started

Speaker 0

做播客。没错，正是这样。所以现在就有了《AI大查房》，而且非常有趣。是的。

doing a podcast. Yeah. Exactly. So that is now AI Grand Rounds, and and it's a lot of fun. And so yeah.

Speaker 0

所以，你知道，我们聊什么都特别开心，对吧？从勒布朗·詹姆斯到AI，再到

So we, you know, we had a lot of fun talking about everything. Right? From LeBron James to AI to

Speaker 1

深度学习。

deep learning.

Speaker 0

没错。扎克宾果游戏总是很有趣。嗯，我们有些听众可能不知道这是什么意思，但你想不想由你来告诉他们什么是扎克宾果？

Bingo. Zach bingo is always very fun. Well, some of our listeners don't know what that means, but do you wanna do you wanna tell them what Zach bingo is?

Speaker 1

扎克在谈话中经常使用很多固定短语。比如他会用Netflix对你无所不知，而医疗系统却对你一无所知来打比方。所以这会是宾果卡上的一个格子。信息论准则是另一个。所以我们为扎克的这些特色用语制作了一整套宾果卡。

Zach had a lot of phrases that he would commonly use in talk. One was using an analogy of Netflix knowing everything about you, but the healthcare system knows nothing. So that would be a square on the bingo card. Information theoretic criterion was another one. So we had a whole bingo card for Zachisms.

Speaker 0

哦，太棒了。你一下子把我带回了博士后时期。我想从你博士后的工作中提出的一点是，我认为你在这方面非常有先见之明，而且我觉得这成为了你后续学术工作的一条主线。你当时说的很多东西如今我们已视为理所当然，但你在很早之前就提出了，在不到十年前，甚至2017、2018年那会儿，你讲这些听起来更像科幻小说。我记得你当时试图解决的一个问题是设计一个能通过美国医师执照考试（USMLE）的神经网络。

Oh, amazing. You just took me straight back to postdoc. So one of the things that I wanted to bring up from your postdoc work. So I think you were very prescient with this, and I think it's been a through line that in your work, in your academic work afterwards, you were saying things that now we would take for granted, but you were saying them very early on, which I think felt much more like science fiction when you were saying them back in, I don't know, less than ten years ago, even 2018, 2017. So you were trying to solve this problem as I remember it of designing a neural network to pass the USMLE.

Speaker 0

当然我们现在知道所有大型语言模型都能做到这一点，但当时，你做这项工作的动机是什么？还有，当你告诉医生或机器学习研究人员时，他们对这些工作的反应如何？

And of course we know now that every other large language model can do this, But at the time, what was the sort of motivation and also what were the reactions to some of this work when you would tell either doctors or machine learning researchers?

Speaker 1

是的，回想我读研而克里斯汀读医学院的时候，我就觉得计算机显然会比人更擅长诊断。我会告诉她朋友们这个想法，结果在医学生的聚会上非常不受欢迎，因为我觉得自己像是末日使者——我敢说他们当时就是这么看我的。

Yeah, I mean, going back to like when I was in grad school and Kristen was in medical school, it just felt obvious to me that computers were gonna be able to do diagnosis better than people. And I would tell her friends that I was super unpopular at the med student parties because I felt like the bringer of the apocalypse is kind of like, I'm sure how they viewed me.

Speaker 0

为什么这对你来说是显而易见的？你认为是必然的深层原因是什么？

Why was it obvious to you? Like what are the sort of the deep reason that you saw it as inevitable?

Speaker 1

嗯，你可以从第一性原理出发，就这么说：如果你认为认知不存在任何非物理组成部分，那么我们肯定能在计算机中重现这些过程，而且计算机具有人脑所没有的扩展性。所以问题不在于是否，而在于何时。我当时也读了很多未来主义文献，试图比那些内容稍微冷静一些。但如果你画出我们在2010年左右的处境，虽然我不知道具体时间，但计算机在这些推理类型上终将比人类更出色似乎是不可避免的。计算机不会疲倦，能阅读整个互联网，拥有完美的记忆力。

Well, you can go full first principles here and just say like, if you think that cognition is, if there's not any non physical component to that then surely we can recreate those processes in computers and computers have scaling properties that human brains do not. So it was just a question of if not when, this is also like I was reading lots of futurist literature at the time. I was trying to be slightly more sober than like what you would see in that. But it just, if you plot it out where we were in like 02/2010, it just seemed like I didn't know exactly when, but it just seemed inevitable that computers are gonna be better at doing these types of deductions than humans could ever be. Computers don't get tired, they can read the entire internet, they have perfect recall.

Speaker 1

这看起来完全就是能力上的不匹配。所以那是我在2010年研究生早期阶段的核心信念之一。我认为这至今仍然是我主要的核心理念。博士后期间，Zach实际上非常支持这类半异端观点。Zach喜欢思考，他热爱这些想法，对吧？

It just seemed like a complete mismatch in terms of capability. So that was something that was core to my belief like in early grad school in like 2010. And I think that still is mainly a core belief that I have. So then during the postdoc Zach was actually very supportive of like these types of like semi heretical views. Like Zach loves to, think He loves these, right?

Speaker 0

是的，他超爱。

Loves it, yeah.

Speaker 1

于是我们开始合作这个项目，Kristen刚通过第一步考试，正在准备第三步，我记得那是住院医师第一年要考的。她在医学院结束时已经完成了第二步。所以我总是开玩笑说，这完全是因为我缺乏想象力——如果我想让计算机擅长医学，我就让它做她做过的事。参加考试。Raj，你也很好地阐述过这一点：第一步考试是成为医生的必要条件，但不是充分条件。

And so we started working on this project to, and Kristen had just passed step one and was working on like step three, I think was what you take during your first year of residency. She had done step two at the end of med school. And so like, I always joke that it was due to a complete lack of imagination that I was just, if I wanted a computer to be good at medicine, I was just gonna have it do what she did. Take and Raj, you have done a good job at articulating this point too. Like step one is a necessary but not sufficient condition to be a doctor.

Speaker 1

我们经常遇到这类问题。比如第一步考试，我在2017年GTC大会上演讲时提出，第一步应该作为医疗AI的基准。第一步考试具备所有这些特性。人们当时非常热衷于让计算机进行鉴别诊断，但评估鉴别诊断非常困难，获取数据也非常棘手。

We would get these questions all the time. Like step one, I gave a talk at GTC in 2017 that like step one should be a benchmark for medical AI. Step one has all these properties. People were really interested in getting computers to do differential diagnosis, but it's very hard to grade a differential. It's very hard to get the data.

Speaker 1

比如对于什么是正确的鉴别诊断会有分歧，但这些试题都有明确无误的标准答案，你可以获取大量人类在这项考试中的表现数据，这正是训练模型所需的数据类型。所以在我看来，第一步考试不仅是医疗AI，更是通用AI的一个明显基准。我在2017年GTC演讲中阐述了这一点，然后我们开始着手研究。我们获得了罗伯特·伍德·约翰逊基金会的一些资助，我之后会详细说，但我们遇到了'苦涩的教训'——这是我最早接触这个概念的经历之一。想法是：我们可以用从网上整理的数据训练LSTM模型，生成第一步考试样题，并训练它选择正确答案。这就是当时的计划。

Like there'll be disagreement as to like what the correct differential should be, but these questions are canned with unambiguously correct answers, there's a whole bunch of human performance that you can get as to how well humans do on this test, there's exactly the kind of data that you would need to train a model to be able to do this, And so from my perspective, step one just felt like an obvious benchmark for not only medical AI, but AI generally. So I kind of laid this out in a twenty seventeen GTC talk and then we started working on it. We got some funding from the Robert Wood Johnson Foundation and I had lots, I'll talk about this later, but we got bitter lessened and it was my most sort of one of my first exposures to the bitter lesson. The idea was like, we could train LSTMs on data that we had curated from the internet where you can kind of like make example, step one questions and you train it to select the correct answer. And that was the plan.

Speaker 1

我们在Titan X上训练模型，按今天标准算是微型GPU。模型表现不差，能答对约40%的题目，据我所知当时这是该领域最强的结果之一。它们还会做一些有趣的事，比如逐词显示诊断概率。你可以输入一个病例，实时观察它的思考过程——我们常用川崎病为例。

And we were training models on tight nexus, like tiny GPUs by today's standards. And the models weren't bad. They would able to get like 40 ish percent of these questions correct, which to my knowledge at the time was like one of the strongest results someone had had on that. They would do these cute things too, where they would give you the sort of word by word probability of diagnosis. So you could feed in a patient case and you could kind of watch it think in real time about, one of the ones that we always used was Kawasaki disease.

Speaker 1

比如'一岁患儿发热四天，出现草莓舌体征'。一旦出现草莓舌，诊断分布的熵值就会骤降——川崎病的概率急剧上升，其他可能性下降。这样能窥见模型的推理过程很巧妙，但终究这些都是用有限数据训练的非常小的模型，因此性能上限很低。这段经历让我认识到：其一，我长期以来的医疗AI直觉是正确的。

And it would be like a one year old patient has had a fever for four days, evidence of strawberry tongue. And as soon as strawberry tongue showed up, like the differential collapsed in terms of entropy, like Kawasaki would jump way up and all of the other things would go down. So it was kind of neat that you could kind of like get at some of these, see how the model was reasoning, but ultimately they were very, very small models trained on limited amounts of data. And therefore the ceiling of those things were like pretty low. So I think what it taught me from that was one that the medical AI intuition that I had had for a long time was right.

Speaker 1

但同样地，在这个人工智能的新时代，你应该始终致力于解决最通用形式的问题。我原以为自己研究的是一个相对通用的版本，但实际上，事实证明更通用的版本是预测下一个标记。所以当GPT-3和GPT-4出现时，它们基本上就解决了这些问题。就像你提到的，现在人们对模型的能力已经完全不感到惊讶了。

But also that if you in this sort of new era of AI, you should always be working on the most general form of the problem. I thought that I was working on a relatively general version of the problem, but it actually, it turns out that the more general version of that problem is predicting the next token. So when GPT-three and GPT-four came out, they kind of solved these problems out the box. Like you alluded to, like people are now completely unimpressed when a model

Speaker 0

过去几年里目标移动的速度真是令人难以置信，对吧？无论是对于计算机模型中所谓的‘智能’定义，还是对人类的意义而言。一旦人工智能轻松通过了这些测试，就连人类通过这些测试的意义的整个讨论都发生了变化。所以你当时在USMLE基准测试和其他测试上做了所有这些有趣的工作。

It's can't amazing how much the goalposts have moved just in the last few years, right? Both for what it means for the quote unquote intelligence that's in the computer models, but also for what it means for humans. Right? The whole conversation even around the significance for a human passing these tests has changed once AI has cleared them with ease. So you were doing all this interesting work, On the USMLE benchmark on other tests.

Speaker 0

我记得我们一起经历了这个过程，然后你从博士后阶段过渡到建立自己的实验室。于是你成为了哈佛大学公共卫生学院流行病学系的教授。你继续着方法论研究，但你也开始非常专注地工作。你可以告诉我们这方面的起源，虽然我能猜到一些——特别是在新生儿学领域的问题，对吧？将人工智能应用于新生儿学。

And I remember we went through this together, then you went from postdoc to starting your own lab. And so you, became a professor in the department of epidemiology at the Harvard school of public health. And you were continuing, I think your methodological work, but you also started to work very focusly. And you can tell us about the sort of origin of this, although I can guess some of this right on problems specifically within neonatology, right? Applying AI to neonatology.

Speaker 0

那么也许可以告诉我们关于BeamLab的情况，以及作为初级教职人员的生活，你是如何起步的，团队的核心理念是什么。

So maybe tell us about BeamLab and know, life as a junior faculty member, how you got it off the ground, what the philosophy was for the group.

Speaker 1

是的，我很兴奋能加入流行病学系，这再次是出于对人工智能的动机。那是在2018、2019年左右，我于2019年7月1日启动了我的实验室，那时还非常早，是在GPT-3出现之前，当时AlexNet监督学习的范式已经乏力，一切都像是CobbNet问题。所以我们确实在寻找下一个范式，而哈佛流行病学系最擅长的几乎是因果推断。我很兴奋能加入这个系，向Jamie Robbins、Miguel Hernan等世界领先的因果推断专家学习，看看如何将这种因果推理引入人工智能系统。我们在那方面发表了几篇很棒的论文，其中一些在NeurIPS和ICML的研讨会上，探讨了将因果推断与深度学习结合。

Yeah, so I was excited to join the department of epidemiology, again, motivated from the AI perspective. So this was like 2018, 2019, so I started my lab 07/01/2019 and this was still very early, so this was pre GPT-three that we had run out of steam for the AlexNet supervised learning, everything is a CobbNet problem paradigm. And so there really was a sense that like, we're looking for the next paradigm and what the department of epidemiology at Harvard does like better than almost anywhere else is causal inference. And so I was excited to join the department to learn from folks like Jamie Robbins, Miguel Hernan and folks like that who are world leaders in causal inference to see how we can get some of that type of causal reasoning into AI systems. And we had a couple of like really great papers on that, some of which were at NeurIPS and ICML workshops about sort of blending causal inference with deep learning.

Speaker 1

我实验室的应用方面一直专注于新生儿围产医学。再次由于我缺乏创意，Kristen成为了一名新生儿科医生。所以我们最终合作了很多，协作频繁。我认为你和她在我的学术生涯中产生了很大影响。我们在新生儿围产医学的框架下做了很多更传统的流行病学医疗数据科学类的工作。

The applied side of my lab has always been focused on neonatal perinatal medicine. Again, due to a complete lack of creativity on my part, Kristen went on to be a neonatologist. And so we've ended up working together a lot, collaborating a lot. And I think to the credit of you and to her have been big influences on my academic career. We did a lot of more traditional epi healthcare data science kinds of things under this umbrella of neonatal perinatal medicine.

Speaker 1

其中我特别自豪的一项工作是，我们研究了一种被认为可以预防早产的药物，发表了一系列论文。早产是指婴儿在妊娠37周前出生。在美国，大约十分之一的婴儿是早产儿。

One of which that I'm really proud about is we looked at, we have a series of papers looking at this drug that's thought to prevent preterm birth. So preterm birth is babies born before thirty seven weeks of gestation. About one in ten babies in The U. S. Are born preterm.

Speaker 1

这是新生儿发病率和死亡率的最大来源之一。早产是一个大问题。历史上只有一种药物可以用来治疗它，叫做17α-羟孕酮，简称17-OHP或17-P。这种药物的疗效在2003年NICHD的一项试验中得到证实，我稍后可能会再提到，但这是一项NIH试验，长期以来作为复合药物使用，并成为有早产风险女性的标准护理。

It's one of the biggest sources of neonatal morbidity and mortality there is. So like preterm birth is a big problem. And historically there's only been one drug that you can use to treat it. It's called seventeen alpha hydroxyprogesterone or seventeen OHP or seventeen P for short. The efficacy for this drug was demonstrated in a 2003 NICHD trial that maybe I'll come back to in a little bit, but it was an NIH trial that was run administered for a long time as a compounded medication and kind of like was standard of care for women who are at risk for preterm birth.

Speaker 1

所以适应症实际上是复发性单胎早产。如果你有早产史且当前怀的是单胎，就有资格使用这种药物。作为我对进入Zach实验室从事AI和机器学习兴趣的一部分，我们访问了一个惊人的临床保险数据库，其中包含4000万美国人八年的数据。当我们获得这些数据时，我想，我们要用机器学习彻底分析它，预测所有事情，并利用这个庞大数据库创建世界上最好的AI系统。但了解这种热情对于这类数据是多么 misplaced 是有启发性的。

So the indication is actually recurrent singleton preterm birth. So if you have a history of preterm birth and you're currently carrying a singleton, you're eligible for the drug. So as part of like my interest in getting into AI and machine learning in Zach's lab we had access to this amazing clinical insurance database that had the lives of 40,000,000 Americans over eight years. When we got access to this data, I was like, we're gonna like machine learn the crap out of this and we're gonna predict all the things and we're gonna like create the like world's best AI system using this huge database. So it was instructive to understand how misplaced that enthusiasm was for this kind of data.

Speaker 1

在医疗保健中，你学到的一件事是，并非所有数据都能回答所有问题。所以我花了大约一年的时间深入挖掘，找出这些数据的所有缺陷，并试图弄清楚它能支持哪些类型的问题。我开始与Beth Israel的一位母胎医学医生合作，以获取对这些想法的一些临床反馈。她说，你知道，机器学习很棒，但围绕17P药物，在母胎医学中有一个我们不知道如何思考的问题。2011年，一家制药商获得了17P的权利，并开始以品牌名Makena重新销售，起初人们很兴奋，因为它会增加可及性，但他们开始对以前基本上免费的东西收取天价。

So one of the things that you learn in healthcare is like not all data are capable of answering all questions. And so I spent about a year of my life really going deep and figuring out like where all the warts were on this data and trying to figure out what types of questions it could support. I started collaborating with a maternal fetal medicine doc at Beth Israel just to try and like get some clinical feedback on these ideas. She's like, you know, this machine learning thing is great but really there's this question that we have no idea how to think about in maternal fetal medicine around the 17P drug. In the year 2011 a drug manufacturer had acquired the rights to 17P and started reselling it under the branded name McKenna, which at first people were pretty excited about because it would increase access, but they started essentially charging an arm and a leg for something that previously had been essentially free under this sort of brand name McKenna.

Speaker 1

她说，我们很想了解更多关于这方面的经济影响，而且围绕这种药物是否有效存在很多争议。所以我们最终写了一系列论文。第一篇论文发表在JAMA Internal Medicine上，只是看看患者为这种药物支付了多少费用。我们发现，Makena每次妊娠的平均价格约为11,000美元，而复合版本每次妊娠的平均价格为200美元，所以价格增加了约5000%，但可能没有给患者带来有意义的益处。复合版本和品牌版本在结果上基本相同。

So she's like, we would love to like understand more about the economic impact of this and also like there's a lot of controversy around like does this drug even work? So we ended up writing a series of papers. The first paper was in JAMA Internal Medicine just like looking at how much patients are being charged for this medication. And so we found on average the price per pregnancy for McKenna was something like $11,000 and on average the price per pregnancy for the compounded version was $200 So something like a 5000% increase with plausibly like no meaningful benefit given to the patients. Like the differences and outcomes between the compounded and brand name version of the drug were essentially identical.

Speaker 1

没有。然后我们做了一篇后续论文，使用了因果推理的思想。这是在流行病学部门非常有帮助的地方，做了一种叫做目标试验模拟的事情。所以你写下纳入标准、研究设计，就像你做RCT一样。然后使用观察数据尝试在你的数据集中模拟它。当时有一个平行的RCT正在进行，制造商必须做它以获得批准更新。所以我们遵循了那个纳入标准，遵循了那个研究设计，做了目标试验模拟，发现基本上没有益处的证据。

There's no So then we did a follow-up paper where we used ideas from causal inference. And this is where it was super helpful to be in the Epi department to do something called target trial emulation. So this is where you write down the inclusion criteria, the study design just like you're doing an RCT. Then you use observational data to try and emulate that in your data set. And so there was a parallel RCT going on now that the manufacturer had to do to be able to get the approval renewed And so we followed that inclusion criteria, followed that study design, we did the target trial emulation and found essentially no evidence of benefit.

Speaker 1

这是一个非常稳健的发现，经过多种敏感性分析，感觉非常可靠。所以我们在一个围产期期刊上发表了它。FDA随后在这个试验出来后，第二个后续的Makena RCT实际上是阴性的。可能有一些亚组效应。FDA审查了这一点，并决定撤销该药物在市场上的授权，并引用我们的论文作为他们在这个决定中使用的关键证据之一。

And this was like a very robust finding across lots of different kinds of sensitivity analysis and just felt like very solid. So we published that in a perinatal journal. The FDA then after this trial came out to the second the subsequent RCT for Makena was actually negative. There were some maybe subgroup effects. The FDA reviewed this and decided to remove authorization for this drug in the marketplace and cited our paper as one of the key pieces of evidence that they used in this decision.

Speaker 1

所以我从不超级兴奋患者有更少的治疗选择，但我认为这是一个实例，我们可以使用一些数据科学方法来产生临床影响，因为如果一个药物无效，一，我们不应该为它支付10,000美元，二，这些药物也有很多明显的副作用。

So like I'm never super excited about patients having fewer treatment options but I think that this was an instance where we could actually use some of these data science methods to have clinical impact because if a drug doesn't work, one, we shouldn't be paying $10,000 for it and two, there's obvious side effects too with a lot of these drugs.

Speaker 0

所以这里内容非常丰富。也许我希望你能更深入探讨的一点是，你知道，你提到过类似知道哪些数据能支持哪些问题，对吧？如何将不同的数据集与不同的问题相匹配。在某种意义上，我认为这确实是区分许多研究质量的关键，并不是说，当然有些数据集本身就更加优越、出色，并且对很多事情都很有用。但我认为，理解数据与问题之间的这种结合，也许我们还可以把计算能力也纳入其中，这确实是帮助学生取得成功的一种艺术，对吧？

So there's so much there. Maybe one of the things that, I'd love for you to just dig into a little bit more is, you know, you said something along the lines of knowing what data can support what questions, right? How to align different data sets with different questions. And, in some sense, I think this is what really separates the quality of a lot of research, which is not that you're, you know, of course there's, there are datasets that just in general are superior and wonderful and useful for a lot of things. But I think knowing that marriage between the data and the question, and maybe we can also add the compute to the mix of this, is really the sort of art of setting up a student for success, right?

Speaker 0

或者与学生合作，提出一个可能富有成果且有趣的创意。所以，你成功建立了实验室，在新生儿学领域发表了这些有趣的论文，继续你在因果推断和人工智能方面的方法论工作，并且还在扩大实验室规模、招募学生。也许你可以稍微反思一下。然后我想转向你现在的工作以及你最近的动态，但在此之前，你可以反思一下你是如何招募学生、指导他们，并为实验室的学生设计项目的。那么，你的理念是什么？

Or working with a student to come up with an idea that is likely to be fruitful and interesting. And so, you got your lab off the ground, you're publishing these interesting papers in neonatology, continuing your methodological work around causal inference and AI, and then growing your lab recruiting students. Maybe you can just reflect a little bit. Then I want to transition to your work now and what you're up to these days, but you can reflect just before that on how you approached recruiting students and then mentoring them and designing projects for students in your lab. So like, what was your philosophy?

Speaker 0

我认为很多初级教职员工也对这类事情很感兴趣。

I think a lot of people who are, you know junior faculty are interested in this kind of stuff as well.

Speaker 1

是的，让我先说明一下，当我开始建立实验室时，无论是我个人生活还是整个世界都处于一个特别疯狂的时期。我于2019年7月1日开始，我们的第一个女儿在2019年7月25日出生，所以实验室刚成立整整25天。七个月后，新冠疫情爆发，日托中心关闭，完全一片混乱，我妻子被征召到麻省总医院和布莱根妇女医院承担大量ICU工作，因为她当时还是研究员，所以在儿科ICU医生被调往普通ICU时，她负责了很多儿科ICU的工作。所以，我对实验室头两年的记忆有些模糊和不完整，但让我试着给你讲讲我是怎么考虑的。我把我的实验室视为一个地方，让对医疗保健深感兴趣的计算机科学家可以来这里研究重要的临床问题。

Yeah, let me first qualify it and say that when I started my lab, was a particularly crazy time in my life personally and in the world generally. I started in 07/01/2019, we had our first daughter 07/25/2019 so full twenty five days into starting my lab. Seven months later COVID happened, daycares shut down, complete insanity, my wife got conscripted into a lot of ICU service at MGH and Brigham because she was still a fellow so she was covering a lot of the pediatric ICU's while the pediatric ICU docs got conscripted into the normal ICU. So it's, I have a partial and fuzzy recollection of essentially the first two years of my lab but let me try and give you a sense of how I thought about it. I viewed my lab to be a place where computer scientists who are deeply interested in healthcare could come and work on important clinical problems.

Speaker 1

所以，再次强调，我认为17p项目就是一个很好的例子。那是由我小组的一名学生乔·哈基姆领导的，他是一名HST学生。HST在这个播客中已经多次被提到，他接受的是生物工程培训。所以他很有兴趣产生临床影响。他会去和MFM医生会面，真正深入地探讨如何将你们的临床定义映射到数据实际能回答的问题上？

So again, I think that that's the 17p project is a good example of that. That was led by a student in my group, Joe Hakim, who is an HST student. So HST has been featured a lot on this podcast already, but a bioengineer by training. And so he was interested in making a clinical impact. He would go and meet with the MFM doctors and really like dig deep into how can I map your clinical definitions onto what the data can actually answer?

Speaker 1

这通常像是一种五五开的分工。所以来自纯计算背景的人也会指导很多住院医师、医学生之类的人。我确实认为通常必须对人工智能和机器学习有非常真诚的兴趣。所以我会据此筛选人员，如果我们只是想，比如我的实验室没有做很多RNA测序分析。它必须是一个你可以用大型医疗数据集回答的临床问题，理想情况下还需要某种机器学习方法。

It was always kind of like a fiftyfifty split. So folks who are coming from purely computational background also supervised a lot of residents, medical students and people like that. I do think that there often had to be like a very sincere interest in AI and machine learning. So I would sub select on folks for that, that if we just wanted, like we weren't doing a lot of like RNA seq analysis in my lab. It really had to be something, a clinical question that you could answer with a large healthcare dataset and ideally some type of machine learning approach.

Speaker 1

在日常工作方面，我倾向于相对放手。我们会以更有组织的方式做一些事情。比如我们有一篇关于近似推断的NeurIPS论文，那是因果推断的一个子集。我们以两周为一个冲刺周期来运行那个项目，比如，接下来两周我们要做什么。我们会回头检查，我认为那个项目做得很好。

I tend to be relatively hands off when it comes to day to day. We would do some things that would be organized in a much more structured kind of way. So we have a NeurIPS paper on something called proximal inference, which is a subset of causal inference. And we ran that very much like in two week sprints where like, here's what we're gonna do for the next two weeks. We're gonna check back in that project I think was good.

Speaker 0

机器学习会议对于激励那些冲刺项目很有帮助，对吧？

The machine learning conferences are great for encouraging those sprints, right?

Speaker 1

是的，没错。但大多数情况下，我也倾向于让研究生从‘铲子就绪’项目开始，就像是：这是项目，这是成功的标准，去执行吧。第二个项目更像是：这是一个可能值得关注的大致主题。到了第三个项目，他们应该能够自己提出并回答他们感兴趣的问题。所以我确实尝试通过一开始给予更多结构，然后逐渐减少结构，来让人们慢慢适应研究。

Yeah, exactly. Yeah, but for the most part, I tended to also start graduate students on a shovel ready project that was like, here's the project, here's what success looks like, go and execute it. Project number two is much more of like, here's a general theme of things that might be interesting to look at. And then the idea was by the third project, they'd be able to just ask and answer their own questions that they found interesting. So I did try and sort of ease folks into research by giving them like a little bit more structure in the beginning but then being like less structured at the end.

Speaker 0

是的，我认为你在这方面考虑得很周到。我们刚和现在在谷歌的Anil聊过——这集还没播出，但很快就会播。Anil是你的第一批博士生之一，对吧？我觉得你在他们职业发展轨迹上所投入的心思和关怀，虽然看似放手，但实际上是在让他们成长，培养独立性。

Yeah and I think you're very thoughtful about that and we just spoke with Anil who's now at Google. Actually this hasn't aired yet, but the episode will air soon. And Anil was one of your first PhD students. Right. And I think, the thought and the care that you put into sort of the arc of their career, while sort of being hands off, but also, so letting them grow, letting them develop independence.

Speaker 0

但同时给予他们一些半监督的结构，以便他们能够成功，我认为这一点非常清晰。除了我们俩当然必要时调侃你之外，他的话也清楚地表明了这一点。好了，现在我想深入谈谈你在Lila的工作。让我试着来梳理一下。

But also giving them a little bit of structure semi supervised so that they can succeed, I think is very clear. And it was very clear in what he said other than us both trolling you, of course, as necessary. All right. So I want to dig into your work now at Lila and so, okay. So let me try to frame this.

Speaker 0

去年，你从哈佛教授的职位上休假，去了一家新公司担任CTO。我想那家公司当时还处于秘密运营状态，现在公开了，叫Lila。也许我们可以从你做出这一转变的思考过程开始。考虑到哈佛目前的情况，有些人会说你看上去像个天才。我知道你的水晶球很灵，Andy，但我认为你的决定先于当前的资金危机。

So last year you went on leave from your Harvard professor job to become the CTO of a new company. I think the company was in stealth at the time now out of stealth called Lila. And maybe we can start with your thought process behind the move. So given what we're going through at Harvard right now, some would say you look like a genius. I know you have a very good crystal ball Andy, but I think your decision to move proceeded the current funding crisis.

Speaker 0

你在学术上做得非常出色。你在指导学生，你在和卓越的Beam医生——你的妻子——一起围绕新生儿学的AI构建研究愿景。为什么要离开？为什么要离开哈佛？

You were doing great academic work. You're mentoring students, you're building a research vision around AI for neonatology with the superior Doctor. Beam, your wife. Why leave? Why move from Harvard?

Speaker 1

是的，这是个好问题。首先我要说明，这不是我第一次去创业。这又要归功于你——在我开始HSPH的教职工作之前，我休了一年假去帮助创办一家公司，这其实也是Raj的建议。我当时在面试教职，也收到了来自一家叫Flagship的公司的创业邀请。Flagship是一家风险投资公司，但他们不是向外投资，而是用资本来孵化和分拆公司。我曾作为顾问参与过Flagship的一个孵化过程。我提供咨询的那个项目从Flagship获得了资金，即将作为一家公司启动。

Yeah, it's a good question. Let me first preface by saying that it wasn't my first time to go and do a startup so again something that I owe to you is before I started my faculty job at HSPH I took a year off to help start a company and this is actually again advice from Raj, I had been interviewing for faculty jobs, I had had an offer to join a startup from a company called Flagship. Flagship is a venture capital firm that instead of deploying capital in external companies they use that capital to incubate and spin out companies. So I'd been part of an incubation process at Flagship as a consultant. The thing that I had been consulting on got funding from Flagship and was going to go get started as a company.

Speaker 1

它专注于利用机器学习进行蛋白质工程。我们能否使用机器学习模型，以更具针对性的方式，使蛋白质疗法变得更好、更快、更便宜？超级有趣，我之前从未考虑过蛋白质工程，但有机会涉足其中。所以我当时觉得这是个非常有趣的想法，但很难拒绝这份教职工作，而你很棒地提出了‘为什么不两者兼顾？’

It was centered on using machine learning for protein engineering. So can we use machine learning models to make protein therapeutics better, faster, cheaper in a more targeted kind of way? Super interesting hadn't thought about protein engineering before but got to do that. And so I was like kind of towards like this is like a really interesting idea but it's hard to turn down this faculty job and to your credit you're like why not both?

Speaker 0

为什么不两者兼顾？

Why not both?

Speaker 1

为什么不两者兼顾？于是我实际上将教职工作的开始日期推迟了一年，并加入了现在被称为Generate Biomedicines的公司，担任机器学习创始负责人，帮助组建团队，构建了许多早期模型，制定了战略，全职在那里工作了一年，之后又以兼职身份待了四年，我想我的头衔像是驻校教授，所以我可以一周有一天做创业的事情，另外四天做教授的工作。Generate后来我认为相当成功。他们有300名员工，至今已筹集了约10亿美元。他们有两种药物进入临床试验，这对我来说是对技术最重要的验证，因为他们确实制造出了似乎有效的实物。

Why not both? And so I actually delayed the start date of my faculty job for a year and joined what is now known as Generate Biomedicines as the founding head of machine learning, helped build the team, helped build a lot of the early models, helped build the strategy, was there full time for a year, remained in a part time capacity for four years after, I think my title was like professor in residence, so I got to do the fun like do the startup thing for one day a week and then the professor thing for the other four. Generate has gone on to be I think pretty successful. They have 300 people, they've raised something like a billion dollars to date. They have two drugs in clinical trials and that to me is the most important validation of the technologies that they actually have made real things that seem to work.

Speaker 1

所以我有一段非常愉快的经历。我认为那段经历为我加入Lila降低了风险。有了这个前言，让我来回答你的问题。2024年2月，我正在休陪产假，相比之下，我们的第二个孩子出生要容易得多。没有新冠疫情，没有实验室启动的烦恼。

So I had a super pleasant experience. So I think that experience was de risking for me to go join Lila. So with that preface, let me answer your question. So I was out on paternity leave in February 2024 and it was like our second child by comparison was much easier. So no COVID, no starting lab.

Speaker 1

我们第一个孩子出生时，我还在帮助启动Generate。那实际上是我们生活中一段相对平静的时光。这给了我一个反思的机会。回到我们之前关于我动机的对话，人工智能一直是我感兴趣的东西，医疗保健一直是一个非常重要且有趣的领域，但始终像是我的沙盒，而不是我的主要动机。我之前提到过，我开始教职工作时还是GPT-3之前的世界，我们还没有看到规模化的好处。

I was also helping start generate at the time our first was born. It it was actually just kind of like a peaceful time in our lives. And that gave me like kind of a chance to reflect. Going back to our like earlier conversation about my motivations, AI has always been the thing interested in, healthcare has always been a super important and interesting domain but has always kind of been the sandbox versus the thing that has been my primary motivation. I mentioned before I started my faculty job that it was a pre GPT-three world, We still hadn't seen the benefits of scale.

Speaker 1

当时似乎仍然有可能在学术环境中进行前沿人工智能研究。而你知道，在2024年我反思时，很难再主张没有大量资源就能进行前沿人工智能研究。也可能是因为，我已经做了五年的教职工作，又开始有了创业的冲动。于是我开始四处打听。

It still seemed plausible that you could do frontier AI research in an academic setting. And, you know, in 2024, when I was reflecting it became hard to make that case that you could do frontier AI research without significant resources. It could also be that, you know, I had done the faculty thing for five years, and I was getting the startup itch again. And so I started to ask around.

Speaker 0

我确实记得一些短信，大意是，我简直不敢相信能够实际编码并花些时间在上面是多么有趣。我想你又在做一些编码了，对吧？

I do remember some texts along the lines of, I can't believe how much fun it is to actually be able to code and to just spend some time. I think you're you're doing some coding again. Right?

Speaker 1

实际上那段时间，我也在做一些木工活。我现在用的桌子也是

I was actually the period. I was actually doing some woodworking too. The desk that I have now is also

Speaker 0

太棒了。太棒了。太棒了。

Amazing. Amazing. Amazing.

Speaker 1

是的。总之，我开始四处了解，发现有一家叫FL97的公司。Generate是FL57，意思就是旗舰公司给它们编的序列号。所以FL97是他们孵化的第九十七家公司。

Yeah. So so anyway, I started looking around and there was a company called FL97. So Generate was FL57 that just means they give them like serial numbers at flagship. So FL97 is the ninety seventh company that they've incubated.

Speaker 0

所以Generate和Lava之间隔了40家公司。

So there were 40 in between Generate and Lava.

Speaker 1

在五年时间里。对，对，对，没错。而且并非偶然，我的两个博士生就在FL97，我之前也给FL97做过一段时间顾问。所以我对它有所了解，但旗舰公司的进化轨迹非常有趣，它们从一个起点开始，随着时间的推移不断演变、调整和适应，最终可能会走向完全不同的方向。所以FL97——我之后就叫它Lila了——开始聚焦到一个让我觉得非常非常有趣、非常非常有吸引力的方向上。

Over the five year period. Yeah, yeah, yeah, exactly. And so not accidentally, two of my PhD students were at FL97 and I had been advising FL97 for a little bit. So I kind of had an idea of what it was, but flagship companies have these very interesting evolutionary trajectories where they start in one place and over time they tend to evolve and change and adapt and then they end up somewhere potentially very different. So FL97 and I'll just call it Lila from here on out started to converge on something that was really, really, really interesting and really, really compelling to me.

Speaker 1

我想弄清楚的是：旗舰以创建生物技术公司闻名，这家公司会是另一家生物技术AI公司，还是真正以AI为核心的公司？意思是，这家公司的主要目标是创造AI，还是利用AI来创造资产、分子之类的东西。于是我去那里待了一段时间，见到了更多他们组建的团队成员，会见了领导团队，最终确信这是一家非常令人兴奋的AI公司。我会再谈谈Lila背后的理念。从资源角度来说，它的限制会更少。

And what I wanted to understand is Flagship is known for making biotechs, is this going to be like another biotech AI company or is this like actually an AI first kind of company? Meaning like is the primary goal of this company to create AI or is it to use AI in service of creating an asset, a molecule, something like that. And so I got to go spend some time at I got to meet more of the team that they had built, got to meet the leadership team and just became convinced that this was a really exciting AI company. And I'll talk a little bit more about the thesis behind Lila. That was going to be less constrained from a resource perspective.

Speaker 1

所以我们计划投入大量资源到GPU上，投入重金创建所需的数据来开发新型AI模型。这感觉像是我过去十年思考的许多不同方向的集大成之作。我常说，我有一份很棒的工作，严格来说现在还有——我正在休假，但做教授是一份很好的工作。尽管最近三个月发生了很多事（截至2024年3月），它依然是一份好工作，有非常支持的系里和学校，优秀的同事，世界级的学生。所以我并不是不开心，只是诚实地思考我想研究的那类问题在学术界是否能够实现。

So we were going to commit significant resources to GPU's, serious resources to creating the data that you need to create new kinds of AI models. And it felt like kind of the culmination of a lot of the different things that I had been thinking about over the last ten years. And so I always say that like, I had a fantastic job, I technically still do I'm on leave, but like being a professor is a great job. What's happened in the last three months, not withstanding as of March 2024, it was a great job, had a wonderfully supportive department, wonderfully supportive school, great colleagues, world class students. And so this wasn't that I was unhappy, it was just trying to be honest about if the kinds of problems that I want to work on were accessible to me in academia.

Speaker 1

而且我认为，当我头脑清醒时，很难再辩称我能在学术环境中研究我真正想解决的问题。

And I think when I was clear eyed it just became hard to argue that I could work on the problems that I wanted to in an academic setting.

Speaker 0

所以我听到的信息是：学术界留住安迪·比姆斯的能力与我们能获得的GPU数量成正比。这是人才的另一种规模定律，不过你倒是没问题。从你的描述中我理解到，你专注于AI本身而非AI应用（或不仅仅是应用），并且需要大量算力，需要大量GPU来完成你的使命。

So the message I'm hearing is that our ability in academia to retain Andy Beams scales with the number of GPUs that we have access to. It's another scaling law for talent, but so you're okay. So from that description, I understand that you're focused on AI first, which means not applications of AI or not just applications of AI, but AI itself. And that you need a lot of compute. You need a lot of GPU's to accomplish your mission.

Speaker 0

或许你可以告诉我们这个使命是什么？你试图实现什么目标？目前进展如何？以及未来几年你如何看待这个方向的发展？

And maybe you can tell us what that mission is. Right? What are you trying to accomplish? Where are you and sort of where do you see this going for the next couple of years?

Speaker 1

是的，先补充说明一下刚才的观点：学术界确实存在值得研究的有趣问题。每个人都有自己的效用函数，我并不是说学术界没有有趣的事情发生，只是恰好我感兴趣的问题在学术背景下难以开展研究。那么我们在LILA做什么呢？

Yeah, and just to preface or circle back on that last point, there are interesting problems that you can work on in academia. Everyone has their own utility function. So I'm not saying that there's nothing interesting happening in academia. It just happens to be the ones that I find interesting are hard to work on in an academic context. So then what are we doing at LILA?

Speaker 1

我们认识到过去五年的扩展范式取得了巨大成功——比如通过USMLE考试可以说是这些扩展范式的意外成果，但它们可能也正在趋于饱和。虽然预训练或者说算力壁垒依然有效，但要获得相同收益仍需将计算规模扩大一个数量级，或许我们无法持续扩展到1000万块GPU。

We recognize that the scaling paradigms of the last five years have been enormously successful. Again, talking about passing the USMLE as a sort of an accident of these scaling paradigms, but they're probably also saturating. So I think it's clear that pre training or maybe the scaling wall still works. So power walls are kind of a hell of a thing that to get the same amount of benefit you still have to scale the compute by an order of magnitude. So it might just be that we can't keep scaling it up to 10,000,000 GPUs.

Speaker 1

因此人们正在寻找新的扩展范式。我们认为模型需要具备自主生成标记的能力，能够提出并回答人类从未问过的问题。大语言模型可以理解为人类知识的绝佳索引——人类创造的所有内容都能被大语言模型获取。

And so people are looking for new scaling paradigms. We think that models need the ability to essentially generate their own tokens. And so the models need the ability to ask and answer questions that people have not asked before. Large language models, one way to think about them is that they are a wonderful index into human knowledge. So everything people have created is accessible to a large language model.

Speaker 1

它们能以这种模糊的方式进行检索，擅长模糊模式匹配和获取人类既有知识。但戴上我的因果推断帽子来说，我们知道仅凭大量观测数据能做的事情有限。若要真正论证世界运行机制，基于观测数据的模型最多只能告诉你哪些假设与既有数据兼容。要从假设集中筛选，要么像因果推断那样做强假设，要么实际进行实验。这正是我们在Lila发展的核心洞察：如何将这些基于全网训练的强大语言模型，与可扩展的实验平台结合，让它们能够突破文献中的未决问题，提出前所未有的研究问题。

They're able to access it in this very fuzzy kind of way where they can do fuzzy pattern matching and they're really great at accessing Cuban knowledge. Again, putting my causal inference hat back on though, we know that there's limits to what you can do with what amounts to a big pile of observational data. So if you actually want to make a claim about how the world works, the best thing that a model with observational data can do is tell you kind of like what hypotheses are compatible with the data that has seen before. And the only way to essentially pick from a set of hypotheses is to either make strong assumptions like we would do in causal inference or actually do the experiment. And so that sort of key insight is what we're developing at Lila is how do we take these very powerful large language models that have been trained on the entire internet and pair them with a scalable experimental platform that will let them break ties that exist in the literature and ask questions that have never been asked before literature.

Speaker 1

所以再次强调，你我都很清楚这一点，你在博士后期间的工作确实专注于这个方向。科学文献并非事实的记录，而是在不同激励结构下进行辩论的记录。人们被激励去发表他们研究结果中最有利的版本，被激励去淡化那些与他们试图支持的假设不一致的内容，然后会有论文发表来反驳这些观点。因此，在我看来很明显，如果你只阅读那些论文，是无法推导出2050年的科学的——你必须进行增量实验步骤，以文献为基础逐步推进，但我们不会有什么神谕，GPT-6也不会成为能够仅凭当前科学知识就从第一性原理进行推理的神谕。所以，如果你接受这个基本前提，那么你的直接结论就是：我们如何将其与可扩展的实验平台连接起来，让模型能够超越我们当前所知。

So again, like you and I both know this, I mean, what you did during your postdoc was really focused on this. The scientific literature is not a record of facts, it's a record of a debate under varying incentive structures. So people are incentivized to publish the most charitable version of their findings, they're incentivized to downplay things that are inconsistent with the hypothesis that they're trying to support and then there will be papers published that sort of rebut that. So I think it's obvious to me that you're not going to be able to derive science that's happening in 2050 if you have just read those papers that you're going to have to do incremental experimental steps that builds upon what has been done in the literature, but we're not gonna have some oracle, GPT-six is not gonna be some oracle that can just reason from first principles conditioned on what we know currently in science. And so if you buy that sort of like basic premise, then to your sort of immediate conclusion is okay, how do we connect this with a scalable experimental platform so that the model can push beyond what we know now.

Speaker 1

这本质上就是我们在Lila所构建的，其中一半团队专注于可扩展实验，另一半专注于人工智能。但再次强调，我们将实验平台视为我们训练模型的新令牌生成器。

So that's in essence what we're building at Lila, where half of the house is focused on scalable experiments, the other half of the house is focused on AI. But again, we view the experimental platform as a new token generator for the models that we're training.

Speaker 0

所以这些是房间里的机器人吗？实验端具体是什么？

So are these robots in a room? What is the experimental side?

Speaker 1

是的，它们是房间里的机器人。它们是房间中的无实体机械臂。我们有一个系统，一个自动化实验平台，如果你熟悉实验操作，它们通常在板上进行，比如96孔板或384孔板。这些板通过我们拥有的平面电机系统磁悬浮，并可以快速移动到这条大导轨旁边。

Yeah, they are robots in a room. They are disembodied robot arms in a room. We have a system, so we have an automated experimental platform where if you're familiar with how experiments work they often work on plates. So either like a 96 well plate or a three eighty four well plate. These plates magnetically levitate over this planar motor system that we have and they can zip next to this big rail.

Speaker 1

有放置实验设备的台面，机械臂会从导轨上拿起板子，放入设备中，完成后放回导轨，然后板子可以快速移动到下一站。我对这个的抽象理解是，我们实际上在构建一种新型计算机，这个平面电机系统、这条导轨本质上就像PCI总线，而我们所做的是在现实世界中将新设备连接到这个通用PCI总线上。想法不是拥有几个能完成人类工作的站点，而是拥有成栋的站点，能够大规模进行实验。然后它确实开始感觉像是一种新型实验集群，我们可以将其与传统GPU集群配对。

There are benches with experimental equipment on it and the robot arm will pick the plate up off the rail, put it in the piece of equipment, when it's done put it back on the rail and then the plate can zip off to the next stop. So the abstraction that I have for this is that actually we're building this new kind of computer and that this planar motor system, this rail is essentially like a PCI bus and what we're doing is hooking new devices on to this generalized PCI bus in the real world. And the idea is not to have a couple of these stations that can do what humans do is to have buildings of these stations that can do experimentation at scale. And then it really does start to feel like a new kind of experimental cluster that we can pair with a traditional GPU cluster.

Speaker 0

你认为，实际上，你喜欢这种描述吗？因为我突然想到，这感觉像是在寻找一种新的缩放定律，或者你在探索一种新的缩放定律。你同意吗？你喜欢这种描述吗？这公平吗？

Do you think, actually, do you like the characterization? Because it occurs to me that it kind of feels like you're looking for a new scaling law or you're searching for a new scaling law. Do you agree with that? Do you like that characterization? Is that fair or not?

Speaker 1

这正是我描述它的方式。

Literally exactly how I describe it.

Speaker 0

好的，太棒了。我可能听过你对我这么说，我只是在复述而已。

Okay, amazing. I've probably heard you say that to me and I'm just regurgitating it.

Speaker 1

字面意思上完全就是那样

Literally exactly how

Speaker 0

做而且

do And

Speaker 1

是的，再说一次，就像它是基于我们过去三个月在大语言模型中经常看到的现象建立的。它们依赖验证器，而对于某些类别的验证任务，自然必须是验证者。因此，我们正在构建一个大型可扩展的基于自然的验证器，以便这些模型能够学习对我们尚未真正理解的事物进行假设和推理。我们认为这将开启一种新的扩展范式，就像纯粹基于互联网数据训练的计算开启了第一种扩展范式一样。再重新表述一下，科学也受制于苦涩的教训，我们正试图弄清楚它在哪些方面受制于苦涩的教训。

yeah, again, like it's just built on the recognition that we've seen this a lot in large language models over the last three months. Like they rely on verifiers and for some class of verification tasks, nature has to be the verifier. And so we're building a big scalable nature based verifier so that these models can learn to hypothesize and reason about things that we don't really understand yet. And we think that that will unlock a new scaling paradigm in the same way that pure compute trained on internet data unlocked the first scaling paradigm. Just to sort of rephrase, science is subject to the bitter lesson and we're trying to figure out in what ways it is subject to the bitter lesson.

Speaker 0

所以你说的另一件事，以及你做这件事的动机之一是，我们现有的范式，现有的大语言模型，它们能做很多事情，对吧？很多事情，我想你用的词可能是副产品。这些自动完成模型的创造者并没有意图让它们能够解决非常棘手的鉴别诊断或通过USMLE考试或其他事情。这只是从计算规模加上应用于模型的其他训练中涌现出来的。但在描述那个现有范式时，我想你说我们正在饱和或者说它正在变得饱和。

So one of the other things you said and sort of the motivation for what you're doing is that our existing paradigm, existing large language models, they can do so many things, right? So many things that, and I think that the word you use or you might've used was by products. There's no intent by the creators of these auto complete models that they'd be able to solve differential diagnoses that are very tricky or pass the USMLEs or other things. This just emerged from the sort of scale of compute plus the other training that was applied to the models. But in describing that existing paradigm, I think you said that we are saturating or that it's getting saturated.

Speaker 0

我想知道，这是你对这些模型随时间性能表现的经验观察吗？还是更多是一种基于模型训练方式及其创建过程的必然性第一原理推论？是它们正在某种程度上饱和，比如它们在基准测试上不再变得更好。它只能变得好这么多。就像那种‘LLMs将无法做X、Y、Z’的最初火花是从哪里来的？

And I wonder, is that a empirical observation that you have of the sort of performance of these models over time? Or is it more of a sort of inevitability first principles deduction that you're making from the way in which the models are trained and the procedure that goes into creating them? Is it a they're sort of saturating, like they're not getting better at the benchmarks. It can only get so much better. Like where's that sort of initial spark of LLMs will not be able to do X, Y, Z coming from?

Speaker 1

它们在需要长期推理和规划的基准测试类别上表现非常差。一个例子是解决复杂的数学问题，解决复杂的编程问题，大体上，仅仅预训练的模型在这些需要人们称之为测试时计算或推理能力的事情上从来不是最顶尖的，已经有人研究过了。

They were very bad at classes of benchmarks that required long term reasoning and planning. An example of this is solving complicated math problems, solving complicated programming problems by and large simply pre trained models are never best in class at those things that have what people call test time compute or reasoning capabilities have taken a look.

Speaker 0

就像是O系列，对吧？GPT的O系列或类似的东西。

It's like the O series, right? The O series from GPT or equivalence.

Speaker 1

再次，就像戴上我的主持人帽子向一些技术背景不强的人解释，预训练只是预测下一个词。或者下一个标记，你可以用非结构化数据做到这一点。推理模型则是通过提供反馈来训练，反馈表明他们的解决方案有多好。在某种意义上，预训练模型被训练来预测平均响应，而推理模型被训练来产生响应。

Again, like putting my host hat back on to explain to some folks who aren't as technical on this, pre training is simply predicting the next word. So you, or the next token, you can do this with unstructured data. Reasoning models are trained when by giving feedback that indicates how good their solution was. In some sense, pre trained models are trained to predict the average response. Reasoning models are trained to produce response.

Speaker 1

所以，我们已经从预训练的一种范式转向推理测试时间计算，我认为这是一个很好的基础案例，说明预训练已经饱和了。

And so the fact that we've already shifted from one paradigm in pre training to reasoningtest time compute, think is a good base case for saying that pre training has saturated.

Speaker 0

好的。然后你提到的另一点，我也想让你多谈一点，我认为这反映或类似于我们几集前与Vijaya的对话。你知道，他有着非常成功的学术生涯，然后转行到工业界和风险投资。对吧。我认为他提出了一个非常有说服力的理由，尽管他自己离开了，但为什么学术界有些问题可能只有在学术界才能解决。

All right. And then one of the other points that you brought up that I'd also like you to just talk about a little bit more kind of, reflects, I think, or resembles some of our conversation, with Vijaya on day a couple of episodes ago. And so, you know, he had this very successful academic career and then he transitioned to industry and to venture capital. Right. And I think he made a very compelling case for, despite himself moving for why, there are certain problems in academia that are likely only solvable within academia.

Speaker 0

所以，也许我对你的挑战，Andy，就像是，尽管你自己已经休假并去了工业界，在Lila工作，你能为留在学术界的情况做一个钢铁侠式的辩护吗？有哪些类型的问题你应该留在学术界解决，尽管当前有资金危机？

And so maybe my challenge for you, Andy, is like, can you steel man the case of sorts while having yourself gone on leave and gone to industry and at Lila, can you steel man the case for staying in academia? What are the types of problems that you should stay in academia, current funding crisis notwithstanding to be able to solve?

Speaker 1

是的，我认为有几个答案。一个是经典的，即那些没有立即或明显商业价值的问题。所以像AI现在有点相反，这就是为什么它如此资源密集，因为有一股将AI所有事物商业化的淘金热。所以，那些没有立即商业价值且更长期视野的问题完全在范围内，这将包括很多理论，无论是机器学习还是其他类型的理论。我认为在医疗AI领域，学术界特别适合的一个位置是评估。

Yeah, I think there's a couple of answers to this. One is the classic, which are problems that have no immediate or obvious commercial value. So like AI is kind of the opposite of that now which is why it's so resource intensive where there's a gold rush to commercialize all things AI. So classes of problems that have no immediate commercial value and are more long term horizon things totally are in scope and that would include a lot of theory both machine learning and other kinds of theory. I think a place in medical AI specifically that is uniquely well positioned for academics is evaluation.

Speaker 1

所以，就像实际上进行评估，看看AI是否带来患者益处，任何JMAI，你和Zach显然一直处于前沿。我认为一旦离开学术界，那里有很多不正当的激励。所以，拥有可信的审计员，能够知道技术是否真的有效，显然也是学术界应该努力做的事情，这对公共健康和患者益处有着巨大的影响。

So like actually doing the evaluation to see if AI results in patient benefit, any JMAI, you and Zach have been at the forefront of this obviously. I think that there's a lot of perverse incentives for that once you get outside of academia. And so having trusted auditors who can know whether or not the technology actually works is also obviously a great thing for academics to be working on that has like huge public health and patient benefit that goes along with it.

Speaker 0

好的。在进入闪电回合之前，也许最后一个问题，我对此非常兴奋，你描述Lila时提到它有两种不同的组成部分，对吧？比如有一个实验性的方面，机器人通过磁力移动这些板子，那是什么

All right. And maybe one last question before the lightning round, which I am so excited about is you described Lila as having these sort of two different components, right? Like there's an experimental side, robots that are moving plates around on these magnetic, what are the

Speaker 1

平面电机系统。

things Planar motor systems.

Speaker 0

那是平面电机系统。然后你们还有一个机器学习方面的部分，对吧？那是在开发模型、进行训练和计算工作。你认为在Lila已经面临的最大挑战是什么？以及未来一两年内，让你夜不能寐、需要专注的关键任务是什么，以推动Lila发展并实现你们的愿景。

That's Planar motor And the then you have a sort of machine learning side, right? That is developing models and training and doing computational work. What do you see as sort of the biggest challenges that you faced already at Lila? And what is the sort of sort of key task and the thing that's keeping you up at night maybe to focus on for the next year or two and growing Lila and achieving your vision.

Speaker 1

在现实世界中构建东西很难，比如实际构建硬件。我的意思是，这可以追溯到我早年担任电气工程师的时候，真正让东西在现实世界中运作是很困难的。而且有各种边缘情况，比如移动这些板子，它们里面有液体，这意味着液体会晃动，可能导致位置稍有偏差。所以当机械臂去抓取时，它可能处于稍微不同的位置，因此有成千上万个类似的‘最后一英里’挑战需要我们去解决。不过，我认为哲学上的挑战在于，所有的自动化实际上都是为人而创造的。

Building stuff in the real world is hard, like actually building hardware. And I mean, this goes back to like early days of my life when I was an electrical engineer and actually getting stuff to work in the real world is hard. And there are all these like edge cases like moving these plates around, they have liquid in them which means that they slosh, which means they could be slightly off. So when the robot arm goes to pick it up, it's in a slightly different position and so there's like thousands of last mile challenges like that that we're solving. I think the like philosophical challenge though is that all of automation is actually created for people.

Speaker 1

因此，我们真正关注的一个问题是，当没有人在回路中时，自动化实验会是什么样子？我之前提到我们在平面电机系统旁边放置了工作台，这暗示这些系统实际上仍然是为人类设计的，因为人们需要一个站立的地方。他们需要一个大约肩高的位置。所以，实际上还有第二层次的挑战，即如果我们只是让云端的人工智能来运行这些实验流程，而实验室里实际上不需要有人站在那里，我们该如何重构这些实验流程。这可能是最大的挑战，我们在这方面已经取得了很大进展。

And so one of the things that we're really focusing on is like what does automated experimentation look like when there are no people in the loop? The fact that I said that we put benches next to this planar motor system is a hint that these were actually still designed for people because people need a place to stand. They need a place that needs to be approximately like shoulder height. And so really there's like a second order set of challenges about how do we actually refactor a lot of these experimental workflows if they're just gonna be run by an AI in the cloud and you actually don't have to have humans in the lab standing there. That's probably the biggest challenge and we're making lots of progress on that.

Speaker 1

我们花了很多时间思考这个问题。但如果我思考最核心的挑战，那会是，鉴于我们在做一些前所未有的事情，我们如何从第一性原理重新思考这些问题。在人工智能方面，都是传统的问题，比如我们不再使用O2（针对哈佛大学使用计算集群的人），我们不再使用Slurm。我们在Kubernetes集群上进行非常复杂的训练流程，这些集群有各种编排功能。我们现在正在扩展到数千个GPU，大规模训练本身就非常困难。

We spent a lot of time thinking about it. But if I was thinking about really the core challenge, it would be like, how do we rethink these things from first principles given that we're doing something that really hasn't been done before. On the AI side of things it's all the traditional things like we're not on O2 anymore for those folks at Harvard who use the computing cluster there, we're not using Slurm. We're doing very complicated training flows on Kubernetes clusters that have all these orchestration things. We're scaling up to thousands of GPUs now and just like training at scale is very difficult.

Speaker 1

我们正在构建一套独特的训练能力，让模型能够使用广泛的工具，而实际协调所有这些也是非常具有挑战性的。相对于现实世界带来的挑战，我对人工智能方面的挑战感觉相对好一些，但我有信心我们能够解决这两类问题。

We are building like a unique set of training capabilities that gives the model access to a wide set of tools to use and actually orchestrating all of that together is also pretty challenging. I feel relatively better about the sort of AI challenges versus the challenges posed by the real world but I'm confident that we'll be able to solve both sets.

Speaker 0

你是否发现你招聘的人员或招聘方式与你在学术实验室时非常非常不同，还是有一些相似之处？

Are you finding that the folks that you recruit or the way in which you recruit is very, very different than your academic lab or are some similarities?

Speaker 1

有相似之处。我认为科学人工智能的使命本身就为我做了很多招聘工作。当我谈到我们试图开发一种能够运行整个科学循环的人工智能，提出假设、测试它们，然后根据结果更新其理解时，这对很多人来说是一个相当有吸引力的信息。我们还用行业资源和薪酬方案进行招聘，而不是学术薪酬方案，这也让事情变得更容易。

There are similarities. I would say that the mission of AI for science does a lot of the recruiting for me. When I talk about, we are trying to get an AI that can run the entire wheel of science to come up with hypotheses, test them and then update its understanding based on the result. That's a pretty compelling message to a lot of people. We also are recruiting with industry resources and compensation packages versus academic compensation packages that also makes things easier.

Speaker 1

但我仍然认为我们招聘到了很多相同类型的人，他们既接受过硬科学或医学科学的交叉培训，又在人工智能技术方面非常深入，能够真正实现——再次强调，这个播客中反复出现的主题是让多种专业知识共存于同一个大脑中。我们发现这种类型的人对我们也很有利，并且也觉得这个使命相当有吸引力。

I still think that we get a lot of the same phenotypes though of people who are cross trained neither some hard science or medical science who are also very like deep on the technical side of AI and can really make, again, like the recurring theme on this podcast is having multiple sets of expertise live in the same brain. And we found that that phenotype has also been good for us and also finds the mission pretty attractive.

Speaker 0

好的，我认为这是一个绝佳的时机，可以过渡到我超级、超级兴奋的环节，那就是闪电回合。

All right, I think that's a great moment to transition to what I am super, super excited for, which is the lightning round.

Speaker 2

天啊。

Oh my god.

Speaker 0

所以安迪，你知道所有规则，让我们开始吧。你准备好了吗？

So Andy, you know all the rules, So let's let's dive in. Are you ready for this?

Speaker 1

我没准备好，但我们开始吧。好吧。

I'm not, but let's do it. Alright.

Speaker 0

所以第一个问题是给你的兄弟安迪的。谁是那个已经让你害怕的GOAT？谁是篮球史上最伟大的GOAT？勒布朗·詹姆斯还是迈克尔·乔丹？

So this first one is for your brothers, Andy. Who is the GOAT that already that already got you scared? Who is the GOAT that is the greatest of all time of basketball? LeBron James or Michael Jordan?

Speaker 1

哦，老兄。我我觉得当你说史上最伟大时，这不只是统计上的考量，更是文化影响力的考量。从这个角度，我不得不选MJ。我认为MJ在全球和美国改变了篮球的方式，而勒布朗虽然统计数据上可以称得上史上最伟大，但我觉得他没有乔丹那样的文化影响力。

Oh, man. I I have I feel like when you say greatest of all time, this is not just a statistical consideration, it's a cultural impact consideration. And I think by by that, I'm gonna have to go MJ. I think that MJ I think that MJ changed basketball both globally and in The US in a way that like LeBron, while having a statistical claim to greatest of all time, I feel like doesn't have the cultural impact that Jordan had.

Speaker 0

我不同意你的观点，但这没关系。我们可以继续下一个问题。阻碍大型语言模型成为临床医学中值得信赖的前线决策支持工具的最大障碍是什么？

I'm gonna disagree with you, but that's fine. Can move on to the next question. What is the single biggest barrier preventing large language models from becoming trusted frontline decision support tools in clinical medicine?

Speaker 1

我认为是可靠性的混合问题，比如明显的幻觉问题，而且它们仍然只是部分解决方案，不像人类那样全面。这方面正在改善，但它们还不能使用工具，比如如果需要打电话给某人，它们仍然做不到。所以还存在一些与准确性和可靠性无关的能力差距，我认为这些差距需要填补，才能完全取代许多前线决策服务。

I'm gonna say that it's a mix reliability, so the obvious like problems with hallucination and that they still only represent a partial solution in a way that a person does not. So this is getting better but they can't use tools like if they have to pick up a phone and call someone, they still can't do that. So there are still like capability gaps that are unrelated to accuracy and reliability that I think still need to be filled before they could totally replace a lot of frontline decision making services.

Speaker 0

好的。我们的下一个问题。哪个是最难的工作？这是我最喜欢的问题之一，因为我们问过扎克，我想也问过拉里·萨默斯。但哪个是最难的工作？

Alright. Our next question. Which is the hardest job? And this is one of my favorite ones asked now since we've we've done it to Zach and I think also to to Larry Summers. But which is the hardest job?

Speaker 0

哈佛大学的终身教职教师、《AnyJM AI》的创始副主编，还是Lila的首席技术官？

Being tenure track faculty at Harvard, founding deputy editor of AnyJM AI, or CTO of Lila?

Speaker 1

哦，老兄。你这是想让我惹麻烦啊，拉吉。我选，我觉得，终身教职教师，因为这不仅是你自己抱负的重压，你还在人们职业生涯中非常脆弱的阶段遇到他们，我总是觉得自己内化了太多这些。如果论文被拒，无所谓，我有论文，但对学生来说，这些感觉像是里程碑式的决定，所以我觉得因为这个原因，拒绝对我的打击比另外两个工作中的日常挑战更大。

Oh, man. I trying trying to get me in trouble here, Raj. I'm gonna go with, I think, I feel like tenure track faculty just because it's not only the weight of your own ambitions, it's you're meeting people at this very vulnerable stage in their career and I always felt like I internalized a lot of that. If a paper gets rejected, whatever, I have papers but for students those feel very like monumental decisions and so I feel like that the rejection hit me harder for that reason than like day to day challenges in the other two jobs you mentioned.

Speaker 0

很棒的回答。而且我认为，再次体现了你作为导师的深思熟虑，你能将自己的视角与学生的视角区分开来。我完全同意。这是一个非常非常重要的时期，每件事、每个结果都感觉极其重要。因此，这确实很有挑战性。

Great answer. And I think, again, reflecting how thoughtful you are as a mentor too, that you can separate sort of your perspective from your students. And I totally agree. It's a very, very important time and each thing feels very, very important, each outcome. So that is challenging to navigate.

Speaker 0

好的。如果你不从事人工智能，你会做什么工作？在这里跳出思维定式想一想。

All right. If you weren't in AI, what job would you be doing? Think outside the box here.

Speaker 1

嗯，我可以告诉你我在幼儿园时说的话。在幼儿园时，我告诉我妈妈，我要么想当捉鬼敢死队员，要么想当垃圾清运工，两者都是高尚的职业，但我想我现在不会这么回答了。如果我不在AI领域，我其实觉得可能会是某种作家。我在本科时一直很喜欢写作，一直很喜欢写论文。

Well, so I can tell you what I said in kindergarten. And in kindergarten I told my mom that I either wanted to be a ghostbuster or a trash man, both noble professions, but I don't think that's what I would answer now. If I wasn't in AI, I actually think some kind of writer. I always liked writing in undergrad. I always liked writing essays.

Speaker 1

我在博士后期间也写过一点博客。我觉得某种像Substack作家之类的工作，会是我自然而然喜欢做的事情。

I blogged a little bit during my postdoc. I think some type of like substack writer or something like that would be something that I would naturally enjoy.

Speaker 0

是的。不错。我觉得我猜不到这个。所以我…我喜欢这个答案。非常棒的回答。

Yeah. Nice. I don't think I would have guessed that. So I I like it. Very great answer.

Speaker 1

最后一个是职业级的，猜对了。

Pro smash for last professional smash correctly.

Speaker 0

那就是我本来会猜的。是的。好了。下一个，也是我们的最后一个问题。如果你能和一个人（已故或在世）共进晚餐，那会是谁？

That's what that's what I would have guessed. Yeah. Alright. Next next and our our last question. If you could have dinner with one person dead or alive, who who would it be?

Speaker 1

我也思考过这个问题，我有两个答案，因为我知道这个问题迟早会来。第一个是偏理智的答案，我认为会是戴维·福斯特·华莱士。除了《苍白的国王》，他写的每本书我都读过。他的许多作品我反复阅读过，我真的很想知道他对未来的看法，因为他在许多小说和非虚构作品中都很大程度上预测了未来。所以我觉得这会是我的选择。

I've also thought about this and I have two answers because I knew this one was coming. The first one is just an intellectual one and I think it would be David Foster Wallace. I've read every book he's ever written except for The Pale King. I've read lots of his stuff over and over again and I would just be dying to know what he thinks of the future that he largely predicted in a lot of his fiction and non fiction. So I think that would be it.

Speaker 1

情感上的答案是我的祖母，我的外婆。我妈妈的妈妈是我们家族的大家长。她大约十五年前去世了，她总是那种会直言不讳告诉你事情真相的人。当你得到外婆的认可时，那就像是最好的认可，因为她是一位坚强的女性，是大萧条时期的孩子，经历了两次世界大战，在那个很多女性都不上大学的年代上了大学，她就像是我们家族的基石。所以我真的很想和她共进晚餐，然后问她，你觉得呢，戴娜？

The sentimental answer is my grandmother, my nana. My mom's mom was the matriarch of our family. She died about fifteen years ago and was always the one that would like tell you exactly how it was. And like when you got Nana's approval that was like the best approval that you could get because she was like a tough lady, a child of the depression lived through two world wars, went to college at a time when a lot of women weren't going to college and it was just like a sort of the bedrock of our family. And so I would just love to have dinner with her and kind of be like, so what do you think Dana?

Speaker 1

然后她会告诉我她的真实想法。

And then she would tell me exactly what she thought.

Speaker 0

好吧，恭喜你安迪·比伯，你成功通过了闪电问答环节。表现非常出色。干得漂亮。好了，安迪，我这里可能还有一两个最后的问题。

Well, congratulations Andy Beeb, you have survived the lightning round. Passed it with flying colors. A great job. All right. So Andy, I just have maybe one or two last questions here.

Speaker 0

更宏观一些，一些总结性的想法，也许你可以留给我们一些智慧的话语。首先，我们在播客中经常讨论这个问题，听众们知道我们喜欢引用规模假说来思考大型语言模型。我们已经在LLMs的背景下讨论过它，但也在你在Lila所做的工作背景下讨论过。也许为了这个问题，我可以把范围限制在医学和语言模型在医学中的应用上。所以，这就是模型当前的状态，对吧？

More big picture, kind of some concluding thoughts that words of wisdom that you can leave us with maybe. The first is, we talk a lot about this on the podcast and listeners will know that we like to invoke the scale hypothesis as a way to think about large language models. We've already talked about it in the context of LLMs, but also in the context of the work that you're doing at Lila. And maybe I can restrict this for the sake of this question, just restrict it to medicine and applications of language models in medicine. So there's the sort of current state of the models, right?

Speaker 0

就像如果我们冻结时间，冻结模型的技术能力，然后问它们能做什么，它们在医学领域将能够做什么。我们都有关于它们在诊断、治疗和医学其他应用方面能做到何种程度的预测。然后还有这个问题的另一个版本，那就是这些模型将如何继续进化？它们会继续进化吗？它们会变得好多少？

Like if we were to just freeze time, freeze the technical capabilities of the models and ask, what they can do, what they'll be able to do within medicine. We all have predictions for where they are with respect to, the things that we have to do in diagnosis and treatment and other applications in medicine. And then there's another version of this, which is how will these models sort of continue to evolve? Will they continue to evolve? How much better will they get?

Speaker 0

我的问题是，你能再次为我们打开你的水晶球，在医学领域内，根据你需要的地方引用规模假说，预测一下未来几年LLMs在医学领域将会发生什么吗？

And my question is, can you open up your crystal ball for us again within medicine and just forecast invoking where you need to, the scale hypothesis, what is going to happen with LLMs in medicine over the next few years?

Speaker 1

是的，所以再次回到原点，我认为克里斯汀在医学院和博士后期间我讨论的那类问题已经解决了。也就是说，即使症状表达不完整、部分呈现，甚至需要患者逐一列出，估算出给定症状下疾病的正确条件概率这个问题大体上已经解决了。我认为在思考医疗领域的规模假设时，我们应该考虑两类问题：一是自动化我们已经知道如何做的事情，二是完成我们还不知道如何做的事情。所以我会再次说，诊断属于自动化我们已经知道如何做的事情。

Yeah, so again, just to come full circle, I consider the class of problems that I was talking about when Kristen was in med school and during postdoc to be solved. So the estimating the correct conditional probability of disease given symptoms, even if the symptoms are expressed partially, you know, in an incomplete way, even if they need to be listed from the patient for that problem to be largely solved. I think I'm gonna put two classes of problems that we should think about when thinking about the scale hypothesis for healthcare. It's automating what we already know how to do and then doing things that we don't know how to do yet. So I would again say diagnosis is automating things that we already know how to do.

Speaker 1

就像我的儿科医生错过了百日咳诊断那样，其实有人知道怎么做，他只是碰巧搞错了。未来一到三年内改变医疗保健的最大因素将是通用计算机使用。我们已经从Operator、OpenAI、云代理等工具中看到了这类技术，但AI能够可靠地使用鼠标和键盘，可能解决了AI领域剩余未解决问题的90%，因为你可以让它坐在工作站上输入指令。你之前问过是什么阻止它成为一线决策工具，我认为当AI能可靠地使用鼠标和键盘执行长期任务时，这个问题就解决了。所以，AI目前无法完成的医疗和保健操作方面的问题，我认为会通过持续扩展它们目前在计算机使用方面的能力来解决。

My pediatrician missing the whooping cough thing like someone knew how to do that he just happened to get it wrong. The big thing that will change healthcare over the next one to three years is generalized computer use. So we've seen tools like this already from operator, from OpenAI, from cloud agents, but the ability for AI to reliably use a mouse and keyboard solves probably like 90% of the remaining unsolved problems in AI because you can just have it sit on a workstation, enter orders. And the question that you asked a while back about what stops it from being a frontline decision tool, I think that's solved when AI can use a mouse and a keyboard reliably to do long time horizon tasks. So the sort of like operational aspects of medicine and healthcare that AI can't currently do, I think will be solved by continuing to scale what they're currently doing with computer use.

Speaker 1

请继续。

Go ahead.

Speaker 0

你会包括能够操作学术系统吗？

Would you include being able to operate a scholar one?

Speaker 1

那可能属于AGI级别的难题。就像嫉妒一样难。是的。

That might be AGI hard. Just like envy hard. Yeah.

Speaker 0

最后的边疆。最后的你是

The final frontier. Final You're

Speaker 1

过时的网络软件，是九十年代某个周末写出来的。所以，是的，我认为通用计算机使用可能还差一两个数量级的计算能力才能变得可靠，但我猜明年内会解决。当通用计算机使用问题解决时，就像第一步作为副产品被解决一样，许多其他操作任务也会作为副产品被解决。所以我设想那会像是，而且我知道很多前沿实验室正在大力推动这一点。然后还有那些未知的未知问题。

out of date web software that was written in the nineties over a weekend. So yeah, so I think that generalized computer use, we're probably one or two orders of magnitude of computing power away from making that reliable but I'm guessing that will be solved over the next year. And when generalized computer use is solved just like step one was solved as a by product many other of these operational tasks will also be solved as a by product. So I imagine that being like, and I know a lot of the frontier labs are pushing pretty hard on that. Then there's like the unknown unknowns.

Speaker 1

所以，就像有些疾病我们不知道如何诊断，不知道如何治疗，甚至不知道如何分类。公共卫生和医学领域有很多东西就像是暗物质一样。我认为这些问题的解决不是靠规模化，而是需要类似我们在Lila和其他人正在做的事情——利用AI来加速科学进程。我觉得这可能需要五到十年的时间。我们将需要新的测量设备。

So like there are some diseases that we don't know how to diagnose that we don't know how to treat that we don't even actually know how to classify. Are many things in public health and medicine generally that are just kind of like dark matter. I think those are gonna have to be unlocked not by scaling but something more akin to what we're doing at Lila and what other people are doing where we're actually just using AI to make science go faster. I think that like that is gonna be more of a five to ten year time horizon. We're gonna need new measurement devices.

Speaker 1

所以我知道你深有体会，Raj，但从电子健康记录中获取的患者生理数据分辨率就像1940年代的黑白电视一样。我们真正需要的是像100英尺8K画质那样的高分辨率。现在我们还没有这样的技术。因此，我们没有高分辨率的患者生理特征描述，我们需要能够实现这一目标的新设备。我常常想起我们在实验室里研究但未完成的一项工作——通过可见光光谱进行无创测量的能力。

So I know that you know this deeply Raj, but the resolution that you have on a patient's physiology from the electronic health record is like the difference between a black and white television in the 1940s. What we actually need is like a 100 foot eight k picture. Now we just don't have that. So we don't have high resolution characterizations of patient physiology and we'll need new devices that will enable that. I always think about one of the things that we worked on but never finished in the lab was the ability to do non invasive measurements via visible light spectroscopy.

Speaker 1

也许这不是正确的技术，但某种其他形式的大规模患者生理特征描述，然后将其输入到当前规模化范式下训练的AI中，感觉像是下一个重大突破。AI加速科学进程将在五到十年内间接改善医学和医疗保健，但很难确切知道这将如何展开。

So maybe that's not the right technology but some other sort of like mass characterization of patient physiology that you can then feed to the AIs that are being trained in the current scaling paradigm feels like the next big unlock. AI making science go faster will indirectly make medicine and healthcare better over like a five to ten year period, but it's hard to know exactly how that's gonna play out.

Speaker 0

回到你之前的一个回答，你认为评估在未来几年内是否会仍然是关键的前沿学术任务？

Going back to one of your earlier answers, do you think that evaluation is gonna remain sort of the critical frontier, critical academic task for the next few years?

Speaker 1

是的，我认为如此。我认为必须这样，因为将会有很多东西上线。整合和实施科学也是如此。比如如何将这些技术改造到Epic系统中，或者进行彻底改造，以便能够引入这类技术，这也感觉是非常必要的事情。

Yeah, I think so. I think it has to, just because there's gonna be a lot of stuff coming online. Integration and implementation science too. Like how do you either retrofit Epic with this stuff or do a gut reno, so that you can get this type of technology in that also feels like a super necessary thing to have happen.

Speaker 0

我认为另一个领域也将起飞。这当然是一个非常古老的学科，但我认为人机交互以及人类和机器如何协同工作，也将在未来几年内对AI医学和将这些工具安全有效地引入临床变得非常重要。好了，最后一个问题，Andy。我们都在各种学术大查房等场合向医生听众做很多关于AI的演讲。

Another area that I think is also going to take off. It's of course a very old discipline, but I think human computer interaction and how humans and machines will work together is also, sort of poised to, to, to really become very, very important for, for AI medicine and for actually getting these tools safely and effectively into the clinic in the next couple of years. All right. Last, last question, Andy. So we both give a lot of talks about AI to doctor audiences to various academic grand rounds kind of settings.

Speaker 0

在这些场合，我被问得最多的问题之一是，医生们事后走过来对我说，老兄，这发展得太快了，听了你的演讲很棒，但是，我应该学习什么？我可以用什么武装自己？我应该学习什么以便在未来几年做好准备？然后还有另一种怀疑态度，我其实很喜欢这种态度，就像，好吧，你展示的一些东西很酷，但有很多炒作，也有很多，你知道的，虚假的东西。我如何分辨什么是真实的？

And one of the questions I get asked the most in these settings is physicians come up afterwards and they say, man, this is moving so fast, you know, great to hear your talk, but, what should I study? What can I arm myself with? What can I learn so that I'm ready for this in the next couple of years? And then this other sort of scepticism, which I honestly really like, just like, okay, some of the stuff you showed was cool, but there's a lot of like hype and there's a lot of like, you know, BS that's out there. How do I tell what's real?

Speaker 0

什么是不真实的。所以也许考虑到医生、临床医生以及听众们，除了收听这个播客之外，你们有什么建议来保持与时俱进，特别是针对医疗服务提供者和临床医生，如何跟上人工智能的发展。

What's not real. And so maybe thinking about the physician, the clinician, listeners in the audience, what's your advice for staying up to date other than listening to this podcast, but staying up to date with AI for providers, for clinicians specifically.

Speaker 1

选择一个前沿模型，并在你的日常生活中每天都使用它。所以花20美元订阅ChatGPT，花20美元订阅Claude，选一个，或者两个都选，但要用它来完成各种任务，看看它在哪些地方会出错。比如你要做墨西哥卷饼食谱，就问ChatGPT要一个食谱。如果你在计划下次度假的活动，就问模型该怎么安排。如果你想为演讲生成一张图片，就用这些模型的图像生成功能来实现。

Pick one of the frontier models and use it every day of your life. So pay the $20 for ChatGPT, pay the $20 for Claude, pick one, pick both, but use it for tasks and see where it breaks. So if you're gonna make a taco recipe, ask ChatGPT for a taco recipe. If you're looking for things to do on your next vacation, ask the model how you would do that. If you're trying to generate an image for a talk, use the image generation capabilities in these models to do it.

Speaker 1

我认为这里没有单一的教学资源会很有帮助，因为技术发展太快了，你只有通过日常使用它完成任务，形成肌肉记忆，才能了解它能做什么、不能做什么。所以有朋友问我这个问题时，我就说，直接用ChatGPT吧。如果你觉得你不会用，就试着用它来完成你想做的任务，要么你会发现它确实能做到，要么你会了解到这些模型存在盲点。你会学会识别它何时产生幻觉，何时可以信任它，何时不能信任它。然后，是的，你会对模型的工作原理以及它们何时失效有一种直观的感觉。

I think that there's no single source of pedagogy that's gonna be helpful here because the technology moves so fast and you're gonna get a sense of what it can do and what it can't do just by getting the muscle memory of using it for tasks in your everyday life. So I've had friends ask me this and I'm like, just use ChatGPT. Like if you think you can't use it, try and use it for the task that you wanna do and either you'll learn that actually can do that or you'll learn that, okay, so here's a blind spot in these models. You'll learn when it hallucinates, you'll learn when to trust it and when to not trust it. And then, yeah, you'll kind of get a sort of an intuitive sense of how the models work and when they don't work.

Speaker 1

我认为花时间去读《注意力就是一切》论文或者RLHF论文可能不是最好的利用时间的方式。我觉得更好的方式是拥有一种直观的、通俗的理解，了解这些模型是如何工作的。而最好的方法就是每天练习使用它们，看看它们何时会出错。

I think it's probably not like the best use of time to go read like the attention is all you need paper or the you know, the RLHF papers. I think it's much better to have, like, an intuitive, like, folk understanding of how the models work. And the best way to do that is just to practice every day with them and and see when it breaks.

Speaker 0

太棒了。好的。我觉得这是一个很好的结束点。安迪，我得说这次访谈真是太精彩了。虽然我已经对你很了解，但在这期节目中我还是学到了很多。

Amazing. Alright. I think that's a great note to end on. Andy, I just got to say this was fantastic. I know a lot about you already, but I learned a lot more on this episode.

Speaker 0

非常感谢你参加AI Grand Rounds。

Thanks so much for coming on AI Grand Rounds.

Speaker 1

这是我职业生涯的亮点。谢谢你邀请我，拉吉。

Highlight of my career. Thanks for having me on, Raj.

Speaker 2

本版权播客内容来自马萨诸塞州医学会，未经马萨诸塞州医学会事先书面许可，不得复制、分发或用于商业目的。如需重复使用新英格兰医学杂志集团播客，请访问NEJM网站上的许可与授权页面。

This copyrighted podcast from the Massachusetts Medical Society may not be reproduced, distributed, or used for commercial purposes without prior written permission of the Massachusetts Medical Society. For information on reusing NE Group podcasts, please visit the licensing and permissions page at the NEJM website.