The Stack Overflow Podcast

Making computer science more humane at Carnegie Mellon

Episode Summary

On this episode of the podcast, Ben and Ryan chat with Martial Hebert, dean of the School of Computer Science at Ryan’s alma mater, Carnegie Mellon University. They talk about the changing landscape of computer science education, Martial’s almost 40 years of research in computer vision and robotics, and the importance of crossing disciplines in computer science.

Episode Notes

While he’s been the dean of the School of Computer Science since 2019, Martial started his career at Carnegie Mellon University way back in 1984. 

Ben covered LIDAR inventor Velodyne while at the Verge, while Martial has LIDAR’s ancestor, the laser rangefinder, which was state of the art in 1986. 

Martial’s area of research is in computer vision and perception for autonomous systems. Since 1985, he’s been a part of 388 publications

Congrats to Lifeboat winner mx0 for their answer to the question “How to use a reserved keyword in pydantic model?”

Episode Transcription

[intro music plays]

Ben Popper Attention, developers. This summer, join the worldwide developer community in Berlin for the We are Developers World Congress. From July 26th to 28th, experience over 300 speakers on 12 stages, outdoor activities, parties, and more. Use discount code Stack Overflow for 20% off your ticket at worldcongress.dev.

BP Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I am Ben Popper, Director of Content here at Stack Overflow, joined by my colleague and collaborator, Ryan Donovan, Editor of our blog, maestro of our newsletter, and Carnegie Mellon University alumni. Ryan, you recently went to a big anniversary, didn't you? 

Ryan Donovan Just went to my 25th Reunion where I got to connect with some folks, including today's guest, Martial Hebert, Dean of the School of Computer Science at Carnegie Mellon. 

BP Very cool. Well Martial, I'm glad you and Ryan connected, and welcome to the program.

Martial Hebert Well, thank you for having me. Very nice to meet you, Ben. I met Ryan at the celebration that he was at, and it's very nice to see you today. 

BP So take us back a little bit. What was your entry into the world of computer science? 

MH Well, I actually started in math originally. As you can tell, I'm not from Pittsburgh originally, so I started in math somewhere else. And then I got into computer science for my PhD, my doctorate, looking at research in computer vision– in fact, one of the first systems for 3D object recognition. And then I came here as a postdoc a long time ago in ‘84, and in fact that was as part of the very first program that DARPA sponsored on what is now self-driving. It was not called self-driving at the time, of course, it was autonomous vehicles. The name of the program was ALV: Autonomous Land Vehicles. We had built at CMU, a very large Chevy truck which was equipped with cameras, computers on board and all this. And you can imagine, given the date, how clunky the whole thing was but it was the early research in those topics. And then I stayed in the Robotics Institute all my career after that working in computer vision. 

BP Very cool. Ryan obviously went to CMU, but I have a little bit of crossed paths I think with that in some ways. When I was a reporter at The Verge, I did a big story about Velodyne and their lidar, which was born out of the DARPA challenge which they had won one year and then that went on to create this entire lidar company sort of based on that success. It became kind of integral to computer vision for autonomous vehicles. And then my first job after I was a journalist, I actually worked at the drone company DJI, which those drones obviously have a lot of computer vision and obstacle avoidance. And one of the people that I worked with there ended up going to CMU. I think he was getting his PhD there after joining DJI in China. So definitely a very well-known school, especially in those areas, and something that I came across in my career as well. 

MH Yeah, in fact, as a side comment since you were interested in Velodyne, if you come to CMU, I will show you a historical piece which is what we call the rangefinder which was the very first laser-type sensor– the ancestor of what you see on the Velodyne and other similar sensors. And that was in ‘86. It was measuring 64 walls, 256 columns, or 16,000 points twice a second. Now it's millions of points, of course, but remember that was 1986. So that was the first sensor. There were only two copies of that sensor– one at CMU, one at what is now Lockheed Martin, and I think we have the only one left. 

BP Oh, very cool. I'm actually out in California. I was here for Apple's WWDC event and they're ready for the next era which apparently is spatial computing. I guess we'll see. But their headset has lidar on the bottom and it's able to see your hands and kind of understand where you are in space, so that stuff is becoming more and more mainstream, or more and more integrated into consumer technologies, I should say. 

MH Yeah. And in fact the 3D processing is kind of what I worked on, at least at the beginning of when I was at CMU. So that's what I did as a researcher and then I became head of the robotics, which is one of our seven departments in the school, and finally Dean. Somebody has to do it, I guess.

RD Right, might as well be you. So I took a few computer science classes back in my day, Fundamentals 1 and 2 and AI, and those were taught in Java and I think it was really, really early, maybe the first ones taught in Java. Can you talk about how the computer science program has changed over the years? 

MH Yeah, it has changed, first of all, in the languages, of course, that are used and taught, including Python and other other things. And it has evolved to include still the fundamentals. In other words, the classes that you mentioned of course have evolved, but the spirit is still there to have the strong fundamentals in terms of the theoretical underpinnings of computer science which we believe are essential to move forward with the more applied aspects. A lot of new courses, a lot of new offerings in new disciplines that have been at the edge of computer science, so AI for example, but machine learning, human computer interaction, many other disciplines that have evolved over time around computer science. So a lot more diverse set of offerings than when you were there. 

RD You have the Software Engineering Institute now as well. Does that have a different mission or domain from the computer science school? 

MH Yeah, the SEI is a completely different mission. It's what is called the FFRDC, Federally Funded Research and Development Center. I remember all those acronyms. And yes, it's a completely different entity, which is basically a standalone research organization separate from the School of Computer Science. Now, we do have joint projects and activities, in particular in AI and cybersecurity, but it is a separate organization at CMU.

BP I saw an article recently saying that in the United States at least, among students who are applying to college and going on to maybe a master's or a PhD, increasingly they're opting for the sciences and for technology over humanities. In the time that you've been working at CMU, have you noticed a shift in what students are interested in? And when they arrive, do you find that they are particularly conversant or interested in pursuing certain technologies having grown up more as digital natives?

MH Well, certainly. In fact, this is reflected in the increase in the amount of teaching that we do at the School of Computer Science across campus. We basically service the entire campus, and now the amount of teaching has increased dramatically. So that's an illustration basically of what you just said– the interest. And more recently a lot of interest –not surprisingly, none of that is really surprising– in AI and machine learning. And again, from across campus, not just students in the most direct STEM field, but across campus. 

BP Right. Well, the English and History professors want to know how they can detect which essays are written by AI. Are you helping them with that? 

MH Yeah. In fact, there's a lot of experimentation going on on campus on how to deal with those new technologies. Not necessarily in the sense of detecting and prohibiting and that kind of thing, but in terms of using it and maybe seeing in a more positive fashion how we can embrace it and use it in a positive way. Now this is very much a work in progress like everywhere else, but that's definitely a direction that we are going. 

RD Are there any courses that are sort of new to the past few years that didn't exist before, sort of like the ones that have changed with the changing technology?

MH Well, one set of courses, it's not just one course, but one set of courses that certainly has become prevalent over the past 10 years, let's say, but certainly accelerating, is courses that have to do with ethics and impact and alignment of technology, all of those different topics having to do basically not about the technology itself but about the impact of the technology. This is of course a requirement now in all of our programs, so this is another difference, by the way, from when you were there that those ethics related courses are now required across the board. So that's one area certainly that is becoming much more important, and that's good, of course. So that's certainly one example of that.

BP I think that's really interesting that you point that out. There's been obviously a lot of discussion in the news about what is the potential impact of AI and to what degree it should be regulated, and it's striking to me that in those companies, they began often or have been working for a long time with people who are doing ethics, impact, bias testing, red testing. When I was in college and social media was becoming sort of the dominant emerging technology, nobody was talking about that stuff. It was just, “Here it is, out in the wild and it's going to grow.” So it definitely feels as though people understand now that with technology at internet scale and speed, that's something you need to have kind of baked in. So it’s interesting that you also have that baked into the curriculum. 

MH Yeah. And it's not just in the educational curriculum, it's also in the research. We have several important centers and initiatives across campus, responsible AI and other related initiatives. And the important thing and the really big challenge to me is not just to look at in general terms the impact and thinking about regulation, guardrails and things like this, but the real challenge is to look much more deeply into the technical aspects. What does it mean really in terms of the design of an architecture or a training schedule et cetera, setting parameters and so forth. So can we say things that are much more formal? Can we develop things that have some theoretical aspects? Let me give you one example. One of our faculty members looks at one particular type of AI system, and those are the ones that use as input, as data, human generated evaluation data. So think about ratings, think about peer review, think about even in the medical domain, medical questionnaires, things like this. Those are all human generated evaluation data. And the problem with that data is that it's very biased because we're all biased, of course. It's imperfect, it has all kinds of defects in the data. And this has to be taken into account explicitly in the design of machine learning systems. It's not something that you can do after the fact, trying to test it and figuring out the impact and so forth. So the interesting question is, can we formalize that? Can we put some kind of formalism around it, mathematical, algorithmic, et cetera, to formalize that? And those are very difficult directions of research because they involve not just traditional computing type of work, but it involves also concepts from social science, et cetera, because it has to do with human behavior as well. 

BP Right.

RD I mean, I think one of Carnegie Mellon's strengths was a lot of the cross department collaboration, like HCI came out of that, I believe. And we talked with a group recently, talking about academic papers, kind of treating it as a humanities course. And in fact, all three of these programmers came out of the humanities originally, sort of delving into the history. Do you think that approach is needed more in computer science to kind of understand the history behind it and the history of thought?

MH Yeah. So in fact, it is so important that we have a pretty major university-wide project on archiving, curating, and studying the history of development of those technologies. Things like the robotics archive, the oral history of computer science, those are all different threads of that idea. This is curated by the libraries at CMU who coordinate that, but of course with all of us in the School of Computer Science and others. And you're right, it's very important to understand the path that those ideas follow and to learn from that. So that's definitely something that we're doing now.

BP And for those who are listening, HCI– Ryan, that's Human Computer Interface, right? 

RD Human Computer Interaction Institute. 

MH Yeah, Human Computer Interaction is one of our seven departments. If you really want to know the six others –and I'm going to say it anyway– one is computer science, which looks at the fundamentals of computer science: programming language, system theory, et cetera. HCI, as we just mentioned, human computer interaction. Machine learning, which is its own department. Language technology, which of course, nowadays is particularly important. And societal system, which is software engineering, cybersecurity, et cetera. Robotics and computational biology, which is our most recent department.

RD So talk about the most recent one– computational biology. That's something I think we've touched on a couple times in blog posts, but we haven't seen a lot of that. 

MH So computational biology was created about 15 years ago, first as a research center. It's basically the idea of using AI and machine learning in the context of biology research, so things like genomics, precision medicine, single cell modeling, cancer research, a lot of applications there. The important view of that department is that it is not just taking AI and machine learning tools and applying them to biology inspired data. It's actually looking at how we can rethink how we approach some of the key biology research challenges using those ideas. And in fact, CMU is putting a major emphasis on joining forces between the sciences and AI and machine learning and basically transforming how we do scientific research. In fact, we are building a new building, the all of sciences building, which will house in the same space, biology, chemistry– traditional sciences, if you will– and machine learning, computational biology, language technology in the same space, basically fostering to the next level that collaboration. 

BP One of the most inspiring pieces of research that has come out in the last few years for me was DeepMind’s AlphaFold where they were able to use these AI techniques to correctly predict the shape of all of these proteins, which according to what I read, I'm no expert, would've taken a hundred years with traditional techniques. And now having all of these shapes visualized and these amazing 3D models, scientists can go out and do all kinds of drug discovery, or like you said, targeted genetic research. And that seems like one of those things where sort of the brute force capabilities of these large learning systems has been applied to science and biology especially in an interesting way. 

MH Yeah, it may be a little bit more to say that, but we are very impressed right now by the potential applications of generative AI and the kind of revolution they can bring. But I think we will see in the future an even much, much larger transformation based on this intersection of sciences and AI/ML. And what you mentioned is one example of that, but this is just a very teeny starting point. I think we will be surprised to what extent the world will change. 

BP Uh-oh. It's changing too fast for me already. I don't know if I like the sound of that. I'm barely holding on as it is. 

RD Yeah, and in terms of that, I spoke with somebody from IBM Research doing quantum computing around material states and simulation of nature. Does CMU do any quantum computing research? 

MH Well in three different places, basically: School of Computer Science, the College of Engineering, and in the sciences, in physics. In the school of Computer Science, we look more at the higher level of the stack when we talk about quantum computing, so theory, algorithm complexity, programming, languages, things like this. So we do have activities around that. There are large activities also in the College of Engineering looking more at the physical aspects of quantum computing. So yes, we do have that. To be fair, to go to the next stage, if you will, requires a very large investment that CMU on its own does not have at the moment. 

BP So in your particular sort of domain of robotics, CMU has quite a history. Have there been things within the last few years or things that you're working on or folks are working on at your school now that you're excited about in that domain? I saw recently an amazing demonstration of two small very simple robots, but they had been trained in an unsupervised way to play soccer. And so just by watching these sort of agents play soccer they had figured it out and then they had translated that not just from a digital system but into these real world robots that could move and block and pass and shoot with the intent of getting a goal. What have you been seeing that's been inspiring for you recently? 

MH It's along a similar line as what you were saying, which is basically the idea of being able to train systems that can operate in a very wide variety of environments and tasks and be able to train them with minimal supervision or no supervision. That's really kind of the holy grail, because if we cannot do that, we are basically limited by having to supervise, curate data, and it basically completely limits the adaptability of the system and so forth. So I would mention, for example, the work of Deepak Pathak, who is a professor in the robotics institute who has shown how to do learning and adaptation with very little data and being able to adapt to a wide range of conditions. So basically going in that direction, I don't want to use the word foundation model in this context quite yet, but in that general direction of having general models as opposed to having very specific models for different tasks and environmental training data. 

BP Yeah, what you say makes a lot of sense. When I was reporting about AI from, I don't know, 2015 to 2020, what was always very interesting was that the models obviously could achieve amazing results like an AlphaGo, but they were sort of narrow and brittle. It only played Go, it couldn't play checkers, and if you changed the rules of Go a little bit it would be completely lost. And now it's the GPT systems that are kind of astounding us because from that very wide general domain they're able to do a lot of amazing things. So it would be quite interesting, as you said, if we could start to do that but in the physical world, and see what comes out of that. 

MH Yeah. So that's the general direction that we see now that is the most interesting. The other piece has to do not with the robot itself, but with the interaction with humans. And again, the idea of being able to interact with the same theme basically of limiting the amount of supervision that is needed to learn the interaction. So again from Deepak’s lab, I point you in the direction of one of his most recent results where the system can learn how to have a person control the robot with just one camera. No fancy gloves, motion capture 3D cameras, lots of gizmos and all this. Just a simple camera. But the interesting thing is that that is learned entirely from completely unannotated, completely raw videos from YouTube, basically watching people and learning from watching people how to translate a motion to a robot motion, which I find extraordinary by the way that we now can do this. And this is what I meant by the holy grail. In other words, being able to learn very complex tasks– converting my hand motion to actual joints without ever using your robot for training. Just by observing human motion.

BP Yeah, if we can use all the world's YouTube videos as training data, then we've put a lot of work in already.

RD There you go. Just get all the robots doing yoga. 

BP Exactly. Cooking and doing yoga. 

RD I mean, I think that's interesting because one of the criticisms I've seen leveled against AI is that it's based on a lot of manual labor and labeling the data. Do you think we are close to a point where we can not label data, or is that more of a distant future? 

MH No, I think we are close to that. And in fact, that's a large portion of the work both in robotics, in large language models. All those different areas point in that direction. And again, the progress in that sense is remarkable. There is a drawback though, there is a major drawback to this though, which is that in the old days– so if I go back to the olden days, I used to do computer vision and we used to have those very carefully annotated datasets and all this. With that, you knew exactly what data you were training on. You knew exactly how the data was created, annotated, supervised, and all that. You had total control over that. Now, of course, by definition, now we don't have that. It's a good thing because it allows us to adapt to new tasks, to new environments and all this. But of course, the flip side of that is because we don't have all this information and control, we have a lot less understanding of what the models are, a lot less understanding of their behavior and so forth. So that's the flip side of that and we need to think about new ways to characterize the behavior of those models.

BP Yeah, I think that's fascinating. There was a paper put out recently from the folks at OpenAI asking models as they went through a problem to explain their reasoning step by step. And in doing so, they were able to vastly sort of improve their scores on these math and coding exams, so it’s interesting that with self-reflection they get better, but also interesting potentially from the perspective of alignment that they're willing to explain how they work step by step. And as far as we know, they're not hiding anything from us. 

MH This is another aspect that is I think very interesting and we're just scratching the surface also, which is this idea of interaction between the AI system and the user. By interaction, I don't mean just that I have my data, my input, I use that AI system, I get the output. That's the traditional simple interaction. But if we look at the generative AI systems, ChatGPT, et cetera, that are now out there, the interaction is much, much closer. In fact, it's so close that there is this continuous feedback between the two systems and they work together in a way that was never done before, never this close and never this complex. So there is I think an entire new area, discipline, call it what you want, that will emerge around that, around basically understanding this level of interaction, optimizing, training, et cetera, for this and modeling this interaction. I think we don't have really the tools to describe what that is and to formalize it. That's an interesting direction. 

RD Yeah. It's almost applying education pedagogy to the AI. Because in school you have to show your work and if you get the AI to show its work it learns better. 

MH And it's interesting how people don't realize necessarily how much they interact with the system, and in fact, how much they drive the system towards the problem that they're actually trying to solve. So there's this whole kind of, I don't want to use words like psychology and all this when we talk about AI, but this whole way of interacting that's completely new and not yet fully understood.

[music plays]

BP All right, everybody. It is that time of the show. We want to shout out someone who came on Stack Overflow and saved a little knowledge from the dustbin of history. Awarded May 31st to mx0, “How do I use a reserved keyword in a pydantic model?” Well, if you've had this question, they've got an answer for you, earned themselves a Lifeboat Badge, and helped over 2,000 people by sharing a little knowledge on Stack Overflow so we really appreciate it. I'm Ben Popper. I'm the Director of Content here at Stack Overflow. You can always find me on Twitter @BenPopper. Email us with questions or suggestions for the show, podcast@stackoverflow.com. And if you like what you hear, leave us a rating and a review. It really helps. 

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. And you can reach out to me on Twitter @RThorDonovan. 

MH And I'm Martial Hebert, the Dean of the School of Computer Science and I would like to first thank you for hosting me on this podcast. If you want more details about the School of Computer Science, please visit www.cs.cmu.edu.

BP Awesome, we'll put that link in the show notes. Thanks for listening, everybody, and we will talk to you soon.

[outro music plays]