The Stack Overflow Podcast

“We’re not replacing you; we’re with you”: Where AI meets infrastructure

Episode Summary

Ryan talks with Greg Fallon, CEO of Geminus, about the intersection of AI and physical infrastructure, the evolution of simulation technology, the role of synthetic data in machine learning, and the importance of building trust in AI systems. Their conversation also touches on automation, security concerns inherent in AI-driven infrastructure, and AI’s potential to revolutionize how complex infrastructure systems are managed.

Episode Notes

The Geminus platform is built to automatically integrate data, physics, and computation for autonomous control of complex systems. Explore the platform or get in touch.

Find Greg on LinkedIn.

Episode Transcription

[intro music plays]

Ryan Donovan Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm Ryan Donovan, your humble host, and today we are talking about AI and physical infrastructure and machinery. My guest is Greg Fallon, CEO of Geminus. Welcome to the show, Greg.

Greg Fallon Thanks, Ryan. Good to be here.

RD Top of the show, we like to get to know our guests, how you got into software and technology.

GF So I started to get into software early in my career. I was getting my PhD in mechanical engineering in the mid-nineties. I fell in love with some new technology that was emerging at the time that was made possible by compute power becoming cheaper and cheaper, and so I left my PhD program to work at a startup that we were making high accuracy simulators for engineers. The company later became a company named Ansys, which is a publicly traded company that has taken that software and that technology globally, and now it's virtually the standard for every industry around the world, at least in terms of the use of these high accuracy simulators. And so that was my first foray in that industry. We built the company, that was very exciting. The biggest challenges there were both technical, but more on the go to market side. We were trying to sell software that was disrupting a workflow from a very conservative user base, which are hardware engineers and mechanical engineers, chemical engineers, and so we had to change the way that they work. Basically, what we were doing is taking a workflow that was very manual where an engineer would come up with an idea, they would go to the laboratory, test it, and they would go back and forth, to one where an engineer would come up with an idea, they would create a simulation for that idea, and work within that simulation to refine it. So you basically replaced a lot of the laboratory experiments and the simulations provide a lot of rich data the engineer would also get ideas from for improving the design. That industry was really exciting to be in. It was very disruptive at the time. The industry really stopped evolving after the advent of Linux clusters, so you had hardware improvement to the problem, so you started to run massively parallel problems and so some of these simulations could take weeks to solve and so you're able to bring that down to something more reasonable. After that, things stabilized a little bit and I started to get interested in deep learning in the 2010 timeframe. Simultaneously, my co-founder, Karthik Duraisamy, who's a professor at Michigan– before that, he was actually at Stanford– was working on blending machine learning and deep learning methods with these high fidelity simulators. And the benefit of doing that was to solve some of the problems that companies were having in machine learning and artificial intelligence in the industrial environment. And when I talk about the industrial environment, I'm talking about making machines operate better– control systems, those types of things. And in that environment, the standard was that you add sensors to a machine, you collect lots of data, you would create a model, whether you were just creating static models using TensorFlow or whether you’re using reinforcement learning. That was really taking data off the machines, and the problem with doing that is that most big machines don't have a lot of sensors. Sensors aren't very trustworthy. They tend to drift. They can fail, and so no one really knows that the data's any good. So you had this problem where you would create these really sophisticated models that had a lot of promise but the unit economics didn't add up because you would take a very long time to make a model– 24 months or more– and that model would be substandard in its performance, and then to boot, you have this big challenge of operators of these machines who weren't going to trust a black box model. These are people that are dealing in high risk environments, spending decades to become qualified to run these big machines, and they just weren't going to take advice from a black box. So we were started to really fill that gap because the methods that we use are really tailored training methods for machine learning and AI coupled with optimization routines. And the training data we would use is primarily synthetic data, these high fidelity simulations, which by their nature are trusted in part by the operators because they’ve learned to be accustomed to this engineering process. We’re using data from their engineering processes, but we also use multiple data sets, which is one of our kind of key training algorithms where we'll take data sets of multi-fidelities and blend them together, and that generated very high trust environment built in the ability to build out uncertainty bounds so that we would use a probabilistic approach so the operator could see the uncertainty bounds. And that built this trust environment, but the other thing too is that we took what was a 12 or 24-month process and we can compress it into days. And so that's really the big impact on industry. So that's my foray in this. In between, I spent time at Autodesk leading their products for mechanical engineering. At that company, we launched a product called Generative Design which used cloud-based algorithms and AI to generate physics designs based on people’s ideas.

RD I'm interested in the synthetic data. That's something we've looked at, not necessarily in terms of simulations, but in terms of building machine learning models on that. Now, it sounds like those simulations are based on real data, real lived data. How do you gather that data?

GF So that data is actually a little easier to gather. So there's two things that go into those simulations. There's the spatial data, the physical data from CAD, mechanical drawings that provides the physical infrastructure, and then there is the operating conditions which are generally known by the process engineers. And so you can usually get most of that information. You can take sensors, but that's general information that's well known and so it's not very difficult to create a high fidelity simulation if you have the tools to do that. And so that tends to be the easy part, and then blending in the other data sets to build the model becomes the secret sauce. And then the hard part really is the issue. Once you have a model, how do you maintain its lifecycle? So there's this whole ML ops that's becoming AI ops type of workflows.

RD The ML ops pipeline, and I assume your ML models are more traditional ML stuff, right?

GF Yes. So we're using traditional neural nets. As I mentioned before, it's really the training of those nets that is the key part. And sometimes we'll use neural nets, other times we'll use different types of operators. And then the sophisticated part is when we start to weave those together, because we often take a systems approach to creating models of very large systems, and so you might have ML models talking to one another and interacting in tandem. The AI part comes when we start to adapt that to reasoning tools like optimizers, and then now we're using tools to basically coordinate and manage different.

RD You get the real lived data into the simulation, you get the synthetic data out. How do you then use that to optimize the actual machine? Is that an operator process or is that an automated process?

GF So it's pretty automated. In essence, what you're doing is creating a high fidelity digital twin of the machine that works almost infinitely fast. So think of this thing that can change its behavior of the machine and explore different operating conditions. Because these machines all have all sorts of switches, pumps, valves, gears, each of which have their own control setting. And identifying the optimum control settings, the mixture thereof to drive performance is the key. And if you can create this digital twin that can basically say, “Okay, what's the current state of the machine? And given that current state, let me run through millions of scenarios in milliseconds or seconds, and then deliver back the right control settings to hit the optimal performance,” that's how it works.

RD I am sure that working on the machines is a very blue color job and you said you built trust. Have you gotten any sort of pushback in getting the AI? I know you said you built trust with the simulations, but I imagine there is a gut feeling that a lot of them have that they miss having. How do you address those concerns?

GF Yeah, so that's a very real problem. The biggest issue arises when the AI agent tells you to do something that is not in your experience. So you've been doing this for 50 years and now it's recommended that you take an action that's counterintuitive. And that's a pretty complex scenario, and the way that we build confidence around those particular scenarios is to have the agent show the operator examples of little things, like if it made a little change, this is what happens for the outcome. And so the operator is usually familiar with the impacts of making small changes, and so by showing them how you can make those small changes and the answer that was given changes in a sensible way, a really kind of simple way, it’s actually very complicated but it's a simple way to build confidence and typically that gets us over the hump. But the honest answer is that you have to build trust by actually getting the operator to use it for a while. They have to see the impact. In all the cases we've been working through so far, usually the impact on performance is pretty dramatic and so you get this ‘how did it do that’ type of a reaction, and then you couple that with saying, “Look, we’re not going to replace you. We’re working with you and you still have to use your intuition,” there's this special sauce in there that seems to help.

RD Yeah, and I imagine that the generative models are known to be nondeterministic. Are the simulations also probabilistic? Any of the ML models, is it run it once with the same settings and it operates the same, you get the same output every time?

GF The answer is yes. The machine learning models are probabilistic, but they're deterministic in the fact that they give the exact same answer for the exact same question all the time. The probabilistic attribute comes in because we tune them to be probabilistic. We want them to go across a range of operating conditions to mimic physical reality, because in physical reality, there is no deterministic answer, and so we build that in. And what you just said is the exact reason why no one is using LLMs to make recommendations to change big machines that if they break could hurt you.

RD Yeah. And that's the safety thing I thought when I first heard about this, but I imagine automation, these machines, these dams, these power grids are highly automated for a long time. What's the automation look like without the sort of AI/ML adjustments?

GF So the state they are today is you'll have a control system, so it'll be a direct control system. There will be a control strategy built in, which is essentially a logic workflow that operates a bit like a model, predictive in nature. It'll say, “Look, if this happens, do this.” It'll have guardrails, which is nice too. And so really we don't replace that. In fact, one of the things that we feel is very important is that we don't want to be in the control system business because, first of all, there's trust that's been built up over 50 years with these machines. It’d be almost possible to replace that trust very quickly. Second of all, there are advantages of working through the control systems because of these guardrails. So our AI agents, when they’re directly connected to machines, which is not all the time, often there’s a human in the loop, but when they're directly connected, they essentially give the control system advice. And so the control system says, “This is what's happening, what should I do?” and they say, “Go ahead and do that,” and the control system will say, “If that's within the guardrails, sure, I can do that. I'm not going to blow anything up.” And so there's some leverage there, which also, by the way, reduces the emotions of the operator to say, “Okay, nothing is going to break here if I do this,” and so that's a critical piece.

RD With the AI as advice giver, is there a risk of malicious actors accessing that AI or that pathway to give advice either through onsite social engineering or their network?

GF Yeah. I have to imagine that there is a risk. The risk is handled at multiple levels. All of these critical infrastructure will have multiple layers of security. First of all, there's the control system which has guardrails built in. Those systems are either air-gapped or protected with many different protections. Most often they're air-gapped in critical systems, and then when they're air-gapped in critical systems, and when we're talking to the air-gap piece, we are also air-gap. So we are native on Azure, but a lot of our inferencing engines are taken off the cloud, which is another challenge for us because a lot of the hardware that sits next to, in our power plant, for example, could be chips that are 10-15 years old, and so we have to make sure that we're integrating well within that environment, so that's a challenge.

RD So for that, do you have to have variously quantized, smaller, lesser models? And is there a noticeable difference in the sort of data speed processing, that sort of thing?

GF Good questions. So yes, the models have to be in a small enough package to fit there. Depending on how many models are talking to one another, that usually is a hurdle. We can simplify the model and get it there. And then in terms of processing speed and limitations, things slow down a little bit on the older hardware, but it's usually fast enough where we can make meaningful impacts within, instead of milliseconds, it might be seconds.

RD And I'm curious about the simulation. Initially, when I think of it, you can see it on a screen. You can see the machine running on the screen, but that's not how it works. I bet it's a bunch of math happening. What's the sort of stack that runs that?

GF The simulations are usually third party tools. Sometimes they're made by the customers, sometimes they're made by third parties, companies like Ansys. We often close the loop with them when they're streaming data to us, so we just see kind of text output from them. The way that those simulators work is a little bit different. They're based on the first principles of physics, and most of them are solving differential equations, so they basically differentiate equation solvers, and the math varies dramatically depending on what type of physics you're dealing with. Some are ordinary differential equations which are relatively simple to solve, some are partial differential equations. And so the whole mathematical algorithms behind those solvers will vary different solver by solver. The good news is that we don't have to get involved with that. We deal with the output, and the output then becomes a data stream for us. And we compare that data stream often to two or three other data streams, and that's where we start to tag a data stream based on their relative fidelity to one another and their cost to create. And that starts to get into the training algorithms that we use.

RD So when you're training the algorithms, are you using the same simulators to train or do you have some sort of higher order simulation for it?

GF No, we try to use off the shelf simulators, and the idea is to fit into the customer's process. The simulators that they have are standard to them. They're trusted. They're often validated within the specific machines that they're interested in. We usually don't have to change a thing. We basically will take the simulators that they have. Luckily, they're fairly ubiquitous in most engineering disciplines– not all, but most engineering disciplines. And a lot of the work that we're doing right now is focusing on applications or use cases where simulations are present, or if not present, simulations that we can get deployed there very quickly.

RD And in terms of the infrastructure machinery pieces, what are the hardest to simulate, optimize? What are the ones where you're like, “We had to develop a whole special thing for fluid dynamics or whatever?”

GF So we usually don't have a hard time with the physics. That tends to come naturally to us. The biggest challenges that we have are often dealing with size. So depending on the infrastructure we're looking at, you could have something that is a single asset, like a piece of an oil refinery. That alone could be very complicated. From a physics perspective, fine, we can deal with that. It gets more challenging when you have dozens or hundreds or thousands of things that are interconnected, so there you start to run into challenges with the actual size of the asset because if an asset is made of thousands of subcomponents and each subcomponent has to be simulated separately, and each subcomponent can act dynamically. So you take a field of natural gas wells. When natural gas is taken from the ground, it's not one hole in the ground. It's often many holes in the ground, sometimes hundreds, and in the case of oil fields, sometimes thousands. And each of those holes in the ground has its own control strategies. Often there's a pump or a valve sitting either at the bottom of or in that well, and so the problem is that, oh my gosh, if you have to train all of these assets together in a single model, you could be talking about training times that make the process untenable– like a year. And so we've developed techniques to get beyond that and to compress that down to a few hours or a few days.

RD Does that involve separate models or are you still training this massive multi-piece simulation?

GF We have a proprietary algorithm where we use separate models that are connected together.

RD Is there something I didn't touch on that you wanted to talk about?

GF There's a few things too. So I'll talk about the future because what we're doing today is important, but I think that we're just scratching the surface on the opportunity here, and we're bringing artificial intelligence to one of the most conservative industries or groups of industries out there, which has got its own challenges where we're dealing with larger and larger pieces of infrastructure. The future is where we'll be able to connect all pieces of infrastructure together where you can have, let's say, an entire energy network that can be optimized and used in a digital twin perspective separately. So for example, let's say you have an entire country's infrastructure running in one of our models. Again, the digital infrastructure that the AI-driven digital twin is running extremely fast. I mean, not infinitely fast when it's an entire country's infrastructure, but pretty darn fast. And what that allows you to do is centralize decisions, but more importantly, you start to think about planning. If you have a full network of infrastructure, you can tackle problems like planning the electrical grid, which is massive. The size of the global electrical grid, if we are going to continue to electrify at the pace we would like to to decarbonize, the size of the electrical grid has to almost double in 20 years. And it took us 130 years to get here and electrical grids are very sensitive. The infrastructure can be disrupted very easily from attaching, let's say, a new charging station for a car or a new power generation station. And so being able to tackle those at scale is a really exciting and big problem. So we built a lot of the core underlying infrastructure to do that and now what we're starting to do is blend in generative AI techniques. But at this point in time, we're using traditional LLMs as agents and so we have agents directing the data science and some of the simulation work that we're doing– we call that the computational science. And so we're evolving that, and that's going to allow us to do bigger models faster, derive more data from them, and that's very exciting. And then there's the next generation, which is the next, let's say, 5 to 10 years beyond where we are today. As models evolve, the simulators themselves are very rigorous. They cover almost an infinite number of scenarios, and they're very difficult to replace with an AI model. Someone might ask, “Can't you just take an AI model to learn fluid flow?” Today, that's not possible. You have to learn it in a very constrained arena. There are new techniques out there that are getting to the point where you can start to replace the simulators with large models themselves. That's very exciting. That's an active area of research. My co-founder is working in that area. But we're pretty far away from that happening. So anyway, it kind of gives us a 10-year plan, and then you start to throw quantum computing in the future in the mix, and that gets exciting for us because then you can start to train models faster with lots and lots of simulations running in a quantum environment.

RD Are we close to quantum computing? I've talked to quantum computing researchers and it always seems like it's a super complex thing with multiple layers of redundancy and they’ve got 400 bits now. Is that something you're planning for?

GF We're not planning for it yet. We want it. We have friends in the space. Everyone says we're 10 years away. Now, I don't know if that's the ‘we're 10 years away from fusion’ answer you used to get 40 years ago about nuclear fusion or if that's really 10 years away. I don't think we're anywhere close. We're talking about, like you said, tens of cubits and we need tens of thousands of cubits.

RD Yeah, that may be the limit of the future that you can conceive on there. That's the limit– 10 years.

GF Yeah.

[music plays]

RD Well, thank you for listening, ladies and gentlemen. I have been Ryan Donovan. I'm the host of the podcast here at Stack Overflow. If you want to give us feedback, ratings, reviews, comments, suggestions, email us at podcast@stackflow.com. And if you want to reach out to me, you can find me on LinkedIn.

GF This is Greg Fallon from Geminus. Our website is Geminus.ai. You can reach out to us at info@geminus.ai. We're also available on LinkedIn.

RD Thank you very much everyone for listening, and we'll talk to you next time.

[outro music plays]