Job interviews are stressful. This one is not.
Hello everyone and welcome to the very first episode of We'll Be in Touch, a new podcast series from Stack Overflow.
This show will explore the world of job interviews, career development, and software engineering. Each episode, we'll sit down with folks working in software development to hear their stories, dive into their latest projects, learn about tricky bugs they've tackled, and discuss the tricks they use to keep up with all the latest languages and frameworks.
Your host, Kyle Mitofsky, is a Staff Software Engineer here at Stack Overflow. With over a decade of experience as an independent contributor, manager, and team leader, he's interviewed a wide range of people and is excited to be able to share these revealing and engaging conversations, WITHOUT the pressure of an actual job interview.
Whether you're an aspiring developer or a seasoned professional, join us as we delve into meaningful discussions that can help shape your career. We're kicking off the series by chatting with a former colleague of Kyle's, Yaakov Ellis, a long time Stack Overflow community member and employee who currently holds a role as a Staff Engineer at Intuit.
Yaakov Ellis We always have a challenge that we try to think of our minds as being multi-threaded, but I think really they're more single-threaded. You always work better when you're focusing on one thing. So you might focus on 10 different things in sequence, but you're going to be less efficient that way.
[intro music plays]
Kyle Mitofsky Hello, everyone, and welcome to the very first episode of We'll Be In Touch, the podcast where we explore the world of job interviews, career development, and software engineering. Each episode, we'll sit down with folks working in software development to hear their stories, dive into their latest tech projects, learn about tricky bugs they've tackled, and discuss the tricks they use to keep up to date with all the latest languages and frameworks. I am your host, Kyle Mitofsky, a Senior Software Developer here at Stack Overflow. In my various roles, I've interviewed a wide range of people and I've always genuinely enjoyed learning more about their background and what they're up to, and I'm excited to be able to share those types of conversations to a wider audience without the pressure of an actual job interview. Whether you're an aspiring developer or a seasoned professional, join us as we delve into meaningful discussions that can help shape your career. Today, we're kicking off the series by chatting with a former colleague of mine, Yaakov Ellis, currently a Staff Software Engineer at Intuit, to explore their unique experiences in tech. Well, welcome, Yaakov. To get into it, can you just start by telling me more about yourself and your journey into tech?
YE Thank you for having me. We were on the team for –what is it– at least a couple of years there. So I have been a developer for about 20 years. I spent eight years at a company called Verndale, it’s a web development company in Boston, and I was, for a full decade plus a few weeks, at Stack Overflow where I worked as the team lead for the tech lead for the internal development team for five years. And then I was on the public platform team for five years under different names. I think the team went through at least three or four different names while I was there, and I was there until last December. And since then, I left the company and I started work at Intuit. I live in Israel, I've lived here for 18 years, so now I switched to work in the Israeli branch of Intuit. There is around 400 people working here. So I'm still working as a developer, but I have completely switched basically everything about my professional life. I had worked for 18 years as a remote employee for American companies and now I switched to work in a local office at least a couple of days a week. And I've switched technology stacks from C# monolithic applications like we have at Stack Overflow to microservices and the cloud-based AWS with Java that I'm working with at Intuit. And I'm having to speak Hebrew on a regular basis, and company size is very different, so I've had a shift in my professional life and I've been there for about 9-10 months now.
KM Can you tell me more about that? And maybe we can talk about it more later too, but I think it's always really interesting that part of being a developer is this constant growth and relearning how to do things. And we do a lot of .NET stuff so when .NET came out, .NET framework and where we are today with .NET Core, two totally different technologies. There's some similar threads, but as somebody who spent 20+ years developing software to make a pivot into a different tech stack, are there different throughlines that you feel like you can still very well piece together what's going on, you just still have to learn new technologies? How has that transition been for you?
YE In some ways it's been very tough, in some ways less so. I think the thing that I've learned that it made people learn the further you get into your career is that the most important skills are, first, the ability to learn new things, and second, the habits and the soft skills and all of the things that go into being a developer other than writing a code. Because writing code up to a certain level, we're not at Stack Overflow, and at every single one of my jobs including the current one, I'm not on the cutting edge of research technology. I'm not discovering new cures for things. I'm not inventing new things. They're all different spins on the same thing. You're connecting different systems, you're interpreting data, you're storing data, you're providing a good user experience, you're thinking about the platform. And the packaging changes, but the habits and the things that go into being a good teammate and contributing and the intangibles are the things that stay the same. That said, it was something that really brought my imposter syndrome up to a boil to have so many different transitions. I've always been the person that's comfortable with the things that I know well, and very often in my career, I've had teammates that always wanted to try the new things. And I was always the person saying, “Well, is that really the best decision for us? Are we just changing because it's new and flashy? How much is it going to contribute?” Part of those are because those are pragmatically good questions, and part of it is that I was just comfortable. I was in .NET with C# since .NET 1.x.
KM Early days.
YE So I've been through so many different pieces of evolution there. Very often the platform changes, but the job doesn't change too much because you have gradual changes in the new things or there are other people taking care of the shift into .NET Core. In my case, it was just a tremendous shift in all ways. And I think just the important thing that there's been for me to keep in mind that I've learned is that even when you're on the platform you’re used to, you're never going to know everything. And moving to a new platform, even more so. You're not going to know everything, but it's not your job to know everything. I guess for myself and maybe for other people, sometime around when you're 18 you think you know everything, and then basically every year since then I've realized that I know less and less, even if the quantity of what I know increases but the awareness of how much I don't know also increases even more so. And that's the same thing I've been learning. The last time I did anything with Java before joining my current job was in university back in 2001-2002. I was on Java 1.2, I believe. Java is in the 20’s right now. All the frameworks that have built up around Java, my team is using Spring Boot. I had to look up what Spring Boot is before starting my job. I didn't even know what it was, which can be surprising if you're in the job world because everybody uses Spring Boot. It's so ubiquitous, and it's been humbling that I'm joining as sometimes I've seen teammates join. I'm on the staff level, so you have three or four levels beneath me in the IC scale, in the individual contributor scale, and people who are one year out of university who have so much more familiarity with Spring Boot and just what it does that I've had to learn. Now I can pride myself on saying there are a few things where I have to learn them multiple times so I am able to learn them and then apply them, and I think that's the goal. And even taking courses I took before I joined Intuit, I took a certification for AWS for a solutions architect as well, just to get familiar with the cloud platform. I was at Stack Overflow for 10 years, and at least until the point where I left, Stack Overflow was a big monolithic application. It was not cloud-based for the main public platform. Everything was within one code base. Now that I'm moving to a microservices platform, I've learned many new things and I have a bigger awareness of things that exist. I still, almost on a daily basis, encounter new pieces of technology, new techniques, new things that other people take for granted that they know, as they've been focused on that for their entire career, but I'm no longer intimidated by it. And compared to where I was a year ago, I think I'm much more comfortable with being uncomfortable professionally.
KM I think that's a great way to put it– being comfortable with being uncomfortable. And that being maybe the only way to combat imposter syndrome, which of course can flare up at any level of tenure or length of career. There's so many new things to learn that it can be overwhelming, but you’ve just got to know that there's a path towards getting there. Okay, so you are a staff level engineer at Intuit. Intuit, of course, brings us stuff like QuickBooks, TurboTax, Credit Karma, MailChimp, stuff like that. That's the big organization. Can you tell me more about your team or a project that you're currently working on and what kind of the scope of that project is and what your role on that project is?
YE So the team that I was hired into works with a tool that is providing employees of the company the ability to manually evaluate cases where there is a system that triggered a warning that maybe this is a risky transaction. So it's within the greater QuickBooks realm, and QuickBooks handles not just accounting, but it handles now payroll, invoicing, payments, money transfers, loans. There are many, many different financial vehicles that are handled, and whenever you have financial vehicles being handled, then you have the potential for fraud– either people that are trying to pose themselves as one party they stole credentials for, or they are trying to, let's say, take advantage of a line of credit that they fraudulently acquired, all different things like that. And whenever you have transactions with money, you're going to have that potential, especially with a huge company like Intuit where we have literally billions of dollars flowing through our system. So a very small percentage of those different transactions, of which there are many types, are flagged by different systems as being suspicious. Some of them are obviously wrong and those are stopped right away. Most of them are just good and they go through. And there's a very, very small number, which is still in the thousands per day, that we think there might be something wrong but somebody has to look into it more. And that's where the tool that my team provides comes in, and there are employees in the company that are trained about how to evaluate them and then there are many different ways in which they'll evaluate based on the type of transaction, based on the type of risk, based on the type of suspicion that they'll evaluate them. And we kind of consolidate the many tools that are used internally from Intuit and also as well as third party tools and we help to provide tools to let those evaluations go in as timely a manner as possible and as accurate manner as possible.
KM Are there real time demands on that system? I know when I spend 2,000 bucks on home decor or something, my credit card company says, “You don't normally do that,” and tries to abate the transaction immediately, doesn't let it go through, sends me a little text alert, which I'm always appreciative of, and I say, “Yes, that was me,” or “No, that wasn't.” Do you have to do it in real time or are these mostly auditing systems where you can put it on a queue and let it resolve at some point?
YE So everything that's happening gets real time evaluation. Now, an interesting difference in what I went through at Stack Overflow versus Intuit is, Stack Overflow at the time I joined, it was less than a hundred people, but for most of the last few years we were 300, 400, 500 people, which is a large size for a startup, but compared to Intuit where there is 18,000+, it's very small. And at Stack, what the situation was very often, especially when it came to things like platform and technology is, “There is some initiative that hopefully next year we can have somebody spend a week on to think of a way we can improve it.” And you might have 50 things like that that might get taken care of by somebody, or it might be they have a proof of concept that maybe has consideration for a budget. So at Intuit, each one of those has a team of 10 people working on it. So we have lots of tools in the system. The tools that deal with all the real time evaluation, there are dozens of teams and hundreds if not thousands of people, some of which may work in my office or in other offices around the world, that provide those services that can be used for all different types of transactions to plug into. So yes, those things happen. Those are not having to do with my team, except that we get our caseload as a result of the decisions of that system. So when that system says, “I'm 99% sure that this is fraudulent,” then it just cancels it. If it says, “I'm 0.1% sure that's fraudulent, I'm 99.9% sure it's good,” it probably lets it through. It depends on what the scenario is. Maybe it lets it through for $5, but not for $50,000. But there's some gray area for every transaction type and every amount and every profile where it says, “We need somebody to look at that,” and that's where systems like my team come in, or in the case of your credit card, where you get an SMS saying, “Please call the bank, somebody wants to talk to you about this,” where it's on hold. There's also a different technology area where I just started working as well, which is related to my team but also related to the QuickBooks and the FinTech platform as a whole. We have a new kind of division-wide initiative of improving the health of all the services in the division, which we're talking about more than a hundred different services spanning every kind of aspect of QuickBooks and things related to it. And there we're talking about things related to the health of the system. There's something we call three nines, four nines, five nines, which stands for 99.9 or .99 or .999% uptime and improving the speed at which we can detect incidents and the speed in which we can repair them, and addressing some issues in a platform-wide basis. So I'm part of a team of people that are pulled from across the division to try to approach things of that nature across all of the services and try to improve those statistics kind of division-wide.
KM I saw also on LinkedIn that in that pursuit you talked about golden signals. Can you talk about what those are and how you might use them to understand uptime and reliability and other service metrics?
YE So golden signals as they're used at least in Intuit, I'm not sure how much of an industry-wide term it is, we're looking at the health of an individual service. So for every service at Intuit, it's being hosted in Kubernetes and Docker environments on AWS in some form, and the requests are going through API gateway in some form, and there's lots of logs. In our case, they’re going into Splunk, millions and billions of logs going in, and the golden signal is trying to say when should we automatically alert the team that something is wrong. So there are many ways that teams can be alerted and there's a whole team of people at Intuit that's available 24/7, like in probably any company of its size, especially with finance, where they're in charge of an incident. Something went wrong, how do we fix it as soon as possible? So I have an app on my phone called PagerDuty. When it's my turn, I can get a call at any time saying that there is an incident somewhere, and maybe it's with my service or maybe a service that uses mine, but they need your help. So the incidents can be triggered in any number of ways, but it's important that when there is something worthy of an incident that we find out as quickly as possible. That's where golden signals come in. So a golden signal would be, if the health of the service drops below a certain threshold with a certain amount of activity, then we're automatically going to make an incident because something's wrong. So a default might be that when you have more than, let's say, 60 requests a minute and more than 2% of your requests are failing or giving 500s. I don't care what it is, if you have that percentage of requests failing, we're making an incident. We're getting on-call activated and we have to look into it. And the numbers can change from service to service based on specifics, but that's the general idea there. And again, we also have many other ways of triggering incidents. You could have if there's one specific metric that ever has even a single thing happening that we can trigger an incident. That's not the only way, but it's kind of a catch-all of if the health of the services is getting below a certain amount, then right away get people talking about it. Now at Stack Overflow the golden signal was, “Are people complaining about Stack Overflow being down on Twitter?” which was remarkably fast. The MTTD, the mean time to detect, on that was about 20 seconds. It was faster than any system that anybody could have. But when you're talking about an ecosystem when we have literally a hundred or more services where these can happen, it's important to have some standard of how we're going to approach them. Now within Intuit itself, there's a big ecosystem of the way the platform works and automated tools to create a new site and new service, and a problem we have with probably any company where you have so many different services being created is you have some that are created a week ago, in which case they're using the current best practice, in which case they would already have the golden signals integration built in automatically. But you might have some services that somebody made six years ago and they've been migrated or upgraded only partially, and maybe there are reasons why they've never been brought into the new standards. And there's always standards evolving. I'm including golden signals and many other things, and part of the job that we were doing is to ensure that 100% adoption of services into golden signals. And then we're proceeding on to other metrics and other tools that can also help to improve the detection and repair of incidents as well, raising the low bar in the threshold across all services as much as possible.
KM There's always this tradeoff between monolith and microservice where if you have 100 different services, it is 100 different things to keep up to date with best practices. And if you want to adopt a new observability or learning mechanism, those things have to maybe opt into it, maybe there's some standardization that you get to take advantage of across all those. But there's still at least builds that have to go out for a lot of different things.
YE Transition to microservices has been a huge thing for me. I was both spoiled and held back by being on a monolith for so long, because whenever we needed to do something, you're in the same solution, you add it in. And now being in an environment that I guess is much more adhering to the way the industry is where for most things you need to do, it makes the most sense for there to be a specific service run by a specific team, because that might need to be used by a dozen other teams and we don't want to be repeating the work and spreading out the expertise. But then that means something where in a monolith I could add one file and just do it, now I have to add a connection to the other service, get authorization from my service to talk to the other service, add additional integration tests, figure out where the API documentation was off a little bit, figure out why things aren't working, deal with time zone issues of the team. There's so many other things that it's necessary for a company to scale beyond a certain size, but you can incur such a cost in terms of how quickly can I go about implementing this technology, this connection, that sometimes can be very frustrating because I could say, “Well, I used to be able to do this in 45 minutes and now it takes this much time.” And I guess part of the challenge, especially when you get to a higher level of career like in a staff level, is trying to find ways that we are existing within the environment where we have microservices, and we need them because of our scale and because so many different services are connected. How can we lower the amount of time needed to build things? How can we make it easier to onboard to new services? How can we improve the level of documentation across many services and the experience for developers and the standardization of logs and things like that to take advantage of the ability to have services that are so spread out and allow a company of such size to still be very efficient while improving the quality and efficiency as much as we can?
KM I want to go a different direction here and laser in on one tiny thing. Really, we're not talking about any big picture architecture. We're just saying, what is one thing like a bug or a ticket that you've worked on in the last sprint that you can tell us about? What was the challenge, what were you working on, and how'd you solve it? Or maybe some of this is just wiring up service metrics, two different services. Tell us about something that happened recently and what you were working on.
YE Well, I'm going to answer in a way that you're not anticipating and I'm going to say that a big problem that I had to work with recently was cross-team communication. There's an issue that we know something's not working right but we're unclear about which team is responsible for it. In this case, Team A provided a service that provides a gateway to the technology run by Team C that has input from Team D, and it's unclear at which point then something's failing. We're sending something through a chain of different services. So that itself was a problem that held us up for a few weeks of trying to track that down– longer than it takes to solve any coding challenge, by the way. Any coding problem we had with it to actually writing the code, is not so hard. It's finding out who's the right person to talk to, what's the right format, what does that error code mean, is there something wrong with the infrastructure, with the setup? So that was the challenge, and that's happened a number of times. And I think the way to solve it is to be politely and annoyingly persistent. What I do often is, if I'm sending a Slack message to somebody on another team and I'm blocked by it, I'll set myself a reminder on that message in Slack saying to remind me tomorrow. And then if I didn't get an answer, then we either need to set a meeting up with that person or figure out from them who's the next person along the chain. And you almost have to be a PM for yourself to figure out the connections to unblock. Because the answer is there somewhere, but again, the challenge isn't won in technology. The challenge is won in kind of documentation and communication that can go way beyond then you have schedule slipping, you have dependencies that aren't accounted for. It's always something to work on. And then as well to try to keep the awareness of, do I just solve my local problem and move on and say, “Okay. Well, that other team is missing something with the documentation,” or do we try to make things better for the next person? So I'm a big subscriber to the broken window theory of software development. The term originated from some observation or study many years ago about how a building or a car that was left in good shape tends to stay that way, but once you had a broken window, then other people would start to vandalize it. I think especially in big organizations and big platforms and services, you very often have broken windows that get there, and I think that the more senior you are, the more responsibility you have to try to always leave things in a better place when you can. Intuit happens to also have Stack Overflow Enterprise. We’re a client of Stack Overflow, so we have that as an internal tool as well as a whole ecosystem of ways of discovering answers to questions and leaving artifacts for the next person. We have a tool called ‘Insights’ where it basically can combine knowledge from Google Docs and from Stack Overflow Enterprise and from other wiki software that we have and from Confluence and from GitHub in different places and from other different tools and you can put in a question and get an answer related to what you're looking for across the whole ecosystem. So if you can just take a few more minutes and leave an artifact for somebody else to find, you've helped to mend the window maybe, but you're helping to improve the ecosystem. And I think that's an important thing to do as a personal practice and also to try to encourage others to do as well.
KM I think we also call that the ‘clean beach rule.’ You're on the beach and you see some trash and you're headed out, just grab the trash. And it takes extra steps, but it's kind of leaving a nice place for everyone. I also wanted to comment on something in the first part of your answer. I really like the idea of setting a reminder for yourself when you're blocked on something from someone else. Sometimes I think I volley the thing over the net and I say, “It's in their court, I'll just wait till I hear back from them,” and it kind of drops off my active radar because I'm now working on other things. They are kind of the known next step. But what ends up happening is I kind of probably wait longer than I should to do the next subsequent follow up. So I like the reminder to say, “Well, if it's been two days and I haven't heard back from them, I should ping them again at that point.”
YE I think we always have a challenge that we try to think of our minds as being multi-threaded, but I think really they're more single-threaded. You always work better when you're focusing on one thing. So you might focus on 10 different things in sequence, but you're going to be less efficient that way, the more you're switching contexts. So I think we tend to say, like you just explained, “I reached out to them, it's in their court,” and then you forget about it. And I'll do that also, which is why I'll just make the last action be, “And remind me tomorrow.” Ultimately, if they don't respond in a week, it's on them, but I also get delayed, and the people who are relying on me get delayed. And in the end, it's not about whose fault it is. It's about getting things done in an efficient way, in a productive way. So in Slack, I use and abuse reminders. The same thing if I look at a message and I know I'm not in a place to respond to it right away, automatically ‘remind me tomorrow morning at 9 AM’ because I'm not going to remember that stuff.
KM One thing I like to ask people is to tell me about a time where you had to learn a new technology on the job. We kind of covered that because you're learning plenty of things, but I kind of want to drill into that even further where you as a staff engineer might have this expectation where maybe you know everything and you don't have to spend time. Of course, that's not actually the case, but there might be this expectation where you don't have to go learn new things. How do you be very intentional about carving out time to learn a new technology? Is that something that you feel like you can highlight to your team that you're spending particular amounts of time on stuff? How do you kind of make sure you have that space to learn it? It's definitely been something that you've been doing a lot over the last year. How intentionally can you go about that?
YE So at Intuit, they do have a pretty comprehensive set of expectations about what people are going to be doing at different levels for different roles. If you're a data scientist versus a product manager versus a front end engineer versus a backend engineer, if you're a developer one or two or senior or staff or higher, there's a very, very comprehensive rubric of what your goals professionally should be and what it means to be meeting goals or exceeding goals or super exceeding goals or something where you need improvement and what that means for all levels, which I don't take for granted because it is not something that exists in every company, speaking from personal experience. And there's also company values that are in the hallways on every floor of the office over there. And part of the things that we're encouraged to do is to work on our craft skills, to work on ourselves professionally, and there is a lot of tools that are given. I know that the company uses Degreed, where there are custom learning tracks set up to pursue different disciplines, to aggregate different learning tracks that are related to different disciplines. And then we have a system of tracking your goals across the span of a year and define your goals, and one of the areas in which their primary goals are supposed to be is how are you improving yourself professionally. So that is part of my goals that I get evaluated on, and it's not something where you have to sneak it in on the side. So I'm very lucky in that way.
KM And can we go from the idea of just learning in general, can you tell me just a super tiny thing that you've learned recently? It could be a new API in Java. What's one little tidbit of something that was like, “Oh, that's how that works.”
YE Right now on the team that I'm working on relating to service-wide reliability where we’re having an initiative to add tracing to every service. Tracing is using OpenTelemetry, so I just spent some time learning about that, and it's really a fascinating platform. Once you have it set up for a service, but especially once you have it set up for all services, then for any thread where there is an exception or a sample of the threads that are executed successfully, you can actually see the steps that it went through and the amount of time that it took throughout not only your service, but the service that it called, and the service that that service called, as many services as you need down the line to see why. In the case of the service that my team works with, there could be a request that takes six seconds, but it might have gone to four other services during that time. Where was it taking the extra time? And without tracing, it's very, very hard to know that other than ask the first service to look at it and then they could ask the other service. So OpenTelemetry has been something that I've learned recently and we're adding it on to dozens and dozens of services. It’s pretty fascinating how it works because back at Stack Overflow we used MiniProfiler to give us the same insight that you could see for one request, and now this is working even on production for millions and millions of users to be able to see how requests go through and be able to see the impact that they have. And also there are aggregated metrics that get aggregated on the client and get sent through to an aggregation server and then ultimately displayed in a Wavefront, which is a tool to view time series based metrics to give really big insights into performance based on that and then see how things all interconnect both as a tool for improving performance and also for diagnosing issues. It’s very, very interesting.
KM We've actually also adopted OpenTelemetry, and I love that little flame graph of, “Here's one request’s lifecycle, and here's all the different pit stops it made along the way, and here's all the SQL calls that it made,” and it helps you really be able to figure out where the bottleneck in that one request is without interrogating each individual service, which maybe has its own kind of bell curve of latency or time. I want to look at this one request– what happened?
YE And especially I think in an environment where you have so many different teams and services, if you have something that's relying on other services down on a chain, it's so easy to say, “Well, I'm okay. I don't have to feel bad. The user might wait for a long time, but what else can I do? I'm going to spend all my time going from service to service to try this one edge case?” But with a tool that gives you insights, you can have a big impact not just for your users, but let's say three services down the line they're missing an index or they have a scaling issue or whatever it is. You're not the only one feeling pain from that. With the microservices and such a big organization, there might be a dozen teams that are all feeling pain from that one missing index for services down the line. And you could send somebody a message on Slack saying, “Here's a thread identifier where I see this and I think you might want to look at the index,” and they fix that. You reach out, you take five minutes. You saw that in tracing, for example, they fix it. That might improve the performance of many services and thousands or tens of thousands of users from something that had been a black box before that. So tools like that can be very powerful when you can use them the right way.
[music plays]
KM It's been so fun talking to you. Are there any links to socials or anything that people who want to find you can go explore?.
YE @Yaakov on Twitter. On LinkedIn, you can find me by just looking for my name and you can read about some of the other ideas I have, especially related to developer productivity and the best habits.
KM Awesome. Well, it was great getting to talk to you today. Thanks for your time and we'll be in touch.
YE Thank you, Kyle. Good to see you.
[music ends]