The Stack Overflow Podcast

Say goodbye to "junior" engineering roles

Episode Summary

On today’s episode we chat with Kirimgeray Kirimli, a director at Flatiron Software and CEO of Snapshot Reviews, a tool that measure developer productivity based on activity from Github, Jira, standups, and more. Kirimli explains how Snapshot Reviews tries to measure a developer's true impact, not just the volume of their activity. He also speaks to the growing power of AI coding assistants and suggests "junior engineer" is not likely to be a job available to humans for much longer.

Episode Notes

How would all this work in practice? Of course, any metric you set out can easily become a target that developers look to game. With Snapshot Reviews, the goal is to get a high level overview of a software team’s total activity and then use AI to measure the complexity of the tasks and output.

If a pull request attached to a Jira ticket is evaluated as simple by the system, for example, and a programmer takes weeks to finish it, then their productivity would be scored poorly. If a coder pushes code changes only once or twice a week, but the system rates them as complex and useful, then a high score would be awarded. 

You can learn more about Snapshot Reviews here.

You can learn more about Flatiron Software here.

Connect with Kirim on LinkedIn here.

Congrats to Stack Overflow user Cherry who earned a great question badge for asking: Is it safe to use ALGORITHM=INPLACE for MySQL?

Episode Transcription

[intro music plays]

Ben Popper Level up your Gen AI skills with Neo4j Graph Academy online courses. Learn to ground LLMs with a knowledge graph for accuracy and build a reliable chatbot. Start today at neo4j.com/LLMs. 

BP Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, joined as I often am by my colleague, Ryan Donovan– blog editor supreme, newsletter maestro, technical writer, our excellence. Ryan, today, we're going to be talking about two things near and dear to our heart. One is how do you evaluate a software engineering team? How do you know if they're being productive? How do you know if they could be more productive? How do you know if something has gone awry that could be addressed? Maybe it relates to developer happiness, maybe it relates to the tooling, who knows. We've talked a ton on this podcast about DORA metrics and other ways of doing that. And then in doing that, we're going to be chatting with Kirim who is president over at Flatiron Software, and we might dip into a few other related topics connected to what they do– for example, helping companies get an engineering team together through nearshoring for a project that they have, and what's going on in the world of AI-assisted development and how that might be sort of impacting the market overall. So, Kirim, without further ado, welcome to the Stack Overflow Podcast.

Kirimgeray Kirimli Nice to be here. Thank you very much for having me. 

BP So for our listeners, do a quick flyover. How did you get into the world of software and development and what led you to the role you're at today? 

KK I've been in software development for almost 20 years now. I studied computer science, then did software development as an engineer for many years, and then most recently, before I started working with Flatiron Software, I was a VP of Engineering at a large media company. I've done pretty much all the roles in software engineering and now at Flatiron I'm working to provide software engineering for companies that need help, that need product development, and anywhere from startups to Fortune 500 companies. We're trying to help companies anywhere from small scale to large scale Fortune 500 companies. And while doing that, we've noticed that there's a big need in the world of software engineering for evaluating good software performance. We've run into this because we believe that we're a great company, but we've always struggled to quantify why are we better than other companies? Why would you choose us versus the next company that has a bunch of engineers? And as we were trying to quantify our success, we ended up developing a tool that helps us measure the efficiency of our teams, that helps us quantify why we think we can do a better job, why we think we're better in delivering products, and that tool is called Snapshot Reviews. It started off as a 360 review platform that allowed engineers to put their 360 review data. Then we continued with adding their sprint respective data, eventually added a stand up module that allows engineers to add their daily updates, then we started pulling in metrics from GitHub– and not just metrics, but the actual code base and the pull request changes. And that's where it started getting interesting, because at that point we started having all the updates coming in, and we also started having all the GitHub data. Still not very helpful because the activity you have in GitHub without context is not really indicative of anything. We started pulling Jira tickets. So we would have some idea, but still without any deeper understanding, these metrics can be gamed, and they're not necessarily indicative of performance or better work, for that matter. Then eventually, as AI became more popular and more accessible, we started taking a new approach, which essentially is what really transformed the product and made it something really useful. We added a feature that allows you to train an LLM based on all of this data, all of that information, and the LLM actually has access to the code base that you've made, the updates that you've given, the tickets and the ask in the ticket, the code review comments that you've submitted and received, as well as 360 review feedback you've had from your peers every six months, and finally, the sprint retrospective information. And this gives the LLMs infinite ability and a lot of visibility into what's happening so you can see a full picture. We have the dashboards that allow you to see static data, like how many pull requests have you submitted, you versus the rest of your team, how many reviews that you've done, and how often you code. And those are all fairly standard metrics that you can find across the internet, though we do think we go a little bit further with that. I think that our bread and butter is actually the AI portion where we feed that information into the AI, where the AI reviews the code and can tell you the complexity of the work you've done. So I might have five one-liner pull requests, and on a standard graph you won't be able to see much and it doesn't mean anything. But what we do is we ask the AI to create a difficulty and complexity score looking at both the ticket and the pull request that allows you to get an understanding better. Then we use that to map some graphs. We are working on a chatbot feature that essentially allows you to ask, “Why has this ticket taken so long?” or things like, “What are some of the issues that you've noticed here?” or, “Given the past five tickets of a certain person, what is your feedback? What is your point of view, given that you have access to all of this other information that I've just mentioned?”

Ryan Donovan So we've talked a lot about productivity and written about it, and there's been some question with people that can you even measure productivity for developers and teams. Obviously, like you said, just counting lines of code and number of commits gives you the wrong sort of impression. Any metric you use to measure becomes a target and it'll skew how the team works. How do you avoid having these sort of quantitative skews while still providing a sort of knowable, usable metric to understand productivity? 

KK The short answer is, at the end of the day, if you really want to game the system, you can. With the AI, we're improving and we're trying to solve that, but if an engineer really wanted to game the system, they could. But here's where it gets a bit more interesting. This is mainly made for engineers working remotely. So even if I wanted to game the system, A, I work within a team. There will be some information that gets released on my stand up updates where you can see that or the tickets that I'm working on, which is where the context actually comes into play, because we're not just looking at the GitHub data. We're not just looking at the amount of pull requests that I open, but there's also a lot of feedback from the team in the sprint retrospective in the stand up updates that I provide. And if I really wanted to game the system where I make the stand up updates work and everything else within that ecosystem, I kind of end up doing my job. Yes, obviously there might be still some edge cases, but the idea is that for 90 percent or 80-90 percent of the engineers, you’ll ensure that there's at least some productivity going on while working on these tickets or the quality of these tickets. It’s not something we're able to answer at the moment because that's potentially a next feature for us, but our focus on the engineers themselves and the work that the team does, assuming that the team is guided as well as possible, I believe we're doing a good job there so that with a remote set up, we're able to at least ensure that the engineers are doing their best, that they're putting their best work out there, and with fairly high accuracy that they're not necessarily gaming the system. And a certain metric could be skewed a little, but if you have data coming in from five major sources, then it becomes a lot more difficult to game that data. There's only so many one-liner pull requests that I can open to game the system when AI is looking at my stand up data and sees that I've been working on the same ticket for the past three weeks that clearly seems to be fairly simple looking at the Jira context.

BP So what makes you confident, as you were saying before, that an AI system grounded in an LLM as its key technology is going to be able to assess quality? There might be ways of looking at how quickly or how fluidly you move from stand up to ticket to pull request to production– that might be a metric. But I do think that I've played around with this on my side as a writer. I've said, “Can you write this for me in a way that explains it to a five year old? Now can you write this for me as a college essay? Now can you write this for me in the style of The New Yorker?” And it's very good at adapting to that and understanding complexity of language. Are you saying that it has similar skills for code, and if so, how did you assess that? How do you benchmark that? 

KK There's no straight-up benchmarking. We're trying to do an approach that is very similar to the Fibonacci numbering that a lot of scrum teams use. So we're having the AI use Fibonacci numbers and we're actually asking the AI, “Can you rate the complexity of work as a scrum team would rate the complexity of a pull request?” And maybe we've actually chosen the wrong word to describe this. Whenever I say ‘complexity,’ a lot of people think about nested if loops, the cyclomatic complexity– not at all that. So if you put six nested statements inside, that's not the complexity metric we're looking for. In fact, the end result might be a lot simpler so this does not necessarily mean it is a difficult ticket. And individual tickets still can be mistaken, so we can still get some skewed results on an individual ticket. But if you do this on a large scale, if you do this on six different teams over the course of several months, you'll get an idea. And an individual engineer might have a period of a few days where they're not actually that productive, and that's acceptable and that's normal because we're all humans. But the idea of this tool is to provide a picture that gives you a mid to long term view of how your team is working. And while you might miss individual metrics, individual tickets and pull requests, we're trying to get a good understanding over the mid to long run where we average those out. So we're trying to make sure that a team performs well within a certain amount of time. If I look at any of my teams and their metrics for the next sprint, some of the best engineers might not necessarily have the best metrics, but over time, when I look at them, they all do and their metrics are always better in the sense that they are able to output more complex tasks, they’re able to be more persistent with their work and they're present and they're actually able to deliver what is considered as more complex pieces of work. And mind you, we're assessing the complexity of the ticket, the work that's been done independently from what the team gives you a score of what that ticket is going to be. So we have two different measurements of that complexity where you can also put them side by side. 

RD So I read an article a while back about somebody talking about why they couldn't fire their worst engineer, the worst engineer by the metrics. And this was somebody whose contribution was a lot of knowledge sharing, a lot of mentorship, they made the team better by their presence but they didn't get a lot of lines of code in and they didn't get a lot of tickets cleared. How do you assess the value of somebody like that and sort of differentiate them from somebody who actually isn't pulling their weight?

KK Somebody like that who's acting as a mentor, I would expect them to be more active on the pull requests. I would expect them to be more active and explain that behavior in their standups. And I would expect the team to provide that feedback in the retrospectives. So going back to the earlier statement, if you're looking just at your GitHub data, this information will get lost, but we're trying to incorporate a wider set of data points. I would expect that person to be more active on writing Jira tickets or commenting on them or editing them directly if they are actually helping the team go through, or creating them in the first place, potentially. They might not necessarily implement them. And to answer your question, I think once we have that information next to GitHub, and that's also why I think this is worth developing a tool, because if we solely relied on GitHub data, what are we? GitHub can just snap some LLM on their data and do much better than I do. But what we try to do is essentially combine independent data sources as well as creating our own data entry methods to enrich what we have in there that allows you to see a clearer picture that you otherwise wouldn't be able to see. 

BP So let's move for a second over to, you're using AI here in a way to help evaluate folks. And a lot of it, like you said, is based on the human input through a lot of very in-person activities. How do you talk in your stand ups? How do you talk in your reviews? It's not like the AI is judging solely on its own through some abstract process, it's got a lot of human feedback that it's working from as its data source. But one question that comes to my mind is, if folks are beginning to lean more and more on AI-assisted development, and if there might be an AI in the loop writing code or debugging or doing various things, how can you adapt to that, and what impact would that have on evaluating software teams? That's my first question, and then just sort of more broadly, since a lot of what you said that Flatiron does is help companies that need it build a software team to get a great project done, what is your recommendation to them in terms of adopting code gen tools?

KK Given the current state of AI, there's a lot of potential for AI to aid code. We actually developed a code review tool, AI based code review tool, just like every other company, that helps us give ideas. AI, I don't think at the moment, is mature enough to write an entire software, but I'm sure within a few years it will be able to replace junior engineers. That's the guess that I have that you should be able to give AI a detailed enough Jira ticket, your code base and the previous pull requests, and it should be able to write that. That could be on Snapshot’s product road map to do that, given the information. However, at the moment, if you're a junior engineer working towards a ticket, having AI help you will actually clear out a lot of the problems that you would otherwise have in your pull request that would be caught by a senior engineer. A lot of the best practices AI might be able to point out and essentially help out the team produce overall high quality work. And that's the impression that we've had, that's the experience that we've had with AI. However, I wouldn't trust it at the moment, although I'm sure within a few years that's going to be possible, at the moment, I wouldn't trust AI to go full on develop something from scratch. Or honestly, the amount of prompts that you need to give to AI to correct that might be more than actually developing yourself. 

BP What do you say then to somebody who just graduated high school and is starting college and has always thought that a computer science degree interested them and that a role as a software developer is high paying and often in demand, to say in three or four years the AI could be a junior developer, that might be a frightening thing to folks who would need an entry point into the industry. So how do you respond to that? 

KK First of all, I think if you're interested in being a developer, I feel like there's still going to be a market for it. It's just that it'll be AI-aided development. When I started my development, it was me just being a junior developer, but if I had an AI as a junior developer, I might actually be able to produce things a lot more productively than I otherwise would have. So my advice to anyone who's in high school now looking to get into development is that they should still do some development, and it's good to understand how it works and there are a lot of best practices and you're still there to ensure that AI did the right thing, because it might not. A lot of the AI-generated content I see looks right, but sometimes it might not be. So the junior developer’s job there is going to be to verify that information. It might be able to write a way more efficient code than they otherwise would have written, but the junior developers still needs to be able to verify that information and that requires computer science knowledge. 

BP I like what you're saying. It's sort of like saying that nobody will come in as a junior developer anymore, you'll come in as a new developer on the team working almost at an intermediate level because you'll have this AI assistant empowering you. 

RD I think that's an interesting idea, but I think the junior developer doesn't always have the context for good software engineering practices to sort of evaluate the code coming in. We have an upcoming blog post about how software engineering is basically a mentorship program. You come in as a junior dev, go through the battle trenches, and you come out a seasoned intermediate dev. How do we get those entry level junior devs to move up to senior devs if we don't have them writing the bad code that gets kind of kicked back in peer review?

KK When I say AI aided code, it's not always AI aiding you at the stage of writing. Sometimes it's AI aiding you at the stage of code reviews. And we're talking about a few years from now, we're not talking about an immediate presence. Even the code review tools we have right now are fairly decent, so I'd hope that by then the code review tools themselves will actually be a little bit more advanced. So this is how I envision the process: junior developers are going to sit in front of their computer, look at their ticket, maybe get advice from AI on how to get it started, look at the Copilot, write something up, the AI is going to suggest how to improve it, etc. Then you're going to open a pull request, an AI code review bot is going to go through your commits, actually give you feedback, hopefully more than individual function scopes but I'm talking about the scope that covers the entire pull request, and then you follow the regular code flow where a senior engineer can still come in, still give you feedback. And by the way, for me at least, when I was a junior engineer, the most amount of learning I've ever done was when a senior engineer gave me coding feedback. So we can still keep that and the AI aided bit doesn't stop that from happening, but rather than a senior engineer telling me to remove print statements or do not build a massive function, that could be said by the AI where the senior engineer could actually give me feedback that has more insight about the larger project. 

RD The senior engineer is no guarantee. I've definitely heard of senior engineers creating very strange function definitions where all the parameters are concatenated into a single parameter.

KK I've also had my time with senior engineers that are not necessarily giving the best advice, but in that case, AI could even work as an arbiter between the two parties, maybe even help the senior engineer a bit too. 

RD There you go. 

BP I'm very curious. One thing we haven't touched on is the other thing Flatiron does, which is helping companies to find a great software team to help them with a project. You mentioned nearshoring, which I guess means hiring within the location that you're at as opposed to going offshore. How do you view the job market at the moment? Something that's come up multiple times on the Stack Overflow Podcast over the last few weeks and months is that for the first time since 2010, developers who lose their jobs are having difficulty finding new jobs. They're not constantly being hounded by recruiters and the slimming down that happened at the large tech companies as well as some of the licks that are being taken by startups means that the availability of software jobs or the demand for software engineers is at a low point not seen since maybe 2008 or the dotcom crisis of 2001. Do you have a perspective on the labor market, given that you help companies find talent? 

KK What we've seen is that the demand for development hasn't gone away, however, it's become incredibly difficult to actually hire developers onshore. Again, this helps my business plan perfectly well, which I'll get to, but I'm comparing the amount of money that I've made when I started as an individual contributor, and I'm looking at the starting salaries now and they've increased two to three fold. It's become very expensive to get a developer onshore, and companies like mine– and we're not even one of the bigger fish in the sea– there are a lot of companies that are much bigger than us and they've all made it easier to have developers so similar or comparable work if not better for a lot cheaper. And not to mention that the increase of remote working actually has opened the American job market to the rest of the world in terms of software engineering much better than anything else has done. If you want to get an onshore engineer, it's a lot more difficult to get them a visa, and that's also made it more expensive on the engineering side. But at the same time, you don't necessarily need to do that because most of these software engineers are not even going to come to the office, they're going to work remotely. And if you're going to have someone work remotely, why not pick someone who's a fraction of the cost? And that's why I think companies like mine or our competitors have actually been so successful lately. We're seeing in the other companies, we're hearing in the industry that there's huge growth for companies that provide quality nearshore software development. And I could not emphasize quality more because there are a lot of companies that do not, and there might not be growth for companies that do not provide quality, but for companies that provide premium nearshore experiences nearshore engineers, staff augmentation models, the growth has been incredible. And there's only more that we can provide. 

BP It's interesting to think that maybe what software developers are going to have to do is start accepting lower salaries because the times have changed and the salaries were extremely high and grew extremely fast. Another thing that's come up is, when you talk about nearshoring, folks can move to areas where the cost of living is a lot less and they can still get a remote job and then a lower salary goes a lot longer. And I've heard some interesting stories about places like Boise, Idaho becoming mini tech hubs during the pandemic as people realized, “Well, all my job is remote at this point, so why don't I just live somewhere where I can really lower my cost basis for these other things?” But Ryan, let me pass to you. 

RD I think definitely the cost of software engineers has gone up a ton. What do you think the reason is that the salaries have tripled in the last 20 years?

KK There's obvious economic reasons that are probably above my pay grades, but inflation is one point. Another spike that I've seen was around 2016/2017 when at some point I believe it became very difficult to sponsor engineers. I think I remember seeing a spike then. And then immediately after COVID there was another spike where everybody wanted software engineers. However, and this actually goes back to Ben's point, the salaries for an engineer in San Francisco or New York have gone up, but those engineers stopped living there and they started living in Idaho or Wisconsin. And when you're living in Wisconsin, then you're opening the market up for an engineer in Wisconsin as well, and that engineer might not necessarily need as much as someone who's living in San Francisco and that actually changed dynamics. So now you have an engineer who used to make 3-400K from Facebook that gets laid off and they're not able to find a comparable job because someone else is going to do that for 200 onshore or less than a 100K nearshore. And I'm sure they're a great engineer, but as the company, why would I pay almost double what I could pay, and that's I think where the main change has happened. I should probably mention a lot of companies have received incredible investments over the years from VCs, and that investment immediately translates into hiring more engineers which then created a vicious cycle that until recently was only increasing the engineer compensation.

[music plays]

BP All right, everybody. It is that time of the show. Let's thank a user who came on Stack Overflow and shared a little knowledge or expressed a little curiosity that helps everybody out, because when a question and answer pop up, everybody gets to learn. A Great Question Badge, awarded to user3077466: “How do I unzip a file in WinSCP script with SSIS execution script task?” All right, another one here, maybe a little nicer. Awarded to Cherry 53 minutes ago: “Is it safe to use ALGORITHM=INPLACE for MySql?” Well, if you wanted to know, Cherry asked, earned a Great Question Badge, and helped 10,000 other folks who had the same question. So Cherry, we appreciate your curiosity. As always, I am Ben Popper. I'm the Director of Content here at Stack Overflow. You can find me on X @BenPopper. We've had so many great guests recently and talked about really interesting things based on listeners who wrote in, so hit us up, podcast@stackoverflow.com. Maybe you can join the show or at least give a suggestion for a topic. And last but not least, if you enjoyed today's program, the nicest thing you could do to help us out would be to leave us a rating and a review. 

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog, and if you want to reach out to me for any reason, you can find me on X @RThorDonovan.

KK My name is Kirimgeray Kirimli, President at Flatiron Software and CEO at Snapshot Reviews. You can find me on GitHub by looking for Kirim, or on Twitter or Instagram by looking for @Kirimgeray, and similarly on LinkedIn. I'd love to hear from you. For Snapshot Reviews, please go to snapshot.reviews. 

BP Wonderful. Thanks for listening, and we will talk to you soon.

[outro music plays]