Aishwarya Naresh Reganti on Making AI Work in Production – O’Reilly

As the founder and CEO of LevelUp Labs, Aishwarya Naresh Reganti helps organizations “really grapple with AI,” and through her teaching, she guides individuals who are doing the same. Aishwarya joined Ben to share her experience as a forward-deployed expert supporting companies that are putting AI into production. Listen in to learn the value all roles—from data folks and developers to SMEs like marketers—bring to the table when launching products; how AI flips the 80-20 rule on its head; the problem with evals (or at least, the term “evals”); enterprise versus consumer use cases; and when humans need to be part of the loop. “LLMs are super powerful,” Aishwarya explains. “So I think you need to really identify where to use that power versus where humans should be making decisions.” Watch now.

About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2026, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

Check out other episodes of this podcast on the O’Reilly learning platform or follow us on YouTube, Spotify, Apple, or wherever you get your podcasts.

Transcript

This transcript was created with the help of AI and has been lightly edited for clarity.

00.58
All right. So today we have Aishwarya Reganti, founder and CEO of LevelUp Labs. Their tagline is “Forward-deployed AI experts at your service.” So with that, welcome to the podcast.

01.13
Thank you, Ben. Super excited to be here.

01.16
All right. So for our listeners, “forward-deployed”—that’s a term I think that first entered the lexicon mainly through Palantir, I believe: forward-deployed engineers. So that communicates that Aishwarya and team are very much at the forefront of helping companies really grapple with AI and getting it to work. So, first question is, we’re two years into these AI demos. What actually separates a real AI product from a good demo at this point?

01.53
Yeah, very timely question. And yeah, we are a team of forward-deployed experts. A bit of a background to also tell you why we probably have seen quite a few demos failing. We work with enterprises to build a prototype for them, educate them about how to improve that prototype over time. I think one of the biggest things that differentiates a good AI product is how much effort a team is spending on calibrating it. I typically call this the 80-20 flip.

A lot of the folks who are building AI products as of today come from a traditional software engineering background. And when you’re building a traditional product, a software product, you spend 80% of the time on building and 20% of the time on what happens after building, right? You’re probably seeing a bunch of bugs, you’re resolving them, etc.

But in AI, that kind of gets flipped. You spend 20% of the time maybe building, especially with all of the AI assistants and all of that. And you spend 80% of the time on what I call “calibration,” which is identifying how your users behave with the product [and] how well the product is doing, and incorporating that as a flywheel so that you can continue to improve it, right?

03.11
And why does that happen? Because with AI products, the interface is very natural, which means that you’re pretty much speaking with these products, or you’re using some form of natural language communication, which means there are tons of ways users could talk and approach your product versus just clicking buttons and all of that, where workflows are so deterministic—which is why you open up a larger surface area for errors.

And you will only understand how your users are behaving with the system as you give them more access to it right. Think of anything as mainstream as ChatGPT. How users interact with ChatGPT today is so much more different than how they would do say three years ago or when it was released in November 2022. So what differentiates a good product is that idea of constant calibration to make sure that it’s getting aligned with the users and also with changing models and stuff like that. So the 80-20 flip I think is what differentiates a good product from just a prototype.

04.14
So actually this is an important point in the in the sense that the persona has changed as to who’s building these data and AI products, because if you rewind five years ago, you had people with some knowledge of data science, ML, and now because it’s so accessible, developers—actually even nondevelopers, vibe coders—can can start building. So with that said, Aishwarya, what do these kinds of nondata and AI people still consistently get wrong when they move from that traditional mindset of building software to now AI applications?

05.05
For one, I truly am one of those people who believes that AI should be for everyone. Even if you’re coming from a traditional machine learning background, there’s so much to catch up on. Like I moved to a team in AWS where. . . I moved from a team in AWS in 2023 where I was working with traditional natural language processing models—I was a part of the Alexa team. And then I moved into an org called GenAI Innovation Center, where we were building generative AI solutions for customers. And I feel like there was so much to learn for me as well.

But if there’s one thing that most people get wrong and maybe AI and traditional ML folks get right, it’s to look at your data, right? When you’re building all of these products, people just assume that “Oh, I’ve tested this for a few use cases” and then it seems to work fine, and they don’t pay so much attention to the kind of data distribution that they would get from their users. And given this obsession to automate everything, people go like, “OK, I can maybe ask an LLM to identify what kind of user patterns I’m seeing, build evals for itself, and update itself.” It doesn’t work that way. You really need to spend the time to understand workflows very well, understand context, understand all this data, pretty much. . .

I think just taking the time to manually do some of the setting up work for your agents so that they can perform at their maximum is super underrated. Traditional ML folks tend to understand that a little better because most of the time we’ve been doing that. We’ve been curating data for training our machine learning models even after they go into production. There’s all of this identifying outliers and updating and stuff. But yeah, if there’s one single takeaway for anybody building AI products: Take the time to look at your data. That’s the most important foundation for building them.

07.01
I’ll flip this a little bit and give props to the traditional developers. What do they get right? In other words, traditional developers write code; some of them write tests, run unit tests [and] integration tests. So they had something to build on that maybe the data scientists who were not writing production code were not used to doing. So what do the traditional developers bring to the table that the data and ML people can learn from?

07.40
That’s an interesting question because I don’t come from a software background and I just feel traditional developers have a very good design thinking: How do you design architectures so that they can scale? I was so used to writing in notebooks and kind of just focusing so much on the model, but traditional developers treat the model as an API and they build everything very well around it, right? They think about security. They think about what kind of design makes sense at scale and all of that. And even today I feel like so much of AI engineering is traditional software engineering—but with all of the caveats that you need to be looking at your data. You need to be building evals which look very different. But if you kind of zoom out and see, it’s pretty much the same process, and everything that you do around the model (assuming that the model is just a nondeterministic API), I think traditional software engineers get it like bang on.

08.36
You recently wrote a post about evals, which was quite interesting actually, [arguing] that it’s a bit of an overused and poorly defined term. I agree with the thesis of the post, but were you getting frustrated? Is that the reason why you wrote the post? [laughs] What was the genesis of the post?

09.03
A baseline is most of my posts come out of frustration and noise in this space. It just feels like if you kind of see the trajectory. . . In November 2022, ChatGPT was out, and [everybody was] like, “Oh, chat interfaces are all you need.” And then there was this concept of retrieval-augmented generation, they go “Oh, RAG is all you need. Chat just doesn’t work.” And then there was this concept of agents and like “Agents are all you need; evals are all you need.” So it just gets super annoying when people hang on to these concepts and don’t really understand the depth of it.

Even now I think there are tons of people who go like “Oh, RAG is dead. It’s not going to be used” and stuff, and there’s so much nuance to it. And with evals as well. I teach a lot of courses: I teach at universities; I also have my own courses. I feel like people just stuck to the term, and they were like “Oh, there is this use case I’m building. I need hundreds of evals in order to make sure that it’s tested very well.” And they just heard the fact that “Oh, evals are what you need to do differently for AI products” and really didn’t understand in depth like what evals mean—how you need to build a flywheel around it, and the entire you know act of building a product, calibrating it, and building a set of evaluations and also doing some A/B testing online to understand how your users are behaving with it. All of that just went into one term “evals,” and people are just like throwing it around everywhere, right?

10.35
And there’s also this confusion around model eval versus product eval, which is all of these frontier companies build evals on their models to make sure that they understand where they are on the leaderboard. And I was speaking to someone one day, and they went like, “Oh, GPT-5 point something has been tested on a particular eval dataset, which means it’s the best for my use case, so I’m going to be using it.” And I’m like, “That’s not the evals that you should be worrying about, right?” So just overloading so much into a term and hyping it up is kind of what I felt was annoying. And I wanted to write a post to say that evals is a process. It’s a long process. It’s pretty much the process of building something and calibrating it over time. And there are tons of components to it, so don’t kind of try to stuff everything in a word and confuse people.

I’ve also seen people who do things like, “Oh, I’m going to build hundreds of evals” and maybe 10 of them are actionable. Evals also need to be super actionable: What is the information you can get from them, and how can you act on that? So I kind of stuffed all of that frustration into the post to kind of say it’s a longer process. There’s so much nuance in it. Don’t try to water that down.

11.48
So it seems like this is an area where the people that were from the prior era—the people building ML and data science products—maybe could bring something to the table, right? Because they had experience, I don’t know, shipping recommendation engines and things like that. They have some prior notion of what continuous evaluation and rigorous evaluation brings to the table.

Actually I was talking to someone about this a few weeks ago in the sense that maybe the data scientists actually have a growing employment opportunity here because basically what they bring to the table seems increasingly important to me. Given that code is essentially free and discardable, it seems like someone with a more rigorous background in stats and ML might be able to distinguish themselves. What do you think?

12.56
Yes and no, because it’s true that machine learning and data scientists understand data very well, but just the way you build evals for these products is so much more different than how you would build, say, your typical metrics (accuracy, F-score, and all of that) that it takes quite some thinking to extend that and also some learning to do. . .

13.21
But at least you might actually go in there knowing that you need it.

13.27
That is true, but I don’t think that’s a super. . . I’ve seen very good engineers pick that up as well because they understand at a design level “What are the metrics I need to be measuring?” So they’re very outcome focused and kind of enter with that. So one: I think everybody has to be more coachable—not really depend on things that they learned like X years ago, because things are changing so quickly. But I also believe that whenever you’re building a product, it’s not really one set of folks that have the edge.

Another maybe distribution that is completely different is just subject-matter experts, right? When you’re building evals, you need to be writing rubrics for your LLM judges. Simple example: Let’s say you’re building a marketing pipeline for your company, and you need to write copy—marketing emails or something like that. Now even if I come from a data science background, if I were thrown at that problem, I just don’t understand what to look for and how to get closer to a brand voice that my company would be satisfied with. But I really need a marketing expert to kind of tell me “This is the brand voice we use, and this is the evals that we can build, or this is how the rubric should look like.” So it should almost be like a cross-functional thing. I feel like each of us have different pieces to that puzzle, and we need to work together.

14.42
That kind of also brings me to this other thing of collaborating in a much tighter manner [than] before. Before it was like, “OK, machine learning folks get data; they build models; and then there is a separate testing team; there is a separate SME team that’s going to look at how this product is behaving.” And now you cannot do that. You need to be optimizing for the same feedback loop. You need to be talking a lot more with all of the stakeholders because even when building, you want to understand their perspective.

15.14
So it seems also the case that as more people build these things, they realize that actually. . . You know sometimes I struggle with the word “eval” in the sense that maybe the right word is “optimize,” because basically what you really want is to understand “What am I optimizing for?” Obviously reliability is one of them, but latency and cost are also important factors, right? So it’s just a discussion that you’re increasingly coming across, and people are recognizing that there’s trade-offs and they have to balance a bunch of things.

15.57
Yes, definitely. I don’t see it being discussed heavily mainstream. But whenever I approach a problem, it’s always that, right? It’s performance, effort, cost, and latency. And all of these four things are kind of. . . You’re trying to balance each of them and trade off each of them. And I always say, start off with something that’s very low effort so that you kind of have an upper ceiling to what can be achieved. Then optimize for performance.

Again, don’t optimize for cost and latency when you get started because you just want to see the realm of possible to make sure that you can build a product and it can work fine. And cost and latency [are] something that ought to be optimized for—even when building for enterprises—after we’ve had a decent prototype that can do well on evals. Right now, if I built something with, say, a good mid-tier model and it can hit all of my eval datasets, then I know that this is possible, and now I can optimize for the latency and cost based on the constraints. But always follow that pyramid, right? Go with [the] lowest effort. Try to optimize for performance. And then cost and latency is something that. . . There are tons of tricks you can do. There’s caching; there’s using smaller models and all of that. That’s kind of a framework that I typically use.

17.08
In prior generations of machine learning, I think a lot of focus was on accuracy to some extent. But now increasingly, because we’re in this kind of generative AI world, it’s more likely that people are interested in reliability and predictability in the following sense: Even if I’m only 10% accurate, as long as I know what that 10% is, I would prefer that [to] a model that’s more accurate but I don’t know when it’s accurate. Right?

17.47
Right. That’s kind of the boon and bane of generative AI models. I guess the fact that they can generalize is amazing, but sometimes they end up generalizing in ways that you wouldn’t want them to. And whenever we work on enterprise use cases, I think for us always in my mind—something that I want to tell myself—is if this can be a workflow, don’t make it autonomous if it can solve a problem with a simple LLM call and if you can audit decisions. For instance, let’s say we’re building a customer support agent. You could literally build it in five minutes: You can throw SOPs at your customer support agent and say “OK, pick up the right resolution, talk to the user, and that’s it.” Building is very cheap today. I can literally have Claude Code build it up in a few minutes.

But something that you want to be more intentional about is “What happens if things go wrong? When should I escalate to humans?” And that’s where I would just break this into a workflow. First, identify the intent of the human and then give me a draft—almost be a copilot for me, where I can collaborate. And then if that draft looks good, a human should approve it so that it goes further.

Right now, you’re introducing auditability at each point so that you as a human can make decisions before, you know, an agent goes up and messes up things for you. And that’s also where your design decisions should really take over. Like I could build anything today, but how much thinking am I doing before that building so that there’s reliability, there’s auditability, and all of those things. LLMs are super powerful. So I think you need to really identify where to use that power versus where humans should be making decisions.

19.28
And you touched on the notion of human auditors or humans in the loop. So obviously people also try to balance LLM as judge versus human in the loop, right? Obviously there’s no one piece of advice, but what are some best practices around how you demarcate between when to use a human and when you’re comfortable using another model as a judge?

20.04
A lot of this usually depends on how much data you have to train your judge, right? I feel humans have this problem, which is: Sometimes you can do a task but you can’t explain why you arrived at that decision in a very structured format. I can today take a look at an article and tell you. . . Especially, I write a lot on Substack and LinkedIn; this is a very super personal use case. If you give me an article and ask me, “Ash, will this go viral on LinkedIn?” I can tell you yes or no for my profile right, because I’ve done it for so many years. But if you ask me, “How did you make that decision?” I probably cannot codify it and write it down as a bunch of rubrics. Which is again, when you translate this to an LLM judge, “Can I build an LLM that can tell me if a post will go viral or not?” Maybe not because I just don’t have all the constraints that I use as a human when I make decisions.

Now, take this to more production-like use cases or enterprise-like use cases. You want to have a human judge until you can codify or you can create a framework of how to evaluate something and you can write that out in natural language. And what that means is you maybe want to take 100 or 200 utterances and say, “OK, does this make sense? What’s the reasoning behind why I graded it a certain way?” And you can feed all of that information into your LLM judge to finally give it a set of rubrics and build your evals. But that’s kind of how you make a decision, which is “Do we have enough information to provide to an LLM judge that it can replace human judgment?”

But otherwise don’t do it—if you have very vague high-level ideas of what good looks like, you probably don’t want to go to an LLM judge. Even when building your systems, I would always recommend that your first pass when you’re doing your eval should be judged by a human, and you should also ask them to give you reasoning as to why they judge it because that reasoning is so important for training your LLM judges.

21.58
What are some signs that you look for? What are signals that you look for when one of these AI applications or systems go live? What are some of the signals you look for that [show] maybe the quality is degrading or breaking down?

22.18
It really depends on the use case, but there are a lot of subtle signals that users will give you, and you can log them, right? Things like “Are users swearing at your product?” That’s something we always use, right? “What kind of words are they using? How many conversation turns if it’s a chatbot, right?” Usually when you’re building your chatbot, you identify that the average number of turns is 10, but it turns out that customers are having only two turns of conversation. That kind of means that they’re not interested to talk to your chatbot. Or sometimes they’re having 20 conversations, which means they’re probably annoyed, which is why they’re having longer conversations.

There are typical things: You know, ask your user to give a thumbs up or thumbs down and all of that, but we know that feedback kind of doesn’t. . . People don’t give feedback unless they’re annoyed at something. So you can have those as well. If you’re building something like a coding agent like Claude Code etc., very obvious logging you can do is “Did the user go and change the code that it generated?” which means it’s wrong. So it’s very specific to your context, but really think of ways you can log all of this behavior you can log anomalies.

Sometimes just getting all of these logs and doing some topic clustering which is “What are our users typically talking about, and do any of those show signs of frustration? Do they show signs of being annoyed with the system?” and things like that. You really need to understand your workflows very well so that you can design these monitoring strategies.

23.50
Yeah, it’s interesting because I was just on a chatbot for an airline, and I was surprised how bad it was, in the sense that it felt like a chatbot of the pre-LLM era. So give us give us kind of your sense of “Are these chatbots now really being powered by foundation models or. . .?” I mean because I was just shocked, Aishwarya, about how bad it was, you know? So what’s your sense of, as far as you know, are enterprises really deploying these generative AI foundation models in consumer-facing apps?

24.41
Very few. To just give you a quick stat that might not be super correct: 70% to 80% of the engagements that we take up at LevelUp Labs happen to be productivity and ops focused rather than customer focused. And the biggest blocker for that has always been trust and reliability, because if you build these customer-facing agents [and] they make one mistake, it’s enough to put you on news media or enough to put you in bad PR.

But I think what good companies are doing as of today is doing a phased approach, which is they have already identified buckets that can be completely autonomous versus buckets that would require humans to navigate, right? Like this example that you gave me, as soon as a user comes up with a query, they have a triaging system that would determine if it should go to an AI agent versus a human, depending on the history of the user, depending on the kind of query. (Is it complicated enough?) Right? Let’s say Ben has this history of. . .

25.44
Hey, hey, I had great status on this airline.

25.47
[laughs] Yeah. So it’s probably not you, but just the kind of query you’re coming up with and all of that. So they’ve identified buckets where automation is possible, and they’re doing it, and they’ve done that because of past behavior data, right? What are low-hanging fruits that we could automate versus escalate to humans. I have not seen a lot of these chat systems that are completely taken over by agents. There’s always some human oversight and very good orchestration mechanisms to make sure that customers are not affected.

26.16
So you mentioned that you mostly are in the technical and ops application areas, but I’ll ask you this question anyway. To what extent do legal things come up? In other words, I’m about to deploy this model. I know I have guardrails, but honestly, just between you and me, I haven’t gone through the proper legal evaluation, you know? [laughs] So in other words, legality or compliance—anything to do with laws—do they come up at all in your discussions with companies?

26.59
As an external implementation team, I think one thing that we do with most companies is give them a high-level overview of the architecture we’ll be building, the requirements, and ask them to do a security and legal review so that they’re okay with it, because we’ve had experiences in the past where we pretty much built out everything and then you have your CISO come in and say, “OK, this doesn’t fall into what we could deploy.” So many companies make that mistake of not really involving your governance and compliance folks in the beginning and then end up scrapping entire projects.

I am not an expert who knows all of these rules and legalities, but we always make sure that they understand: “Where is the data coming from? Do we have any issues productionizing this?” and all of that, but we haven’t really worked. . . I mean I don’t have a lot of background on how to do this. We’re mostly engineering folks, but we make sure that we have a sign-off so that we are not kind of landing in surprises.

28.07
Yeah, the reason I bring it up is obviously, now that everything is much more democratized, more people can build—so in reality the people can move fast and break things literally, right? So I just wonder if there’s any discussion at all. It sounds like you are proactive, but mostly out of experience, but I wonder if regular teams are talking about this.

Speaking of which, you brought up earlier leaderboards—obviously I’m guilty of this too: “I’m about to build something. OK, let me look at a leaderboard.” But, you know, I’m not literally going to take the leaderboard’s advice, right? I’m going to still kick the tires on the specific application and use case. But I’m sure though, in your conversations, people tell you all sorts of things like, “Hey, we should use this because I saw somewhere that this is ranked number one,” right? So is this still a frustration on your end, or are people much more savvy now?

29.19
For one, I want to quickly clarify that it’s not wrong to look at a leaderboard. It’s always. . . You know, you get a high-level idea of “Who are your best competitors at this point?” But what I have a problem with is being so obsessed with just that leaderboard that you don’t build evals for yourself.

29.34
In my experience, when we work with a lot of these companies, I think over the past two years the discussion has really shifted away from the model because of two reasons: One is most companies already have existing partnerships. They’re either working with a major model provider vendor and they’re OK doing that now just because all of these model providers are racing towards feature parity, leaderboard success, and all of that. If Anthropic has something, you know, if their model is performing well on a leaderboard today, Gemini and OpenAI will probably be there in a week. So people are not too concerned about model performance. They know that in a couple of weeks, that will kind of be built into other models. So they’re not worried about that.

And two is companies are also thinking much more about the application layer right now. There’s so much discussion around all of these harnesses like Claude Code, OpenClaw, and stuff like that. So I’ve not seen a lot of complaints on “Oh, this is the model that we should be using.” It seems like they have a shared understanding of how models perform. They want to optimize the harness and the application layer much more.

30.48
Yeah. Yeah. Obviously another one of these buzzwords is “harness engineering,” and whatever you think about it, the one good thing is it really elevates the notion that you should worry about the things around the model rather than the model itself.

But speaking of. . . I guess I’m kind of old school in the sense that I want to still make sure that I can swap models out, not necessarily because I believe one model is better than the other but one model may be cheaper than the other, right?

And at least up until recently—I haven’t had this conversation in a while—it seemed to me that people got stuck on a model because their prompts were so specific for a model that porting to another model seemed like a lot of work. But nowadays though you have tools like DSPy and GEPA that it seems like you can do that more easily. So what’s your sense of model portability as a design principle—model neutrality?

32.06
For one, I think the gap between models is much more exaggerated for consumer use cases just because people care quite a bit about the personality, about how the model…

32.22
No, I care about latency and cost.

32.24
Yeah. In terms of latency and cost, right, most of the model providers pretty much are competing to make sure they are in the market. I don’t know. Do you think that there are models. . .

32.35
Well, I think that you can still get good deals with Gemini. [laughs]

32.40
Interesting.

32.41
But honestly, I use OpenRouter and OpenCode. So, I’m much more kind of I don’t want to get locked into a single [model]. When I build something, I want to make sure that I build in a way that I can move to a different model provider if I have to. But it doesn’t sound like you think that this is something that people worry about right now. They’re just worried about building something usable and then we can worry about that later.

33.12
Yes. And again, I come from a very enterprise point, like “What are companies thinking about this?” And like I said, I’m not seeing a lot of competition for model neutrality because these companies have deals with vendors and they’re okay sticking with the same model provider.

Now, when it comes to consumers, like if you’re building something for the kind of use cases that you were saying, Ben, I feel that, like I said, personality is super important for consumer builders. And I still think we’re not at a point where you can easily swap out models and be like, “OK, this is going to work as good as before,” just because you have over time learned how the model behaves. So you’ve kind of gotten calibrated with these models, and these models also have very specific personalities. So there’s a lot of you know reengineering that you have to do.

34.07
And when I say reengineering, it just might mean changing the way your prompts are written and stuff like that. It will still functionally work, which is why I say that enterprises don’t care about this much because the kind of use cases I see are like document processing or code generation, in which case functionality is of much more importance than personality. But for consumer use cases, I don’t think we’re at a point—to your point on building with OpenRouter, you can do that, but I think it’s a lot of overhead given that you’ll have to write specific prompts for all of these models depending on your use case.

I recently ported my OpenClaw from Anthropic to OpenAI because of all of the recent things, and I had to change all of my SOUL.md files, USER.md files, so that I could kind of set the behavior. And it [took] quite some time to do it, and I’m still getting used to interacting with OpenClaw using OpenAI because it seems like it makes different mistakes than what Anthropic would do.

35.03
So hopefully at some point [the] personalities of these models will converge but I do not think so because this is not a capability problem. It’s more of design choices that these model providers have made while building these models. So I don’t see a time where. . . We’re already at a point where capability-wise most models are getting closer, but personality-wise I don’t think model vendors would prefer to converge them because these are kind of your spiky edges which will make people with a certain personality gravitate towards your models. You don’t want to be making it like an average.

35.38
So in closing, you do a bit of teaching as well, right? One of the things I’ve really paid attention to is, in my conversations with people who are very, very early in their career, maybe still looking for the first job, literally, there’s a lot of worry out there. I mean, not necessarily if you’re a developer and you have a job—as long as you embrace the AI tools, you’re probably going to be fine. It’s just getting to that first job is getting harder and harder for people.

And unfortunately, you need that first job to burnish your credentials and your résumé. And honestly companies also I think neglect the fact that this is your pipeline for talent within the company as well: You have to have the top of the funnel of your talent pipeline. So what advice do you give to people who are literally still trying to get to that first job?

36.51
For one, I have had a lot of success with hiring young folks because I think they are very agent native. I call them like agent-native operators. If you’ve been working in software, in IT, for about 10 years or something like me, you’ve gotten used to certain workflows without using AI. I feel like we’re so stuck in that old mindset that I really need someone who’s agent native to come and tell me, “Hey you could literally ask Claude Code to do this.” So I’ve had a lot of luck hiring folks who are early career because they are very coachable, one, and two, they just understand how to be agent native.

So my suggestion would still be around that: Be a tinkerer. Try to find out what you can do with these tools, how you can automate them, and be extremely obsessed with designing and thinking and not really execution, right? Execution is kind of being taken over by agents.

So how do you really think about “What can I delegate?” versus “What can I augment?” and really sitting in the position of almost being an agent manager and thinking “How can you set up processes so that you can make end-to-end impact?” So just thinking a lot around those lines—and those are the kind of people that we’d like to hire as well.

And if you see a lot of these latest job roles ,you’ll also see roles blurring, right? People who are product managers are expected to also do GTM, also do a bit of engineering, and all of that. So really understand the stack end to end. And the best way to do it, I feel, is build a product of your own [and] try to sell it. You’ll get to see the whole thing. [That] doesn’t mean “Oh, stop looking for jobs—go become an entrepreneur” but really understanding workflows end to end and making that impact and sitting at the design layer will be super valued is what I think.

38.34
Yeah, the other thing I tell people is you have interests so go deep in your interest and build something in whatever you’re interested in. Domain knowledge is going to be valuable moving forward, but also you end up building something that you would want to use yourself and you learn a lot of things along the way and then maybe that’s how you get your name out there, right?

38.59
Exactly. Solving for your own problem is the best advice: Try to build something that solves your own pain point. Try to also advocate for it. I feel like social media and all of this is so good at this point that you can really make a mark in nontraditional ways. You probably don’t even have to submit a job application. You can have a GitHub repository that gets a lot of stars—that might land you a job. So think of all of these ways to bring yourself more visibility as you build so that you don’t have to go through your typical job queue.

39.30
And with that, thank you, Aishwarya.

39.32
Thank you.

Aishwarya Naresh Reganti on Making AI Work in Production – O’Reilly

Transcript

Leave a Comment Cancel reply