Adam Cheyer is a pioneering AI technologist whose innovations have fundamentally shaped today’s intelligent interfaces. As co-founder of Siri Inc. (acquired by Apple), he served as a Director of Engineering at Apple’s iOS group, and later co-founded Viv Labs (acquired by Samsung), Sentient Technologies, and played a founding role in Change.org.
Adam Cheyer was Chief Architect of CALO, one of DARPA’s largest AI projects, authored over 60 publications and holds more than 25 patents
In recognition of his achievement, he received his alma mater Brandeis University’s 2024 Alumni Achievement Award—for transforming a long?standing AI vision into everyday tools used by hundreds of millions.
Now represented by Champions Speakers Agency, he continues to speak globally on how organisations can harness AI with responsibility, scale, and impact.
Q1. How do you see the role of data management in enabling AI capabilities and bringing data to life for organisations?
Adam Cheyer: “AI systems are built on two foundations: algorithms and data. The algorithms themselves are well established, but without high-quality, well-organised data, they can’t deliver real value. Data is the fuel that powers every AI application, and managing it effectively is now a mission-critical skill for any organisation developing AI.
“With the rapid acceleration of AI in recent years — especially in the past six months — the ability to handle, refine, and govern data has shifted from being a technical advantage to an essential requirement across industries.”
Q2. What challenges have you faced when managing large data sets?
Adam Cheyer: “I’ve been building AI systems over 30 years, so it’s changed a little bit over time. Clearly, the first issue is just storage and management and processing of the data. The data now is so large. Back in the 80s and 90s that wasn’t quite as essential, it was smaller data sets, but today the data sets are huge.
“So, you need a system that can store it efficiently in a distributed way, and we’ve used various systems over the years to do that. You need a system that can process this huge amount of data in parallel at scale.
“One of the key areas in data management for me is data quality. Even if you work with data companies – and when we were a start-up, and then even at Apple for instance – many of the data sources come from other places, other vendors, and surprisingly the data is not always in perfect clean form.
“So, you need to have a process and tools and a pipeline that goes through and takes that data, cleanses it, adapts it, and often if you have multiple sources you need to integrate data together, and that can be a real challenge.
“There are standard systems, ETL systems etc., but sometimes you need proprietary algorithms. As an example, with Siri, when we were a start-up, you would get millions and millions of restaurant name data and business name data.
“If you had something like Joe’s Restaurant and Joe’s Bar and Grill – are they the same or not? That’s a real problem. Joe’s – probably you’d say yes, but Joe’s Pizzeria and Joe’s Grill maybe not, right? And so, how do you know?
“There’s a lot of work that goes into cleansing, integrating data.
“And then the final thing I’ll mention, which is a big topic in data management, is privacy and security. Once you have data coming in from users, there are standards, issues, and regulations that mean you need to be able to ensure that the data you have is accessible only by the right people, that it is secured and protected, and that it keeps privacy as much as possible – standardised.
“At Apple, we had a number of techniques and teams, and there’s a lot that goes into that. So, you need good systems, good processes, and to set up your organisation to be able to handle all of these challenges.”
Q3. How do you manage data privacy when building large AI systems?
Adam Cheyer: “Absolutely, so it is a challenge. Your first tendency is, well, we just record everything, but I think that’s the wrong approach. You really need to be thoughtful about what is saved, especially if it involves personal information, and what is not saved.
“Step one for me is: it’s only an issue for privacy if it’s stored somewhere. Make sure you’re only storing things that you’re actually going to use and actually need to provide value to your customer. So that, for me, is step one – really analysing the data, what you have and why you need it.
“That gets into a principle that I’ve used all the way through, which is transparency and control by the user. Where storing data, it shouldn’t just be some terms and conditions somewhere. They should know what is stored and why.
“I’ll give you a very simple example. With Siri, you could say, “Find restaurants near my house.” Siri would come back and say, “Where do you live?” You would say, “I live here.” Then it would say, “Do you want me to remember that?” The user could have a decision.
“If I say yes, it’ll be stored – some privacy risk – but there’s also convenience. Now I can say, “Find movies near my house.” So being able to have that choice of what gets stored as much as possible, and then to give them the control – maybe they stored it but want to change their mind later. At every point, when it’s stored, it’s stored in a way that the user can see – transparent – and also control.
“That trade-off of value versus privacy – I always try to do that. Obviously, best practices around data storage and doing it in a secure way, having audits.
“We had a whole team at Apple, separate from the engineering and product teams, whose job was to almost go in as hackers and validate every assumption, so that if Apple ever gets pulled into Congress or something, they say we are guaranteed sure that this data has been secured in the way that it was said it would be done.
“So, having the organisational aspect – for me those are the ideas: get just the data you need, make it transparent, give users control as much as possible to show the value, use best practices on the system side, storage etc., but also organisational side, have audit teams and really keep checking compliance is being met.”
Q4. How do you manage bias when building AI models?
Adam Cheyer: “Algorithms aren’t biased, they’re just pattern-matching, pattern-seeking devices, and they’re looking for patterns – that’s their job. They’re looking for useful patterns in data, that’s how machine learning works. But the data that you give will have patterns, and some of them are going to be desirable and some of them are not desirable.
“Society and morality and all of this goes into it. Machines don’t know about any of that. They just take the data you have.
“So, when you are going through this process of data collection, the data that you’re going to feed to an AI to try to discover hopefully useful patterns, there are a number of techniques you need to consider to, as best as possible, ensure that your data sufficiently represents the customer base you’re trying to meet.
“If you get a disfluency between those, if the data is not well representative, it’s a problem.
“So you need to have, first of all, teams who are looking and working explicitly on bias. They know there’s going to be some patterns in there – is it well represented according to our morals, our values, what we want to be able to give to the customer? If you just take random data and throw it in, it’s not enough. You need to have active investigation, active tools, and people to do that within your organisation.
“Of course, diversity is often one of the areas that bias is an issue. So, have a very diverse team, maybe a team trained and understanding what to look for.
“The other thing – even if you have an organisation trained to try to reduce problems of data bias, you may not capture everything. So being able to allow, listen to, and be responsive to end user feedback – if they raise an issue, to say, “We hear you, we care, and we’re doing something about it.” That will get you a better solution, you’ll have more perspectives.
“The last thing for me, when I deal with AI, is that there’s bias issues but also safety issues. The same topics, especially with these large language models – there are things you don’t want an AI system to say for safety reasons, not for bias reasons.
“For instance, you don’t want to be able to ask a question like, “How can I create a bomb to eradicate the human race?” and have it say, “Here, happy to give you the recipe.” Maybe not the best thing from a safety perspective.
“So, the similar techniques you use for bias, you also need an organisation using the best tools, best practices, to try to protect against safety issues.”
Q5. Do you believe AI will replace humans or rather assist us?
Adam Cheyer: “AI is a tool to help humanity solve big problems and achieve more with our lives. AI does not have the ethics and morals and values that we do, so you need people to teach a machine what. So that when it reflects back, we’ve given it to us – we don’t get offended.”
Q6. Can you share some examples of projects you have worked on and what this has taught you?
Adam Cheyer: “I’ve been doing AI and machine learning in many forms and many techniques over time. The way we use LLMs is different than the way we used some of the models earlier on.
“But often, when you’re working with data and these algorithms, there are many things you can try to do to lift the performance and accuracy of the models you’re building.
“For instance, with LLMs specifically, they’re usually based on a very large pre-trained model – trained over all sorts of text and data on the web and elsewhere, books – and it knows a lot, and it’s great. But think of that as background knowledge.
“To solve your problem for your customers or your organisation, a general model will get you so far. But being able to use transfer learning, fine-tuning the pre-trained model using data that is more adapted and specific to your use cases, will make it better, improve performance and accuracy.
“In fact, many organisations out there, when they’re thinking about how to get value from AI, the first question I would ask is: do you have proprietary data? Do you have data that no one else has? Because if you do, you can use that to create an AI system that no one else can, and that will give you an incredible advantage in the industry.
“That can come both from fine-tuned data – documents or files – but also from what’s known as human feedback in the loop. Just being able to grade, for my use case, for my customer base, this is a better answer than that. That feedback can be fed back into the LLM machine learning model and improve performance.
“Some would say that, for example, the OpenAI models – ChatGPT models – are more advanced than, say, Google Bard. Google has an incredible amount of data but hasn’t done as much work on that tuning, human-in-the-loop feedback, so they’re a little bit behind.
“So, being able to adapt a model is extremely important, both using data you have – proprietary data – and human training data.
“Previously, before LLMs, we used many other techniques, such as data augmentation. At one point, we would get data of use cases but augment the data by substituting in different synonyms and different phrases, as a way of creating more data. As you mentioned earlier, typically more data, as long as it’s clean and representative, can improve the model.
“We also used techniques such as ensemble learning, where you have multiple models working together to solve a problem. Often, when we were at Samsung – I sold a company to Samsung, and we were doing a lot of AI work there – we used ensemble models as a way to lift performance.
“So, those are some things that I would try when wrestling either with LLMs or with more standard machine learning models, as a way to squeeze out the best accuracy and performance out of AI.”
See more stories here.