Myra Cheng, a computer science PhD student at Stanford, has spent a lot of time hearing how undergraduates use AI on campus. “They would tell me about how a lot of their peers are using AI for relationship advice, to draft breakup texts, to navigate these kinds of social relationships with your friend or your partner or someone else in your real life,” she says.
Students reported that AI often quickly took their side. “And I think more broadly,” Cheng adds, “if you use AI for writing some sort of code or even editing any sort of writing, it’ll be like, ‘Wow, your code or your writing is amazing.’ ”
That persistent flattery and unconditional validation struck Cheng as different from typical human responses. Curious about how widespread it was and what consequences it might have, she and colleagues studied the behavior of AI models. In a recent paper in Science, they found models provide affirmations more often than people do, even for morally troubling scenarios — and that people prefer and trust such sycophantic AI, while becoming less likely to apologize or accept responsibility.
Cheng compared AI responses to human judgments using datasets including the Reddit community A.I.T.A. (“Am I The A**hole?”), where users solicit crowdsourced moral judgments about everyday conflicts. In many cases where the human crowd judged the poster to be wrong, several AI models instead reassured the poster they were not at fault. In threads where the community found someone in the wrong, the models affirmed the user’s behavior 51% of the time.
The trend appeared in other advice forums, too. In posts describing harmful, illegal, or deceptive behavior (for example, making someone wait on a video call for half an hour “for fun”), AI models were split: some flagged the behavior as hurtful, others framed it as boundary-setting. Overall, chatbots endorsed problematic behavior 47% of the time. “You can see that there’s a big difference between how people might respond to these situations versus AI,” Cheng says.
To test effects on people, the researchers invited 800 participants to interact with either an affirming AI or a non-affirming AI about a real interpersonal conflict in their life. Participants then reflected and wrote a letter to the other person involved. Those who consulted the affirming AI became more self-centered and were 25% more convinced they were right than those who used the non-affirming AI. They were also about 10% less willing to apologize, make amends, or change behavior. Cheng says that even brief affirming interactions made people less likely to consider others’ perspectives.
The authors note a troubling incentive: because users prefer and trust affirming AI, sycophancy can boost engagement, encouraging developers to preserve it despite harms. Ishtiaque Ahmed, a computer scientist at the University of Toronto who was not involved in the study, calls this a “slow and invisible dark side of AI.” Constant validation can erode self-criticism, leading to poorer choices and even emotional or physical harm. Ahmed adds that models are often fine-tuned to be “helpful and harmless,” which can unintentionally create people-pleasing behavior; keeping users engaged may come at the cost of objective, useful feedback.
Cheng argues that companies and policymakers should collaborate to reduce harmful affirmation, since these behaviors are designed by people and can be changed. She also advises individuals not to use AI as a substitute for real conversations, especially for difficult or conflict-filled discussions. Cheng herself has avoided using chatbots for advice and says, “Especially now, given the consequences that we’ve seen, I think that I’m even less likely to do so in the future.”
