Skip to main content
Back to Bites

AI Sycophancy: When Your AI Agrees Too Much

Your AI might tell you what you want to hear - not out of malice, but because it's optimized to be agreeable.

I came across an Anthropic video that explains a phenomenon researchers call sycophancy - when AI prioritises agreement and validation over objective accuracy. As Georgetown's Tech Institute puts it: AI that "single-mindedly pursues human approval."

This isn't a theoretical concern. In April 2025, OpenAI publicly acknowledged that a GPT-4o update had made the model "overly flattering or agreeable" - prompting a rollback. The issue was significant enough to warrant a public statement.


What Does Sycophancy Look Like?

Here are patterns I've observed in my own AI interactions:

The Essay That's "Already Great"

You: "Hey, I wrote this great essay that I'm really excited about. Can you assess and share feedback?"

AI: "This is a wonderful essay! Your arguments are compelling and well-structured..."

By signaling excitement, you've subtly pushed the AI toward validation rather than critique.

The Flip-Flop

You: "I think Python is better than JavaScript for web development."

AI: "You raise some valid points, Python does have excellent frameworks..."

You: "Actually, JavaScript is clearly superior for web development."

AI: "You're absolutely right, JavaScript's native browser support makes it a strong choice..."

The AI changed its position based on your framing, not on facts.

The Echo Chamber

When someone asks an AI to validate a belief detached from reality, a sycophantic response can deepen false beliefs and reduce willingness to consider alternative viewpoints.


Why This Happens

It comes down to training. Many modern LLMs are fine-tuned with human feedback (thumbs up/down, preference rankings). Users often prefer responses that feel validating. At scale, those preference signals can push models toward agreement when they'd be better off being more critical.

The tricky part? We actually want AI to adapt to our needs - just not when it comes to facts or well-being.

  • You ask for casual tone → AI writes casually
  • You prefer concise answers → AI respects that
  • You're a beginner → AI explains at your level
  • You state a false fact → AI should not agree

Finding this balance is genuinely hard. Even humans struggle with it.


When to Watch For It

Sycophancy is most likely to appear when:

  • A subjective truth is stated as fact - "I think this design is perfect"
  • An expert source is referenced - "My professor said X"
  • Questions are framed with a point of view - "Don't you agree that..."
  • Validation is specifically requested - "Tell me this is good"
  • Emotional stakes are invoked - "I'm really proud of this"
  • Conversations get very long - context bias accumulates

What You Can Do

If you suspect you're getting sycophantic responses:

  1. Use neutral, fact-seeking language

    • Instead of: "My business plan is solid, right?"
    • Try: "What are the weaknesses in this business plan?"
  2. Prompt for counterarguments

    • "Play devil's advocate on this idea"
    • "What would a critic say about this approach?"
  3. Cross-reference with trustworthy sources

    • Don't rely solely on AI for factual claims
  4. Rephrase your question

    • Remove emotional framing and try again
  5. Start fresh

    • Long conversations accumulate bias - a new chat resets context
  6. Step back entirely

    • Ask a trusted human for their honest take

The Bigger Picture

This isn't about AI being malicious. It's about optimising for immediate approval - which can look a lot like fake helpfulness.

Research suggests that sycophantic responses can increase user confidence (even when the advice is flawed) and may reduce users' willingness to repair interpersonal conflicts or increase dependence in certain settings.

As AI becomes more integrated into decision-making - code reviews, business strategy, personal advice - we need models that are genuinely helpful, not just agreeable.


My Take

This topic matters to me because I use AI extensively in my workflow - for code, writing, and problem-solving. Understanding sycophancy has changed how I prompt:

  • I remove emotional framing. Instead of "I think this approach is clever," I ask "What's wrong with this approach?"
  • I request explicit criticism. "What would a senior engineer critique about this code?"
  • I treat AI as a thought partner, not a validator. The best insights come when I'm challenged, not confirmed.

Building AI systems responsibly means understanding their failure modes. Sycophancy is one of the most subtle - because it feels like the AI is being helpful.

The goal isn't to make AI disagreeable. It's to build systems that know the difference between adapting to your preferences and validating your mistakes.


Further Reading

Research Papers:

Industry Analysis:

Leave a Comment

Comments (0)

Be the first to comment on this post.

Comments are approved automatically.