Skip to main content

opinion

Found 3 posts tagged with "opinion".

Benchmark Gaming: Why Leaderboard Scores Mislead

2026-03-033 min

That impressive benchmark score? It might reflect test leakage, judge bias, or selective disclosure. Why LLM leaderboards are less reliable than they look.

Over-Refusal: When Safety Training Goes Too Far

2026-02-134 min

Safety alignment backfires when models refuse benign requests. Why 'How do I kill a Python process?' gets flagged, and what this means for usability.

AI Slop: Recognizing Low-Quality AI Content

2026-01-094 min

Merriam-Webster's 2025 Word of the Year is 'slop' - AI-generated content with no real value. How to recognize it and avoid producing it.