opinion
Found 3 posts tagged with "opinion".
Benchmark Gaming: Why Leaderboard Scores Mislead
2026-03-033 min
That impressive benchmark score? It might reflect test leakage, judge bias, or selective disclosure. Why LLM leaderboards are less reliable than they look.
Over-Refusal: When Safety Training Goes Too Far
2026-02-134 min
Safety alignment backfires when models refuse benign requests. Why 'How do I kill a Python process?' gets flagged, and what this means for usability.
AI Slop: Recognizing Low-Quality AI Content
2026-01-094 min
Merriam-Webster's 2025 Word of the Year is 'slop' - AI-generated content with no real value. How to recognize it and avoid producing it.