Is AI Any Good?

So far, we’ve attempted to answer that question through benchmarks. These give models a fixed set of questions to answer and grade them on how many they get right. But just like exams, these benchmarks don’t always reflect deeper abilities. Lately, it seems as if a new AI model is released every week, and each time a company introduces one, it comes with fresh scores showing it surpassing the capabilities of its predecessors. AI research is a hypercompetitive infinite game. An infinite game is open-ended—the goal is to keep playing. However, in AI, a dominant player often produces a significant result, triggering a wave of follow-up papers that chase the same narrow topic. This race-to-publish culture puts enormous pressure on researchers, rewarding speed over depth and short-term wins over long-term insight. If academia chooses to play a finite game, it will lose.

This “finite vs. infinite game” framework also applies to benchmarks. So, do we have a truly comprehensive scoreboard for evaluating the true quality of a model? Not really. Many dimensions—social, emotional, interdisciplinary—still evade assessment. But the wave of new benchmarks hints at a shift. As the field evolves, a bit of skepticism is probably healthy.