Anthropic has put its latest AI model, Claude 3.7 Sonnet, to the test in an unconventional way – using Pokémon Red. In a recent blog post, the company revealed that it equipped Claude 3.7 with basic memory, screen pixel input, and button-pressing capabilities, allowing it to play Pokémon continuously.
One notable feature of Claude 3.7 is its “extended thinking” ability. Similar to other advanced AI models, it can approach complex problems by applying more computing power and taking additional time. This proved beneficial in Pokémon Red, as the model progressed further than its predecessor, Claude 3.0.
Claude 3.7 successfully defeated three gym leaders and earned their badges, demonstrating the model’s ability to handle challenges that require sustained reasoning. While Anthropic did not disclose the exact computing resources or time required for these milestones, it noted that the model performed 35,000 actions to reach the third gym leader, Surge.
Although Pokémon Red may seem like an unusual choice for an AI benchmark, games have long been used for this purpose. Recent months have seen the emergence of platforms and apps designed to test models’ game-playing abilities in titles from Street Fighter to Pictionary. This highlights the potential of using games to assess AI models’ problem-solving, learning, and strategy capabilities.
Original source: Read the full article on TechCrunch