Alibaba Unveils AI Models with the Power to Control PCs and Phones

In terms of technological advancement, Alibaba refuses to be outdone by the competition. As a prime example of this, Alibaba’s Qwen Team launched their new collection of AI models, named Qwen2.5-VL, distinguished by their broad spectrum of capabilities, such as text analysis tasks and image analysis.

The distinct abilities of the Qwen2.5-VL models don’t end there. They can befittingly understand videos, tally objects in images and control PCs. These characteristics draw striking similarities to the model that underpins the recently launched Operator by OpenAI.

According to the conducted tests from the Qwen Team, the prime Qwen2.5-VL model outperforms the industry’s top players, such as OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet and Google’s Gemini 2.0 Flash. These models have been assessed on various grounds like document analysis, question answering, math and video understanding.

The Qwen2.5-VL models step forward with their ability to analyze sophisticated data, such as charts and graphics, and extract specific data from document scans like invoices and forms. They can also comprehend long-duration videos. Apparently, training on copyrighted works reflects in their ability to recognize IPs from various TV series and film products.

However, there is a limitation to the topics these models from Alibaba will touch upon in Qwen Chat due to its Chinese roots. Nevertheless, the revolutionary feature of Qwen2.5-VL models that stands out is their interactive capabilities with software installed on PCs and mobile devices.

These groundbreaking technologies set a new benchmark for AI development while underlining Alibaba’s commitment to keeping pace with the ever-evolving tech world.

Original source: Read the full article on TechCrunch

Related Posts