Hugging Face, the AI development platform, spearheads a revolution in artificial intelligence with the release of SmolVLM-256M and SmolVLM-500M. Asserted as the smallest AI models available, these systems can proficiently analyse images, short videos, and text.
Built for “constrained devices”, such as laptops with limited RAM, these models promise effective and economical data processing solutions. The models, with 256 million and 500 million parameters, are agile in tasks like image or video descriptions and extracting information from PDFs, including charts and scanned text.
Training these models involved leveraging The Cauldron, a diverse range of high-grade image and text datasets, and Docmatix – a compilation of file scans coupled with precise captions. These were created by Hugging Face’s M4 team, experts in multimodal AI technologies.
In comparison with larger models like Idefics 80B, SmolVLM-256M and SmolVLM-500M claimed superiority in analysing grade-school-level science diagrams during benchmarks such as AI2D. Accessible online and downloadable under an Apache 2.0 license from Hugging Face, these models can be utilised without restrictions.
However, a caveat comes from a recent study by Google DeepMind, Microsoft Research, and the Mila research institute in Quebec, asserting that smaller models might underperform in complex reasoning tasks. They surmised that this could be due to smaller models identifying superficial patterns in data, but struggle to apply this knowledge in novel situations.
Original source: Read the full article on TechCrunch