Hugging Face Releases SmolVLA, a Compact Open-Source Robotics Model -- Pure AI

Hugging Face Releases SmolVLA, a Compact Open-Source Robotics Model

By John K. Waters
06/10/2025

Hugging Face has introduced SmolVLA, a lightweight open-source Vision-Language-Action (VLA) model for robotics that operates on consumer-grade hardware and is trained entirely on community-contributed data. At 450 million parameters, SmolVLA aims to offer efficient, reproducible performance for robotic tasks without reliance on proprietary datasets or expensive infrastructure.

Generalist Performance, Minimal Resources

SmolVLA-450M is designed to perform general-purpose manipulation tasks using visual and language cues. The model combines a compact vision-language backbone with a flow-matching transformer that predicts sequences of robot actions. Despite its relatively small size and modest training data — fewer than 30,000 episodes — SmolVLA matches or outperforms larger models like ACT on both simulated (LIBERO, Meta-World) and real-world (SO100, SO101) benchmarks.

SmolVLA’s architecture includes a modified SmolVLM2 encoder-decoder stack, reduced visual token use, and selectively truncated layers during inference. These optimizations cut response time by up to 30 percent and double task throughput in real-world settings.

Asynchronous Inference and Efficient Control

The model's asynchronous inference capability is key to its performance. Unlike synchronous modes that pause between predictions, SmolVLA pipelines execution and inference, allowing robots to request the next action chunk while performing the current one. This results in greater responsiveness and smoother task execution.

A remote policy server can handle inference, enabling real-time deployment even with low-latency consumer devices. Benchmarks show that asynchronous inference allows SmolVLA-equipped robots to complete 2× more tasks within fixed time constraints compared to synchronous setups.

Community Data, Real-World Variability

SmolVLA is pretrained on 10 million frames curated from 487 community datasets tagged under “lerobot” on Hugging Face. These datasets span a variety of environments — from labs to living rooms — and were selected for diversity over size. Unlike benchmark datasets, these include noisy labels, inconsistent camera views, and suboptimal demonstrations, mimicking real-world complexity.

To standardize this data, contributors implemented camera view remapping and automatic instruction refinement using Qwen2.5-VL-3B-Instruct. Labels were rewritten to maximize clarity and brevity, enhancing training consistency.

Pretraining on this dataset raised SmolVLA’s success rate on the SO100 task suite from 51.7% to 78.3%. Further multitask finetuning improved generalization on unseen object configurations and control setups.

Training and Deployment

SmolVLA is released with a complete training and deployment stack. Users can fine-tune the model using the LeRobot framework or build from architecture-level components. SmolVLA runs on CPUs and single consumer GPUs, including MacBooks.

Training from scratch or fine-tuning from the base checkpoint is supported using simple commands from the lerobot repository.

A Step Toward Open Robotics

With SmolVLA, Hugging Face continues its push for open and reproducible AI tools. By releasing a performant robotics model built entirely on decentralized data and low-cost hardware, the company hopes to lower the barrier to generalist robotics research.

SmolVLA and its datasets are available on GitHub and the Hugging Face Hub. More technical details are included in the accompanying report and documentation.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].

Featured

The New AI Security Rules, Perplexity's $34.5B Chrome Bid, More

Pure AI

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

TechMentor Orlando
November 16-21, 2025

TechMentor & Cybersecurity Live! @ Microsoft HQ
August 3-7, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

TechMentor Orlando
November 15-20, 2026