Why Robots Are Still Dumb While Chatbots Get All the Glory
You can scrape the web for words. You can’t scrape it for folding laundry.
Large language models are growing at a breakneck pace. GPT-4 has 1.7 trillion parameters. LLaMA 3 goes up to 2 trillion. Claude 4 is estimated to have 500 billion. These numbers are mind-bending — and they reflect how far we’ve come in just a few years.
But here’s the contrast nobody talks about.
Robotic foundation models? They’re tiny in comparison. PaLM-E integrates a 22 billion parameter vision model. RT-2 has 55 billion. OpenVLA is just 7 billion. Octo-Base? A mere 93 million.
What’s going on here? Why are robots being left behind?
Why Robotic Models Lag So Far Behind
The main bottleneck is not compute or model design. It’s data.
LLMs feast on text. And we have a firehose of it — books, forums, Wikipedia, Reddit, StackOverflow. There are entire companies whose only job is to clean and curate this ocean of text.
Robotic models need something else: embodied experience. They need to see, touch, move, and act in the real (or realistically simulated) world. You can’t just scrape YouTube and call it a day.
The comparison gets even harder when you realize that LLMs are general-purpose. Trained once, they can answer trivia, write poetry, generate code. But robotic models? Every new task — folding laundry, opening a door, pouring coffee — needs its own set of demonstrations. The data problem scales with the number of tasks.
The Real Bottleneck: Demonstration at Scale
To train a robotic foundation model, you need massive numbers of demonstrations. That means robots — real or simulated — performing tasks again and again. And not just the successful trials. Failure data matters too.
There are two main ways to get this data:
Teleoperation, where a human controls the robot and shows it how to do the task
Algorithmic execution, where traditional planning or control algorithms do the task, and that data is logged for training
Both approaches are in play. Neither is easy to scale.
This is not like training a vision model on ImageNet. You don’t just collect and label. You need physical interaction, repeated execution, across diverse environments and embodiments.
That’s why robotic models are small. Not because we don’t know how to build bigger ones. But because we don’t have the data to feed them.
The Next Leap Will Belong to the Data Makers
Here’s the hard truth: the size of the next robotic model will depend on who can generate the most useful data at scale.
This is the race that matters.
The real winners in robotic AI won’t be the ones with the biggest GPUs or the cleverest architectures. They’ll be the ones who figure out how to make thousands — maybe millions — of robots learn from experience, together.
Whether it’s through simulation, shared learning, or fleets of real-world robots, the future belongs to those who build the dataset that powers general-purpose manipulation.
That’s the frontier.
And it’s wide open.
If this post helped clarify the robotics vs LLM gap, consider subscribing to BuildRobotz. I write regularly about robotics, AI, and what it takes to build machines that move.