This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.
The holy grail of robotics since the field’s beginning has been to build a robot that can do our housework. But for a long time, that has just been a dream. While roboticists have been able to get robots to do impressive things in the lab, such as parkour, this usually requires meticulous planning in a tightly-controlled setting. This makes it hard for robots to work reliably in homes around children and pets, homes have wildly varying floorplans, and contain all sorts of mess.
There’s a well-known observation among roboticists called the Moravec’s paradox: What is hard for humans is easy for machines, and what is easy for humans is hard for machines. Thanks to AI, this is now changing. Robots are starting to become capable of doing tasks such as folding laundry, cooking and unloading shopping baskets, which not too long ago were seen as almost impossible tasks.
In our most recent cover story for the MIT Technology Review print magazine, I looked at how robotics as a field is at an inflection point. You can read more here. A really exciting mix of things are converging in robotics research, which could usher in robots that might—just might—make it out of the lab and into our homes.
Here are three reasons why robotics is on the brink of having its own “ChatGPT moment.”
1. Cheap hardware makes research more accessible
Robots are expensive. Highly sophisticated robots can easily cost hundreds of thousands of dollars, which makes them inaccessible for most researchers. For example the PR2, one of the earliest iterations of home robots, weighed 450 pounds (200 kilograms) and cost $400,000.
But new, cheaper robots are allowing more researchers to do cool stuff. A new robot called Stretch, developed by startup Hello Robot, launched during the pandemic with a much more reasonable price tag of around $18,000 and a weight of 50 pounds. It has a small mobile base, a stick with a camera dangling off it, an adjustable arm featuring a gripper with suction cups at the ends, and it can be controlled with a console controller.
Meanwhile, a team at Stanford has built a system called Mobile ALOHA (a loose acronym for “a low-cost open-source hardware teleoperation system”), that learned to cook shrimp with the help of just 20 human demonstrations and data from other tasks. They used off-the-shelf components to cobble together robots with more reasonable price tags in the tens, not hundreds, of thousands.
2. AI is helping us build “robotic brains”
What separates this new crop of robots is their software. Thanks to the AI boom the focus is now shifting from feats of physical dexterity achieved by expensive robots to building “general-purpose robot brains” in the form of neural networks. Instead of the traditional painstaking planning and training, roboticists have started using deep learning and neural networks to create systems that learn from their environment on the go and adjust their behavior accordingly.
Last summer, Google launched a vision-language-action model called RT-2. This model gets its general understanding of the world from the online text and images it has been trained on, as well as its own interactions. It translates that data into robotic actions.
And researchers at the Toyota Research Institute, Columbia University and MIT have been able to quickly teach robots to do many new tasks with the help of an AI learning technique called imitation learning, plusgenerative AI. They believe they have found a way to extend the technology propelling generative AI from the realm of text, images, and videos into the domain of robot movements.
Many others have taken advantage of generative AI as well. Covariant, a robotics startup that spun off from OpenAI’s now-shuttered robotics research unit, has built a multimodal model called RFM-1. It can accept prompts in the form of text, image, video, robot instructions, or measurements. Generative AI allows the robot to both understand instructions and generate images or videos relating to those tasks.
3. More data allows robots to learn more skills
The power of large AI models such as GPT-4 lie in the reams and reams of data hoovered from the internet. But that doesn’t really work for robots, which need data that have been specifically collected for robots. They need physical demonstrations of how washing machines and fridges are opened, dishes picked up, or laundry folded. Right now that data is very scarce, and it takes a long time for humans to collect.
A new initiative kick-started by Google DeepMind, called the Open X-Embodiment Collaboration, aims to change that. Last year, the company partnered with 34 research labs and about 150 researchers to collect data from 22 different robots, including Hello Robot’s Stretch. The resulting data set, which was published in October 2023, consists of robots demonstrating 527 skills, such as picking, pushing, and moving.
Early signs show that more data is leading to smarter robots. The researchers built two versions of a model for robots, called RT-X, that could be either run locally on individual labs’ computers or accessed via the web. The larger, web-accessible model was pretrained with internet data to develop a “visual common sense,” or a baseline understanding of the world, from the large language and image models. When the researchers ran the RT-X model on many different robots, they discovered that the robots were able to learn skills 50% more successfully than in the systems each individual lab was developing.
Now read the rest of The Algorithm
Deeper Learning
Generative AI can turn your most precious memories into photos that never existed
Maria grew up in Barcelona, Spain, in the 1940s. Her first memories of her father are vivid. As a six-year-old, Maria would visit a neighbor’s apartment in her building when she wanted to see him. From there, she could peer through the railings of a balcony into the prison below and try to catch a glimpse of him through the small window of his cell, where he was locked up for opposing the dictatorship of Francisco Franco. There is no photo of Maria on that balcony. But she can now hold something like it: a fake photo—or memory-based reconstruction.
Remember this: Dozens of people have now had their memories turned into images in this way via Synthetic Memories, a project run by Barcelona-based design studio Domestic Data Streamers. Read this story by my colleague Will Douglas Heaven to find out more.
Bits and Bytes
Why the Chinese government is sparing AI from harsh regulations—for now
The way China regulates its tech industry can seem highly unpredictable. The government can celebrate the achievements of Chinese tech companies one day and then turn against them the next. But there are patterns in China’s approach, and they indicate how it’ll regulate AI. (MIT Technology Review)
AI could make better beer. Here’s how.
New AI models can accurately identify not only how tasty consumers will deem beers, but also what kinds of compounds brewers should be adding to make them taste better, according to research. (MIT Technology Review)
OpenAI’s legal troubles are mounting
OpenAI is lawyering up as it faces a deluge of lawsuits both at home and abroad. The company has hired about two dozen in-house lawyers since last spring to work on copyright claims, and is also hiring an antitrust lawyer. The company’s new strategy is to try to position itself as America’s bulwark against China. (The Washington Post)
Did Google’s AI actually discover millions of new materials?
Late last year, Google DeepMind claimed it had discovered millions of new materials using deep learning. But researchers who analyzed a subset of DeepMind’s work found that the company’s claims may have been overhyped, and that the company hadn’t found materials that were useful or credible. (404 Media)
OpenAI and Meta are building new AI models capable of “reasoning”
The next generation of powerful AI models from OpenAI and Meta will be able to do more complex tasks, such as reason, plan and retain more information. This, tech companies believe, will allow them to be more reliable and not make the kind of silly mistakes that this generation of language models are so prone to. (The Financial Times)
Post a Comment