Aided by A.I. Language Models, Google’s Robots Are Getting Smart

admin 28 July 2023

0 6 minutes read

A one-armed robot stood in front of a table. On the table sat three plastic figurines: a lion, a whale and a dinosaur.

An engineer gave the robot an instruction: “Pick up the extinct animal.”

The robot whirred for a moment, then its arm extended and its claw opened and descended. It grabbed the dinosaur.

Until very recently, this demonstration, which I witnessed during a podcast interview at Google’s robotics division in Mountain View, Calif., last week, would have been impossible. Robots weren’t able to reliably manipulate objects they had never seen before, and they certainly weren’t capable of making the logical leap from “extinct animal” to “plastic dinosaur.”

Google’s robot being prompted to pick up the extinct animal.CreditCredit…Video by Kelsey Mcclellan For The New York Times

But a quiet revolution is underway in robotics, one that piggybacks on recent advances in so-called large language models — the same type of artificial intelligence system that powers ChatGPT, Bard and other chatbots.

Google has recently begun plugging state-of-the-art language models into its robots, giving them the equivalent of artificial brains. The secretive project has made the robots far smarter and given them new powers of understanding and problem-solving.

I got a glimpse of that progress during a private demonstration of Google’s latest robotics model, called RT-2. The model, which is being unveiled on Friday, amounts to a first step toward what Google executives described as a major leap in the way robots are built and programmed.

“We’ve had to reconsider our entire research program as a result of this change,” said Vincent Vanhoucke, Google DeepMind’s head of robotics. “A lot of the things that we were working on before have been entirely invalidated.”

“A lot of the things that we were working on before have been entirely invalidated,” said Vincent Vanhoucke, head of robotics at Google DeepMind.Credit…Kelsey McClellan for The New York Times

Robots still fall short of human-level dexterity and fail at some basic tasks, but Google’s use of A.I. language models to give robots new skills of reasoning and improvisation represents a promising breakthrough, said Ken Goldberg, a robotics professor at the University of California, Berkeley.

“What’s very impressive is how it links semantics with robots,” he said. “That’s very exciting for robotics.”

To understand the magnitude of this, it helps to know a little about how robots have conventionally been built.

For years, the way engineers at Google and other companies trained robots to do a mechanical task — flipping a burger, for example — was by programming them with a specific list of instructions. (Lower the spatula 6.5 inches, slide it forward until it encounters resistance, raise it 4.2 inches, rotate it 180 degrees, and so on.) Robots would then practice the task again and again, with engineers tweaking the instructions each time until they got it right.

This approach worked for certain, limited uses. But training robots this way is slow and labor-intensive. It requires collecting lots of data from real-world tests. And if you wanted to teach a robot to do something new — to flip a pancake instead of a burger, say — you usually had to reprogram it from scratch.

Partly because of these limitations, hardware robots have improved less quickly than their software-based siblings. OpenAI, the maker of ChatGPT, disbanded its robotics team in 2021, citing slow progress and a lack of high-quality training data. In 2017, Google’s parent company, Alphabet, sold Boston Dynamics, a robotics company it had acquired, to the Japanese tech conglomerate SoftBank. (Boston Dynamics is now owned by Hyundai and seems to exist mainly to produce viral videos of humanoid robots performing terrifying feats of agility.)

Google engineers with robots where work on RT-2 has taken place.Credit…Kelsey McClellan for The New York Times

A closer look at Google’s robot, which has new capabilities from a large language model.Credit…Kelsey McClellan for The New York Times

In recent years, researchers at Google had an idea. What if, instead of being programmed for specific tasks one by one, robots could use an A.I. language model — one that had been trained on vast swaths of internet text — to learn new skills for themselves?

”We started playing with these language models around two years ago, and then we realized that they have a lot of knowledge in them,” said Karol Hausman, a Google research scientist. “So we started connecting them to robots.”

Google’s first attempt to join language models and physical robots was a research project called PaLM-SayCan, which was revealed last year. It drew some attention, but its usefulness was limited. The robots lacked the ability to interpret images — a crucial skill, if you want them to be able to navigate the world. They could write out step-by-step instructions for different tasks, but they couldn’t turn those steps into actions.

Google’s new robotics model, RT-2, can do just that. It’s what the company calls a “vision-language-action” model, or an A.I. system that has the ability not just to see and analyze the world around it, but to tell a robot how to move.

It does so by translating the robot’s movements into a series of numbers — a process called tokenizing — and incorporating those tokens into the same training data as the language model. Eventually, just as ChatGPT or Bard learns to guess what words should come next in a poem or a history essay, RT-2 can learn to guess how a robot’s arm should move to pick up a ball or throw an empty soda can into the recycling bin.

“In other words, this model can learn to speak robot,” Mr. Hausman said.

In an hourlong demonstration, which took place in a Google office kitchen littered with objects from a dollar store, my podcast co-host and I saw RT-2 perform a number of impressive tasks. One was successfully following complex instructions like “move the Volkswagen to the German flag,” which RT-2 did by finding and snagging a model VW Bus and setting it down on a miniature German flag several feet away.

Two Google engineers, Ryan Julian, left, and Quan Vuong, successfully instructed RT-2 to “move the Volkswagen to the German flag.”Credit…Kelsey McClellan for The New York Times

It also proved capable of following instructions in languages other than English, and even making abstract connections between related concepts. Once, when I wanted RT-2 to pick up a soccer ball, I instructed it to “pick up Lionel Messi.” RT-2 got it right on the first try.

The robot wasn’t perfect. It incorrectly identified the flavor of a can of LaCroix placed on the table in front of it. (The can was lemon; RT-2 guessed orange.) Another time, when it was asked what kind of fruit was on a table, the robot simply answered “white.” (It was a banana.) A Google spokeswoman said the robot had used a cached answer to a previous tester’s question because its Wi-Fi had briefly gone out.

RT-2 can learn to guess how a robot’s arm should move to pick up an empty soda can.CreditCredit…Video by Kelsey Mcclellan For The New York Times

Google has no immediate plans to sell RT-2 robots or release them more widely, but its researchers believe these new language-equipped machines will eventually be useful for more than just parlor tricks. Robots with built-in language models could be put into warehouses, used in medicine or even deployed as household assistants — folding laundry, unloading the dishwasher, picking up around the house, they said.

“This really opens up using robots in environments where people are,” Mr. Vanhoucke said. “In office environments, in home environments, in all the places where there are a lot of physical tasks that need to be done.”

Of course, moving objects around in the messy, chaotic physical world is harder than doing it in a controlled lab. And given that A.I. language models frequently make mistakes or invent nonsensical answers — which researchers call hallucination or confabulation — using them as the brains of robots could introduce new risks.

But Mr. Goldberg, the Berkeley robotics professor, said those risks were still remote.

“We’re not talking about letting these things run loose,” he said. “In these lab environments, they’re just trying to push some objects around on a table.”

Google has recently begun plugging state-of-the-art language models into its hardware robots, giving them the equivalent of artificial brains.CreditCredit…Video by Kelsey Mcclellan For The New York Times

Google, for its part, said RT-2 was equipped with plenty of safety features. In addition to a big red button on the back of every robot — which stops the robot in its tracks when pressed — the system uses sensors to avoid bumping into people or objects.

The A.I. software built into RT-2 has its own safeguards, which it can use to prevent the robot from doing anything harmful. One benign example: Google’s robots can be trained not to pick up containers with water in them, because water can damage their hardware if it spills.

If you’re the kind of person who worries about A.I. going rogue — and Hollywood has given us plenty of reasons to fear that scenario, from the original “Terminator” to last year’s “M3gan” — the idea of making robots that can reason, plan and improvise on the fly probably strikes you as a terrible idea.

But at Google, it’s the kind of idea researchers are celebrating. After years in the wilderness, hardware robots are back — and they have their chatbot brains to thank.

Aided by A.I. Language Models, Google’s Robots Are Getting Smart

admin

Taylor Swift Needs to Become Other People

Biden, Let the Protests of 1968 Be a Warning

Talk of an Immigrant ‘Invasion’ Grows in Republican Ads and Speech

Biden Revisits His Past in Interview With Howard Stern

Michael C. Jensen, Who Helped Reshape Modern Capitalism, Dies at 84

PEN America Cancels World Voices Festival Amid Israel-Gaza Criticism

A Guide to Antwerp, a City of Avant-Garde Fashion and Art Nouveau Architecture

Conservative Justices Take Argument Over Trump’s Immunity in Unexpected Direction

The Best Ways to Cook Asparagus

On Stand, Pecker Fires Back After Trump Lawyer Implies He Was Untruthful

The Essential Joan Didion

King Charles to Return to Public Duties, Reassuring Anxious Royal Watchers

Police Arrest Rabbis Near Gaza-Israel Border at a Rally to Highlight Starvation

‘Forbidden Broadway’ Scraps Summer Broadway Run, Citing Crowded Season

At the Louvre, the Olympics Are More French Than You Might Think

Hello world!

The Crypto Market Crashed. They’re Still Buying Bitcoin.

As a grain shipment heads to Lebanon, fighting intensifies on many fronts.

Brittney Griner Is Back in Russian Court as Lawyers Plead for Leniency

Elite Universities Are Out of Touch. Blame the Campus.

Why Do We Talk About Miscarriage Differently Than Abortion?

With Product You Purchase

Subscribe to our mailing list to get the new updates!

Related Articles