NVIDIA Researcher and Three Young Roboticists on the Future of Embodied AI

Dr. Jim Fan, senior research manager at NVIDIA, led an engaging panel discussion at the GenAI Summit 2024 in San Francisco this week, featuring three young scientists making significant strides in robotics. The conversation focused on the future of embodied AI, which involves integrating AI into physical entities, such as robots.

Dr. Fan, who leads the AI Agents Initiative at NVIDIA, emphasized the challenge of developing human-level control in robots, noting that while large language models like ChatGPT can perform natural language tasks almost at a human level, controlling robots remains complex. "The reason is that you can download lots and lots of text and data from the internet, but you cannot download the control data for robotics," he explained, referencing Moravec's Paradox.

Tony Zhao: From Mechanical Engineering to Imitation Learning
Tony Zhao, a third-year PhD student at Stanford, shared his journey from mechanical engineering to robotics and machine learning. Initially focused on control theory for mechanical systems, Zhao transitioned to reinforcement learning and later, imitation learning. The projects he has worked on, including the ALOHA series, demonstrate how scaling data collection and training large generative models can enable robots to perform complex tasks that were previously thought to be extremely difficult. Stanford's Mobile ALOHA is a low-cost AI robot with a "whole-body" teleoperation system. Mobile ALOHA has successfully performed intricate daily tasks autonomously, including tying shoelaces, cracking eggs, cooking shrimp, cleaning wine spills and rinsing pans. Zhao worked on the project with fellow panelist Zipen Fu, and they were advised by Stanford computer science professor Chelsea Finn.

Zipeng Fu: Innovating with Diffusion Models and Real-World Data Collection
Zipeng Fu, who recently defended his PhD at Columbia University, detailed his work on diffusion models for real-time robotic inference and control. His project, Diffusion Policy, employs imitation learning to enable robots to perform tasks based on data collected through manual teleoperation. Fu emphasized the importance of data collection in robotics, noting that methods like the Universal Manipulation Interface (UMI), a data collection and policy learning framework that allows direct skill transfer from in-the-wild human demonstrations to deployable robot policies, can significantly enhance the availability of real-world data for training robust robotic systems.

"Teleoperation," the technical term for operating a machine, system, or robot from a distance, is also known as remote control or "telerobotics." The prefix "tele-" stands for "long distance," and teleoperation can span distances from centimeters to millions of kilometers.

Cheng Chi: Advancing Robotics with Learning-Based Approaches
Cheng Chi, also a Stanford PhD student, discussed his work on using machine learning and imitation learning to train deployable robot systems. Chi's research leverages simulation-generated data to teach robots to perform tasks, such as parkour, which traditionally required extensive human engineering. He highlighted the shift in robotics towards tackling manipulation tasks that involve contact with soft objects and fluids, using advances in imitation learning to improve efficiency.

"Simulation-generated data," also known as "synthetic data," refers to datasets that replicate real-world data patterns, distributions, and correlations. This data can be created by generating random numbers or using a stochastic (randomly determined) process that follows a distribution equation.

Future Prospects and Hardware Evolution
Along with software challenges and advances, the panelists underscored the impact of the evolution of hardware in robotics, debating the merits of minimalistic versus maximalist approaches. Dr. Fan shared insights into NVIDIA's Project GR00T, aimed at building AI brains for humanoid robots, while Zhao advocated for simpler, more intelligent hardware solutions. The discussion touched on the potential of diverse, low-cost sensors to enhance robotic capabilities.

Timeline for Generalist Robots
Addressing the timeline for achieving generalist robots, the panelists were cautiously optimistic. They predicted that within the next few years, significant breakthroughs would enable robots to perform multiple tasks reliably. However, they acknowledged that widespread deployment would require overcoming production, societal, ethical, and safety challenges.

"Generalist robots," as the name implies, are designed to perform a broad array of tasks without requiring substantial investment, making them accessible to many businesses previously priced out of the robotics market. With advancements in robotics technology, these robots have become increasingly versatile, capable of performing tasks beyond the scope of earlier generations. The new generation of generalist robots can now combine human operations in innovative ways, in addition to automating repetitive and dangerous tasks.

Dr. Fan concluded the discussion by expressing confidence in the near-term advancements in robotics research. "I believe a breakthrough will be within the next three years, if not sooner," he said, while noting the complexities of integrating robots into daily life.

The panel discussion highlighted the dynamic interplay of data, algorithms, and hardware in advancing embodied AI, setting the stage for future innovations in robotics.

This year's GenAI Summit, underway this week in San Francisco's Palace of Fine Arts, is the second annual event organized by GPT Dao, a global generative AI community. According to event organizers, this year's summit drew an estimated 10,000 attendees and 300 exhibitors. The list of exhibitors at this year's conference includes Microsoft, IBM, and Amazon. (A complete list is available on the conference website.)

("Dao" stands for Decentralized Autonomous Organization. Dao's operate based on smart contracts executed on a decentralized network, typically a blockchain, which allows for decentralized decision making and control over the organization’s assets and operations.)

GPT Dao provides a range of services to its community, including Web3 and AI project incubation, GPT investment research education, and AI infrastructure services. It also provides a platform for community governance designed to allow members to propose, discuss, and vote on changes, new features, and/or initiatives. It also provides a decentralized crowdfunding platform that allows anyone to invest in or contribute to a project, regardless of their location or financial status.


About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at