NVIDIA has unveiled an intriguing new AI system called Eureka that leverages LLMs to automate and enhance robot reward function design for complex task learning. Built on OpenAI’s GPT-4 architecture, Eureka allows robots to master over 30 difficult skills like pen spinning, drawer opening, ball passing, and more.
Developed primarily by NVIDIA Research, Eureka interfaces with the company’s Isaac Gym physics simulation software to enable powerful reinforcement learning capabilities. It represents a pioneering approach to combining generative AI with robotic reinforcement learning.
According to Anima Anandkumar, NVIDIA’s Senior Director of AI Research, reward function design remains an ongoing challenge for reinforcement learning and currently involves much manual trial-and-error. Eureka aims to streamline this through GPT-4 generated reward formulations that align with developer intentions.
In testing, Eureka’s AI-designed rewards improved training efficiency by over 50% on average and allowed completing tasks requiring 80% human expertise levels. The system can generate tailored rewards for various robot types including quadrupeds, bipedal walkers, drones, and robotic arms.
Eureka removes the need for predefined reward templates or additional task prompts. The developer simply specifies the desired skill such as ball throwing. GPT-4 then formulates a custom reward function optimized for that task and robot configuration, cutting significant manual effort.
By integrating with Isaac Gym’s GPU-accelerated simulation environment, Eureka can rapidly iterate and statistically evaluate large batches of reward function candidates in parallel. This allows systematically refining the formula to maximize training sample efficiency.
In benchmarks across 29 tasks and 10 robots, Eureka’s AI-designed rewards outperformed human expert-written ones 83% of the time, demonstrating the advantages of automated reward optimization powered by large language models. Results included mastering very difficult skills like pen spinning that require complex physics simulations.
Eureka also enables a new form of interactive reward tuning. Developers can provide natural language feedback to steer the system, allowing collaborative improvement of the reward function in real-time. This makes Eureka a powerful co-pilot for interactively designing robot behaviors.
With Eureka, NVIDIA delivers an impressive demonstration of using AI to automate and enhance key elements of robotic reinforcement learning. By tapping into generative models and simulation, the complexity of reward design shifts from manual to ML-driven. Eureka provides a glimpse into a future where robots learn skills with greater speed, ease, and performance – unlocking their true potential.
Jim Fan, NVIDIA Senior AI Scientist, also share on the X/Twtter tweet, says: “As usual, we open-source everything! Welcome you all to check out our video gallery and try the codebase today: http://eureka-research.github.io”