FactorSim: Generative Simulation via Factorized Representation

1Stanford University 2Nvidia

Click on a Reinforcement Learning Game:


* game sprites are generated with DALLE-3.


Flappy Bird
Prompt:

Can we "solve" a simple RL benchmark from just reading its documentations?

FACTORSIM takes language documentation as input, uses Chain-of-Thought to derive a series of steps to be implemented, adopts a Factored POMDP representation to facilitate efficient context selection during each generation step, trains agents on the generated simulations, and tests the resulting policy on previously unseen RL environments.


Abstract

Generating simulations to train intelligent agents in game-playing and robotics from natural language input e.g., user input or task documentation remains an open-ended challenge. Existing approaches focus on parts of this task: generating the task hyperparameters, specifying reward functions, or populating the environment with assets while omitting key elements of the simulation logic and dynamics (e.g., game mechanics). Unlike previous work, we aim to generate full simulations; we introduce FACTORSIM, which generates simulations up to hundreds of lines of code to train RL agents. We recognize that coded simulations can be modeled as factored Partially Observable Markov Decision Processes (POMDPs). Leveraging this factorized representation and the model-view-controller paradigm, FACTORSIM decomposes generation into a series of steps with minimal necessary context given to the Language Model at each step, reducing each step's reasoning complexity. For evaluation, we adopt a Reinforcement Learning benchmark and derive language descriptions of simulations from its documentation. We demonstrate that FACTORSIM outperforms many competitive methods in generating full game simulation code from scratch both qualitatively and quantitatively, and achieves superior zero-shot transfer results after training on the generated simulations.

Zero-shot transfer results

Zero-shot performance on the original RL environments after training a PPO agent on generated environments.

FACTORSIM could help with generating robotic tasks.

FACTORSIM modularizes the code generation process into subtasks and have LLMs generate each subtask using only a set of necessary global states as context.