FactorSim

* game sprites are generated with DALLE-3.

Flappy Bird

Prompt:

Can we "solve" classic RL environments from just reading their documentations?

How do we generate prompt-aligned simulations?

FACTORSIM takes language documentation as input, uses Chain-of-Thought to derive a series of steps to be implemented, adopts a Factored POMDP representation to facilitate efficient context selection during each generation step, trains agents on the generated simulations, and tests the resulting policy on previously unseen RL environments.

Abstract

Generating simulations to train intelligent agents in game-playing and robotics from natural language input, from user input or task documentation, remains an open-ended challenge. Existing approaches focus on parts of this challenge, such as generating reward functions or task hyperparameters. Unlike previous work, we introduce FACTORSIM that generates full simulations in code from language input that can be used to train agents. Exploiting the structural modularity specific to coded simulations, we propose to use a factored partially observable Markov decision process representation that allows us to reduce context dependence during each step of the generation. For evaluation, we introduce a generative simulation benchmark that assesses the generated simulation code's accuracy and effectiveness in facilitating zero-shot transfers in reinforcement learning settings. We show that FACTORSIM outperforms existing methods in generating simulations regarding prompt alignment (e.g., accuracy), zero-shot transfer abilities, and human evaluation. We also demonstrate its effectiveness in generating robotic tasks.

Zero-shot transfer results

Zero-shot performance on the original RL environments after training a PPO agent on generated environments.

FactorSim can better generate robotics tasks:

FactorSim modularizes the code generation process into subtasks and have LLMs generate each subtask using only a set of necessary global states as context.

BibTeX


@inproceedings{sun2024factorsim,
  title={FactorSim: Generative Simulation via Factorized Representation},
  author={Sun, Fan-Yun and Harini, SI and Yi, Angela and Zhou, Yihan and Zook, Alex and Tremblay, Jonathan and Cross, Logan and Wu, Jiajun and Haber, Nick},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024}
}

FactorSim: Generative Simulation via Factorized Representation

NeurIPS 2024

Click on a Reinforcement Learning Game: