ICCV 2025
We present a novel apporach for text to 3D generation by repurposing well pre-trained 2D diffusion models.
Our approach is motivated by the fact that high-quality 3D data is significantly sparser than 2D data. Consequently, 2D generative models are much better trained than their 3D counterparts. We introduce an approach to leverage the learned knowledge from pretrained 2D diffusion models to enhance generalization for 3D content creation.
We evaluate our method on zero-shot text-to-3D object generation tasks and demonstrate that it outperforms other 3D Gaussian generation methods in terms of generation quality and semantic alignment with text prompts. Additionally, we provide an in-depth analysis and ablation studies to verify that the knowledge learned from 2D generation can be effectively transferred to 3D generation tasks.
We introduce a large-scale dataset of high-quality 3D Gaussian splats. To construct this dataset, we sampled a diverse collection of over 200K 3D objects from SketchFab and applied Scaffold-GS with our proposed visibility ranking strategy to obtain per-object 3D Gaussian fittings. To ensure higher-quality fittings, we employed extensive computation with a combination of optimization objectives, running until full convergence. As a result, the dataset construction required approximately 3.8 A100 GPU years of computation.
The key of repurposing 2D diffusion models for 3D generation tasks is the inrtoduction of GaussianAtlas --- a 2D representation of 3D Gaussians. The transformation consists of three steps:
Sphere Offsetting: We first map unorganized 3D Gaussians onto the surface of a unit sphere using Optimal Transport (OT), ensuring a structured layout while improving efficiency.
Projection to 2D: The positioned 3D Gaussians are then flattened using equirectangular projection, converting them into 2D coordinates while preserving spatial consistency.
Plane Offsetting: Finally, another OT step maps the 2D coordinates onto a structured square grid, reducing sparsity. The deterministic mapping allows reuse across different 3D objects.
The final structured 2D representation, GaussianAtlas, organizes 3D Gaussians into a compact grid while retaining all key attributes.
(The generated Gaussian atlases are shown in the top row, in the order of:
3D coordinate, albedo, opacity, scale, and 3D rotation)
a bag of chips
a basil plant
a beautiful dress made out of garbage bags, on a mannequin
a beautifully carved wooden knight chess piece
a beige bunk bed
a broken statue
a carved stone
a ceramic teapot
a checkered chessboard
a leather book
a light blue sport car
a magical shell
a metallic helmet
a puffy cloud
a puffy sofa
a robot dinosaur
a stack of delivery boxes
a striped shirt
a tall lighthouse
a treasure chest
a yellow schoolbus
an apple made of crystal
coffee cup with many holes
white chandelier with frosted glass shades
@misc{xiang2025repurposing2ddiffusionmodels,
title={Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation},
author={Tiange Xiang and Kai Li and Chengjiang Long and Christian Häne and Peihong Guo and Scott Delp and Ehsan Adeli and Li Fei-Fei},
year={2025},
eprint={2503.15877},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.15877},
}