SynCity: Training-Free Generation of 3D Worlds

Visual Geometry Group, University of Oxford

* denotes equal contribution

  @inproceedings{ 
    engstler2025syncity, 
    title={SynCity: Training-Free Generation of 3D Worlds} 
    author={Paul Engstler and Aleksandar Shtedritski and Iro Laina and Christian Rupprecht and Andrea Vedaldi} 
    year={2025} 
    booktitle={Arxiv} 
}

Summary

SynCity generates complex and immersive 3D worlds from text prompts and does not require any training or optimization. It leverages the pretrained 2D image generator Flux (for artistic diversity and contextual coherence) and the 3D generator TRELLIS (for accurate geometry). We incrementally build scenes on a grid in a tile-by-tile fashion: First, we generate each tile as a 2D image, where context from adjacent tiles establishes consistency. Then, we convert the tile into a 3D model. Finally, adjacent tiles are blended seamlessly into a coherent, navigable 3D environment.

Overview of SynCity. 2D prompting: To generate a new tile, we first render a view of where that tile should be placed, including context from neighbouring tiles. 3D prompting: We extract the new tile image and construct an image prompt for TRELLIS by adding a wider base under the tile. 3D blending: The 3D model that TRELLIS outputs is usually not well blended with the rest of the scene. To address that, we render a view of the new tile next to each neighbouring tile, and inpaint the region between the two with an image inpainting model. Next, we condition using that well-blended view to refine the region between the two 3D tiles. Finally, the new, blended, tile is added to the world.

Result Gallery

Exploring Generated Worlds

The 3D worlds generated by SynCity can be fully explored. Here, we show some example trajectories that demonstrate the rich detail and immersive nature of the generations. A sky box has been added for visual effect.

Interactive Demo

Explore a generated world in your browser! This interactive demo allows you to navigate through the generated scene and experience the immersive environment yourself.

Use WASD to move and QE to raise/lower the camera.

Please note that the generated world has been compressed to enable a smooth in-browser experience. Thus, the visual quality is slightly degraded compared to the videos above. On devices without a dedicated GPU, the demo might appear choppy.

Acknowledgements

The authors of this work are supported by ERC 101001212-UNION, AIMS EP/S024050/1, and Meta Research. The interactive demo is powered by the PlayCanvas WebGL Engine .