This bulletin was made to write a feedback for each presentation and the content of the discussion on the paper presented at the lab seminar.

SemCity: Semantic Scene Generation with Triplane Diffusion

Paper

  • SemCity: Semantic Scene Generation with Triplane Diffusion
  • CVPR 2024


Summary

SemCity is an innovative framework designed for the generation of real-world outdoor scenes that bypasses the complexities of direct 3D data training by introducing 2D triplane representations. By leveraging an implicit decoder, the model can synthesize scenes with a significantly higher resolution than the original training data, offering enhanced detail and scalability. Furthermore, the framework proposes a method for manipulating triplane features during the diffusion process, which enables versatile scene editing applications such as inpainting and outpainting.


Questions

Q: Why does the use of an implicit decoder allow for the generation of 3D scenes at various resolutions?

A: Because the decoder predicts the semantic class of a specific location based on continuous coordinates rather than a fixed voxel grid. By simply adjusting the density of the coordinate grid during inference, you can control and scale the resolution of the generated 3D scene as needed.


Q: Was the proposed method designed with the class imbalance of real-world outdoor datasets in mind?

A: Yes. To effectively learn from data distributions where "air" or empty space is overwhelmingly dominant, the framework utilizes weighted cross-entropy loss. This ensures that the model accurately learns and represents minority class objects, even when they occupy a very small portion of the overall scene.


0 0