← ML Research Wiki / 2505.21041

CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians

(2025)

Paper Information
arXiv ID
Venue
arXiv.org

Abstract

Figure1.We present CityGo, an explicit and efficient framework for high-fidelity rendering of large-scale urban scenes.By combining proxy buildings, residual Gaussians, and surrounding Gaussians, we enable efficient, high-quality urban scene rendering on lightweight devices for applications such as in-vehicle navigation and aerial perception.

Summary

CityGo is a novel framework proposed for efficient and high-fidelity urban scene rendering using proxy buildings and residual Gaussians. It addresses the limitations of traditional methods like Structure-from-Motion (SfM) and Neural Radiance Fields (NeRF), which struggle with scalability, training time, and rendering quality. CityGo utilizes proxy meshes derived from multi-view stereo data, complemented by 3D Gaussian Splatting for texture representation and occlusion handling. The method selectively introduces residual Gaussians to refine details in regions where the proxy rendering diverges from the original images. The framework aims for practical deployment in real-time applications such as AR navigation and UAV inspection, offering significant improvements in efficiency, rendering speed, and model size compared to existing techniques.

Methods

This paper employs the following methods:

  • 3D Gaussian Splatting
  • Structure-from-Motion
  • Multi-View Stereo
  • Neural Radiance Fields

Models Used

  • None specified

Datasets

The following datasets were used in this research:

  • Area-H
  • Area-L
  • UrbanBIS

Evaluation Metrics

  • PSNR

Results

  • CityGo achieves real-time rendering of city-scale scenes on mobile GPUs.
  • Reduction of model size to approximately 1/8 of 3DGS while preserving visual details.
  • Support for efficient, high-quality rendering on resource-limited devices.

Limitations

The authors identified the following limitations:

  • Challenges with misclassification of non-building structures
  • Dependency on accurate proxy geometry leading to potential artifacts

Technical Requirements

  • Number of GPUs: 1
  • GPU Type: NVIDIA RTX A6000
  • Compute Requirements: 100K iterations, using SH = 0 to accelerate convergence

Papers Using Similar Methods

External Resources