RaySt3R

Abstract

3D shape completion has broad applications in robotics, digital twin reconstruction, and extended reality (XR). Although recent advances in 3D object and scene completion have achieved impressive results, existing methods lack 3D consistency, are computationally expensive, and struggle to capture sharp object boundaries. Our work (RaySt3R) addresses these limitations by recasting 3D shape completion as a novel view synthesis problem. Specifically, given a single RGB-D image and a foreground object mask, and a novel viewpoint (encoded as a collection of query rays), we train a feedforward transformer to predict depth maps, object masks, and per-pixel confidence scores for those query rays. RaySt3R fuses these predictions across multiple query views using a confidence- and occlusion-aware merging algorithm to reconstruct complete 3D shapes. We evaluate RaySt3R on synthetic and real-world datasets, and observe it achieves state-of-the-art performance, outperforming the baselines on all datasets by up to 44% in 3D chamfer distance.

Results

💡Tips

● Scroll to zoom in/out

● Drag to rotate

● Press "shift" and drag to pan

Qualitative comparison

Select a method from the dropdown menu to compare RaySt3R to the baselines.

RaySt3R

💡Tips

● Scroll to zoom in/out

● Drag to rotate

● Press "shift" and drag to pan

*No camera intrinsics prediction; using ours instead.

Acknowledgements

This work was generously supported by the Center for Machine Learning and Health (CMLH) at CMU, the NVIDIA Academic Grant Program, and the Pittsburgh Supercomputing Center. The authors would like to thank Mandi Zhao, Shun Iwase, Balázs Gyenes, Gerhard Neumann, Jeff Tan, and all members of the Momentum Robotics lab at CMU for providing useful feedback.

BibTeX

@misc{rayst3r, title={RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion}, author={Bardienus P. Duisterhof and Jan Oberst and Bowen Wen and Stan Birchfield and Deva Ramanan and Jeffrey Ichnowski}, year={2025}, eprint={2506.05285}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.05285}, }

RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion

RaySt3R turns a single masked RGB-D image into complete 3D shapes.

Abstract

Results

Qualitative comparison

Quantitative Results

Acknowledgements

BibTeX