UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene

Yaosen Chen¹

Qi Yuan¹

Zhiqiang Li¹

Yuegen Liu^1,3

Wei Wang^*1,2

Chaoping Xie^1,2

Xuming Wen^1,2

Qien Yu⁴

¹Media Intelligence Laboratory, ChengDu Sobey Digital Technology Co., Ltd

²Peng Cheng Laboratory

³Southwest Jiaotong University

⁴Sichuan University

*Corresponding Author is Wei Wang.

description Paper description Code

Paper

UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene

Yaosen Chen, Qi Yuan, Zhiqiang Li, Yuegen Liu, Wei Wang, Chaoping Xie, Xuming Wen, Qien Yu

description arXiv version

Transferring photorealistic style with a style image in the 3D scene.Multi-view images of a given set of 3D scenes (a) and a style image (b), our model is capable of rendering photorealistic stylized novel views (c) with a consistent appearance at various view angles in 3D space.

Abstract

3D scenes photorealistic stylization aims to generate photorealistic images from arbitrary novel views according to a given style image while ensuring consistency when rendering from different viewpoints. Some existing stylization methods with neural radiance fields can effectively predict stylized scenes by combining the features of the style image with multi-view images to train 3D scenes. However, these methods generate novel view images that contain objectionable artifacts. Besides, they cannot achieve universal photorealistic stylization for a 3D scene. Therefore, a styling image must retrain a 3D scene representation network based on a neural radiation field. We propose a novel 3D scene photorealistic style transfer framework to address these issues. It can realize photorealistic 3D scene style transfer with a 2D style image. We first pre-trained a 2D photorealistic style transfer network, which can meet the photorealistic style transfer between any given content image and style image. Then, we use voxel features to optimize a 3D scene and get the geometric representation of the scene. Finally, we jointly optimize a hyper network to realize the scene photorealistic style transfer of arbitrary style images. In the transfer stage, we use a pre-trained 2D photorealistic network to constrain the photorealistic style of different views and different style images in the 3D scene. The experimental results show that our method not only realizes the 3D photorealistic style transfer of arbitrary style images but also outperforms the existing methods in terms of visual quality and consistency. Project page:https://semchan.github.io/UPST_NeRF/.

Approach

Overview of Universal Photorealistic Style Transfer of Neural Radiance Fields.In our framework, the training in photorealistic style transfer in 3D scenes divides into two stages. The first stage is geometric training for a single scene. We use the density voxel grid and feature voxel grid to represent the scene directly, and the density voxel grid is used to output density; the feature voxel grid with a shallow MLP of RGBNet use to predict the color. The second stage is style training. The parameters of the density voxel grid and feature voxel grid will be frozen, and we use a reference style image's features to be the input of the hyper network, which can control the RGBNet's input. Thus, we jointly optimize the hyper network to realize the scene photorealistic style transfer with arbitrary style images.

The architecture of YUVStyleNet. We designed a framework for 2D photorealistic style transfer, which supports the input of a full resolution style image and a full resolution content image, and realizes the photorealistic transfer of styles from the style image to the content image. In this framework, we transform the image into YUV channels. The final fusion uses the generated stylized UV channel, and the Y channel fusion after the stylized image is fused with the original content image to get the final photorealistic stylized image.

Qualitative comparisons with artistic style images

Qualitative comparisons with photorealistic style images

Consistency comparisons

Short-range and Long-range consistency.We use every two adjacent novel views ($O_{i},O_{i+1}$) and view pairs of gap 5 ($O_{i},O_{i+5}$) for short and long-range consistency calculation. The comparisons of short and long-range consistency are shown in Tab.1 and Tab.2, respectively. Our method outperforms other methods by a significant margin.

User study

User study.We record the user preference in the form of boxplot. Our results win more preferences both in the photorealistic stylization and consistency quality.

Appendix

on NeRF-Synthetic datasets

on Local Light Field Fusion(LLFF) datasets

Citation


        @inproceedings{chen2022upstnerf,
        title = {UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene},
        author = {Yaosen Chen and Qi Yuan and Zhiqiang Li and Yuegen Liu and Wei Wang and Chaoping Xie and Xuming Wen and Qien Yu},
        year = {2022},
        booktitle = {arxiv}
        }