UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene

UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene

Qi Yuan1
Zhiqiang Li1
Yuegen Liu1,3
Chaoping Xie1,2
Xuming Wen1,2
Qien Yu4
1Media Intelligence Laboratory, ChengDu Sobey Digital Technology Co., Ltd
2Peng Cheng Laboratory
3Southwest Jiaotong University
4Sichuan University

*Corresponding Author is Wei Wang.

Paper


UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene

Yaosen Chen, Qi Yuan, Zhiqiang Li, Yuegen Liu, Wei Wang, Chaoping Xie, Xuming Wen, Qien Yu

description arXiv version

Transferring photorealistic style with a style image in the 3D scene.Multi-view images of a given set of 3D scenes (a) and a style image (b), our model is capable of rendering photorealistic stylized novel views (c) with a consistent appearance at various view angles in 3D space.

Abstract


3D scenes photorealistic stylization aims to generate photorealistic images from arbitrary novel views according to a given style image while ensuring consistency when rendering from different viewpoints. Some existing stylization methods with neural radiance fields can effectively predict stylized scenes by combining the features of the style image with multi-view images to train 3D scenes. However, these methods generate novel view images that contain objectionable artifacts. Besides, they cannot achieve universal photorealistic stylization for a 3D scene. Therefore, a styling image must retrain a 3D scene representation network based on a neural radiation field. We propose a novel 3D scene photorealistic style transfer framework to address these issues. It can realize photorealistic 3D scene style transfer with a 2D style image. We first pre-trained a 2D photorealistic style transfer network, which can meet the photorealistic style transfer between any given content image and style image. Then, we use voxel features to optimize a 3D scene and get the geometric representation of the scene. Finally, we jointly optimize a hyper network to realize the scene photorealistic style transfer of arbitrary style images. In the transfer stage, we use a pre-trained 2D photorealistic network to constrain the photorealistic style of different views and different style images in the 3D scene. The experimental results show that our method not only realizes the 3D photorealistic style transfer of arbitrary style images but also outperforms the existing methods in terms of visual quality and consistency. Project page:https://semchan.github.io/UPST_NeRF/.

Approach


Overview of Universal Photorealistic Style Transfer of Neural Radiance Fields.In our framework, the training in photorealistic style transfer in 3D scenes divides into two stages. The first stage is geometric training for a single scene. We use the density voxel grid and feature voxel grid to represent the scene directly, and the density voxel grid is used to output density; the feature voxel grid with a shallow MLP of RGBNet use to predict the color. The second stage is style training. The parameters of the density voxel grid and feature voxel grid will be frozen, and we use a reference style image's features to be the input of the hyper network, which can control the RGBNet's input. Thus, we jointly optimize the hyper network to realize the scene photorealistic style transfer with arbitrary style images.




The architecture of YUVStyleNet. We designed a framework for 2D photorealistic style transfer, which supports the input of a full resolution style image and a full resolution content image, and realizes the photorealistic transfer of styles from the style image to the content image. In this framework, we transform the image into YUV channels. The final fusion uses the generated stylized UV channel, and the Y channel fusion after the stylized image is fused with the original content image to get the final photorealistic stylized image.

Qualitative comparisons with artistic style images




Qualitative comparisons with photorealistic style images


Consistency comparisons

Short-range and Long-range consistency.We use every two adjacent novel views ($O_{i},O_{i+1}$) and view pairs of gap 5 ($O_{i},O_{i+5}$) for short and long-range consistency calculation. The comparisons of short and long-range consistency are shown in Tab.1 and Tab.2, respectively. Our method outperforms other methods by a significant margin.


User study

User study.We record the user preference in the form of boxplot. Our results win more preferences both in the photorealistic stylization and consistency quality.


Appendix


on NeRF-Synthetic datasets


on Local Light Field Fusion(LLFF) datasets


Citation



        @inproceedings{chen2022upstnerf,
        title = {UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene},
        author = {Yaosen Chen and Qi Yuan and Zhiqiang Li and Yuegen Liu and Wei Wang and Chaoping Xie and Xuming Wen and Qien Yu},
        year = {2022},
        booktitle = {arxiv}
        }