Single-view based methods

Layer-structured 3D Scene Inference via View Synthesis (ECCV 2018)


Infer a layer-structured 3D representation of a scene from a single input image.


Use view synthesis as a proxy task: we enforce that our representation (inferred from a single image), when rendered from a novel perspective, matches the true observed image.


Geometry-aware Deep Network for Single-Image Novel View Synthesis (CVPR 2018)


We propose to exploit the 3D geometry of the scene to synthesize a novel view. Specifically, we approximate a real-world scene by a fixed number of planes, and learn to predict a set of homographies and their corresponding region masks to transform the input image into a novel view.


Structure-Preserving Stereoscopic View Synthesis With Multi-Scale Adversarial Correlation Matching (CVPR 2019)


Stereoscopic view synthesis from a single image.


The proposed Multi-Scale Adversarial Correlation Matching (MS-ACM) framework does not assume any costly supervision signal of scene structures such as depth. Instead, it models structures as self-correlation coefficients extracted from multi-scale feature maps in transformed spaces. In training, the feature space attempts to push the correlation distances between the synthesized and target images far apart, thus amplifying inconsistent structures. At the same time, the view synthesis network minimizes such correlation distances by fixing mistakes it makes. With such adversarial training, structural errors of different scales and levels are iteratively discovered and reduced, preserving both global layouts and fine-grained details.


View Independent Generative Adversarial Network for Novel View Synthesis (ICCV 2019)


Our method is to let the network, after seeing many images of objects belonging to the same category in different views, obtain essential knowledge of intrinsic properties of the objects. An encoder is designed to extract view-independent feature that characterizes intrinsic properties of the input image, which includes 3D structure, color, texture etc. And the decoder hallucinate the image of a novel view based on the extracted feature and an arbitrary user-specific camera pose.


SynSin: End-to-end View Synthesis from a Single Image (2019)


We propose a novel end-to-end model for this task; it is trained on real images without any ground-truth 3D information. To this end, we introduce a novel differentiable point cloud renderer that is used to transform a latent 3D point cloud of features into the target view. The projected features are decoded by our refinement network to inpaint missing regions and generate a realistic output image.


DeepVoxels: Learning Persistent 3D Feature Embeddings (CVPR 2019)


To this end, we propose DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D scene without having to explicitly model its geometry. At its core, our approach is based on a Cartesian 3D grid of persistent embedded features that learn to make use of the underlying 3D scene structure.


Multi-view based methods

DeepStereo: Learning to Predict New Views from the World’s Imagery (CVPR 2016)


In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches, which consist of multiple complex stages of processing, each of which requires careful tuning and can fail in unexpected ways, our system is trained end-to-end. The pixels from neighboring views of a scene are presented to the network, which then directly produces the pixels of the unseen view.

View Synthesis by Appearance Flow (ECCV 2016)


Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows – 2D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view.

What is appearance flow?

For each pixel $i$ in the target view, the appearance flow vector $f (i) \in \mathbb{R}^2$ specifies the coordinate at the input view where the pixel value is sampled to reconstruct pixel $i$.



Soft 3D Reconstruction for View Synthesis (TOG 2017)


Our main contribution is the formulation of a so‰ 3D representation that preserves depth uncertainty through each stage of 3D reconstruction and rendering.


Deep Blending for Free-Viewpoint Image-Based Rendering (TOG 2018)


We present a new deep learning approach to blending for IBR, in which we use held-out real image data to learn blending weights to combine input photo contributions.


Extreme View Synthesis (ICCV 2019)


We follow the traditional paradigm of performing depthbased warping and refinement, with a few key improvements. First, we estimate a depth probability volume, rather than just a single depth value for each pixel of the novel view. This allows us to leverage depth uncertainty in challenging regions, such as depth discontinuities. After using it to get an initial estimate of the novel view, we explicitly combine learned image priors and the depth uncertainty to synthesize a refined image with less artifacts.


PerspectiveNet: A Scene-consistent Image Generator for New View Synthesis in Real Indoor Environments (NIPS 2019)


Our main contribution is a novel scene-level multi-camera optimization scheme termed PerspectiveNet. The crux of the method lies in regarding the joint set of reference and generated test views as a calibrated optical system.


IGNOR: Image-guided Neural Object Rendering (ICLR 2020)


Synthesize the view-dependent appearance of an object from an RGB video of the object.


We propose EffectsNet, a deep neural network that predicts view-dependent effects. Based on these estimations, we are able to convert observed images to diffuse images. These diffuse images can be projected into other views. In the target view, our pipeline reinserts the new view-dependent effects. To composite multiple reprojected images to a final output, we learn a composition network that outputs photo-realistic results.


Last modification:March 13th, 2020 at 09:11 am