Learning Scene Illumination by Pairwise Photos from Rear and Front Mobile Cameras
Dachuan Cheng
State Key Laboratory of Computer Science, Institute of Software, University of Chinese Academy of Sciences
Search for more papers by this authorJian Shi
Institute of Automation, University of Chinese Academy of Sciences
Search for more papers by this authorYanyun Chen
State Key Laboratory of Computer Science, Institute of Software, University of Chinese Academy of Sciences
Search for more papers by this authorXiaoming Deng
Beijing Key Laboratory of Human Computer Interactions, Institute of Software, Chinese Academy of Sciences
Search for more papers by this authorXiaopeng. Zhang
Institute of Automation, University of Chinese Academy of Sciences
Search for more papers by this authorDachuan Cheng
State Key Laboratory of Computer Science, Institute of Software, University of Chinese Academy of Sciences
Search for more papers by this authorJian Shi
Institute of Automation, University of Chinese Academy of Sciences
Search for more papers by this authorYanyun Chen
State Key Laboratory of Computer Science, Institute of Software, University of Chinese Academy of Sciences
Search for more papers by this authorXiaoming Deng
Beijing Key Laboratory of Human Computer Interactions, Institute of Software, Chinese Academy of Sciences
Search for more papers by this authorXiaopeng. Zhang
Institute of Automation, University of Chinese Academy of Sciences
Search for more papers by this authorAbstract
Illumination estimation is an essential problem in computer vision, graphics and augmented reality. In this paper, we propose a learning based method to recover low-frequency scene illumination represented as spherical harmonic (SH) functions by pairwise photos from rear and front cameras on mobile devices. An end-to-end deep convolutional neural network (CNN) structure is designed to process images on symmetric views and predict SH coefficients. We introduce a novel Render Loss to improve the rendering quality of the predicted illumination. A high quality high dynamic range (HDR) panoramic image dataset was developed for training and evaluation. Experiments show that our model produces visually and quantitatively superior results compared to the state-of-the-arts. Moreover, our method is practical for mobile-based applications.
References
- Barron J. T., Malik J.: Intrinsic scene properties from a single rgb-d image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 17–24. 1, 2
- Barron J. T., Malik J.: Shape, illumination, and reflectance from shading. IEEE transactions on pattern analysis and machine intelligence 37, 8 (2015), 1670–1687. 1, 2
- Chang A. X., Funkhouser T., Guibas L., Hanrahan P., Huang Q., Li Z., Savarese S., Savva M., Song S., Su H., et al.: Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015). 2
- Chaitanya C. R. A., Kaplanyan A. S., Schied C., Salvi M., Lefohn A., Nowrouzezahrai D., Aila T.: Interactive reconstruction of monte carlo image sequences using a recurrent denoising autoencoder. ACM Transactions on Graphics (TOG) 36, 4 (2017), 98. 2
- Calian D. A., Lalonde J.-F., Gotardo P., Simon T., Matthews I., Mitchell K.: From faces to outdoor light probes. In Computer Graphics Forum (2018), Vol. 37, Wiley Online Library, pp. 51–61. 6
- Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L.: Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (2009), IEEE, pp. 248–255. 2
- Debevec P.: Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques (1998), ACM, pp. 189–198. 2
- Girshick R., Donahue J., Darrell T., Malik J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (2014), pp. 580–587. 2
- Google: Tf-mesh-renderer. https://github.com/google/tf_mesh_renderer, 2017. 2
- Green R.: Spherical harmonic lighting: The gritty details. In Archives of the Game Developers Conference (2003), Vol. 56, p. 4. 3
- Gardner M.-A., Sunkavalli K., Yumer E., Shen X., Gambaretto E., Gagné C., Lalonde J.-F.: Learning to predict indoor illumination from a single image. ACM Transactions on Graphics (SIGGRAPH Asia) 9, 4 (2017). 1, 2, 5, 6
- Haber T., Fuchs C., Bekaer P., Seidel H.-P., Goesele M., Lensch H. P.: Relighting objects from image collections. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (2009), IEEE, pp. 627–634. 1, 2
- Hold-Geoffroy Y., Sunkavalli K., Hadap S., Gambaretto E., Lalonde J.-F.: Deep outdoor illumination estimation. In IEEE International Conference on Computer Vision and Pattern Recognition (2017). 1, 2, 5, 6
- Hara K., Nishino K., Ikeuchi K.: Multiple light sources and reflectance property estimation based on a mixture of spherical distributions. In Tenth IEEE International Conference on Computer Vision (2005), pp. 1627–1634. 3
- HE K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778. 2, 4
- Karras T., Aila T., Laine S., Herva A., Lehtinen J.: Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics (TOG) 36, 4 (2017), 94. 2
- Karsch K., Hedau V., Forsyth D., Hoiem D.: Rendering synthetic objects into legacy photographs. In ACM Transactions on Graphics (TOG) (2011), Vol. 30, ACM, p. 157. 1
- Krizhevsky A., Sutskever I., Hinton G. E.: Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger, (Eds.). Curran Associates, Inc., 2012, pp. 1097–1105. 2, 4
- Karsch K., Sunkavalli K., Hadap S., Carr N., Jin H., Fonte R., Sittig M., Forsyth D.: Automatic scene inference for 3d object compositing. ACM Transactions on Graphics (TOG) 33, 3 (2014), 32. 2
- LeCun Y., Bottou L., Bengio Y., Haffner P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324. 2
- Lalonde J.-F., Matthews I.: Lighting estimation in outdoor image collections. In 3D Vision (3DV), 2014 2nd International Conference on (2014), Vol. 1, IEEE, pp. 131–138. 1, 2
- Lombardi S., Nishino K.: Reflectance and illumination recovery in the wild. IEEE transactions on pattern analysis and machine intelligence 38, 1 (2016), 129–141. 1, 2
- Liu B., Xu K., Martin R. R.: Static scene illumination estimation from videos with applications. Journal of Computer Science and Technology 32, 3 (2017), 430–442. 2
- Mi: Mi sphere camera kit. https://www.mi.com/us/mi-sphere-camera-kit/, 2017. 6
- Manakov A., Restrepo J., Klehm O., Hegedus R., Eisemann E., Seidel H.-P., Ihrke I.: A reconfigurable camera addon for high dynamic range, multispectral, polarization, and light-field imaging. ACM Transactions on Graphics 32, 4 (2013), 47–1. 2
- Ng R., Ramamoorthi R., Hanrahan P.: All-frequency shadows using non-linear wavelet lighting approximation. Proc Acm Siggraph 22, 3 (2003), 376–381. 3
- Orts-Escolano S., Rhemann C., Fanello S., Chang W., Kowdle A., Degtyarev Y., Kim D., Davidson P. L., Khamis S., Dou M., et al.: Holoportation: Virtual 3d teleportation in real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (2016), ACM, pp. 741–754. 1
- Ronneberger O., Fischer P., Brox T.: U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (2015), Springer, pp. 234–241. 2
- Ramamoorthi R., Hanrahan P.: An efficient representation for irradiance environment maps. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques (2001), ACM, pp. 497–500. 3, 4
- Rogge L., Klose F., Stengel M., Eisemann M., Magnor M.: Garment replacement in monocular video sequences. ACM Transactions on Graphics (TOG) 34, 1 (2014), 6. 1
- Shan Q., Adams R., Curless B., Furukawa Y., Seitz S. M.: The visual turing test for scene reconstruction. In 3DTV-Conference, 2013 International Conference on (2013), IEEE, pp. 25–32. 1, 2
- Shi J., Dong Y., Tong X., Chen Y.: Efficient intrinsic image decomposition for rgbd images. In Proceedings of the 21st ACM Symposium on Virtual Reality Software and Technology (2015), ACM, pp. 17–25. 2
- Shu Z., Yumer E., Hadap S., Sunkavalli K., Shechtman E., Samaras D.: Neural face editing with intrinsic image disentangling. 5444–5453. 2
- Simonyan K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). 2, 4
- Tocci M. D., Kiser C., TOCCI N., Sen P.: A versatile hdr video production system. ACM Transactions on Graphics (TOG) 30, 4 (2011), 41. 2
- Todo H., Yamaguchi Y.: Estimating reflectance and shape of objects from a single cartoon-shaded image. Computational Visual Media 3, 1 (2017), 21–31. 2
10.1007/s41095-016-0066-0 Google Scholar
- VICON: Boujou. https://www.vicon.com/products/software/boujou, 2017. 1
- WIKI: Virtualadvertising. https://en.wikipedia.org/wiki/Virtual_advertising, 2017. 1
- Yi R., Zhu C., Tan P., Lin S.: Faces as lighting probes via unsupervised deep highlight extraction. arXiv preprint arXiv:1803.06340 (2018). 6
- Zhang E., Cohen M. F., Curless B.: Emptying, refurnishing, and relighting indoor spaces. ACM Transactions on Graphics (TOG) 35, 6 (2016), 174. 1, 2
- Zhou B., Lapedriza A., Khosla A., Oliva A., Torralba A.: Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017). 3, 5
- Zhou Z., Shu B., Zhuo S., Deng X., Tan P., Lin S.: Image-based clothes animation for virtual fitting. In SIGGRAPH Asia (2012), pp. 33: 1–33:4. 1