Procedural Modeling of a Building from a Single Image
Abstract
Creating a virtual city is demanded for computer games, movies, and urban planning, but it takes a lot of time to create numerous 3D building models. Procedural modeling has become popular in recent years to overcome this issue, but creating a grammar to get a desired output is difficult and time consuming even for expert users. In this paper, we present an interactive tool that allows users to automatically generate such a grammar from a single image of a building. The user selects a photograph and highlights the silhouette of the target building as input to our method. Our pipeline automatically generates the building components, from large-scale building mass to fine-scale windows and doors geometry. Each stage of our pipeline combines convolutional neural networks (CNNs) and optimization to select and parameterize procedural grammars that reproduce the building elements of the picture. In the first stage, our method jointly estimates camera parameters and building mass shape. Once known, the building mass enables the rectification of the façades, which are given as input to the second stage that recovers the façade layout. This layout allows us to extract individual windows and doors that are subsequently fed to the last stage of the pipeline that selects procedural grammars for windows and doors. Finally, the grammars are combined to generate a complete procedural building as output. We devise a common methodology to make each stage of this pipeline tractable. This methodology consists in simplifying the input image to match the visual appearance of synthetic training data, and in using optimization to refine the parameters estimated by CNNs. We used our method to generate a variety of procedural models of buildings from existing photographs.
Supporting Information
Filename | Description |
---|---|
cgf13372-sup-0001-S1.zip52.4 MB | Supporting Information |
cgf13372-sup-0003-S1.pdf2 MB | Supporting Information |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Aubry M., Maturana D., Efros A., Russell B., Sivic J.: Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In CVPR (2014). 3
- Aliaga D. G., Rosen P. A., Bekins D. R.: Style grammars for interactive visualization of architecture. TVCG 13, 4 (2007), 786–797. 3
- Badrinarayanan V., Kendall A., Cipolla R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017). 4
- Bokeloh M., Wand M., Seidel H.: A connection between partial symmetry and inverse procedural modeling. ACM Trans. Graph. 29, 4 (2010), 104:1–104:10 3
- Cipolla R., Drummond T., Robertson D.: Camera calibration from vanishing points in image of architectural scenes. In Proceedings of the British Machine Vision Conference (1999), pp. 38.1–38.10. 3, 6, 10
- Cohen A., Schwing A. G., Pollefeys M.: Efficient structured parsing of facades using dynamic programming. In CVPR (2014), pp. 3206–3213. 4, 7
- Chen T., Zhu Z., Shamir A., Hu S., Cohen-Or D.: 3sweepp: Extracting editable objects from a single photo. ACM Trans. Graph. 32, 6 (2013), 195:1–195:10 3
- Demir I., Aliaga D. G., Benes B.: Proceduralization for editing 3d architectural models. In 3DV (2016), pp. 194–202. 3
- Debevec P. E., Taylor C. J., Malik J.: Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996), SIGGRAPH ‘96, pp. 11–20. 3
- Fan L., Wonka P.: A probabilistic model for exteriors of residential buildings. ACM Trans. Graph. 35, 5 (2016), 155:1–155:13 4
- Guillou E., Meneveaux D., Maisel E., Bouatouch K.: Using vanishing points for camera calibration and coarse 3d reconstruction from a single image. The Visual Computer 16, 7 (2000), 396–410. 3, 6, 10
- Hoiem D., Efros A. A., Hebert M.: Automatic photo pop-up. ACM Trans. Graph. 24, 3 (2005), 577–584. 2
- Huang H., Kalogerakis E., Yumer E., Mech R.: Shape synthesis from sketches via procedural models and convolutional networks. TVCG PP, 99 (2016), 1–1. 3, 5
- Hara K., Vemulapalli R., Chellappa R.: Designing deep convolutional neural networks for continuous object orientation estimation. CoRR abs/1702.01499 (2017). 3, 4, 5
- Hartley R., Zisserman A.: Multiple View Geometry in Computer Vision, 2 ed. Cambridge University Press, New York, NY, USA, 2003. 3, 6
- He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). arXiv:1512. 03385. 14
- Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., Darrell T.: Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia (2014), MM ‘14, ACM, pp. 675–678. 9
- Jiang N., Tan P., C heong L.: Symmetric architecture modeling with a single image. ACM Trans. Graph. 28, 5 (2009), 113:1–113:8 3
- Kelly T., Femiani J., Wonka P., Mitra N. J.: Bigsur: Large-scale structured urban reconstruction. ACM Trans. Graph. 36, 6 (2017), 204:1–204:16 2
- Krizhevsky A., Sutskever I., Hinton G. E.: Imagenet classification with deep convolutional neural networks. In NIPS (2012), pp. 1097–1105. 4, 9
- Koutsourakis P., Simon L., Teboul O., Tziritas G., Paragios N.: Single view reconstruction using shape grammars for urban environments. In ICCV (2009), pp. 1795–1802. 4
- Lee S. C., Nevatia R.: Extraction and integration of window in a 3d building model from ground view images. In CVPR (2004), Vol. 2, pp. II–113–II–120 Vol. 2. 8
- Lowe D. G.: Three-dimensional object recognition from single two-dimensional images. Artif. Intell. 31, 3 (1987), 355–395. 3, 7
- Liu F., S. C., Lin G.: Deep convolutional neural fields for depth estimation from a single image. CoRR abs/1411.6387 (2015). 2
- Liu X., Zhao Y., c. Zhu S.: Single-view 3d scene parsing by attributed grammar. In CVPR (2014), pp. 684–691. 2
- Liu H., Zhang J., Zhu J., Hoi S. C. H.: Deepfacade: A deep learning approach to facade parsing. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17 (2017), pp. 2301–2307. 4, 12, 13, 14
- Martinović A., Gool L. V.: Bayesian grammar learning for inverse procedural modeling. In CVPR (2013), pp. 201–208. 4
- Massa F., Marlet R., Aubry M.: Crafting a multi-task cnn for viewpoint estimation. In British Machine Vision Conference (BMVC) (2016). 3, 4
- Mathias M., Martinović A., Van Gool L.: Atlas: A three-layered approach to facade parsing. International Journal of Computer Vision 118, 1 (2016), 22–48. 7
- Martinović A., Mathias M., Weissenberg J., Van Gool L.: A three-layered approach to facade parsing. In ECCV (2012), pp. 416–429. 7
- Massa F., Russell B. C., Aubry M.: Deep exemplar 2d-3d detection by adapting from real to rendered views. CoRR abs/1512.02497 (2015). 5
- Mundy J. L.: Object recognition in the geometric era: A retrospective. In Toward Category-Level Object Recognition (2006), Springer Berlin Heidelberg, pp. 3–28. 3, 7
- Musialski P., Wonka P., Aliaga D. G., Wimmer M., Gool L., Purgathofer W.: A survey of urban reconstruction. Comput. Graph. Forum 32, 6 (2013), 146–177. 2
- Musialski P., Wimmer M., Wonka P.: Interactive coherence-based façade modeling. Comput. Graph. Forum 31, 2pt3 (2012), 661–670. 4
- Müller P., Zeng G., Wonka P., Van Gool L.: Image-based procedural modeling of facades. ACM Trans. Graph. 26, 3 (2007). 4, 7, 8, 12
- Nishida G., Garcia-Dorado I., Aliaga D. G., Benes B., Bousseau A.: Interactive sketching of urban procedural models. ACM Trans. Graph. 35, 4 (2016), 130:1–130:11 3, 5, 9, 10
- Oh B. M., Chen M., Dorsey J., Durand F.: Image-based modeling and photo editing. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (2001), pp. 433–442. 2, 3
- Powell M. J. D.: The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge University Press. 7
- Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., Berg A. C., Fei-Fei L.: Imagenet large scale visual recognition challenge. IJCV 115, 3 (2015), 211–252. 2, 5, 9
- Ronneberger O., Fischer P., B rox T.: U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention 9351, 3 (2015), 234–241. 4, 12, 13
- Ritchie D., Mildenhall B., Goodman N. D., Hanrahan P.: Controlling procedural modeling programs with stochastically-ordered sequential monte carlo. ACM Trans. Graph. 34, 4 (2015), 105:1–105:11 2, 3
- Stava O., Benes B., Mech R., Aliaga D. G., Kristof P.: Inverse procedural modeling by automatic generation of l-systems. Comput. Graph. Forum 29, 2 (2010), 665–674. 3
- Seitz S. M., Curless B., Diebel J., Scharstein D., Szeliski R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR (2006), pp. 519–528. 2
- Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., Rabinovich A.: Going deeper with convolutions. In CVPR (2015), pp. 1–9. 4
- Su H., Qi C. R., Li Y., Guibas L. J.: Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In ICCV (2015). 3, 4, 5
- Simonyan K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). 4
- Teboul O.: Ecole centrale paris facades database. URL: http://vision.mas.ecp.fr/Personnel/teboul/data. php. 5, 12
- Teboul O., Kokkinos I., Simon L., Koutsourakis P., Paragios N.: Shape grammar parsing via reinforcement learning. In CVPR (2011), pp. 2273–2280. 4, 7, 12, 13
- Talton J. O., Lou Y., Lesser S., Duke J., Měch R., Koltun V.: Metropolis procedural modeling. ACM Trans. Graph. 30, 2 (2011), 11:1–11:14 2, 3
- Teboul O., Simon L., Koutsourakis P., Paragios N.: Segmentation of building facades using procedural shape priors. In CVPR (2010), pp. 3105–3112. 4, 7
- Tyleček R., Šára R.: Spatial pattern templates for recognition of objects with regular structure. In GCPR (2013), Springer Berlin Heidelberg, pp. 364–374. 5, 7
- Vanegas C. A., Aliaga D. G., Beneš B.: Building reconstruction using manhattan-world grammars. In CVPR (2010), pp. 358–365. 4
- Vanegas C. A., Aliaga D. G., Wonka P., Müller P., Waddell P., Watson B.: Modelling the appearance and behaviour of urban spaces. Comput. Graph. Forum 29, 1 (2010), 25–42. 3
- Vanegas C. A., Garcia-Dorado I., Aliaga D. G., Benes B., Waddell P.: Inverse design of urban procedural models. ACM Trans. Graph. 31, 6 (2012), 168:1–168:11 3
- Wu C., Frahm J., Pollefeys M.: Detecting large repetitive structures with salient boundaries. In ECCV (2010), pp. 142–155. 12, 13
- Wu F., Yan D., Dong W., Zhang X., Wonka P.: Inverse procedural modeling of facade layouts. ACM Trans. Graph. 33, 4 (2014), 121:1–121:10 4
- Xiao J., Hays J., Ehinger K. A., Oliva A., Torralba A.: Sun database: Large-scale scene recognition from abbey to zoo. In CVPR (2010), pp. 3485–3492. 2, 5, 9
- Zheng Y., C hen X., Cheng M., Zhou K., Hu S., Mitra N. J.: Interactive images: Cuboid proxies for smart image manipulation. ACM Trans. Graph. 31, 4 (2012), 99:1–99:11 3
- Zia Z., Stark M., Schiele B., Schindler K.: Detailed 3d representations for object recognition and modeling. IEEE Transactions on Patterm Analysis and Machine Intelligence (PAMI) 35, 11 (2013), 2608–2623. 3