Creating a virtual city is demanded for computer games, movies, and urban planning, but it takes a lot of time to create numerous 3D building models. Procedural modeling has become popular in recent years to overcome this issue, but creating a grammar to get a desired output is difficult and time consuming even for expert users. In this paper, we present an interactive tool that allows users to automatically generate such a grammar from a single image of a building. The user selects a photograph and highlights the silhouette of the target building as input to our method. Our pipeline automatically generates the building components, from large-scale building mass to fine-scale windows and doors geometry. Each stage of our pipeline combines convolutional neural networks (CNNs) and optimization to select and parameterize procedural grammars that reproduce the building elements of the picture. In the first stage, our method jointly estimates camera parameters and building mass shape. Once known, the building mass enables the rectification of the façades, which are given as input to the second stage that recovers the façade layout. This layout allows us to extract individual windows and doors that are subsequently fed to the last stage of the pipeline that selects procedural grammars for windows and doors. Finally, the grammars are combined to generate a complete procedural building as output. We devise a common methodology to make each stage of this pipeline tractable. This methodology consists in simplifying the input image to match the visual appearance of synthetic training data, and in using optimization to refine the parameters estimated by CNNs. We used our method to generate a variety of procedural models of buildings from existing photographs.

Supporting Information

References

Aubry M., Maturana D., Efros A., Russell B., Sivic J.: Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In CVPR (2014). 3
Google Scholar
Aliaga D. G., Rosen P. A., Bekins D. R.: Style grammars for interactive visualization of architecture. TVCG 13, 4 (2007), 786–797. 3
Google Scholar
Badrinarayanan V., Kendall A., Cipolla R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017). 4
Google Scholar
Bokeloh M., Wand M., Seidel H.: A connection between partial symmetry and inverse procedural modeling. ACM Trans. Graph. 29, 4 (2010), 104:1–104:10 3
10.1145/1778765.1778841
Web of Science® Google Scholar
Cipolla R., Drummond T., Robertson D.: Camera calibration from vanishing points in image of architectural scenes. In Proceedings of the British Machine Vision Conference (1999), pp. 38.1–38.10. 3, 6, 10
Google Scholar
Cohen A., Schwing A. G., Pollefeys M.: Efficient structured parsing of facades using dynamic programming. In CVPR (2014), pp. 3206–3213. 4, 7
Google Scholar
Chen T., Zhu Z., Shamir A., Hu S., Cohen-Or D.: 3sweepp: Extracting editable objects from a single photo. ACM Trans. Graph. 32, 6 (2013), 195:1–195:10 3
10.1145/2508363.2508378
Web of Science® Google Scholar
Demir I., Aliaga D. G., Benes B.: Proceduralization for editing 3d architectural models. In 3DV (2016), pp. 194–202. 3
Google Scholar
Debevec P. E., Taylor C. J., Malik J.: Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996), SIGGRAPH ‘96, pp. 11–20. 3
Google Scholar
Fan L., Wonka P.: A probabilistic model for exteriors of residential buildings. ACM Trans. Graph. 35, 5 (2016), 155:1–155:13 4
10.1145/2910578
Web of Science® Google Scholar
Guillou E., Meneveaux D., Maisel E., Bouatouch K.: Using vanishing points for camera calibration and coarse 3d reconstruction from a single image. The Visual Computer 16, 7 (2000), 396–410. 3, 6, 10
10.1007/PL00013394
Web of Science® Google Scholar
Hoiem D., Efros A. A., Hebert M.: Automatic photo pop-up. ACM Trans. Graph. 24, 3 (2005), 577–584. 2
10.1145/1073204.1073232
Web of Science® Google Scholar
Huang H., Kalogerakis E., Yumer E., Mech R.: Shape synthesis from sketches via procedural models and convolutional networks. TVCG PP, 99 (2016), 1–1. 3, 5
Google Scholar
Hara K., Vemulapalli R., Chellappa R.: Designing deep convolutional neural networks for continuous object orientation estimation. CoRR abs/1702.01499 (2017). 3, 4, 5
Google Scholar
Hartley R., Zisserman A.: Multiple View Geometry in Computer Vision, 2 ed. Cambridge University Press, New York, NY, USA, 2003. 3, 6
Google Scholar
He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). arXiv:1512. 03385. 14
Google Scholar
Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., Darrell T.: Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia (2014), MM ‘14, ACM, pp. 675–678. 9
Google Scholar
Jiang N., Tan P., C heong L.: Symmetric architecture modeling with a single image. ACM Trans. Graph. 28, 5 (2009), 113:1–113:8 3
10.1145/1618452.1618459
Web of Science® Google Scholar
Kelly T., Femiani J., Wonka P., Mitra N. J.: Bigsur: Large-scale structured urban reconstruction. ACM Trans. Graph. 36, 6 (2017), 204:1–204:16 2
10.1145/3130800.3130823
Web of Science® Google Scholar
Krizhevsky A., Sutskever I., Hinton G. E.: Imagenet classification with deep convolutional neural networks. In NIPS (2012), pp. 1097–1105. 4, 9
Google Scholar
Koutsourakis P., Simon L., Teboul O., Tziritas G., Paragios N.: Single view reconstruction using shape grammars for urban environments. In ICCV (2009), pp. 1795–1802. 4
Google Scholar
Lee S. C., Nevatia R.: Extraction and integration of window in a 3d building model from ground view images. In CVPR (2004), Vol. 2, pp. II–113–II–120 Vol. 2. 8
Google Scholar
Lowe D. G.: Three-dimensional object recognition from single two-dimensional images. Artif. Intell. 31, 3 (1987), 355–395. 3, 7
10.1016/0004-3702(87)90070-1
Web of Science® Google Scholar
Liu F., S. C., Lin G.: Deep convolutional neural fields for depth estimation from a single image. CoRR abs/1411.6387 (2015). 2
Google Scholar
Liu X., Zhao Y., c. Zhu S.: Single-view 3d scene parsing by attributed grammar. In CVPR (2014), pp. 684–691. 2
Google Scholar
Liu H., Zhang J., Zhu J., Hoi S. C. H.: Deepfacade: A deep learning approach to facade parsing. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17 (2017), pp. 2301–2307. 4, 12, 13, 14
Google Scholar
Martinović A., Gool L. V.: Bayesian grammar learning for inverse procedural modeling. In CVPR (2013), pp. 201–208. 4
Google Scholar
Massa F., Marlet R., Aubry M.: Crafting a multi-task cnn for viewpoint estimation. In British Machine Vision Conference (BMVC) (2016). 3, 4
Google Scholar
Mathias M., Martinović A., Van Gool L.: Atlas: A three-layered approach to facade parsing. International Journal of Computer Vision 118, 1 (2016), 22–48. 7
10.1007/s11263-015-0868-z
Web of Science® Google Scholar
Martinović A., Mathias M., Weissenberg J., Van Gool L.: A three-layered approach to facade parsing. In ECCV (2012), pp. 416–429. 7
Google Scholar
Massa F., Russell B. C., Aubry M.: Deep exemplar 2d-3d detection by adapting from real to rendered views. CoRR abs/1512.02497 (2015). 5
Google Scholar
Mundy J. L.: Object recognition in the geometric era: A retrospective. In Toward Category-Level Object Recognition (2006), Springer Berlin Heidelberg, pp. 3–28. 3, 7
Google Scholar
Musialski P., Wonka P., Aliaga D. G., Wimmer M., Gool L., Purgathofer W.: A survey of urban reconstruction. Comput. Graph. Forum 32, 6 (2013), 146–177. 2
10.1111/cgf.12077
Web of Science® Google Scholar
Musialski P., Wimmer M., Wonka P.: Interactive coherence-based façade modeling. Comput. Graph. Forum 31, 2pt3 (2012), 661–670. 4
10.1111/j.1467-8659.2012.03045.x
Web of Science® Google Scholar
Müller P., Zeng G., Wonka P., Van Gool L.: Image-based procedural modeling of facades. ACM Trans. Graph. 26, 3 (2007). 4, 7, 8, 12
10.1145/1276377.1276484
Web of Science® Google Scholar
Nishida G., Garcia-Dorado I., Aliaga D. G., Benes B., Bousseau A.: Interactive sketching of urban procedural models. ACM Trans. Graph. 35, 4 (2016), 130:1–130:11 3, 5, 9, 10
10.1145/2897824.2925951
Web of Science® Google Scholar
Oh B. M., Chen M., Dorsey J., Durand F.: Image-based modeling and photo editing. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (2001), pp. 433–442. 2, 3
Google Scholar
Powell M. J. D.: The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge University Press. 7
Google Scholar
Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., Berg A. C., Fei-Fei L.: Imagenet large scale visual recognition challenge. IJCV 115, 3 (2015), 211–252. 2, 5, 9
10.1007/s11263-015-0816-y
Web of Science® Google Scholar
Ronneberger O., Fischer P., B rox T.: U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention 9351, 3 (2015), 234–241. 4, 12, 13
Google Scholar
Ritchie D., Mildenhall B., Goodman N. D., Hanrahan P.: Controlling procedural modeling programs with stochastically-ordered sequential monte carlo. ACM Trans. Graph. 34, 4 (2015), 105:1–105:11 2, 3
10.1145/2766895
Web of Science® Google Scholar
Stava O., Benes B., Mech R., Aliaga D. G., Kristof P.: Inverse procedural modeling by automatic generation of l-systems. Comput. Graph. Forum 29, 2 (2010), 665–674. 3
10.1111/j.1467-8659.2009.01636.x
Web of Science® Google Scholar
Seitz S. M., Curless B., Diebel J., Scharstein D., Szeliski R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR (2006), pp. 519–528. 2
Google Scholar
Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., Rabinovich A.: Going deeper with convolutions. In CVPR (2015), pp. 1–9. 4
Google Scholar
Su H., Qi C. R., Li Y., Guibas L. J.: Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In ICCV (2015). 3, 4, 5
Google Scholar
Simonyan K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). 4
Google Scholar
Teboul O.: Ecole centrale paris facades database. URL: http://vision.mas.ecp.fr/Personnel/teboul/data. php. 5, 12
Google Scholar
Teboul O., Kokkinos I., Simon L., Koutsourakis P., Paragios N.: Shape grammar parsing via reinforcement learning. In CVPR (2011), pp. 2273–2280. 4, 7, 12, 13
Google Scholar
Talton J. O., Lou Y., Lesser S., Duke J., Měch R., Koltun V.: Metropolis procedural modeling. ACM Trans. Graph. 30, 2 (2011), 11:1–11:14 2, 3
10.1145/1944846.1944851
Web of Science® Google Scholar
Teboul O., Simon L., Koutsourakis P., Paragios N.: Segmentation of building facades using procedural shape priors. In CVPR (2010), pp. 3105–3112. 4, 7
Google Scholar
Tyleček R., Šára R.: Spatial pattern templates for recognition of objects with regular structure. In GCPR (2013), Springer Berlin Heidelberg, pp. 364–374. 5, 7
Google Scholar
Vanegas C. A., Aliaga D. G., Beneš B.: Building reconstruction using manhattan-world grammars. In CVPR (2010), pp. 358–365. 4
Google Scholar
Vanegas C. A., Aliaga D. G., Wonka P., Müller P., Waddell P., Watson B.: Modelling the appearance and behaviour of urban spaces. Comput. Graph. Forum 29, 1 (2010), 25–42. 3
10.1111/j.1467-8659.2009.01535.x
Web of Science® Google Scholar
Vanegas C. A., Garcia-Dorado I., Aliaga D. G., Benes B., Waddell P.: Inverse design of urban procedural models. ACM Trans. Graph. 31, 6 (2012), 168:1–168:11 3
10.1145/2366145.2366187
Web of Science® Google Scholar
Wu C., Frahm J., Pollefeys M.: Detecting large repetitive structures with salient boundaries. In ECCV (2010), pp. 142–155. 12, 13
Google Scholar
Wu F., Yan D., Dong W., Zhang X., Wonka P.: Inverse procedural modeling of facade layouts. ACM Trans. Graph. 33, 4 (2014), 121:1–121:10 4
10.1145/2601097.2601162
Web of Science® Google Scholar
Xiao J., Hays J., Ehinger K. A., Oliva A., Torralba A.: Sun database: Large-scale scene recognition from abbey to zoo. In CVPR (2010), pp. 3485–3492. 2, 5, 9
Google Scholar
Zheng Y., C hen X., Cheng M., Zhou K., Hu S., Mitra N. J.: Interactive images: Cuboid proxies for smart image manipulation. ACM Trans. Graph. 31, 4 (2012), 99:1–99:11 3
10.1145/2185520.2185595
Web of Science® Google Scholar
Zia Z., Stark M., Schiele B., Schindler K.: Detailed 3d representations for object recognition and modeling. IEEE Transactions on Patterm Analysis and Machine Intelligence (PAMI) 35, 11 (2013), 2608–2623. 3
10.1109/TPAMI.2013.87
PubMed Web of Science® Google Scholar

Citing Literature

Volume37, Issue2

May 2018

Pages 415-429

Filename	Description
cgf13372-sup-0001-S1.zip52.4 MB	Supporting Information
cgf13372-sup-0003-S1.pdf2 MB	Supporting Information

Procedural Modeling of a Building from a Single Image

Abstract

Supporting Information

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Procedural Modeling of a Building from a Single Image

Abstract

Supporting Information

References

Citing Literature

References

Related

Information