Automatic Mechanism Modeling from a Single Image with CNNs
Abstract
This paper presents a novel system that enables a fully automatic modeling of both 3D geometry and functionality of a mechanism assembly from a single RGB image. The resulting 3D mechanism model highly resembles the one in the input image with the geometry, mechanical attributes, connectivity, and functionality of all the mechanical parts prescribed in a physically valid way. This challenging task is realized by combining various deep convolutional neural networks to provide high-quality and automatic part detection, segmentation, camera pose estimation and mechanical attributes retrieval for each individual part component. On the top of this, we use a local/global optimization algorithm to establish geometric interdependencies among all the parts while retaining their desired spatial arrangement. We use an interaction graph to abstract the inter-part connection in the resulting mechanism system. If an isolated component is identified in the graph, our system enumerates all the possible solutions to restore the graph connectivity, and outputs the one with the smallest residual error. We have extensively tested our system with a wide range of classic mechanism photos, and experimental results show that the proposed system is able to build high-quality 3D mechanism models without user guidance.
Supporting Information
Filename | Description |
---|---|
cgf13572-sup-0001-S1.mp424.2 MB | Supplement Material |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Bertasius G., Shi J., Torresani L.: Semantic segmentation with boundary neural fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3602–3610. 3
- Ceylan D., Li W., Mitra N. J., Agrawala M., Pauly M.: Designing and fabricating mechanical automata from mocap sequences. ACM Trans. Graph. 32, 6 (2013), 186:1–186:11. 2
- Coros S., Thomaszewski B., Noris G., Sueda S., Forberg M., Sumner R. W., Matusik W., Bickel B.: Computational design of mechanical characters. ACM Trans. Graph. 32, 4 (2013), 83:1–83:12. 2
- Choy C. B., Xu D., Gwak J., Chen K., Savarese S.: 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In ECCV (2016), Springer, pp. 628–644. 3
- Chen T., Zhu Z., Shamir A., Hu S.-M., Cohen-Or D.: 3-sweep: Extracting editable objects from a single photo. ACM Transactions on Graphics (TOG) 32, 6 (2013), 195. 3
- Dai J., He K., Li Y., Ren S., Sun J.: Instance-sensitive fully convolutional networks. In European Conference on Computer Vision (2016), Springer, pp. 534–549. 3
- Dai J., He K., Sun J.: Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3150–3158. 3
- Dwibedi D., Malisiewicz T., Badrinarayanan V., Rabinovich A.: Deep cuboid detection: Beyond 2d bounding boxes. CoRR abs/1611.10010 (2016). arXiv:1611.10010. 3
- Fan H., Su H., Guibas L. J.: A point set generation network for 3d object reconstruction from a single image. In CVPR (2017), pp. 2463–2471. 3
- Girshick R.: Fast r-cnn. In 2015 IEEE International Conference on Computer Vision (ICCV) (Dec 2015), pp. 1440–1448. 4
- He K., Gkioxari G., Dollár P., Girshick R. B.: Mask r-cnn. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 2980–2988. 3
- Hergel J., Lefebvre S.: 3d fabrication of 2d mechanisms. Comput. Graph. Forum 34, 2 (2015), 229–238. 2
- Huang Q., Wang H., Koltun V.: Single-view reconstruction via joint analysis of image and shape collections. ACM Trans. Graph. 34, 4 (2015), 87:1–87:10. 3
- He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778. 3, 5
- Izadinia H., Shan Q., Seitz S. M.: Im2cad. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), IEEE, pp. 2422–2431. 3
- Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., Darrell T.: Caffe: Convolutional architecture for fast feature embedding. In MM ‘14 (2014), ACM, pp. 675–678. 9
- Jiang N., Tan R., Cheong L.-F.: Symmetric architecture modeling with a single image. ACM Trasn. Graph. 28, 5 (2009), 113:1–113:8. 3
- Krähenbühl P., Koltun V.: Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in neural information processing systems (2011), pp. 109–117. 3
- Koo B., Li W., Yao J., Agrawala M., Mitra N. J.: Creating works-like prototypes of mechanical objects. ACM Trans. Graph. 33, 6 (2014), 217:1–217:9. 2
- Krizhevsky A., Sutskever I., Hinton G. E.: Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012), pp. 1097–1105. 5
- Liu W., Anguelov D., Erhan D., Szegedy C., Reed S. E., Fu C., Berg A. C.: SSD: single shot multibox detector. In ECCV (2016), pp. 21–37. 3
- Lin T., Goyal P., Girshick R. B., He K., Dollár P.: Focal loss for dense object detection. In ICCV (2017), pp. 2999–3007. 3
- Lin G., Milan A., Shen C., Reid I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR (July 2017). 3
- Li Y., Qi H., Dai J., Ji X., Wei Y.: Fully convolutional instance-aware semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017). 3
- Long J., Shelhamer E., Darrell T.: Fully convolutional networks for semantic segmentation. In CVPR (2015), pp. 3431–3440. 3, 4
- Liu J., Sun J., Shum H.-Y.: Paint selection. ACM Transactions on Graphics (ToG) 28, 3 (2009), 69. 4
- Lin M., Shao T., Zheng Y., Mitra N. J., Zhou K.: Recovering functional mechanical assemblies from raw scans. IEEE transactions on visualization and computer graphics 24, 3 (2018), 1354–1367. 2, 6, 9, 10
- Li Y., Wu X., Chrysathou Y., Sharf A., Cohen-Or D., Mitra N. J.: Globfit: Consistently fitting primitives by discovering global relations. In ACM Trans. Graph. (2011), Vol. 30, ACM, p. 52. 8
- Mitra N. J., Yang Y.-L., Yan D.-M., Li W., Agrawala M.: Illustrating how mechanical assemblies work. ACM Transactions on Graphics-TOG 29, 4 (2010), 58. 2, 10
- Megaro V., Zehnder J., Bächer M., Coros S., Gross M., Thomaszewski B.: A computational design tool for compliant mechanisms. ACM Trans. Graph. 36, 4 (2017), 82:1–82:12. 2
- Pinheiro P. O., Lin T.-Y., Collobert R., Dollár P.: Learning to refine object segments. In European Conference on Computer Vision (2016), Springer, pp. 75–91. 3
- Razavian A. S., Azizpour H., Sullivan J., Carlsson S.: Cnn features off-the-shelf: an astounding baseline for recognition. In CVPRW, 2014 IEEE Conference on (2014), IEEE, pp. 512–519. 4
- Ren S., He K., Girshick R., Sun J.: Fasterr-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (2015), pp. 91–99. 3, 4
- Sala P., Dickinson S.: 3-d volumetric shape abstraction from a single 2-d image. In Proceedings of the IEEE International Conference on Computer Vision Workshops (2015), pp. 1–9.
- Shao T., Li D., Rong Y., Zheng C., Zhou K.: Dynamic furniture modeling through assembly instructions. ACM Trans. Graph. 55, 6 (2016), 172–1. 11
- Su H., Qi C. R., Li Y., Guibas L. J.: Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In ICCV (2015), pp. 2686–2694. 3, 5, 9
- Song P., Wang X., Tang X., Fu C.-W., Xu H., Liu L., Mitra N. J.: Computational design of wind-up toys. ACM Trans. Graph. 36, 6 (2017), 238:1–238:13. 2
- Simonyan K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). 4
- Thomaszewski B., Coros S., Gauge D., Megaro V., Grinspun E., Gross M.: Computational design of linkage-based characters. ACM Transactions on Graphics (TOG) 33, 4 (2014), 64. 2, 11
- Ureta F., Tymms C., Zorin D.: Interactive modeling of mechanical objects. Eurographics Symposium on Geometry Processing 35, 5 (2016), 145–155. 2
- Wilczkowiak M., Sturm P., Boyer E.: Using geometric constraints through parallelepipeds for calibration and 3d modeling. IEEE Trans. Pattern Anal. Mach. Intell. 27, 2 (2005), 194–207. 3
- Xu M., Li M., Xu W., Deng Z., Yang Y., Zhou K.: Interactive mechanism modeling from multi-view images. ACM Trans. Graph 35, 6 (2016), 236. 2, 4, 6, 10
- Xu K., Zheng H., Zhang H., Cohen-Or D., Liu L., XIONG Y.: Photo-inspired model-driven 3d object modeling. ACM Trans. Graph. 30, 4 (2011), 80:1–80:10. 3
- Zhang R., Auzinger T., Ceylan D., Li W., Bickel B.: Functionality-aware retargeting of mechanisms to 3d shapes. ACM Trans. Graph. 36, 4 (2017), 81:1–81:13. 2
- Zheng Y., Chen X., Cheng M.-M., Zhou K., Hu S.-M., Mitra N. J.: Interactive images: Cuboid proxies for smart image manipulation. ACM Trans. Graph. 31, 4 (2012), 99:1–99:11. 3
- Zhu L., Xu W., Snyder J., Liu Y., Wang G., Guo B.: Motion-guided mechanical toy modeling. ACM Trans. Graph. 31, 6 (2012), 127–1. 2, 11