This paper presents a novel system that enables a fully automatic modeling of both 3D geometry and functionality of a mechanism assembly from a single RGB image. The resulting 3D mechanism model highly resembles the one in the input image with the geometry, mechanical attributes, connectivity, and functionality of all the mechanical parts prescribed in a physically valid way. This challenging task is realized by combining various deep convolutional neural networks to provide high-quality and automatic part detection, segmentation, camera pose estimation and mechanical attributes retrieval for each individual part component. On the top of this, we use a local/global optimization algorithm to establish geometric interdependencies among all the parts while retaining their desired spatial arrangement. We use an interaction graph to abstract the inter-part connection in the resulting mechanism system. If an isolated component is identified in the graph, our system enumerates all the possible solutions to restore the graph connectivity, and outputs the one with the smallest residual error. We have extensively tested our system with a wide range of classic mechanism photos, and experimental results show that the proposed system is able to build high-quality 3D mechanism models without user guidance.

Supporting Information

References

Bertasius G., Shi J., Torresani L.: Semantic segmentation with boundary neural fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3602–3610. 3
Google Scholar
Ceylan D., Li W., Mitra N. J., Agrawala M., Pauly M.: Designing and fabricating mechanical automata from mocap sequences. ACM Trans. Graph. 32, 6 (2013), 186:1–186:11. 2
10.1145/2508363.2508400
Web of Science® Google Scholar
Coros S., Thomaszewski B., Noris G., Sueda S., Forberg M., Sumner R. W., Matusik W., Bickel B.: Computational design of mechanical characters. ACM Trans. Graph. 32, 4 (2013), 83:1–83:12. 2
10.1145/2461912.2461953
Web of Science® Google Scholar
Choy C. B., Xu D., Gwak J., Chen K., Savarese S.: 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In ECCV (2016), Springer, pp. 628–644. 3
Google Scholar
Chen T., Zhu Z., Shamir A., Hu S.-M., Cohen-Or D.: 3-sweep: Extracting editable objects from a single photo. ACM Transactions on Graphics (TOG) 32, 6 (2013), 195. 3
10.1145/2508363.2508378
Web of Science® Google Scholar
Dai J., He K., Li Y., Ren S., Sun J.: Instance-sensitive fully convolutional networks. In European Conference on Computer Vision (2016), Springer, pp. 534–549. 3
Google Scholar
Dai J., He K., Sun J.: Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3150–3158. 3
Google Scholar
Dwibedi D., Malisiewicz T., Badrinarayanan V., Rabinovich A.: Deep cuboid detection: Beyond 2d bounding boxes. CoRR abs/1611.10010 (2016). arXiv:1611.10010. 3
Google Scholar
Fan H., Su H., Guibas L. J.: A point set generation network for 3d object reconstruction from a single image. In CVPR (2017), pp. 2463–2471. 3
Google Scholar
Girshick R.: Fast r-cnn. In 2015 IEEE International Conference on Computer Vision (ICCV) (Dec 2015), pp. 1440–1448. 4
Google Scholar
He K., Gkioxari G., Dollár P., Girshick R. B.: Mask r-cnn. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 2980–2988. 3
Google Scholar
Hergel J., Lefebvre S.: 3d fabrication of 2d mechanisms. Comput. Graph. Forum 34, 2 (2015), 229–238. 2
10.1111/cgf.12555
Web of Science® Google Scholar
Huang Q., Wang H., Koltun V.: Single-view reconstruction via joint analysis of image and shape collections. ACM Trans. Graph. 34, 4 (2015), 87:1–87:10. 3
10.1145/2766890
CAS Web of Science® Google Scholar
He K., Zhang X., Ren S., Sun J.: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778. 3, 5
Google Scholar
Izadinia H., Shan Q., Seitz S. M.: Im2cad. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), IEEE, pp. 2422–2431. 3
Google Scholar
Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., Darrell T.: Caffe: Convolutional architecture for fast feature embedding. In MM ‘14 (2014), ACM, pp. 675–678. 9
Google Scholar
Jiang N., Tan R., Cheong L.-F.: Symmetric architecture modeling with a single image. ACM Trasn. Graph. 28, 5 (2009), 113:1–113:8. 3
10.1145/1618452.1618459
Web of Science® Google Scholar
Krähenbühl P., Koltun V.: Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in neural information processing systems (2011), pp. 109–117. 3
Google Scholar
Koo B., Li W., Yao J., Agrawala M., Mitra N. J.: Creating works-like prototypes of mechanical objects. ACM Trans. Graph. 33, 6 (2014), 217:1–217:9. 2
10.1145/2661229.2661289
Web of Science® Google Scholar
Krizhevsky A., Sutskever I., Hinton G. E.: Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012), pp. 1097–1105. 5
Google Scholar
Liu W., Anguelov D., Erhan D., Szegedy C., Reed S. E., Fu C., Berg A. C.: SSD: single shot multibox detector. In ECCV (2016), pp. 21–37. 3
Google Scholar
Lin T., Goyal P., Girshick R. B., He K., Dollár P.: Focal loss for dense object detection. In ICCV (2017), pp. 2999–3007. 3
Google Scholar
Lin G., Milan A., Shen C., Reid I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In CVPR (July 2017). 3
Google Scholar
Li Y., Qi H., Dai J., Ji X., Wei Y.: Fully convolutional instance-aware semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017). 3
Google Scholar
Long J., Shelhamer E., Darrell T.: Fully convolutional networks for semantic segmentation. In CVPR (2015), pp. 3431–3440. 3, 4
Google Scholar
Liu J., Sun J., Shum H.-Y.: Paint selection. ACM Transactions on Graphics (ToG) 28, 3 (2009), 69. 4
10.1145/1531326.1531375
CAS Web of Science® Google Scholar
Lin M., Shao T., Zheng Y., Mitra N. J., Zhou K.: Recovering functional mechanical assemblies from raw scans. IEEE transactions on visualization and computer graphics 24, 3 (2018), 1354–1367. 2, 6, 9, 10
10.1109/TVCG.2017.2662238
PubMed Web of Science® Google Scholar
Li Y., Wu X., Chrysathou Y., Sharf A., Cohen-Or D., Mitra N. J.: Globfit: Consistently fitting primitives by discovering global relations. In ACM Trans. Graph. (2011), Vol. 30, ACM, p. 52. 8
10.1145/2010324.1964947
CAS Web of Science® Google Scholar
Mitra N. J., Yang Y.-L., Yan D.-M., Li W., Agrawala M.: Illustrating how mechanical assemblies work. ACM Transactions on Graphics-TOG 29, 4 (2010), 58. 2, 10
10.1145/1778765.1778795
Web of Science® Google Scholar
Megaro V., Zehnder J., Bächer M., Coros S., Gross M., Thomaszewski B.: A computational design tool for compliant mechanisms. ACM Trans. Graph. 36, 4 (2017), 82:1–82:12. 2
10.1145/3072959.3073636
Web of Science® Google Scholar
Pinheiro P. O., Lin T.-Y., Collobert R., Dollár P.: Learning to refine object segments. In European Conference on Computer Vision (2016), Springer, pp. 75–91. 3
Google Scholar
Razavian A. S., Azizpour H., Sullivan J., Carlsson S.: Cnn features off-the-shelf: an astounding baseline for recognition. In CVPRW, 2014 IEEE Conference on (2014), IEEE, pp. 512–519. 4
Google Scholar
Ren S., He K., Girshick R., Sun J.: Fasterr-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (2015), pp. 91–99. 3, 4
Google Scholar
Sala P., Dickinson S.: 3-d volumetric shape abstraction from a single 2-d image. In Proceedings of the IEEE International Conference on Computer Vision Workshops (2015), pp. 1–9.
Google Scholar
Shao T., Li D., Rong Y., Zheng C., Zhou K.: Dynamic furniture modeling through assembly instructions. ACM Trans. Graph. 55, 6 (2016), 172–1. 11
Google Scholar
Su H., Qi C. R., Li Y., Guibas L. J.: Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In ICCV (2015), pp. 2686–2694. 3, 5, 9
Google Scholar
Song P., Wang X., Tang X., Fu C.-W., Xu H., Liu L., Mitra N. J.: Computational design of wind-up toys. ACM Trans. Graph. 36, 6 (2017), 238:1–238:13. 2
10.1145/3130800.3130808
Web of Science® Google Scholar
Simonyan K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). 4
Google Scholar
Thomaszewski B., Coros S., Gauge D., Megaro V., Grinspun E., Gross M.: Computational design of linkage-based characters. ACM Transactions on Graphics (TOG) 33, 4 (2014), 64. 2, 11
10.1145/2601097.2601143
Web of Science® Google Scholar
Ureta F., Tymms C., Zorin D.: Interactive modeling of mechanical objects. Eurographics Symposium on Geometry Processing 35, 5 (2016), 145–155. 2
Google Scholar
Wilczkowiak M., Sturm P., Boyer E.: Using geometric constraints through parallelepipeds for calibration and 3d modeling. IEEE Trans. Pattern Anal. Mach. Intell. 27, 2 (2005), 194–207. 3
10.1109/TPAMI.2005.40
PubMed Web of Science® Google Scholar
Xu M., Li M., Xu W., Deng Z., Yang Y., Zhou K.: Interactive mechanism modeling from multi-view images. ACM Trans. Graph 35, 6 (2016), 236. 2, 4, 6, 10
10.1145/2980179.2982425
Web of Science® Google Scholar
Xu K., Zheng H., Zhang H., Cohen-Or D., Liu L., XIONG Y.: Photo-inspired model-driven 3d object modeling. ACM Trans. Graph. 30, 4 (2011), 80:1–80:10. 3
10.1145/2010324.1964975
Web of Science® Google Scholar
Zhang R., Auzinger T., Ceylan D., Li W., Bickel B.: Functionality-aware retargeting of mechanisms to 3d shapes. ACM Trans. Graph. 36, 4 (2017), 81:1–81:13. 2
10.1145/3072959.3073710
Web of Science® Google Scholar
Zheng Y., Chen X., Cheng M.-M., Zhou K., Hu S.-M., Mitra N. J.: Interactive images: Cuboid proxies for smart image manipulation. ACM Trans. Graph. 31, 4 (2012), 99:1–99:11. 3
10.1145/2185520.2185595
Web of Science® Google Scholar
Zhu L., Xu W., Snyder J., Liu Y., Wang G., Guo B.: Motion-guided mechanical toy modeling. ACM Trans. Graph. 31, 6 (2012), 127–1. 2, 11
10.1145/2366145.2366146
Web of Science® Google Scholar

Citing Literature

Volume37, Issue7

October 2018

Pages 337-348

Automatic Mechanism Modeling from a Single Image with CNNs

Abstract

Supporting Information

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Automatic Mechanism Modeling from a Single Image with CNNs

Abstract

Supporting Information

References

Citing Literature

References

Related

Information