Self-Supervised Learning of Part Mobility from Point Cloud Sequence
Yahao Shi
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
Search for more papers by this authorXinyu Cao
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
Search for more papers by this authorCorresponding Author
Bin Zhou
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
Search for more papers by this authorYahao Shi
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
Search for more papers by this authorXinyu Cao
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
Search for more papers by this authorCorresponding Author
Bin Zhou
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
Search for more papers by this authorAbstract
Part mobility analysis is a significant aspect required to achieve a functional understanding of 3D objects. It would be natural to obtain part mobility from the continuous part motion of 3D objects. In this study, we introduce a self-supervised method for segmenting motion parts and predicting their motion attributes from a point cloud sequence representing a dynamic object. To sufficiently utilize spatiotemporal information from the point cloud sequence, we generate trajectories by using correlations among successive frames of the sequence instead of directly processing the point clouds. We propose a novel neural network architecture called PointRNN to learn feature representations of trajectories along with their part rigid motions. We evaluate our method on various tasks including motion part segmentation, motion axis prediction and motion range estimation. The results demon strate that our method outperforms previous techniques on both synthetic and real datasets. Moreover, our method has the ability to generalize to new and unseen objects. It is important to emphasize that it is not required to know any prior shape structure, prior shape category information or shape orientation. To the best of our knowledge, this is the first study on deep learning to extract part mobility from point cloud sequence of a dynamic object.
References
- [Bel] Belongie S.: Rodrigues' rotation formula. From MathWorld–A Wolfram Web Resource, created by Eric W. Weisstein. https://mathworld.wolfram.com/RodriguesRotationFormula.html.
- [BM92] Besl P. J., McKay N. D.: A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis Machine Intelligence 14, 2 (Feb. 1992), 239–256.
- [BPDG19] Behl A., Paschalidou D., Donne S., Geiger A.: Pointflownet: Learning representations for rigid motion estimation from point clouds. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019).
- [CGF09] Chen X., Golovinskiy A., Funkhouser T.: A benchmark for 3D mesh segmentation. ACM Transactions on Graphics 28, 3 (2009), 1–12.
- [CZ08] Chang W., Zwicker M.: Automatic registration for articulated shapes. Computer Graphics Forum 27, 5 (2008), 1459–1468.
- [FRA11] Fayad J., Russell C., Agapito L.: Automated articulated structure and 3d shape recovery from point correspondences. In 2011 International Conference on Computer Vision (November 2011), pp. 431–438.
10.1109/ICCV.2011.6126272 Google Scholar
- [HLRB13] Hermans T., Li F., Rehg J. M., Bobick A. F.: Learning contact locations for pushing and orienting unknown objects. In 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids) (October 2013), pp. 435–442.
10.1109/HUMANOIDS.2013.7030011 Google Scholar
- [HLVK*17] Hu R., Li W., Van Kaick O., Shamir A., Zhang H., Huang H.: Learning to predict part mobility from a single static snapshot. ACM Transactions on Graphics. 36, 6 (2017), 227:1–227:13.
- [HS97] Hochreiter S., Schmidhuber J.: Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
- [HSvK18] Hu R., Savva M., van Kaick O.: Functionality representations and applications for shape analysis. Computer Graphics Forum 37, 2 (2018), 603–624.
- [JSGC15] Jaimez M., Souiai M., Gonzalez-Jimenez J., Cremers D.: A primal-dual framework for real-time dense rgb-d scene flow. In 2015 IEEE International Conference on Robotics and Automation (ICRA) (May 2015), pp. 98–104.
10.1109/ICRA.2015.7138986 Google Scholar
- [KL17] Klokov R., Lempitsky V.: Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In The IEEE International Conference on Computer Vision (ICCV) (Oct 2017).
10.1109/ICCV.2017.99 Google Scholar
- [KLAK16] Kim Y., Lim H., Ahn S. C., Kim A.: Simultaneous segmentation, estimation and analysis of articulated motion from dense point cloud sequence. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (Oct 2016), pp. 1085–1092.
10.1109/IROS.2016.7759184 Google Scholar
- [LBS*18] Li Y., Bu R., Sun M., Wu W., Di X., Chen B.: Pointcnn: Convolution on x-transformed points. In Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 2018, pp. 820–830.
- [LQG19] Liu X., Qi C. R., Guibas L. J.: Flownet3d: Learning scene flow in 3d point clouds. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019).
- [LSZ*19] Liu M., Shi Y., Zheng L., Xu K., Huang H., Manocha D.: Recurrent 3d attentional networks for end-to-end active object recognition. Computational Visual Media 5, 01 (2019), 92–104.
10.1007/s41095-019-0135-2 Google Scholar
- [LWL*16] Li H., Wan G., Li H., Sharf A., Xu K., Chen B.: Mobility fitting using 4d ransac. Computer Graphics Forum 35, 5 (2016), 79–88.
- [LYB19] Liu X., Yan M., Bohg J.: Meteornet: deep learning on dynamic 3d point cloud sequences. In The IEEE International Conference on Computer Vision (ICCV) (October 2019).
- [MMEB18] Martln-Martln R., Eppner C., Brock O.: The rbo dataset of articulated objects and interactions, 2018.
- [MS09] Myronenko A., Song X. B.: On the closed-form solution of the rotation matrix arising in computer vision problems. CoRR abs/0904.1613 (2009).
- [MS15] Maturana D., Scherer S.: 3d convolutional neural networks for landing zone detection from lidar. In 2015 IEEE International Conference on Robotics and Automation (ICRA) (May 2015), pp. 3471–3478.
10.1109/ICRA.2015.7139679 Google Scholar
- [MTFA15] Myers A., Teo C. L., Fermller C., Aloimonos Y.: Affordance detection of tool parts from geometric features. In 2015 IEEE International Conference on Robotics and Automation (ICRA) (May 2015), pp. 1374–1381.
10.1109/ICRA.2015.7139369 Google Scholar
- [MYY*13] Mitra N. J., Yang Y.-L., Yan D.-M., Li W., Agrawala M.: Illustrating how mechanical assemblies work. Communication of ACM 56, 1 (January 2013), 106–114.
- [MZC*19] Mo K., Zhu S., Chang A. X., Yi L., Tripathi S., Guibas L. J., Su H.: PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019).
- [PB11] Papazov C., Burschka D.: Deformable 3d shape registration based on local similarity transforms. Computer Graphics Forum 30, 5 (2011), 1493–1502.
- [PMR*20] Pais G., Miraldo P., Ramalingam S., Govindu V., Nascimento J., Chellappa R.: 3dregnet: a deep neural network for 3d point registration. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 7191–7201.
10.1109/CVPR42600.2020.00722 Google Scholar
- [QYSG17] Qi C. R., Yi L., Su H., Guibas L. J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 2017, pp. 5099–5108.
- [RLA*19] Roberts R., Lewis J. P., Anjyo K., Seo J., Seol Y.: Optimal and interactive keyframe selection for motion capture. Computational Visual Media 5, 02 (2019), 172–191.
10.1007/s41095-019-0138-z Google Scholar
- [SA07] Sorkine O., Alexa M.: As-rigid-as-possible surface modeling, 2007.
- [SMW06] Schaefer S., McPhail T., Warren J.: Image deformation using moving least squares. ACM Transactions on Graphics 25 (07 2006), 533–540.
- [SSTN18] Suwajanakorn S., Snavely N., Tompson J. J., Norouzi M.: Discovery of latent 3d keypoints via end-to-end geometric reasoning. In Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 2018, pp. 2059–2070.
- [VRS14] Vogel C., Roth S., Schindler K.: View-consistent 3d scene flow estimation over multiple frames. In Computer Vision – ECCV 2014 (Cham, 2014), Springer International Publishing, pp. 263–278.
10.1007/978-3-319-10593-2_18 Google Scholar
- [WZS*19] Wang X., Zhou B., Shi Y., Chen X., Zhao Q., Xu K.: Shape2motion: Joint analysis of motion parts and attributes from 3d shapes. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019).
- [XQM*20] Xiang F., Qin Y., Mo K., Xia Y., Zhu H., Liu F., Liu M., Jiang H., Yuan Y., Wang H., Yi L., Chang A. X., Guibas L. J., Su H.: SAPIEN: A simulated part-based interactive environment. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020).
- [YHL*18] Yi L., Huang H., Liu D., Kalogerakis E., Su H., Guibas L.: Deep part induction from articulated object pairs. ACM Transactions on Graphics 37, 6 (December 2018), 209:1–209:15.
- [YHY*19] Yan Z., Hu R., Yan X., Chen L., van Kaick O., Zhang H., Huang H.: Rpm-net: Recurrent prediction of motion and parts from point cloud. ACM Transactions on Graphics (Proceedings of SIGGRAPH ASIA 2019) 38, 6 (2019), 240:1–240:15.
- [YKC*16] Yi L., Kim V. G., Ceylan D., Shen I.-C., Yan M., Su H., Lu C., Huang Q., Sheffer A., Guibas L.: A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics (ToG) 35, 6 (2016), 1–12.
- [YLX*16] Yuan Q., Li G., Xu K., Chen X., Huang H.: Space-time co-segmentation of articulated point cloud sequences. Computer Graphics Forum 35, 2 (2016), 419–429.
- [YP06] Yan J., Pollefeys M.: A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In Computer Vision – ECCV 2006 (Berlin, Heidelberg, 2006), Springer Berlin Heidelberg, pp. 94–106.
10.1007/11744085_8 Google Scholar
- [YX16] Yan Z., Xiang X.: Scene flow estimation: a survey. arXiv preprint arXiv:1612.02590 (2016).