Online visual tracking via cross-similarity-based siamese network
Luyao Wang
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
Search for more papers by this authorCorresponding Author
Huchuan Lu
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
Huchuan Lu, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China.
Email: [email protected]
Search for more papers by this authorPingping Zhang
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
Search for more papers by this authorLuyao Wang
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
Search for more papers by this authorCorresponding Author
Huchuan Lu
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
Huchuan Lu, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China.
Email: [email protected]
Search for more papers by this authorPingping Zhang
Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
Search for more papers by this authorSummary
Among deep-learning-based trackers, the siamese-based method inspires many researchers due to its effectiveness and simplicity. However, the traditional siamese tracker has not achieved satisfactory performance due to the limited representation ability and the lack of appropriate model update strategy. To cover the shortage of siamese models, we proposed a cross-similarity-based siamese network with three contributions. First, we introduce a novel cross similarity module into the SiameseFC framework, which could improve the matching ability of fully convolutional networks during the tracking process. Second, we propose a novel attention weighting layer to emphasize various contributions of matching scores in different positions. This adaptive attention weighting scheme makes our tracker well adapt to the appearance change caused by pose variation, partial occlusion, and so on. Third, we develop a simple yet effective model update strategy, which exploits an independent classification model to invoke the model fine-tuning process. Experimental results on the standard tracking benchmark show that our tracker performs much better than the baseline SiameseFC method and also achieves promising results in comparisons to other state-of-the-art algorithms.
REFERENCES
- 1Lu H, Li Y, Chen M, Kim H, Serikawa S. Brain intelligence: go beyond artificial intelligence. Mob Netw Appl. 2018; 23(2): 368-375.
- 2Lu H, Wang D, Li Y, et al. CONet: a cognitive ocean network. CoRR. 2019. arXiv preprint arXiv:1901.06253.
- 3Lu H, Li Y, Uemura T, Kim H, Serikawa S. Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Gener Comput Syst. 2018; 82: 142-148.
- 4Lu H, Uemura T, Wang D, Zhu J, Huang Z, Kim H. Deep-sea organisms tracking using dehazing and deep learning. Mob Netw Appl. 2018.
- 5Henriques JF, Caseiro R, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell. 2015; 37(3): 583-596.
- 6Danelljan M, Häger G, Khan FS, Felsberg M. Learning spatially regularized correlation filters for visual tracking. Paper presented at: 2015 IEEE International Conference on Computer Vision; 2015; Santiago, Chile.
- 7Danelljan M, Bhat G, Khan FS, Felsberg M. ECO: efficient convolution operators for tracking. Paper presented at: 2017 IEEE Conference on Computer Vision and Pattern Recognition; 2017; Honolulu, HI.
- 8Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. Paper presented at: 2016 IEEE Conference on Computer Vision and Pattern Recognition; 2016; Las Vegas, NV.
- 9Sun C, Wang D, Lu H, Yang M. Learning spatial-aware regressions for visual tracking. Paper presented at: 2018 IEEE Conference on Computer Vision and Pattern Recognition; 2018; Salt Lake City, UT.
- 10Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015; 115(3): 211-252.
- 11Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M. Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell. 2014; 36(7): 1442-1468.
- 12Kristan M, Matas J, Leonardis A, et al. The visual object tracking VOT2015 challenge results. Paper presented at: 2015 IEEE International Conference on Computer Vision Workshop; 2015; Santiago, Chile.
- 13Tao R, Gavves E, Smeulders AWM. Siamese instance search for tracking. Paper presented at: 2016 IEEE Conference on Computer Vision and Pattern Recognition; 2016; Las Vegas, NV.
- 14Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS. Fully-convolutional siamese networks for object tracking. In: Computer Vision - ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II. Cham, Switzerland: Springer International Publishing; 2016.
10.1007/978-3-319-48881-3_56 Google Scholar
- 15Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S. Learning attentions: residual attentional siamese network for high performance online visual tracking. Paper presented at: 2018 IEEE Conference on Computer Vision and Pattern Recognition; 2018; Salt Lake City, UT.
- 16Wang Q, Gao J, Xing J, Zhang M, Hu W. DCFNet: discriminant correlation filters network for visual tracking. CoRR. 2017. arXiv preprint arXiv:1704.04057.
- 17Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S. Learning dynamic siamese network for visual object tracking. Paper presented at: IEEE International Conference on Computer Vision; 2017; Venice, Italy.
- 18Zhu Z, Wu W, Zou W, Yan J. End-to-end flow correlation tracking with spatial-temporal attention. Paper presented at: 2018 IEEE Conference on Computer Vision and Pattern Recognition; 2018; Salt Lake City, UT.
- 19Dosovitskiy A, Fischer P, Ilg E, et al. FlowNet: learning optical flow with convolutional networks. Paper presented at: 2015 IEEE International Conference on Computer Vision; 2015; Santiago, Chile.
- 20He A, Luo C, Tian X, Zeng W. A twofold siamese network for real-time object tracking. Paper presented at: 2018 IEEE Conference on Computer Vision and Pattern Recognition; 2018; Salt Lake City, UT.
- 21Li B, Yan J, Wu W, Zhu Z, Hu X. High performance visual tracking with siamese region proposal network. Paper presented at: 2018 IEEE Conference on Computer Vision and Pattern Recognition; 2018; Salt Lake City, UT.
- 22Wang N, Yeung D. Learning a deep compact image representation for visual tracking. Paper presented at: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems; 2013; Lake Tahoe, NV.
- 23Hong S, You T, Kwak S, Han B. Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of the 32nd International Conference on Machine Learning; 2015; Lille, France.
- 24Danelljan M, Häger G, Khan FS, Felsberg M. Convolutional features for correlation filter based visual tracking. Paper presented at: 2015 IEEE International Conference on Computer Vision Workshop; 2015; Santiago, Chile.
- 25Ma C, Huang J, Yang X, Yang M. Hierarchical convolutional features for visual tracking. Paper presented at: 2015 IEEE International Conference on Computer Vision; 2015; Santiago, Chile.
- 26Danelljan M, Robinson A, Khan FS, Felsberg M. Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Computer Vision - ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V. Cham, Switzerland: Springer International Publishing; 2016.
10.1007/978-3-319-46454-1_29 Google Scholar
- 27Wang L, Ouyang W, Wang X, Lu H. Visual tracking with fully convolutional networks. Paper presented at: 2015 IEEE International Conference on Computer Vision; 2015; Santiago, Chile.
- 28Wang L, Ouyang W, Wang X, Lu H. STCT: sequentially training convolutional networks for visual tracking. Paper presented at: 2016 IEEE Conference on Computer Vision and Pattern Recognition; 2016; Las Vegas, NV.
- 29Han B, Sim J, Adam H. Branchout: regularization for online ensemble tracking with convolutional neural networks. Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition; 2017; Honolulu, HI.
- 30Fan H, Ling H. SANet: structure-aware network for visual tracking. Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2017; Honolulu, HI.
- 31Song Y, Ma C, Gong L, Zhang J, Lau RWH, Yang M. CREST: convolutional residual learning for visual tracking. Paper presented at: IEEE International Conference on Computer Vision; 2017; Venice, Italy.
- 32Huang C, Lucey S, Ramanan D. Learning policies for adaptive tracking with deep feature cascades. Paper presented at: IEEE International Conference on Computer Vision; 2017; Venice, Italy.
- 33Zhu Z, Huang G, Zou W, Du D, Huang C. UCT: learning unified convolutional networks for real-time visual tracking. Paper presented at: IEEE International Conference on Computer Vision Workshops; 2017; Venice, Italy.
- 34Held D, Thrun S, Savarese S. Learning to track at 100 FPS with deep regression networks. In: Computer Vision - ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I. Cham, Switzerland: Springer International Publishing; 2016.
10.1007/978-3-319-46448-0_45 Google Scholar
- 35Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS. End-to-end representation learning for correlation filter based tracking. Paper presented at: 2017 IEEE Conference on Computer Vision and Pattern Recognition; 2017; Honolulu, HI.
- 36Wu Y, Lim J, Yang M. Online object tracking: a benchmark. Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition; 2013; Portland, OR.
- 37Wu Y, Lim J, Yang M. Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell. 2015; 37(9): 1834-1848.
- 38Vedaldi A, Lenc K. MatConvNet: convolutional neural network for MATLAB. In: Proceedings of the 23rd ACM International Conference on Multimedia; 2015; Brisbane, Australia.
- 39Zhang J, Ma S, Sclaroff S. MEEM: robust tracking via multiple experts using entropy minimization. In: Computer Vision - ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI. Cham, Switzerland: Springer International Publishing; 2014.
10.1007/978-3-319-10599-4_13 Google Scholar
- 40Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PHS. Staple: complementary learners for real-time tracking. Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition; 2016; Las Vegas, NV.
- 41Danelljan M, Häger G, Khan FS, Felsberg M. Accurate scale estimation for robust visual tracking. Paper presented at: British Machine Vision Conference; 2014; Nottingham, UK.
- 42Ma C, Yang X, Zhang C, Yang M. Long-term correlation tracking. Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition; 2015; Boston, MA.
- 43Li Y, Zhu J. A scale adaptive kernel correlation filter tracker with feature integration. In: Computer Vision - ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part II. Cham, Switzerland: Springer International Publishing; 2014.