Distilled Collections from Textual Image Queries
Corresponding Author
Yunhai Wang
Shenzhen VisuCA Key Lab/SIAT
Memorial University
Corresponding author: Yunhai Wang ([email protected])Search for more papers by this authorCorresponding Author
Yunhai Wang
Shenzhen VisuCA Key Lab/SIAT
Memorial University
Corresponding author: Yunhai Wang ([email protected])Search for more papers by this authorAbstract
We present a distillation algorithm which operates on a large, unstructured, and noisy collection of internet images returned from an online object query. We introduce the notion of a distilled set, which is a clean, coherent, and structured subset of inlier images. In addition, the object of interest is properly segmented out throughout the distilled set. Our approach is unsupervised, built on a novel clustering scheme, and solves the distillation and object segmentation problems simultaneously. In essence, instead of distilling the collection of images, we distill a collection of loosely cutout foreground “shapes”, which may or may not contain the queried object. Our key observation, which motivated our clustering scheme, is that outlier shapes are expected to be random in nature, whereas, inlier shapes, which do tightly enclose the object of interest, tend to be well supported by similar shapes captured in similar views. We analyze the commonalities among candidate foreground segments, without aiming to analyze their semantics, but simply by clustering similar shapes and considering only the most significant clusters representing non-trivial shapes. We show that when tuned conservatively, our distillation algorithm is able to extract a near perfect subset of true inliers. Furthermore, we show that our technique scales well in the sense that the precision rate remains high, as the collection grows. We demonstrate the utility of our distillation results with a number of interesting graphics applications.
Supporting Information
Filename | Description |
---|---|
cgf12547-sup-0001-S1.zip11.3 MB | Supporting Information |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- ALexe B., DEselaers T., FErrari V.: Classcut for unsupervised class segmentation. In Proc. Euro. Conf. on Comp. Vis. (2010), Springer, pp. 380–393. 3
- ALexe B., DEselaers T., FErrari V.: Measuring the objectness of image windows. IEEE Trans. Pat. Ana. & Mach. Int. 34, 11 (2012), 2189–2202. 4
- BArnard K., DUygulu P., FOrsyth D., DE FReitas N., BLei D.M., JOrdan M.I.: Matching words and pictures. The Journal of Machine Learning Research 3 (2003), 1107–1135. 3
- BElongie S., MAlik J., PUzicha J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24, 4 (2002), 509–522. 4
- CArneiro G., CHan A.B., MOreno P.J., Vasconcelos N.: Supervised learning of semantic classes for image annotation and retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on 29, 3 (2007), 394–410. 3
- Chen T., Cheng M.-M., Tan P., Shamir A., Hu S.-M.: Sketch2photo: internet image montage. ACM Trans. Graph. (SIGGRAPH Asia) 28, 5 (2009), 124. 1, 3, 10
- Cao L., Fei-Fei L.: Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In Proc. Int. Conf. on Comp. Vis. (2007), IEEE, pp. 1–8. 3
- Chapelle O., Haffner P., Vapnik V.N.: Support vector machines for histogram-based image classification. Neural Networks, IEEE Transactions on 10, 5 (1999), 1055–1064. 10
- Coifman R.R., Lafon S.: Diffusion maps. Applied and computational harmonic analysis 21, 1 (2006), 5–30. 8
- Chen X., Shrivastava A., Gupta A.: Enriching visual knowledge bases via object discovery and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (March 2014). 3
- Chen Y., Wang J.Z.: Image categorization by learning and reasoning with regions. The Journal of Machine Learning Research 5 (2004), 913–939. 3
- Duygulu P., Barnard K., DE Freitas J.F., Forsyth D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Computer VisionalsECCV 2002 (2002), Springer, pp. 97–112.
- DollÁR P., Zitnick C.L.: Structured forests for fast edge detection. In Proc. Int. Conf. on Comp. Vis. (2013), IEEE. 4
- Eitz M., Hildebrand K., Boubekeur T., Alexa M.: Photosketch: a sketch based image query and compositing system. In ACM SIGGRAPH - Talk Program (2009). 1
- Faktor A., Irani M.: Co-segmentation by composition. In Proc. Int. Conf. on Comp. Vis. (2013), IEEE, pp. 1297–1304. 5
- Feixas M., Sbert M., Gonzáalez F.: A unified information-theoretic framework for viewpoint selection and mesh saliency. ACM Transactions on Applied Perception (TAP) 6, 1 (2009), 1. 9
- Hall P.M., Owen M.: Simple canonical views. In BMVC (2005). 8
- Hochbaum D.S., Singh V.: An efficient algorithm for co-segmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2009), IEEE, pp. 269–276. 3
- Joulin A., Bach F., Ponce J.: Multi-class cosegmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2012), IEEE, pp. 542–549. 3
- Jeon J., Lavrenko V., Manmatha R.: Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information*** retrieval (2003), ACM, pp. 119–126. 3
- Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., Darrell T.: Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (2014), ACM, pp. 675–678. 3
10.1145/2647868.2654889 Google Scholar
- Krizhevsky A., Sutskever I., Hinton G.E.: Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012), pp. 1097–1105. 3
- Kim G., Xing E.P.: On multiple foreground cosegmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2012), IEEE, pp. 837–844. 3
- Laurentini A.: The visual hull concept for silhouette-based image understanding. IEEE Trans. Pat. Ana. & Mach. Int. 16, 2 (1994), 150–162. 8
- LI L.-J., Fei-Fei L.: What, where and who? classifying events by scene and object recognition. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on (2007), IEEE, pp. 1–8. 3
- Lee Y.J., Grauman K.: Shape discovery from unlabeled image collections. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2009), IEEE, pp. 2254–2261. 3
- Lin S., Hanrahan P.: Modeling how people extract color themes from images. In Proceedings of the 2013 ACM annual conference on Human factors in computing systems (2013), ACM, pp. 3101–3110. 9
- Ling H., Jacobs D.W.: Shape classification using the inner-distance. IEEE Trans. Pat. Ana. & Mach. Int. 29, 2 (2007), 286–299. 4
- LI L.-J., Socher R., Fei-Fei L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (2009), IEEE, pp. 2036–2043. 3
- Liu X., Wan L., QU Y., Wong T.-T., Lin S., Leung C.-S., Heng P.-A.: Intrinsic colorization. ACM Trans. Graph. (SIGGRAPH Asia) 27, 5 (2008), 152. 1
- Maier M., Hein M., Von Luxburg U.: Cluster identification in nearest-neighbor graphs. In Algorithmic Learning Theory (2007), Springer, pp. 196–210. 5
10.1007/978-3-540-75225-7_18 Google Scholar
- Mukherjee L., Singh V., Dyer C. R.: Half-integrality based algorithms for cosegmentation of images. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2009), IEEE, pp. 2028–2035. 3
- O'Donovan P., Agarwala A., Hertzmann A.: Color compatibility from large datasets. In ACM Transactions on Graphics (TOG) (2011), vol. 30, ACM, p. 63. 9
- Oliva A., Torralba A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision 42, 3 (2001), 145–175. 3
- Page D.L., Koschan A., Sukumar S.R., Roui-Abidi B., Abidi M.A.: Shape analysis algorithm based on information theory. In Proc. IEEE Conf. on Image Processing (2003), vol. 1, pp. 229–232. 5
- Payet N., Todorovic S.: From a set of shapes to object discovery. In Proc. Euro. Conf. on Comp. Vis. (2010), Springer, pp. 57–70. 3
- Rivers A., Durand F., Igarashi T.: 3d modeling with silhouettes. ACM Trans. Graph. (SIGGRAPH Asia) 29, 4 (2010). 8
- Rubinstein M., Joulin A., Kopf J., Liu C.: Unsupervised joint object discovery and segmentation in internet images. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2013), IEEE, pp. 1939–1946. 2, 3, 4, 5, 7
- Rother C., Kolmogorov V., Blake A.: “grabcut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (SIGGRAPH) 23, 3 (2004), 309–314. 4
- Russakovsky O., Lin Y., YU K., Fei-Fei L.: Object-centric spatial pooling for image classification. In Computer Vision-ECCV 2012 (2012), Springer, pp. 1–15. 3
- Rother C., Minka T., Blake A., Kolmogorov V.: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2006), vol. 1, pp. 993–1000. 3
- Secord A., LU J., Finkelstein A., Singh M., Nealen A.: Perceptual models of viewpoint preference. ACM Transactions on Graphics (TOG) 30, 5 (2011), 109. 9
- Snavely N., Seitz S.M., Szeliski R.: Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. (SIGGRAPH) 25, 3 (2006), 835–846. 1
- Simonyan K., Vedaldi A., Zisserman A.: Deep fisher networks for large-scale image classification. In Advances in neural information processing systems (2013), pp. 163–171. 3
- Troje N.F., Bülthoff H. H.: How is bilateral symmetry of human faces used for recognition of novel views? Vision Research 38, 1 (1998), 79–89. 8
- Tuytelaars T., Lampert C.H., Blaschko M.B., Buntine W.: Unsupervised object discovery: A comparison. Int. J. Comp. Vis. 88, 2 (2010), 284–302. 3
- Von Ahn L., Blum M., Hopper N.J., Langford J.: Captcha: Using hard ai problems for security. In Advances in Cryptology?aEUROCRYPT 2003 (2003), Springer, pp. 294–311. 10
- Vicente S., Carreira J., Agapito L., Batista J.: Reconstructing pascal voc. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 41–48. 7
- Vailaya A., Figueiredo M.A., Jain A.K., Zhang H.-J.: Image classification for content-based indexing. Image Processing, IEEE Transactions on 10, 1 (2001), 117–130. 3
- Viola P., Jones M.: Robust real-time object detection. International Journal of Computer Vision 4 (2001), 34–47. 10
- Vicente S., Rother C., Kolmogorov V.: Object cosegmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2011), IEEE, pp. 2217–2224. 3
- Wang C., Blei D., LI F.-F.: Simultaneous image classification and annotation. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (2009), IEEE, pp. 1903–1910. 3
10.1109/CVPR.2009.5206800 Google Scholar
- Weston J., Bengio S., Usunier N.: Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning 81, 1 (2010), 21–35. 3
- Wang F., Huang Q., Guibas L.J.: Image co-segmentation via consistent functional maps. In Proc. Int. Conf. on Comp. Vis. (2013), IEEE, pp. 849–856. 5
- Wang W., Song Y., Zhang A.: Semantics-based image retrieval by region saliency. In Image and Video Retrieval (2002), Springer, pp. 29–37. 5
10.1007/3-540-45479-9_4 Google Scholar
- Yang C., Dong M., Hua J.: Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (2006), vol. 2, IEEE, pp. 2057–2063. 3
- Zhang C., Gao J., Wang O., Georgel P., Yang R., Davis J., Frahm J., Pollefeys M.: Personal photo enhancement using internet photo collections. IEEE Trans. Vis. & Comp. Graphics (2013). 1
- Zhou Z.-H., Zhang M.-L.: Multi-instance multi-label learning with application to scene classification. In Advances in Neural Information Processing Systems (2006), pp. 1609–1616. 3