We present a distillation algorithm which operates on a large, unstructured, and noisy collection of internet images returned from an online object query. We introduce the notion of a distilled set, which is a clean, coherent, and structured subset of inlier images. In addition, the object of interest is properly segmented out throughout the distilled set. Our approach is unsupervised, built on a novel clustering scheme, and solves the distillation and object segmentation problems simultaneously. In essence, instead of distilling the collection of images, we distill a collection of loosely cutout foreground “shapes”, which may or may not contain the queried object. Our key observation, which motivated our clustering scheme, is that outlier shapes are expected to be random in nature, whereas, inlier shapes, which do tightly enclose the object of interest, tend to be well supported by similar shapes captured in similar views. We analyze the commonalities among candidate foreground segments, without aiming to analyze their semantics, but simply by clustering similar shapes and considering only the most significant clusters representing non-trivial shapes. We show that when tuned conservatively, our distillation algorithm is able to extract a near perfect subset of true inliers. Furthermore, we show that our technique scales well in the sense that the precision rate remains high, as the collection grows. We demonstrate the utility of our distillation results with a number of interesting graphics applications.

Supporting Information

References

ALexe B., DEselaers T., FErrari V.: Classcut for unsupervised class segmentation. In Proc. Euro. Conf. on Comp. Vis. (2010), Springer, pp. 380–393. 3
Google Scholar
ALexe B., DEselaers T., FErrari V.: Measuring the objectness of image windows. IEEE Trans. Pat. Ana. & Mach. Int. 34, 11 (2012), 2189–2202. 4
10.1109/TPAMI.2012.28
Web of Science® Google Scholar
BArnard K., DUygulu P., FOrsyth D., DE FReitas N., BLei D.M., JOrdan M.I.: Matching words and pictures. The Journal of Machine Learning Research 3 (2003), 1107–1135. 3
10.1162/153244303322533214
Web of Science® Google Scholar
BElongie S., MAlik J., PUzicha J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24, 4 (2002), 509–522. 4
10.1109/34.993558
Web of Science® Google Scholar
CArneiro G., CHan A.B., MOreno P.J., Vasconcelos N.: Supervised learning of semantic classes for image annotation and retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on 29, 3 (2007), 394–410. 3
10.1109/TPAMI.2007.61
PubMed Web of Science® Google Scholar
Chen T., Cheng M.-M., Tan P., Shamir A., Hu S.-M.: Sketch2photo: internet image montage. ACM Trans. Graph. (SIGGRAPH Asia) 28, 5 (2009), 124. 1, 3, 10
10.1145/1618452.1618470
PubMed Web of Science® Google Scholar
Cao L., Fei-Fei L.: Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In Proc. Int. Conf. on Comp. Vis. (2007), IEEE, pp. 1–8. 3
Google Scholar
Chapelle O., Haffner P., Vapnik V.N.: Support vector machines for histogram-based image classification. Neural Networks, IEEE Transactions on 10, 5 (1999), 1055–1064. 10
10.1109/72.788646
CAS PubMed Web of Science® Google Scholar
Coifman R.R., Lafon S.: Diffusion maps. Applied and computational harmonic analysis 21, 1 (2006), 5–30. 8
10.1016/j.acha.2006.04.006
Web of Science® Google Scholar
Chen X., Shrivastava A., Gupta A.: Enriching visual knowledge bases via object discovery and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (March 2014). 3
Google Scholar
Chen Y., Wang J.Z.: Image categorization by learning and reasoning with regions. The Journal of Machine Learning Research 5 (2004), 913–939. 3
Web of Science® Google Scholar
Duygulu P., Barnard K., DE Freitas J.F., Forsyth D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Computer VisionalsECCV 2002 (2002), Springer, pp. 97–112.
Google Scholar
DollÁR P., Zitnick C.L.: Structured forests for fast edge detection. In Proc. Int. Conf. on Comp. Vis. (2013), IEEE. 4
Google Scholar
Eitz M., Hildebrand K., Boubekeur T., Alexa M.: Photosketch: a sketch based image query and compositing system. In ACM SIGGRAPH - Talk Program (2009). 1
Google Scholar
Faktor A., Irani M.: Co-segmentation by composition. In Proc. Int. Conf. on Comp. Vis. (2013), IEEE, pp. 1297–1304. 5
Google Scholar
Feixas M., Sbert M., Gonzáalez F.: A unified information-theoretic framework for viewpoint selection and mesh saliency. ACM Transactions on Applied Perception (TAP) 6, 1 (2009), 1. 9
10.1145/1462055.1462056
Web of Science® Google Scholar
Hall P.M., Owen M.: Simple canonical views. In BMVC (2005). 8
Google Scholar
Hochbaum D.S., Singh V.: An efficient algorithm for co-segmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2009), IEEE, pp. 269–276. 3
Google Scholar
Joulin A., Bach F., Ponce J.: Multi-class cosegmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2012), IEEE, pp. 542–549. 3
Google Scholar
Jeon J., Lavrenko V., Manmatha R.: Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information*** retrieval (2003), ACM, pp. 119–126. 3
Google Scholar
Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., Darrell T.: Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (2014), ACM, pp. 675–678. 3
10.1145/2647868.2654889
Google Scholar
Krizhevsky A., Sutskever I., Hinton G.E.: Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012), pp. 1097–1105. 3
Google Scholar
Kim G., Xing E.P.: On multiple foreground cosegmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2012), IEEE, pp. 837–844. 3
Google Scholar
Laurentini A.: The visual hull concept for silhouette-based image understanding. IEEE Trans. Pat. Ana. & Mach. Int. 16, 2 (1994), 150–162. 8
10.1109/34.273735
Web of Science® Google Scholar
LI L.-J., Fei-Fei L.: What, where and who? classifying events by scene and object recognition. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on (2007), IEEE, pp. 1–8. 3
Google Scholar
Lee Y.J., Grauman K.: Shape discovery from unlabeled image collections. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2009), IEEE, pp. 2254–2261. 3
Google Scholar
Lin S., Hanrahan P.: Modeling how people extract color themes from images. In Proceedings of the 2013 ACM annual conference on Human factors in computing systems (2013), ACM, pp. 3101–3110. 9
Google Scholar
Ling H., Jacobs D.W.: Shape classification using the inner-distance. IEEE Trans. Pat. Ana. & Mach. Int. 29, 2 (2007), 286–299. 4
10.1109/TPAMI.2007.41
PubMed Web of Science® Google Scholar
LI L.-J., Socher R., Fei-Fei L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (2009), IEEE, pp. 2036–2043. 3
Google Scholar
Liu X., Wan L., QU Y., Wong T.-T., Lin S., Leung C.-S., Heng P.-A.: Intrinsic colorization. ACM Trans. Graph. (SIGGRAPH Asia) 27, 5 (2008), 152. 1
Google Scholar
Maier M., Hein M., Von Luxburg U.: Cluster identification in nearest-neighbor graphs. In Algorithmic Learning Theory (2007), Springer, pp. 196–210. 5
10.1007/978-3-540-75225-7_18
Google Scholar
Mukherjee L., Singh V., Dyer C. R.: Half-integrality based algorithms for cosegmentation of images. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2009), IEEE, pp. 2028–2035. 3
Google Scholar
O'Donovan P., Agarwala A., Hertzmann A.: Color compatibility from large datasets. In ACM Transactions on Graphics (TOG) (2011), vol. 30, ACM, p. 63. 9
Google Scholar
Oliva A., Torralba A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision 42, 3 (2001), 145–175. 3
10.1023/A:1011139631724
Web of Science® Google Scholar
Page D.L., Koschan A., Sukumar S.R., Roui-Abidi B., Abidi M.A.: Shape analysis algorithm based on information theory. In Proc. IEEE Conf. on Image Processing (2003), vol. 1, pp. 229–232. 5
10.1109/ICIP.2003.1246940
Web of Science® Google Scholar
Payet N., Todorovic S.: From a set of shapes to object discovery. In Proc. Euro. Conf. on Comp. Vis. (2010), Springer, pp. 57–70. 3
Google Scholar
Rivers A., Durand F., Igarashi T.: 3d modeling with silhouettes. ACM Trans. Graph. (SIGGRAPH Asia) 29, 4 (2010). 8
Google Scholar
Rubinstein M., Joulin A., Kopf J., Liu C.: Unsupervised joint object discovery and segmentation in internet images. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2013), IEEE, pp. 1939–1946. 2, 3, 4, 5, 7
Google Scholar
Rother C., Kolmogorov V., Blake A.: “grabcut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (SIGGRAPH) 23, 3 (2004), 309–314. 4
10.1145/1015706.1015720
Web of Science® Google Scholar
Russakovsky O., Lin Y., YU K., Fei-Fei L.: Object-centric spatial pooling for image classification. In Computer Vision-ECCV 2012 (2012), Springer, pp. 1–15. 3
Google Scholar
Rother C., Minka T., Blake A., Kolmogorov V.: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2006), vol. 1, pp. 993–1000. 3
Google Scholar
Secord A., LU J., Finkelstein A., Singh M., Nealen A.: Perceptual models of viewpoint preference. ACM Transactions on Graphics (TOG) 30, 5 (2011), 109. 9
10.1145/2019627.2019628
Web of Science® Google Scholar
Snavely N., Seitz S.M., Szeliski R.: Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. (SIGGRAPH) 25, 3 (2006), 835–846. 1
10.1145/1141911.1141964
Web of Science® Google Scholar
Simonyan K., Vedaldi A., Zisserman A.: Deep fisher networks for large-scale image classification. In Advances in neural information processing systems (2013), pp. 163–171. 3
Google Scholar
Troje N.F., Bülthoff H. H.: How is bilateral symmetry of human faces used for recognition of novel views? Vision Research 38, 1 (1998), 79–89. 8
10.1016/S0042-6989(97)00165-X
CAS PubMed Web of Science® Google Scholar
Tuytelaars T., Lampert C.H., Blaschko M.B., Buntine W.: Unsupervised object discovery: A comparison. Int. J. Comp. Vis. 88, 2 (2010), 284–302. 3
10.1007/s11263-009-0271-8
Web of Science® Google Scholar
Von Ahn L., Blum M., Hopper N.J., Langford J.: Captcha: Using hard ai problems for security. In Advances in Cryptology?aEUROCRYPT 2003 (2003), Springer, pp. 294–311. 10
Google Scholar
Vicente S., Carreira J., Agapito L., Batista J.: Reconstructing pascal voc. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 41–48. 7
Google Scholar
Vailaya A., Figueiredo M.A., Jain A.K., Zhang H.-J.: Image classification for content-based indexing. Image Processing, IEEE Transactions on 10, 1 (2001), 117–130. 3
10.1109/83.892448
CAS PubMed Web of Science® Google Scholar
Viola P., Jones M.: Robust real-time object detection. International Journal of Computer Vision 4 (2001), 34–47. 10
Google Scholar
Vicente S., Rother C., Kolmogorov V.: Object cosegmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec. (2011), IEEE, pp. 2217–2224. 3
Google Scholar
Wang C., Blei D., LI F.-F.: Simultaneous image classification and annotation. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (2009), IEEE, pp. 1903–1910. 3
10.1109/CVPR.2009.5206800
Google Scholar
Weston J., Bengio S., Usunier N.: Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning 81, 1 (2010), 21–35. 3
10.1007/s10994-010-5198-3
Web of Science® Google Scholar
Wang F., Huang Q., Guibas L.J.: Image co-segmentation via consistent functional maps. In Proc. Int. Conf. on Comp. Vis. (2013), IEEE, pp. 849–856. 5
Google Scholar
Wang W., Song Y., Zhang A.: Semantics-based image retrieval by region saliency. In Image and Video Retrieval (2002), Springer, pp. 29–37. 5
10.1007/3-540-45479-9_4
Google Scholar
Yang C., Dong M., Hua J.: Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (2006), vol. 2, IEEE, pp. 2057–2063. 3
Google Scholar
Zhang C., Gao J., Wang O., Georgel P., Yang R., Davis J., Frahm J., Pollefeys M.: Personal photo enhancement using internet photo collections. IEEE Trans. Vis. & Comp. Graphics (2013). 1
Google Scholar
Zhou Z.-H., Zhang M.-L.: Multi-instance multi-label learning with application to scene classification. In Advances in Neural Information Processing Systems (2006), pp. 1609–1616. 3
Google Scholar

Citing Literature

Volume34, Issue2

May 2015

Pages 131-142

Distilled Collections from Textual Image Queries

Abstract

Supporting Information

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Distilled Collections from Textual Image Queries

Abstract

Supporting Information

References

Citing Literature

References

Related

Information