3D Content Creation

With the wide deployment of 3D TVs at home but the severe lack of 3D content on the market, 3D content creation techniques have attracted more and more attentions recently. This chapter demonstrates the whole process of 3D content creation, from modeling and representation, capturing, 2D-to-3D conversation, to 3D multi-view generation. It showcases three practical examples that are adopted in the industrial 3D creation process to provide a clear picture of how things work together in a real 3D creation system. The chapter showcases a real-time 3D capturing system with a monoscopic mobile phone. It focuses on the visual data management and semantic rules collection parts. The chapter discusses the topic of how to derive a multi-view 3D content from a stereo image pair, as the future glass-free displays require multi-view content.

Controlled Vocabulary Terms

mobile handsets; visual databases

References

E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision” in Computational Models of Visual Processing, Landy, and Movshon, Eds. MIT Press, Cambridge, 1991.
Google Scholar
L. McMillan, and G. Bishop, “Plenoptic modeling: an image-based rendering system” in Proc. SIGGRAPH 1995.
Google Scholar
M. Levoy and P. Hanrahan, “Light field rendering” in Proc. SIGGRAPH 1996.
Google Scholar
S. J. Gortler, R. Grzesczuk, R. Szeliski, and M. F. Cohen, “The Lumigraph” in Proc. SIGGRAPH 1996.
Google Scholar
H. Y. Shum and L. W. He, “Rending with concentric mosaics” in Proc. SIGGRAPH 1999.
Google Scholar
D. Scharstein and R. Szeliski, Middlebury Stereo Vision Page, www.middlebury.edu/steoreo, 2005.
Google Scholar
T. Jebara, A. Azarbayejani, and A. Pentland, “3D structure from 2D motion”, IEEE Signal Processing Magazine, vol. 16, no. 3, pp. 66–83, May 1999.
10.1109/79.768574
Web of Science® Google Scholar
S. B. Xu, “Qualitative depth from monoscopic cues” in Proc. Int. Conf. on Image Processing and its Applications, pp. 437–440, 1992.
Google Scholar
C. Weerasinghe, P. Ogunbona, and W. Li, “2D to pseudo-3D conversion of head and shoulder images using feature based parametric disparity maps” in Proc. International Conference on Image Processing, pp. 963–966, 2001.
Google Scholar
S. Battiato, S. Curti, M. L. Cascia, M. Tortora, and E. Scordato, “Depth Map Generation by Image Classification” in Proc. SPIE, vol. 5302, pp. 95–104, April 2004.
Web of Science® Google Scholar
C. Choi, B. Kwon, and M.Choi, “A real-time field-sequential stereoscopic image converter”, IEEE Trans. Consumer Electronics, vol. 50, no. 3, pp. 903–910, August 2004.
10.1109/TCE.2004.1341698
Web of Science® Google Scholar
S. Sethuraman and M. W. Siegel, “The video Z-buffer: a concept for facilitating monoscopic image compression by exploiting the 3D stereoscopic depth map” in Proc. SMPTE International Workshop on HDTV'96, pp. 8–9, Los Angeles, 1996.
Google Scholar
K. Kim, M. Siegel, and J. Y. Son, “Synthesis of a high-resolution 3D-stereoscopic image pair from a high-resolution monoscopic image and a low-resolution depth map” in Proc. SPIE/IS&T Conference, vol. 3295A, pp. 76–86, January, 1998.
Google Scholar
H. Wang, H. Li, and S. Manjunath, “ Real-time Capturing and Generating Stereo Images and Videos with a Monoscopic Low Power Mobile Device”, US Patent, 2012.
Google Scholar
S. Curti, D. Sirtori, and F. Vella, “3D effect generation from monocular view” in Proc. First International Symposium on 3D Data Processing Visualization and Transmission (3DPVT 2002), 2002.
Google Scholar
P. Kozankiewicz, “Fast algorithm for creating image-based stereo images” in Proc. 10th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen-Bory, Czech Republic, 2002.
Google Scholar
S. Battiato, A. Capra, S. Curti, and M. L. Cascia, “3D stereoscopic image pairs by depth-map generation” in Proc. 2nd International Sysmposium on 3D Data Processing, Visualization and Transmission, pp. 124–131, 2004.
Google Scholar
A. P. Pentland, “Depth of Scene from Depth of Field”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 4, pp. 523–531.
Google Scholar
G. Kang, C. Gan, and W. Ren, “Shape from Shading based on Finite-Element” in Proc. International Conference on Machine Learing and Cybernetics, vol. 8, pp. 5165–5169.
Google Scholar
Y. Feng, J. Jayaseelan, and J. Jiang, “Cue Based Disparity Estimation for Possible 2D-to-3D Video Conversion” in Proc. VIE'06, 2006.
Google Scholar
J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Texton Boost for Image Understanding: Multi-class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context”, International Journal of Computer Vision, vol. 81, no. 1, January 2009.
10.1007/s11263-007-0109-1
PubMed Web of Science® Google Scholar
M. Petkovic, W. Jonker, Z. Zivkovic, “Recognizing strokes in tennis videos using hidden Markov models”, in Proc. IASTED International Conference on Visualization, Imaging and Image Processing, Marbella, Spain, 2001.
Google Scholar
L. Xie, S-F. Chang, A. Divakaran, and H. Sun, “Structure analysis of soccer video with hidden Markov models”, In Proc. International Conference on Acoustic, Speech, and Signal Processing (ICASSP), 2002.
Google Scholar
P. Xu, L. Xie, S-F. Chang, A. Divakaran, A. Vetro, and H. Sun, “Algorithms and system for segmentation and structure analysis in soccer video”, In Proc. International Conference on Multimedia and Exposition (ICME), Tokyo, August 2001.
Google Scholar
N. Vasconcelos and A. Lippman, “Statistical models of video structure for content analysis and characterization”, IEEE Trans. Image Processing, vol. 9, no. 1, January 2000.
10.1109/83.817595
CAS Web of Science® Google Scholar
S. Lee and M. H. Hayes, “A fast clustering algorithm for video abstraction” in Proc. International Conference on Image Processing, vol. II, pp. 563–566, Sept. 2003.
Google Scholar
Y. Zhuang, Y. Rui, T. S. Huang, and S. Mehrotra, “Adaptive key frame extraction using unsupervised clustering” in Proc. International Conference on Image Processing, Chicago, pp. 866–870, October 1998.
Google Scholar
A. Hanjalic and H. Zhang, “An integrated scheme for automatic video abstraction based on unsupervised cluster-validity analysis”, IEEE Trans. Circuits and Systems for Video Technology, vol. 9, December 1999.
Google Scholar
S. Lu, I. King, and M. R. Lyu, “Video summarization by video structure analysis and graph optimization” in Proc. IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, June 2004.
Google Scholar
C. O. Yun, S. H. Han, T. S. Yun, and D. H. Lee, “Development of Stereoscopic Image Editing Tool using Image-based Modeling” in Proc. CGVR'06, Las Vegas, 2006.
Google Scholar
K. N. Ngan and H. Li, “Semantic Object Segmentation”, IEEE MMTC E-Letter, vol. 4, no. 6, July 2009.
Google Scholar
H. Li and K. N. Ngan, “Automatic video segmentation and tracking for content-based applications”, IEEE Communications Magazine, vol. 45, no. 1, pp. 27–33, 2007.
10.1109/MCOM.2007.284535
Web of Science® Google Scholar
Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum, “Lazy snapping” in Proc. SIGGRAPH 2004, pp. 303–308, 2004.
10.1145/1015706.1015719
Web of Science® Google Scholar
C. Rother, V. Kolmogorov, and A. Blake, “GrabCut – interactive foreground extraction using iterated graph cuts” in Proc. SIGGRAPH 2004, pp. 309–314, 2004.
10.1145/1015706.1015720
Web of Science® Google Scholar
Adobe Systems Inc. Adobe Photoshop User Guide, 2002.
Google Scholar
E. Mortensen, and W. Barrett, “Intelligent Scissors for Image Composition” in Proc. ACM SIGGRAPH 1995, pp. 191–198, 1995.
Google Scholar
Y.-Y. Chuang, B. Curless, D. Salesin, and R. Szeliski, “A Bayesian approach to digital matting” in Proc. IEEE Conf. Computer Vision and Pattern Recog., 2001.
Google Scholar
Corel Corp., Knockout User Guide, 2002.
Google Scholar
Y. Boykov, and M. –P. Jolly, “Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images” in Proc. IEEE Int. Conf. on Computer Vision, 2001.
Google Scholar
J. Wang, P. Bhat, R. A. Colburn, M. Agrawala, and M. F. Cohen, “Interactive video cutout” in Proc. SIGGRAPH 2005, pp. 585–594, 2005.
10.1145/1073204.1073233
Web of Science® Google Scholar
V. Vezhnevets and V. Konouchine, “GrowCut – Interactive Multi-Label N-D Image Segmentation by Cellular Automata” in Proc. Graphicon 2005, pp. 150–156, 2005.
Google Scholar
L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories”, IEEE Trans. Pattern Recognition and Machine Intelligence, vol. 28, no. 4, pp. 594–611, April 2006.
10.1109/TPAMI.2006.79
PubMed Web of Science® Google Scholar
J. Winn, A. Criminisi, and T. Minka, “Object categorization by learned universal visual dictionary” in Proc IEEE Intl. Conf. on Computer Vision, 2005.
Google Scholar
L. von Ahn and L. Dabbish, “Labeling images with a computer game” in Proc. SIGCHI conference on Human factors in Computing Systems, 2004.
Google Scholar
B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “LabelMe: a database and web-based tool for image annotation”, International Journal of Computer Vision, pp. 157–173, Volume 77, Numbers 1–3, May 2008.
10.1007/s11263-007-0090-8
Web of Science® Google Scholar
Y. Abramson and Y. Freund, “Semi-automatic Visual Learning (Seville): a Tutorial on Active Learning for Visual Object Recognition” in Proc. Intl. Conf. Computer Vision and Pattern Recognition, San Diego, 2005.
Google Scholar
I. Kozintsev, M. Naphade, T. S. Huang, “Factor graph framework for semantic video indexing”, IEEE Trans. Circuits and Systems for Video Technology, vol. 12, no. 1, pp. 40–52, 2002.
10.1109/76.981844
Web of Science® Google Scholar
H. Shih and C. Huang, “MSN: Statistical Understanding of Broadcasted Baseball Video Using Multi-level Semantic Network”, IEEE Trans. Broadcasting, 2005.
Google Scholar
H. Shih, and C. Huang, “A Semantic Network Modeling for Understanding Baseball Video” in Proc. ICASSP, 2003.
Google Scholar

Citing Literature

3D Visual Communications