Efficient 2D to 3D video conversion implemented on DSP
 Eduardo RamosDiaz^{1}Email author,
 Victor Kravchenko^{2} and
 Volodymyr Ponomaryov^{1}
DOI: 10.1186/168761802011106
© RamosDiaz et al; licensee Springer. 2011
Received: 3 June 2011
Accepted: 18 November 2011
Published: 18 November 2011
Abstract
An efficient algorithm to generate threedimensional (3D) video sequences is presented in this work. The algorithm is based on a disparity map computation and an anaglyph synthesis. The disparity map was first estimated by employing the wavelet atomic functions technique at several decomposition levels in processing a 2D video sequence. Then, we used an anaglyph synthesis to apply the disparity map in a 3D video sequence reconstruction. Compared with the other disparity map computation techniques such as optical flow, stereo matching, wavelets, etc., the proposed approach produces a better performance according to the commonly used metrics (structural similarity and quantity of bad pixels). The hardware implementation for the proposed algorithm and the other techniques are also presented to justify the possibility of realtime visualization for 3D color video sequences.
Keywords
disparity map multiwavelets anaglyph 3D video sequences quality criteria atomic function DSP1. Introduction
Conversion of available 2D content for release in threedimensional (3D) is a hot topic for content providers and for success of 3D video in general. It naturally completely relies on virtual view synthesis of a second view given the original 2D video [1]. 3DTV channels, mobile phones, laptops, personal digital assistants and similar devices represent hardware, in which the 3D video content can be applied.
There are several techniques to visualize 3D objects, such as using polarized lens, active vision, and anaglyph. However, some of those techniques have certain drawbacks, mainly the special hardware requirements, such as the special display used with the synchronized lens in the case of active vision and the polarized display in the case of polarized lens. However, the anaglyph technique only requires a pair of spectacles constructed with red and blue filters where the red filter is placed over the left position producing a visual effect of 3D perception. Anaglyph synthesis is a simple process, in which the red channel of the second image (frame) replaces the red channel in the first image (frame) [2]. In the literature, several methods to compute anaglyphs have been described. One of them is the original Photoshop algorithm [3], where the red channel of the left eye becomes the red channel of the anaglyph and vice versa for the blue and green channels of the right eye. Dubois [4] suggested the least square projection in each color component (R, G, B) from R_{6} space to the 3D subspace. Two principal drawbacks of these algorithms are the presence of ghosting and the loss of color [5].
In the 2D to 3D conversion, depth cues are needed to generate a novel stereoscopic view for each frame of an input sequence. The simplest way to obtain 3D information is the use of motion vectors directly from compressed data. However, this technique can only recover the relative depth accurately, if the motion of all scene objects is directly proportional to their distance from the camera [1].
In [6], the motion vector maps, which are obtained from the MPEG4 compression standard, are used to construct the depth map of a stereo pair. The main idea here is to avoid the disparity map stage because it requires extremely computationally intensive operations and cannot suitably estimate the highresolution depth maps in the video sequence applications. In paper [7], a realtime algorithm for use in 3DTV sets is developed, where the general method to perform the 2D to 3D conversion consists of the following stages: geometric analysis, static cues extraction, motion analysis, depth assignment, depth control, and depth image based rendering. One drawback of this algorithm is that it requires extremely computationally intensive operations.
There are several algorithms to estimate the DM such as the optical flow differential methods designed by Lucas & Kanade (L&K) and Horn and Schunk [8, 9], where some restrictions in the motion map model are employed. Other techniques are based on the disparity estimation where the best match between pixels in a stereo pair or neighboring frames is found by employing a similarity measure, for example, the normalized crosscorrelation (NCC) function or the sum of squared difference (SSD) between the matched images or frames [10]. A recent approach called the regionbased stereo matching (RBSM) is presented in [11], where the block matching technique with various window sizes is computed. Another promising framework consists of stereo correspondence estimation based on wavelets and multiwavelets [12], in which the wavelet transform modulus (WTM) is employed in the DM estimation. The WTM is calculated from the vertical and the horizontal detail components, and the approximation component is employed to normalize the estimation. Finally, the cross correlation in wavelet transform space is applied as the similarity measure.
In this article, we propose an efficient algorithm to perform a 3D video sequence from a 2D video sequence acquired by a moving camera. The framework uses the wavelet atomic functions (WAF) for the disparity map estimation. Then, the anaglyph synthesis is implemented in the visualization of the 3D color video sequence on a standard display. Additionally, we demonstrate the DSP implementation for the proposed algorithm with different sizes of the 2D video sequences.
The main difference with other algorithms presented in literature is that the proposed framework performing sufficiently good depth and spatial perception in the 3D video sequences does not require intensive computational operations and can generate 3D videos practically in realtime mode.
In the present approach, we employ the WAFs because they have already demonstrated successful performance in medical image recognition, speech recognition, image processing, and other technologies [13–15].
The article is organized as follows: Section 2 presents the proposed framework, Section 3 contains the simulation results, and Section 4 concludes the article.
2. The proposed algorithm
2.1. Disparity map computation
Stereo correspondence estimation based on the MW (MWAF) technique is proposed to obtain the disparity map. The stereo correspondence procedure consists of two stages: the WAF implementation and the WTM computation.
Here, we present a novel type of wavelets known as WAFs, first introducing basic atomic functions (up, fup_{ n }, π_{ n } ) used as the mother functions in wavelet construction. The definition of AFs is connected with a mathematical problem: the isolation of a function that has derivatives with a maximum and minimum similar to those of the initial function. To solve this problem requires an infinitely differentiable solution to the differential equations with a shifted argument [15]. It has been shown that AFs fall within an intermediate category between splines and classical polynomials: like Bsplines, AFs are compactly supported, and like polynomials, they are universal in terms of their approximation properties.
The detailed definitions and properties of these functions can be found in [15].
The coefficients {h_{ k } } should satisfy such normalization condition: $\frac{1}{\sqrt{2}}{\sum}_{k}{h}_{k}={H}_{0}\left(0\right)=1$. Finally, wavelets of decomposition and reconstruction are employed in such a form: ${\stackrel{\u0303}{\psi}}_{i,k}={2}^{i\u22152}\stackrel{\u0303}{\psi}\left(x\u2215{2}^{i}k\right)$ and ${\psi}_{i,k}={2}^{i\u22152}\psi \left(x\u2215{2}^{i}k\right)$, respectively, where i and k are indexes of translation and scale [16].
Filter coefficients {h_{ k }} for scale function φ(x) generated from different WAF based on up, fup_{ 4 }, and π_{6}.
K  Up  fup _{ 4 }  π _{6} 

0  0.757698251288  0.751690134933  0.7835967912 
1  0.438708321041  0.441222946160  0.4233724330 
2  0.047099287129  0.041796290935  0.0666415128 
3  0.118027008279  0.124987992607  0.0793267472 
4  0.037706980974  0.034309220121  0.0420426990 
5  0.043603935723  0.053432685600  0.0008988715 
6  0.025214528289  0.024353106483  0.0144489586 
7  0.011459893503  0.022045882572  0.0211760726 
8  0.013002207742  0.014555894480  0.0046781803 
9  0.001878954975  0.007442614689  0.0141324153 
10  0.003758906625  0.006923189587  0.0104455879 
11  0.005085949920  0.001611566664  0.0003223058 
12  0.001349824585  0.002253528579  0.0059986067 
13  0.003639380570  0.000052445920  0.0075295865 
14  0.002763059895  0.000189566204  0.0011585840 
15  0.001188712844  0.000032923756  0.0064315112 
16  0.001940226446  0.000258206216  0.0047891344 
Once the W_{ s } is computed for each an image stereo pair or neighboring frames for a video, the disparity map for each level of decomposition can be formed using the crosscorrelation function in wavelet transform space:
$Co{{r}_{\left(\mathsf{\text{L}}\text{\_}\mathsf{\text{R}}\right),s}}_{}\left(x,y\right)=\sum _{\left(i,j\right)\in P}^{}\frac{{W}_{\mathsf{\text{L}}}\left(i,j\right)\cdot {W}_{\mathsf{\text{R}}}\left(x+i,y+j\right)}{\sqrt[]{{\sum}_{i,j\in P}{W}_{\mathsf{\text{L}}}^{2}\left(i,j\right)\cdot \sum _{i,j\in P}{W}_{\mathsf{\text{R}}}^{2}\left(x+i,y+j\right)}}$, (11)
where W_{L} and W_{R} are the wavelet transform for the left and right images in each decomposition level s, and P is sliding processing window. Finally, the disparity map for each level of decomposition is computed by applying the NNI technique. In this work, we propose using four levels of decomposition in DWT.
2.2. Disparity map improvement and anaglyph synthesis
where D_{new} is the new disparity map pixel value, 0 < a < 1 is a normalizing constant, and 0 < P < 1.
At the final stage, the anaglyph synthesis is performed using the improved disparity map. To generate an anaglyph, the neighboring frames in a grid dictated by the disparity map should be resampled. During numerous simulations, the bilinear, sinc and NNIs were implemented to find an anaglyph with a better 3D perception. The NNI showed a better performance during the simulations and it was sufficiently fast in comparison with the other investigated interpolations. Thus, the NNI was chosen to successfully create the required anaglyph in this application. The NNI is performed for each pair of neighboring frames in the video sequence. NNI [19] that uses this framework changes the values of the pixels to the closest neighbor value. To perform the NNI in the current decomposition level and to form the resulting disparity map, intensity of each pixel is changed. The new intensity value is determined by comparing a pixel in the low resolution disparity from i th decomposition level with the closest pixel value in the actual disparity map from (i  1)th decomposition level.
2.3. DSP implementation
3. Simulation results
where N is the total number of pixels in the input image, and d_{E} and d_{G} are the estimated and the ground truth disparities, respectively.
In Equations (15) to (17), X is the estimated image, Y is the ground truth image, μ and σ are the mean value and standard deviation for the X or Y images, and C_{1} = C_{2} = C_{3} = 1.
QBD and SSIM for proposed and existed algorithms for different test images.
Image  L&K  SSD  GEEMSF  WF Bio6.8  WF Coiflet2  WF Haar  WAF π_{6}  MWF Coiflet2  MWAF π_{6} 

Aloe  
SSIM  0.3983  0.6166  0.3017  0.9267  0.5826  0.5776  0.9232  0.5826  0.9232 
QBP  0.1121  0.4722  0.9190  0.0297  0.4517  0.4420  0.0130  0.4490  0.0111 
Venus  
SSIM  0.1990  0.4320  0.2145  0.5979  0.4530  0.4472  0.4604  0.4530  0.6947 
QBP  0.3084  0.1428  0.2013  0.1694  0.5014  0.5010  0.1930  0.5011  0.1091 
Lampshade1  
SSIM  0.0861  0.6320  0.3124  0.7061  0.7061  0.7081  0.6897  0.7061  0.7619 
QBP  0.2430  0.2800  0.3410  0.2072  0.2071  0.2071  0.2017  0.2071  0.1426 
Wood1  
SSIM  0.1089  0.7142  0.7051  0.9367  0.7096  0.7072  0.9448  0.7096  0.9448 
QBP  0.1316  0.2376  0.2100  0.1258  0.2400  0.2402  0.1180  0.2400  0.0919 
Bowling1  
SSIM  0.1118  0.6925  0.7081  0.8828  0.6690  0.6672  0.9084  0.6690  0.9084 
QBP  0.1720  0.1885  0.0645  0.0555  0.2010  0.2011  0.0119  0.2010  0.0165 
Reindeer  
SSIM  0.1557  0.7460  0.7143  0.7393  0.7321  0.7308  0.6819  0.7321  0.7001 
QBP  0.3910  0.1250  0.2810  0.1418  0.1565  0.1570  0.1513  0.1520  0.1680 
Based on the objective quantity metrics and the subjective results presented in Figure 4, MWAF π_{ 6 } has been selected as the technique to estimate the disparity map for video sequence visualization.
Processing times for different algorithms.
Algorithm  Matlab Time/frame, s (240 × 360)  Matlab Time/frame, s (480 × 720)  Serial Processing in DSP Time/frame, s (240 × 360)  Serial Processing in DSP Time/frame, s (480 × 720) 

Classic wavelet families (Coif2, Db6.8, Haar)  4.20  6.16  0.0314  0.0713 
Wavelet atomic functions (up, fup_{ 4 }, π_{ 6 })  4.23  6.19  0.0312  0.0715 
MWAF (up, fup_{ 4 }, π_{ 6 })  4.84  6.77  0.0489  0.081 
Mclassic wavelet families (Coif2, Db6.8, Haar)  4.85  6.76  0.0480  0.080 
The processing time values were measured since the moment the sequence was acquired from the DSP until the anaglyph was displayed in a regular monitor.
The processing times in Table 3 lead to a possible conclusion that the DSP algorithm can process up to 20 frames per sec for a frame of 240 × 360 pixels size in RGB format. Additionally, the DSP algorithm can process up to 12 frames per sec for a frame of 480 × 720 pixels size in RGB format. Processing time values for L&K and SSD algorithms implemented in Matlab were 22.59 and 16.26 s, accordingly, because they required extremely computationally intensive operations.
4. Conclusion
This study analyzed the performance of various 3D reconstruction methods. The proposed framework based on MWAFs is the most effective method to reconstruct the disparity map for 3D video sequences with different types of movements. Such framework produces the best depth and the best spatial perception in synthesized 3D video sequences against other analyzed algorithms that is confirmed by numerous simulations for different initial 2D color video sequences. The MWAF algorithm can be applied to any type of color video sequence without additional information. The performance of the DSP implementation shows that the proposed algorithm can practically visualize the final 3D color video sequence in realtime mode. In future, we suppose to optimize the proposed algorithm in order to increase the processing speed up to the film velocity.
List of abbreviations
 CCS:

code composer studio
 3D:

threedimensional
 LP:

low pass
 MW:

multiple decomposition levels
 NCC:

normalized crosscorrelation
 QBD:

quantity of bad disparities
 RBSM:

regionbased stereo matching
 SSD:

sum of squared difference: WAF: wavelet atomic functions
 WTM:

wavelet transform modulus.
Declarations
Authors’ Affiliations
References
 Smolic A, Kauff P, Hnorr S, Hournung A, Kunter M, Muller M, Lang M: Three dimensional video postproduction and processing. Proc IEEE 2011,99(4):607625.View ArticleGoogle Scholar
 Ideses I, Yaroslavsky L: New methods to produce high quality color anaglyphs for 3D visualization. In ICIAR, Lecture Notes in Computer Science. Volume 3212. Springer Verlag, Germany; 2004:273280. 10.1007/9783540301264_34Google Scholar
 Sanders W, McAllister D: Producing anaglyphs from synthetic images. Proceedings of SPIE Stereoscopic Displays and Virtual Reality Systems X 2003, 5006: 348358.View ArticleGoogle Scholar
 Dubois E: A projection method to generate anaglyph stereo images. In Proceedings of IEEE International Conference on Acoustic Speech Signal Processing. Volume 3. Salt Lake City, USA; 2001:16611664.Google Scholar
 Woods A, Rouke T: Ghosting in anaglyphic stereoscopic images. Stereoscopic Displays and Applications XV, Proceedings of SPIEIS&T Electronic Imaging, SPIE 2004, 5291: 354365.Google Scholar
 Ideses I, Yaroslavsky L, Fishbain B: 3D from compressed video, in Stereoscopic displays and virtual reality systems. Proc SPIE 2007.,6490(64901C):
 Caviedes J, Villegas J: Real time 2D to 3D conversion: Technical and visual quality requirements. International Conference on Consumer Electronics, ICCEIEEE 2011, 897898.Google Scholar
 Fleet DJ: Measurement of Image Velocity. Kluwer Academic Publishers, Massachusetts; 1992.View ArticleMATHGoogle Scholar
 Beauchemin SS, Barron JL: The computation of optical flow. ACM Comput Surv 1995,27(3):433465. 10.1145/212094.212141View ArticleGoogle Scholar
 Bovik A: Handbook of Image and Video Processing. Academic Press, USA; 2000.MATHGoogle Scholar
 Alagoz BB: Obtaining depth maps from color images by region based stereo matching algorithms. OncuBilim Algor Syst Labs 2008,08(4):112.Google Scholar
 Bhatti A, Nahavandi S: Stereo Vision. Volume Chap 6. ITech, Vienna; 2008:2748.View ArticleMATHGoogle Scholar
 YuV Gulyaev, Kravchenko VF, Pustovoit VI: A new class of WAsystems of KravchenkoRvachev functions in Doklady mathematics. 2007,75(2):325332.Google Scholar
 Juarez C, Ponomaryov V, Sanchez J, Kravchenko V: Wavelets based on atomic function used in detection and classification of masses in mammography. Lecture Notes in Artificial Intelligence 2008, 5317: 295304.Google Scholar
 Kravchenko V, Meana H, Ponomaryov V:Adaptive Digital Processing of Multidimensional Signals with Applications. FizMatLit Edit, Moscow; 2009. [http://www.posgrados.esimecu.ipn.mx/]Google Scholar
 Meyer Y: Ondelettes. Hermann, Paris; 1991.MATHGoogle Scholar
 Kravchenko VF, Yurin AV: New class of wavelet functions in digital processing of signals and images. J Success Mod Radio Electron, Moscow, Edit Radioteknika 2008, 5: 3123.Google Scholar
 Ideses I, Yaroslavsky L: Three methods that improve the visual quality of colour anaglyphs. J Opt A Pure Appl Opt 2005, 7: 755762. 10.1088/14644258/7/12/008View ArticleGoogle Scholar
 Goshtasby A: 2D and 3D Image Registration. Wiley Publishers, USA; 2005.Google Scholar
 Texas Instruments: TMS320DM642 Evaluation Module with TVP Video Encoders. Technical Reference 5073450001 Rev B 2004.Google Scholar
 Malpica WS, Bovik AC: Range image quality assessment by structural similarity. ICASSP 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE 2009, 11491152.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.