Introduction

While objective and subjective quality assessment of 2D video has been an active research topic in the recent years, emerging 3D technologies require new quality metrics and methodologies taking into account the fundamental differences in the human visual perception and typical distortions of stereoscopic content. Therefore, we have developed a comprehensive stereoscopic video database that contains a large variety of scenes captured using a stereoscopic camera setup consisting of two HD camcorder with different capture parameters. In addition to the videos, the database also provides subjective quality scores obtained using an adapted single stimulus continuous quality scale (SSCQS) method. The resulting mean opinion scores can be used to evaluate the performance of visual quality metrics as well as for the comparison and for the design of new objective quality metrics.

Video database

For acquiring high quality stereoscopic videos the following aspects have to be considered:

Matching cameras
Matching optics
Matching geometry
Matching photography
Synchronization

Acquisition

Considering the different aspects mentioned above, we have built the stereo camera setup shown in the figure below, which consists of two identical HD camcorders (Canon HG-20) and an adjustable stereo mount.

The mount ensures that optical axes of the cameras are parallel and supports the continuous adjustment of the camera distance in the range 7-50 cm. To ensure matching of the focal length the wide angle end of the zoom lens with a focal length of 43 mm has been used. In order to match the cameras with each other the focal length, white balance and shutter speed have been set manually. The synchronized operation of the two camcorder is ensured through the use of a single remote control. The camcorders support the capture of videos with a resolution of 1920×1080 pixels and a frame rate of 25 fps. They are stored in AVCHD format, compressed with MPEG-4 AVC/H.264 at 24 Mbps.

Processing

In a stereoscopic camera setup spatial distortions may be caused within the individual cameras (e.g. barrel/ pincushion distortion) or by the camera setup and calibration (e.g. relative positions). The goal of the spatial alignment is to compensate small vertical disparities caused by the camera setup and adjust the depth position to avoid stereo window violations. This is achieved by applying a relative vertical and horizontal translation between the video pairs based on point correspondences. For a reliable adjustment of the depth position the control points for the nearest object are manually selected.

In a stereoscopic camera setup temporal mismatch may occur if the cameras are not shutter-synchronized. Therefore the left and the right frames are not captured simultaneously but slightly shifted in time. In order to avoid temporal mismatch a remote control has been used to control both camcorders simultaneously. The synchronization accuracy has been checked manually and was typically below 40 ms which is equal to 2 frames. To eliminate the inﬂuence of temporal mismatch on the subjective quality the videos were manually aligned according to the temporal oﬀset.

Even with a manual control of white balance and exposure, luminance and chrominance components may vary globally between the different views. These discrepancies may originate from the use of heterogeneous cameras, calibration errors and appearance changes due to the different viewing angles. The goal of color adjustment step is to correct these color differences between the two stereo images. Histogram matching is used to adapt the right camera view to the left camera view.

Description

The proposed database contains stereoscopic videos with a resolution of 1920×1080 pixels and a framerate of 25 fps. Various indoor and outdoor scenes with a large variety of colors, textures, moving objects and depth structures have been captured. Each of the scenes has been captured with a static camera and different camera distances in the range 10-50 cm. Since the acquisition was done in a sequential way, the content of a single scene may vary slightly across the different camera distances. However, the general 2D (color, texture, motion) and 3D (depth) characteristics are preserved. The database contains 6 scenes, shown in the figure below, with different characteristics.

The following table provides an overview of the selected scenes together with the 3D characteristics such as near distance and far distance, and the maximum permissible camera distance. The latter can be theoretically computed based on a simplified Bercovitz equation.

Id	Title	Near (m)	Far (m)	Distance (cm)
1	sofa	3	6	17
2	bike	10	150	30
6	feet	2	4	11
8	hallway	2	20	6
11	notebook	3	10	12
12	car	8	120	24

Subjective quality

Equipment

The subjective test campaign was conducted at the Multimedia Signal Processing Group (MMSPG) quality test laboratory at EPFL (shown in the figure below), which is compliant with the recommendations for subjective evaluation of visual data issued by ITU-R BT.500-11. A 46” polarized stereoscopic display (Hyundai S465D) with a native resolution of 1920×1080 pixels has been used to display the test stimuli. The experiments involved only one subject per session assessing the test material. The subject was seated in line with the center of the monitor, at a distance of approximatively 2 m which is equal to the height of the screen multiplied by factor 3.

Observers

20subjects (6 female, 16 male) participated in the test. All of them were non-expert viewers with a marginal experience of 3D image and video viewing. The age distribution ranged from 24 to 37 with an average of 27 years.

Stimuli

For the subjective evaluation, the stereoscopic video database has been split into a training set with 1 scene (stairs) and a testing set with the 6 scenes mentioned above. For each of the scenes 5 different stimuli have been considered corresponding to different camera distances (10, 20, 30, 40, 50 cm).

Procedure

Since the optimal acquisition settings for 3D content may vary depending on the scene, the display and the observer, it is difficult to select one of the stimuli as a reference. Therefore, a single stimulus (SS) method has been adopted for the subjective quality evaluation. In order to determine the influence of the camera distance on the 3D quality a continuous quality scale with 5 levels (excellent, good, fair, poor, bad), as described in ITU-R BT.500-11, has been used.

Processing

The screening of subjects was performed according to the guidelines described in ITU-R BT.500-11. Using the outlier detection, 3 of the 20 subjects have been discarded as an outlier. Thus the statistical analysis is based on the scores from 17 subjects.

After the outlier removal, the mean opinion score is computed for each test condition. The relationship between the estimated mean values based on a sample of the population (i.e. the subjects who took part in our experiments) and the true mean values of the entire population is given by the confidence interval of the estimated mean.

Copyright

Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, modify, and distribute the data provided and its documentation for research purpose only. The data provided may not be commercially distributed. In no event shall the Ecole Polytechnique Fédérale de Lausanne (EPFL) be liable to any party for direct, indirect, special, incidental, or consequential damages arising out of the use of the data and its documentation. The Ecole Polytechnique Fédérale de Lausanne (EPFL) specifically disclaims any warranties. The data provided hereunder is on an “as is” basis and the Ecole Polytechnique Fédérale de Lausanne (EPFL) has no obligation to provide maintenance, support, updates, enhancements, or modifications.

If you use this database in your research we kindly ask you to reference this website and the paper below:

Lutz Goldmann, Francesca De Simone, Touradj Ebrahimi: “A Comprehensive Database and Subjective Evaluation
Methodology for Quality of Experience in Stereoscopic Video”, Electronic Imaging (EI), 3D Image Processing (3DIP) and Applications, San Jose, USA, 2010.

Download

The whole database is split into several archives

Processed and encoded stereo videos: LR video pairs stored as individual AVI files.
Raw subjective quality scores: List of 30 videos and the 30×20 score matrix as CSV files.
Mean opinion scores and confidence intervals: 30×1 mean opinion score and confidence intervals as CSV files.

The dataset can be downloaded as a zip file from the following FTP by using dedicated FTP clients, such as FileZilla or FireFTP (we recommend to use FileZilla):

Protocol: FTP
FTP address: tremplin.epfl.ch
Username: [email protected]
Password: ohsh9jah4T
FTP port: 21

After you connect, choose the 3DVQA folder from the remote site, and download the relevant material. The total size of the provided data is ~958 MB.

Contact

If you have any questions regarding this research please contact Philippe Hanhart ([email protected])