Research and technological challenges

Research and technological challenges

The Montreux Jazz Festival Digital Archive Project is a cutting edge and the first experience attempted to create a high quality audiovisual digital archive. As such it contains a number of scientific and technological challenges, which are summarized in the following.

Enhancement of audiovisual content

Thanks to the technological progress in the past 50 years, in electronics, computing, and algorithmics, the quality of capture and display devices have undergone an impressive increase, following the well-known Moore’s law. From black and white, low quality and low resolution images of the early days of analog television merely producing a few hundred lines per picture in interlaced, consumers now can produce and have access to High Definition content as large as 1080 lines in progressive mode. In professional environments such as digital cinema, the dimensions of each picture not only have at least doubled when compared to HD content, and are displayed in 3D, but also the color capture and rendition have been largely improved. It is a matter of time, measured in a few years, before consumers have access to cheaper capture, and display devices of 2000 times 4000 pixels already available to professionals, and in 3D. Display of content in Standard Definition or even High Definition in such environments is unthinkable and results in a poor quality of experience by viewers. This fact has been recognized by scientists and technologists in the field of video and several techniques are under development that aim at enhancing the resolution and color rendition of lower quality content for display in higher resolution devices, and to convert 2D video into 3D. A same phenomenon can be observed as far as audio information is concerned. From the early days of mono and low quality audio capture and rendition, today’s standard solutions for audio recording work in stereo and surround sound environment. Here again, technologies have been under development to render mono, and stereo audio to surround sound. EPFL has several experts in image and audio processing with most advanced state of the art solutions to convert lower quality audio-visual material to high definition video and surround sound audio. Their know-how and further research on the content of Montreux Jazz Festivals will be a unique opportunity to further the state of the art in this exciting and highly demanded field.


Storage of several thousands hours of high resolution audio and video creates a unique challenge by itself. As far the life span and reliability of such a system is concerned, adequate hybrid solutions in terms of hardware are required based on the latest state of the art technologies in the field of data storage. The master archive will be stored in a robotic mass data tape storage system as those offered by companies such as IBM and HP. Such systems possess advanced technologies, which continuously monitor the status of the stored data in data tapes, and as soon as first signs of deterioration of data are detected, automatically transfer the content into a new tape which would replace the old one. Robotic mass data tape storage system however exhibits an important drawback in terms of speed of access to any specific data. For this reason, such systems are often enhanced by a disk-based storage system, which mirrors part or the whole content in an often more compact manner thanks to lossy compression of its multimedia content. The latter system is faster in terms of accessibility to a specific portion of the content and assisted by a file format management operating system similar to those found in any typical computer system. In a first step of the project a ‘cahier de charge’ will be elaborated for the most adequate hardware solution for the storage system for both the master archive and the secondary archive. In parallel, the software architecture of the archival will be designed and implemented. Several issues have to be carefully tackled. For instance, the format used for the lossy compression of audiovisual content in the secondary archive and the compression parameters based on the quality of the original content, so as to provide a maximum of compactness while preserving a transparent quality of that content. Standard solutions such as JPEG 2000 file format, already used and under standardization for archival of image and video databases are among potential candidates. EPFL has already an excellent know-how and expertise in compression, media security and standardization activities in these areas and is uniquely positioned to play a key role in this field.


Annotation is an important issue in any database. Annotation essentially consists of attaching metadata to the audiovisual content, which serves in identifying and describing that content. Metadata is defined as the data about the data, and is used to facilitate the understanding, use and management of data. Both manual, semi-automatic, and automatic annotation techniques will be considered, although the priority will be given to automatic methods. Annotation results are stored in MPEG-7 format. Tools for population of metadata will be developed through various research projects at EPFL or in cooperation with other institutions through joint collaborations. Some of the analysis tools include speech recognition, music genre classification, scene cut and shot detection, video summarization, emotion and social tagging.

Search and retrieval

Searching and managing a large collection of audiovisual content represents a considerable challenge. The field of research addressing the study of systems for indexing, searching and retrieving data is known as information retrieval. The search essentially consists in matching a user query to objects stored in a database. Mirroring the well-known document search and retrieval system, a video search and retrieval system may simply associate metadata, in the form of textual annotation, to each video sequence. In this case, the search is then simply done by keywords. Keyword based systems is known to possess a number of limitations both from practical and theoretical view points. An alternative to palliate the limitation of keyword based search and retrieval system is by making use of audio and visual basic features, together with other higher level semantics. Content-Based Image Retrieval (CBIR) performs video search and retrieval using the content of the video themselves.

More specifically, relevant and discriminating features are extracted from each image. Commonly, CBIR relies on low-level features such as color, shape and texture, but it canalso use higher-level features such as faces obtained by face detection and recognition.

The audiovisual material under consideration is of very high value. Furthermore, copyrights for their use can be different from case to case. Given the ease to copy and distribute digital data at negligible cost, security is a crucial issue in this project. Security of the content will be guaranteed through a multi-layered approach encompassing both hardware and software measures. Digital Rights Management (DRM) refers to technologies to control access to digital data or to manage its usage according to the restrictions associated with a specific instance of a digital work. In this project, advanced DRM techniques are adapted to the application scenario requirements, and new systems are developed where appropriate. Solutions will be built upon the MPEG Intellectual Property Management and Protection standard (IPMP) and Secure JPEG 2000 (JPSEC). These solutions will make extensive use of most advanced encryption technologies and digital watermarking so as to not only guarantee conditional access to various resolutions and qualities of the content, but also facilitate their tracking and monitoring if distributed.

Visualization and interaction with the content

An added-value of this project is the invention and implementation of truly innovative and efficient visualization and interaction mechanisms and protocols with Monteux Jazz Festival content. This will be achieved by a close collaboration between technologists and designers.

Large databases of audiovisual content call for entirely new approaches to enable efficient browsing and presentation. Research topics include the development of new metrics for video and music similarity, the exploitation of these metrics to automatically structure large collections according to musical criteria, and the development of new user interfaces for straightforward and intuitive interaction and visualization.