Tuesday, April 26, 2011

MPEG-4 - multimedia over the Internet

MPEG-4 - multimedia over the Internet

Media Streaming has emerged as an important service offered over the Internet. In response, the Moving Pictures Experts Group (MPEG) has introduced the MPEG-4 multimedia standard that encompasses a wide range of tools and technologies for delivering multimedia content. Unlike its predecessors (MPEG-1 and MPEG-2), the MPEG-4 standard does not focus only on the media compression aspect of multimedia technology. It also considers the media packaging and delivery components, and many of the mechanisms that future multimedia applications might need. On top of being a comprehensive multimedia standard, MPEG-4 has a more interesting feature: it is an object-based multimedia technology. An MPEG-4 presentation is comprised of units of aural, visual or audiovisual content, called media objects. These objects can be of natural origin such as camera-generated images, or synthetic origin such as computer generated images or voice. Being object-based, MPEG-4 enables many new applications. Interactive multimedia, scene manipulation and controlling individual objects are just some of the features that MPEG- 4 provides. Figure 1 depicts an example of an MPEG-4 audiovisual scene.

Coding, composition and streaming

Media objects in an MPEG-4 audiovisual scene can be video objects (for example, a person talking without the background scenery), still images (a fixed background), audio objects (the voice stream associated with the talking person) and so on. In addition to multimedia objects, MPEG-4 defines the coded representation of objects such as text and graphics; talking synthetic heads with associated text are used to synthesize the speech and animate the head. Figure 1 shows how MPEG-4 using a tree to represent the interactive scene description hierarchy describes an audiovisual scene as a composition of individual objects. The leaves of the tree are the most basic elements of a scene and are called primitive media objects. A combination of two or more primitive objects forms a compound media object. A talking person is an example of a compound media object whose primitive objects are a voice and a visual object. Primitive media objects in MPEG- 4 are delivered in elementary streams.

Data access in MPEG-4 is always viewed as delivering, storing or accessing elementary streams. MPEG-4 has dedicated a part of the standard to delivery issues by defining a generic platform for the delivery of multimedia information. This platform is called the Delivery Multimedia Integration Framework (DMIF). DMIF is part 6 of the MPEG-4 standard and addresses three different content access scenarios: local file access, broadcast service and remote interactive service. The remote interactive scenario, which includes streaming MPEG-4 content across the Internet, is by far the most complicated of the three.

Challenge of MPEG-4 streaming

MPEG-4 traffic is an aggregation of its elementary streams (objects). The MPEG-4 standard supports many tools and encoding techniques. For example, MPEG-2 or H.263 video streams may be used in an MPEG-4 presentation as well as MPEG-4’s own encoded video. Each one is treated as a separate object. With an MPEG-4 presentation likely to possess a large number of these objects (different types of audio, video, animation, images and so on), the aggregated traffic pattern becomes very diversified and difficult to model.

Another problem arises when we consider the possibility of interactivity compared with traditional non-objectoriented media formats. User interaction is one of the most important facilities now available thanks to the objectbased nature of MPEG-4. In addition to the traditional play/pause/stop interaction, with MPEG-4 users can interact with the scene, add objects, remove them, or alter their traffic and rendering specifications. Adding and removing objects changes the traffic pattern and may render previous bandwidth allocations useless. Therefore, dynamic resource management tools and techniques (such as quality-of-service, or QoS, renegotiation mechanisms) are required. (For this reason, many of the existing commercial streaming systems are designed for single stream, non-interactive, multimedia applications and generally rely on proprietary delivery mechanisms.)

Considering the difficulties of realizing object-based multimedia streaming, it is clear that additional measures must be taken to enable MPEG-4 streaming over the Internet. In our opinion, the best approach to tackling the challenges of object-based media streaming is through the use of DMIF.

DMIF overview

MPEG-4 terminal’s generic architecture consists of three main layers: compression, synchronization and delivery. The compression layer performs media encoding and decoding of the elementary streams. The synchronization, or sync, layer manages elementary streams and their synchronization and hierarchical relations. The delivery layer, which is of special interest here, ensures transparent access to content regardless of the delivery technology in use. The delivery layer provides a method to retrieve MPEG-4 elementary streams. This layer also provides an abstraction layer between the core MPEG-4 systems components and the retrieval method. So why do we need to define another framework for the delivery of MPEG- 4 instead of just using existing protocols and methods? The answer becomes apparent when reviewing DMIF’s goals and objectives. DMIFis not intended to replace the existing protocols and methods; instead, it is designed to unify media access through these methods. DMIF essentially hides the delivery technology details from applications that rely on DMIF for communications and ensures interoperability between end systems.

Streaming multimedia over the Internet
Yaser Pourmohammadi-Fallah, Kambiz Asrar-Haghighi, Hussein M. Alnuweiri