Cultivate Interactive Home Page *
*

Search Disabled

  Home | Current Issue | Index of Back Issues
  Issue 5 Home | Editorial | Features | Regular Columns | News & Events | Misc.

An Introductory Guide to Audio and Video Encoding

By David Johns - October 2001

In a follow up to last issue’s Streaming Video articles David Johns of Culturejam limited [1], a company who specialise in optimising video and audio for the Web, introduces the art of encoding.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The Business Rationale behind Encoding

The world is becoming a digital environment. The records we create are all stored as millions of "ones" and "zeros". Radio, TV, home entertainment, even the telephone - they're all digital now.

Yet there is confusion in the marketplace about what "being digital" actually means. Many seem to believe that once something is converted to digital it can be conveyed via any digital transmission or delivery medium with no further work.

Whilst true in theory, the practice is far from being so simple; the different distribution and viewing media - Internet, digital TV, DVD, mobile devices and so on - all require content to be individually optimised for their specific characteristics. So even newly-created digital material usually requires adjustment for its intended purpose.

Meanwhile, what about the vast array of legacy material which was both created and stored in older formats? Long-playing vinyl records? Analogue video and audio tapes? Quite simply, they’re destined for a slow decay into oblivion and with them their precious value.

There is a means to prevent this however: conversion into a more stable form, a form able to withstand future copying without any loss of quality. Unsurprisingly, that means digital.

News footage, corporate communication libraries, video archives, TV and radio commercials, sound effects, stock footage, showreels - all can have their value preserved for the future via digitising. To be done properly, this process require substantial investment in professional equipment and skills.

This article describes those processes and hopefully explains why good quality digitising and encoding is not just "something which anyone can do" (a popular myth, thanks to the widespread availability of low cost one-button encoding software). For further explanation of all terms used in this article see the Culturejam glossary [2].

The Culturejam Web site
The Culturejam Web site

Overview of the Process

First of all, the audio-visual (A/V) content has to be brought into a computer system. This is called digitising. Then it's edited - partly according to how you want it to look, partly to optimise it for the encoding process - and finally it's encoded. After this last step, the encoded media can be stored on CD-R or DVD, or even copied onto a server which will transmit the material across the Internet.

Digitising and Encoding - In Detail

Digitising

Once the A/V source material has been received it needs to be copied onto the computer system so that it can be edited and/or encoded. In the dim, dark days of analogue tape, this required a conversion from analogue to digital and hence the process became known as "digitising". Today, even though many modern tape formats already store the audio and video digitally, the process of bringing it into a computer is still commonly referred to as digitising.

Digitising generally happens in "real time" which is to say, it takes as long to do as there is material. One hour of material takes one hour to digitise. Ten hours takes ten hours and so on.

However you also have to allow a bit of time for each item that requires digitising because the machine operator has to open the tape, put it in the machine, give the machine a name to store the file under, find and mark the start and end points, start the computer digitising and at the end, pull the tape out and re-file it in the right place. Even if they're all on the same tape, there's a certain amount of stop-starting so as to give each clip its own filename on the computer system.

Some digitising systems bring the material in 'uncompressed', that is, unaltered from its original form. For video, this requires substantial and expensive hardware due to the sheer volume of data involved (see later). Other systems apply a mild form of data compression and encoding straight away, in order to make the amount of data more manageable.

From an operational point of view, encoding at this point is effectively transparent as far as the user is concerned but it does have a bearing on the final encoding quality, so the lower the compression you can get away with when digitising, the better.

Editing

Editing for multi-media playback comprises two elements. First there's traditional editing - cutting the source material into something well-produced that tells the story you wish to tell.

For traditional broadcast, that would be the end of the story but for multi-media applications there's an extra step - modifying the material so that after it's encoded, it is presented in the most appropriate manner possible.

An example of this is with captions. A typical TV item may well have name captions strapped across the bottom of the screen to identify the speaker. Yet if that same picture is encoded for Internet transmission, the caption will probably be unreadable due to the small size of Internet video windows (see 'encoding' for details of why Internet video is generally smaller than full-screen TV quality). Hence it makes sense to place a much larger, Web-friendly caption over the video when creating a Web-edit version.

An editor at his Media100 system
An editor at his Media100 system

There are many similar techniques that may be required, from re-drawing on-screen graphics and diagrams to zooming in on parts of the video screen to prevent items being miniaturised to invisibility after encoding.

Encoding

Encoding is a two stage process, comprising processing and then the encoding itself.

Processing

This has a bit of crossover with the second editing stage; that is, some things can either be done in the edit or as a separate processing step.

Processing includes colour, contrast and brightness modification, audio level adjustment and EQ (bass and treble, in simple terms) plus other similar tweaks. The aim is to sweeten the A/V such that once it's been through the encoding process it looks and sounds as good as it can.

Unfortunately, this step can have to be done more than once because each type of encoding may require slightly different processing in order to achieve the best result.

Encoding

Here's where it gets complicated. There are numerous formats into which audio and video files can be encoded, with different ones being applicable to different applications or simply being competing formats in the same field. The table below lists the most common encoding formats and codecs

NOTE: there is a difference between a format and a codec - but sometimes they go hand in hand inherently. A format refers to a particular way of storing the encoded data; a codec ("compressor / de-compressor") refers to a computer program which implements a mathematical formula for converting the data from one type to another, often with a reduction in the amount of data needed to represent the original information.

Encoded data has to be stored in one format or other, it doesn't stand alone - but sometimes the format and codec are so inextricably linked, you can refer to them both by the same term and they would never be associated with anything else. See "Real Video" and "Windows Media" for examples, below.

Term Format or codec? Typical use
Real Video or Real Audio Both. The Real software includes multiple codecs, all of which are proprietary to Real hence they go hand-in-hand with the Real format. Internet video and audio
Windows Media Both, for the same reasons as Real, above. Internet video and audio
QuickTime Format - it can store data encoded with a variety of codecs. There is only one codec generally used for QuickTime video streaming though, and that's Sorenson. From Internet video up to broadcast-quality video editing, depending on the codec chosen.
AVI Format - but never used for streaming. For that, Microsoft, who invented AVI (it's part of Windows), came up with Windows Media instead. From CD-ROM video up to broadcast-quality video editing, depending on the codec chosen.
M-JPEG Codec Broadcast-quality video editing.
Sorenson Codec - universally used with the QuickTime format. Internet video and CD-ROM video
Cinepak Codec, used for AVI and QuickTime CD-ROM video
Indeo Codec, used for AVI and QuickTime CD-ROM video
On2 VPx (eg VP4) Codec Internet video
ZyGo Video Codec, used in QuickTime Internet video
MPEG-1 Both Some parts of digital TV, also video CDs and Internet video downloads
MPEG-2 Both DVD video & digital television
MPEG-4 Both Any device, from mobile phone handsets to broadcast-quality TV. Not in widespread use yet.
mp3 Both. Note that this is not the same as MPEG-3 (there is no MPEG 3). It actually stands for "MPEG-1 Audio Layer 3" Audio only, widely used for distribution of music on the Internet.
Ogg Vorbis Both. Audio only. A new (and free!) codec trying to rival mp3 for music distribution.
Qdesign Music Codec. Audio only. Used with QuickTime (and often paired with Sorenson on the video side)

The next technical thing to bear in mind is bitrates. The bitrate is the rate at which bits of data (the ones and zeros that represent all things digital) can be sent or received down a particular communications link, or read from a given storage device.

For example, a typical home PC user has a 56k modem. What this means is that the theoretical maximum rate at which that modem can send or receive data is 56,000 bits per second (aka 56kbps). Contrast this with even a lowly single-speed CD-ROM, which provides a data-reading rate of 1,200,000 bits per second (aka 1.2 Mbps).

(note: in truth, a 56k modem cannot achieve 56k except in a lab. In reality, 56k modems usually max out at 40-45kbps. Also, they can't transmit at 56k, they're limited to 33kbps-ish)

Why is this important? Because in order for the viewer to be able to watch the content, it has to be encoded at no greater than the fastest rate the viewer's storage device or communications link can provide. It's like trying to empty Lake Geneva; if you've only got a 1-inch hosepipe to drain it with, you're going to be waiting a long, long time. If you build 60-metre diameter drainpipes, it'll empty a lot faster.

Of course, not everyone uses 56k modems. Some people use ISDN, others have ADSL, many companies have leased lines. Some content will be stored on CD-ROM, some on DVD. In short, each possibility has to be taken into account and encoding performed accordingly. Note that although many new PCs now come equipped with high speed (x40) CD-ROM drives, older PCs won't have them so you really need to encode for the lowest common denominator.

This table describes the various data rates most commonly encountered:

Connection Theoretical speed Realistic speed Typical user
28k modem 28 kbps 25 kbps Home or home worker
33k modem 33 kbps 29 kbps Home or home worker
56k modem 56 kbps 45 kbps Home or home worker
ISDN 64 kbps or 128 kbps 64 kbps or 128 kbps Enthusiastic home user; more likely home worker or small business
ADSL From 512 kbps to 2 Mbps Impossible to say! (see below) Home user or small business
Leased line Various types available from 64 kbps up to 622 Mbps Exactly what it says on the tin. Corporate.
Ethernet LAN Various, usually 10 Mbps or 100 Mbps 7 Mbps / 70 Mbps Within offices
CD-ROM (single speed) 1.2 Mbps 1.2 Mbps Anyone
DVD-ROM (single speed) 11 Mbps 11 Mbps Anyone with a modern PC

(ADSL note: the way ADSL works is that everyone gets UP TO the maximum rate, depending on how many others are trying to use it at the same time (at the line exchange, not in the same office). Hence you might get 512kbps out of a 512kbps line but if 10 people are all trying to watch material via the same line at the exchange, they'll each get 51.2kbps. This is referred to as 'contention')

It's also important to note that for the communications lines, the quoted "realistic" speeds are when considered "point-to-point" (ie the speed from one end of the line to the other). The other end of the line connects to the tangled mess that is the Internet so the actual data rate that manages to flow across the Internet and then down that connection may be much less than you expect. This has to be taken into account when encoding too.

For some of the formats/codecs, bitrate is not an issue as it's standardised. For example, MPEG-1 VideoCDs use 1.2Mbps. DVD MPEG-2 ranges from 4-9Mbps but this is usually dependant on the material being encoded, not the end-user.

Bitrate is critical for Internet downloads and streaming. It is common to encode the same file at three or so different bitrates so that it's suitable for a variety of audiences but it's still important to know what the likely audience is so that the best judgement as to what those bitrates should be can be made.

The key thing to remember is that the lower the bitrate, the harder it is for the codec to represent the audio or video in the given number of bits each second. If you try to encode a full-screen video at 25 frames per second (normal TV rate) such that it would play down a 28kbps line - well, it won't work. Here's why:

Full frame video is 768 pixels (dots) wide by 576 high. Each pixel takes 24 bits to describe its colour (8 for red, 8 for green, 8 for blue - all other colours are combinations of those three). There are 25 full frames each second. That's 768 x 576 x 24 x 25 = 265,420,800 bits per second.

So to send that amount of information down a line with a capacity of 28,000 bits per second (and in reality, no more than 25,000 bits per second), the video information will have to be squeezed by a factor of 265,420,800 / 25,000 = 10,617

That's a lot of compression!

To achieve this, video codecs take some shortcuts. First the video is re-sized to a more manageable level (for example, Internet transmission often uses 192 x 144 - a sixteenth of full-size). Immediately, the compression required drops (using that same example, to just 664 times). Next, the frame rate is dropped, typically by half to 12.5 frames each second. Yes, this makes the video look a bit jerky but you can't have everything. The compression required for Internet video is then just 331 times.

At this point, the codec starts to do its work by using some very complicated mathematics which tries to represent that data in an even more compact way; codecs usually do this (in simple terms) by comparing frames and only storing the differences between them rather than all the information shown. Lo and behold, you end up with sufficiently little data that you can send it down a modem and reconstruct the video at the other end!

The drawback of the maths part is that some of the detail is lost and the video can look 'blocky' and 'blotchy'. It tends to have trouble with fast-moving scenes (when there is a lot of difference in movement between the frames).

Of course, the higher the bitrate, the lower the amount of squishing that has to be done, so the better the quality. Also, each new generation of codec results in noticeable improvements in quality for a given bitrate, thanks to legions of boffins whose purpose in life is to invent better mathematics for us.

Conclusion

The appetite for digital information is never going to decline and neither is the number of formats in which digital content will be viewed. With the world being an analogue place, there will inherently be a continual need for digitisation to take place.

Equally, it is human nature to demand more for less, such as more data in less space. Hence parallel to the demand for digitisation will be a continual demand for encoding. As the technology progresses, so the perceived quality of the encoding, and the efficiency of any associated compression, will increase.

A knowledge of the processes and technology involved will aid anyone involved in the field of digital information to ensure they are receiving the best results from either their own efforts or those of their suppliers.

Note about Internet Broadcasting

To view audio and video over the Internet, encoded files can either be downloaded (copied onto the hard disc) by the viewer, in which case they have to wait until the whole file has come down and then they can play it as often as they like, or they can be streamed, which means watched as they are transmitted but no local copy is stored [3].

Streaming provides practically immediate access - you see it as soon as you've clicked on it but you don't usually get to keep a copy. The quality is also restricted by your connection speed (inherently, you can't receive more data than you've got the connection for).

Downloading means you have to wait but you get to keep the file and since you're not trying to watch it as it comes down, it can be encoded at a higher rate than your connection speed thus giving better quality.

To confuse matters, there's also a half-way house known as "progressive downloading" whereby the file is copied to your hard disc but will start playing back as soon as enough has come down for the rest to have been downloaded by the time you get to the end.

Culturejam

Culturejam specialises in making Internet video look great. We know that video and audio are the most compelling forms of communication available and that the Web is the most ubiquitous and interactive medium known to man. However, we also see that there is little synergy between them at present, largely due to a lack of expertise in Web-oriented video origination, digitisation, post-production and encoding. Having assembled a unique team bearing considerable skills in those areas, Culturejam is therefore positioned as the premier creator of Web-ready audio and video for all markets such as archiving, marketing and training.

References

  1. Culturejam
    URL: <http://www.Culturejam.tv/> Link to external resource
  2. Culturejam Glossary
    URL: <http://www.Culturejam.tv/glossary.htm> Link to external resource
  3. There were 2 articles on Streaming Video in the last issue of Cultivate Interactive.
    Cunningham, D and Francis, N. (2001) An Introduction to Streaming Video, Cultivate Interactive, issue 4, 7 May 2001.
    URL: <http://www.cultivate-int.org/issue4/video/> Link to external resource
    Strom, J. (2001) Streaming Video: A Look Behind The Scenes, Cultivate Interactive, issue 4, 7 May 2001.
    URL: <http://www.cultivate-int.org/issue4/scenes/> Link to external resource

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Author Details

David JohnsDavid Johns
Culturejam limited
One Minster Gardens
West Molesey
Surrey KT8 2ER

Email: djohns@Culturejam.tv Link to an email address
<http://www.Culturejam.tv/> Link to external resource

Phone: +44 (0) 20 8979 7600
Fax: +44 (0) 20 8979 8140

David Johns has worked in radio, television and computing, in both technical and creative arenas. His IT skills were honed at IBM and Logica; his media experience stems from working for broadcasters such as Virgin Radio, the BBC and local commercial radio. A regular user of the Internet since 1987, he witnessed the birth of Web radio and TV and thereafter focused his career onto this arena.

Culturejam

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Johns, D. "An Introductory Guide to Audio and Video Encoding ", Cultivate Interactive, issue 5, 1 October 2001
URL: <http://www.cultivate-int.org/issue5/jam/>

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Related articles:
If you would like to view similar articles to this one click on a key word below:

< - audio - video - encoding - streaming - television - >

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -