![]() |
Search Options | Help | Site Map | Cultivate Web Site | |||||
|
||||||
| Home | Current Issue | Index of Back Issues |
| Issue 5 Home | Editorial | Features | Regular Columns | News & Events | Misc. | ||
By David Johns - October 2001
In a follow up to last issues Streaming Video articles David Johns of Culturejam limited [1], a company who specialise in optimising video and audio for the Web, introduces the art of encoding.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The world is becoming a digital environment. The records we create are all stored as millions of "ones" and "zeros". Radio, TV, home entertainment, even the telephone - they're all digital now.
Yet there is confusion in the marketplace about what "being digital" actually means. Many seem to believe that once something is converted to digital it can be conveyed via any digital transmission or delivery medium with no further work.
Whilst true in theory, the practice is far from being so simple; the different distribution and viewing media - Internet, digital TV, DVD, mobile devices and so on - all require content to be individually optimised for their specific characteristics. So even newly-created digital material usually requires adjustment for its intended purpose.
Meanwhile, what about the vast array of legacy material which was both created and stored in older formats? Long-playing vinyl records? Analogue video and audio tapes? Quite simply, theyre destined for a slow decay into oblivion and with them their precious value.
There is a means to prevent this however: conversion into a more stable form, a form able to withstand future copying without any loss of quality. Unsurprisingly, that means digital.
News footage, corporate communication libraries, video archives, TV and radio commercials, sound effects, stock footage, showreels - all can have their value preserved for the future via digitising. To be done properly, this process require substantial investment in professional equipment and skills.
This article describes those processes and hopefully explains why good quality digitising and encoding is not just "something which anyone can do" (a popular myth, thanks to the widespread availability of low cost one-button encoding software). For further explanation of all terms used in this article see the Culturejam glossary [2].
![]() |
| The Culturejam Web site |
First of all, the audio-visual (A/V) content has to be brought into a computer system. This is called digitising. Then it's edited - partly according to how you want it to look, partly to optimise it for the encoding process - and finally it's encoded. After this last step, the encoded media can be stored on CD-R or DVD, or even copied onto a server which will transmit the material across the Internet.
Once the A/V source material has been received it needs to be copied onto the computer system so that it can be edited and/or encoded. In the dim, dark days of analogue tape, this required a conversion from analogue to digital and hence the process became known as "digitising". Today, even though many modern tape formats already store the audio and video digitally, the process of bringing it into a computer is still commonly referred to as digitising.
Digitising generally happens in "real time" which is to say, it takes as long to do as there is material. One hour of material takes one hour to digitise. Ten hours takes ten hours and so on.
However you also have to allow a bit of time for each item that requires digitising because the machine operator has to open the tape, put it in the machine, give the machine a name to store the file under, find and mark the start and end points, start the computer digitising and at the end, pull the tape out and re-file it in the right place. Even if they're all on the same tape, there's a certain amount of stop-starting so as to give each clip its own filename on the computer system.
Some digitising systems bring the material in 'uncompressed', that is, unaltered from its original form. For video, this requires substantial and expensive hardware due to the sheer volume of data involved (see later). Other systems apply a mild form of data compression and encoding straight away, in order to make the amount of data more manageable.
From an operational point of view, encoding at this point is effectively transparent as far as the user is concerned but it does have a bearing on the final encoding quality, so the lower the compression you can get away with when digitising, the better.
Editing for multi-media playback comprises two elements. First there's traditional editing - cutting the source material into something well-produced that tells the story you wish to tell.
For traditional broadcast, that would be the end of the story but for multi-media applications there's an extra step - modifying the material so that after it's encoded, it is presented in the most appropriate manner possible.
An example of this is with captions. A typical TV item may well have name captions strapped across the bottom of the screen to identify the speaker. Yet if that same picture is encoded for Internet transmission, the caption will probably be unreadable due to the small size of Internet video windows (see 'encoding' for details of why Internet video is generally smaller than full-screen TV quality). Hence it makes sense to place a much larger, Web-friendly caption over the video when creating a Web-edit version.
![]() |
| An editor at his Media100 system |
There are many similar techniques that may be required, from re-drawing on-screen graphics and diagrams to zooming in on parts of the video screen to prevent items being miniaturised to invisibility after encoding.
Encoding is a two stage process, comprising processing and then the encoding itself.
This has a bit of crossover with the second editing stage; that is, some things can either be done in the edit or as a separate processing step.
Processing includes colour, contrast and brightness modification, audio level adjustment and EQ (bass and treble, in simple terms) plus other similar tweaks. The aim is to sweeten the A/V such that once it's been through the encoding process it looks and sounds as good as it can.
Unfortunately, this step can have to be done more than once because each type of encoding may require slightly different processing in order to achieve the best result.
Here's where it gets complicated. There are numerous formats into which audio and video files can be encoded, with different ones being applicable to different applications or simply being competing formats in the same field. The table below lists the most common encoding formats and codecs
NOTE: there is a difference between a format and a codec - but sometimes they go hand in hand inherently. A format refers to a particular way of storing the encoded data; a codec ("compressor / de-compressor") refers to a computer program which implements a mathematical formula for converting the data from one type to another, often with a reduction in the amount of data needed to represent the original information.
Encoded data has to be stored in one format or other, it doesn't stand alone - but sometimes the format and codec are so inextricably linked, you can refer to them both by the same term and they would never be associated with anything else. See "Real Video" and "Windows Media" for examples, below.
| Term | Format or codec? | Typical use |
| Real Video or Real Audio | Both. The Real software includes multiple codecs, all of which are proprietary to Real hence they go hand-in-hand with the Real format. | Internet video and audio |
| Windows Media | Both, for the same reasons as Real, above. | Internet video and audio |
| QuickTime | Format - it can store data encoded with a variety of codecs. There is only one codec generally used for QuickTime video streaming though, and that's Sorenson. | From Internet video up to broadcast-quality video editing, depending on the codec chosen. |
| AVI | Format - but never used for streaming. For that, Microsoft, who invented AVI (it's part of Windows), came up with Windows Media instead. | From CD-ROM video up to broadcast-quality video editing, depending on the codec chosen. |
| M-JPEG | Codec | Broadcast-quality video editing. |
| Sorenson | Codec - universally used with the QuickTime format. | Internet video and CD-ROM video |
| Cinepak | Codec, used for AVI and QuickTime | CD-ROM video |
| Indeo | Codec, used for AVI and QuickTime | CD-ROM video |
| On2 VPx (eg VP4) | Codec | Internet video |
| ZyGo Video | Codec, used in QuickTime | Internet video |
| MPEG-1 | Both | Some parts of digital TV, also video CDs and Internet video downloads |
| MPEG-2 | Both | DVD video & digital television |
| MPEG-4 | Both | Any device, from mobile phone handsets to broadcast-quality TV. Not in widespread use yet. |
| mp3 | Both. Note that this is not the same as MPEG-3 (there is no MPEG 3). It actually stands for "MPEG-1 Audio Layer 3" | Audio only, widely used for distribution of music on the Internet. |
| Ogg Vorbis | Both. | Audio only. A new (and free!) codec trying to rival mp3 for music distribution. |
| Qdesign Music | Codec. | Audio only. Used with QuickTime (and often paired with Sorenson on the video side) |
The next technical thing to bear in mind is bitrates. The bitrate is the rate at which bits of data (the ones and zeros that represent all things digital) can be sent or received down a particular communications link, or read from a given storage device.
For example, a typical home PC user has a 56k modem. What this means is that the theoretical maximum rate at which that modem can send or receive data is 56,000 bits per second (aka 56kbps). Contrast this with even a lowly single-speed CD-ROM, which provides a data-reading rate of 1,200,000 bits per second (aka 1.2 Mbps).
(note: in truth, a 56k modem cannot achieve 56k except in a lab. In reality, 56k modems usually max out at 40-45kbps. Also, they can't transmit at 56k, they're limited to 33kbps-ish)
Why is this important? Because in order for the viewer to be able to watch the content, it has to be encoded at no greater than the fastest rate the viewer's storage device or communications link can provide. It's like trying to empty Lake Geneva; if you've only got a 1-inch hosepipe to drain it with, you're going to be waiting a long, long time. If you build 60-metre diameter drainpipes, it'll empty a lot faster.
Of course, not everyone uses 56k modems. Some people use ISDN, others have ADSL, many companies have leased lines. Some content will be stored on CD-ROM, some on DVD. In short, each possibility has to be taken into account and encoding performed accordingly. Note that although many new PCs now come equipped with high speed (x40) CD-ROM drives, older PCs won't have them so you really need to encode for the lowest common denominator.
This table describes the various data rates most commonly encountered:
| Connection | Theoretical speed | Realistic speed | Typical user |
| 28k modem | 28 kbps | 25 kbps | Home or home worker |
| 33k modem | 33 kbps | 29 kbps | Home or home worker |
| 56k modem | 56 kbps | 45 kbps | Home or home worker |
| ISDN | 64 kbps or 128 kbps | 64 kbps or 128 kbps | Enthusiastic home user; more likely home worker or small business |
| ADSL | From 512 kbps to 2 Mbps | Impossible to say! (see below) | Home user or small business |
| Leased line | Various types available from 64 kbps up to 622 Mbps | Exactly what it says on the tin. | Corporate. |
| Ethernet LAN | Various, usually 10 Mbps or 100 Mbps | 7 Mbps / 70 Mbps | Within offices |
| CD-ROM (single speed) | 1.2 Mbps | 1.2 Mbps | Anyone |
| DVD-ROM (single speed) | 11 Mbps | 11 Mbps | Anyone with a modern PC |
(ADSL note: the way ADSL works is that everyone gets UP TO the maximum rate, depending on how many others are trying to use it at the same time (at the line exchange, not in the same office). Hence you might get 512kbps out of a 512kbps line but if 10 people are all trying to watch material via the same line at the exchange, they'll each get 51.2kbps. This is referred to as 'contention')
It's also important to note that for the communications lines, the quoted "realistic" speeds are when considered "point-to-point" (ie the speed from one end of the line to the other). The other end of the line connects to the tangled mess that is the Internet so the actual data rate that manages to flow across the Internet and then down that connection may be much less than you expect. This has to be taken into account when encoding too.
For some of the formats/codecs, bitrate is not an issue as it's standardised. For example, MPEG-1 VideoCDs use 1.2Mbps. DVD MPEG-2 ranges from 4-9Mbps but this is usually dependant on the material being encoded, not the end-user.
Bitrate is critical for Internet downloads and streaming. It is common to encode the same file at three or so different bitrates so that it's suitable for a variety of audiences but it's still important to know what the likely audience is so that the best judgement as to what those bitrates should be can be made.
The key thing to remember is that the lower the bitrate, the harder it is for the codec to represent the audio or video in the given number of bits each second. If you try to encode a full-screen video at 25 frames per second (normal TV rate) such that it would play down a 28kbps line - well, it won't work. Here's why:
Full frame video is 768 pixels (dots) wide by 576 high. Each pixel takes 24 bits to describe its colour (8 for red, 8 for green, 8 for blue - all other colours are combinations of those three). There are 25 full frames each second. That's 768 x 576 x 24 x 25 = 265,420,800 bits per second.
So to send that amount of information down a line with a capacity of 28,000 bits per second (and in reality, no more than 25,000 bits per second), the video information will have to be squeezed by a factor of 265,420,800 / 25,000 = 10,617
That's a lot of compression!
To achieve this, video codecs take some shortcuts. First the video is re-sized to a more manageable level (for example, Internet transmission often uses 192 x 144 - a sixteenth of full-size). Immediately, the compression required drops (using that same example, to just 664 times). Next, the frame rate is dropped, typically by half to 12.5 frames each second. Yes, this makes the video look a bit jerky but you can't have everything. The compression required for Internet video is then just 331 times.
At this point, the codec starts to do its work by using some very complicated mathematics which tries to represent that data in an even more compact way; codecs usually do this (in simple terms) by comparing frames and only storing the differences between them rather than all the information shown. Lo and behold, you end up with sufficiently little data that you can send it down a modem and reconstruct the video at the other end!
The drawback of the maths part is that some of the detail is lost and the video can look 'blocky' and 'blotchy'. It tends to have trouble with fast-moving scenes (when there is a lot of difference in movement between the frames).
Of course, the higher the bitrate, the lower the amount of squishing that has to be done, so the better the quality. Also, each new generation of codec results in noticeable improvements in quality for a given bitrate, thanks to legions of boffins whose purpose in life is to invent better mathematics for us.
The appetite for digital information is never going to decline and neither is the number of formats in which digital content will be viewed. With the world being an analogue place, there will inherently be a continual need for digitisation to take place.
Equally, it is human nature to demand more for less, such as more data in less space. Hence parallel to the demand for digitisation will be a continual demand for encoding. As the technology progresses, so the perceived quality of the encoding, and the efficiency of any associated compression, will increase.
A knowledge of the processes and technology involved will aid anyone involved in the field of digital information to ensure they are receiving the best results from either their own efforts or those of their suppliers.
|
Note about Internet Broadcasting To view audio and video over the Internet, encoded files can either be downloaded (copied onto the hard disc) by the viewer, in which case they have to wait until the whole file has come down and then they can play it as often as they like, or they can be streamed, which means watched as they are transmitted but no local copy is stored [3]. Streaming provides practically immediate access - you see it as soon as you've clicked on it but you don't usually get to keep a copy. The quality is also restricted by your connection speed (inherently, you can't receive more data than you've got the connection for). Downloading means you have to wait but you get to keep the file and since you're not trying to watch it as it comes down, it can be encoded at a higher rate than your connection speed thus giving better quality. To confuse matters, there's also a half-way house known as "progressive downloading" whereby the file is copied to your hard disc but will start playing back as soon as enough has come down for the rest to have been downloaded by the time you get to the end. |
Culturejam specialises in making Internet video look great. We know that video and audio are the most compelling forms of communication available and that the Web is the most ubiquitous and interactive medium known to man. However, we also see that there is little synergy between them at present, largely due to a lack of expertise in Web-oriented video origination, digitisation, post-production and encoding. Having assembled a unique team bearing considerable skills in those areas, Culturejam is therefore positioned as the premier creator of Web-ready audio and video for all markets such as archiving, marketing and training.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
David Johns
Culturejam limited
One Minster Gardens
West Molesey
Surrey KT8 2ER
Email: djohns@Culturejam.tv
<http://www.Culturejam.tv/>
Phone: +44 (0) 20 8979 7600
Fax: +44 (0) 20 8979 8140
David Johns has worked in radio, television and computing, in both technical and creative arenas. His IT skills were honed at IBM and Logica; his media experience stems from working for broadcasters such as Virgin Radio, the BBC and local commercial radio. A regular user of the Internet since 1987, he witnessed the birth of Web radio and TV and thereafter focused his career onto this arena.
![]() |
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
For citation purposes:
Johns, D. "An Introductory Guide to Audio and Video Encoding ", Cultivate Interactive, issue
5, 1 October 2001
URL: <http://www.cultivate-int.org/issue5/jam/>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Related articles:
If you would like to view similar articles to this one click on a key word below:
< - audio - video - encoding - streaming - television - >
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Copyright ©2000 - 2001 Cultivate. | Published by UKOLN | Design by ILRT | Contact Us |