ALL ORIGINAL ALL INTERNATIONAL INDEPENDENT MUSIC 24/7




Introduction to MP3

Foreword

This document is an introduction to the mp3 format and an explanation of the technology behind lossy compression in mp3 files.

Basic Concept

The target of audio compression is to reduce the space on digital media needed to store audio. On a CD, one second of audio needs (44100 Hz samples per second * 2 channels * 16 bits per sample) 176 kb space. When you want to send an uncompressed song over the internet (say, with a 56k modem), this would take fairly long (and for the impatient, almost forever).

There are some methods to compress audio data. Lossless compression methods compact the data in such a way that they can be restored back bit-identical to the original. That is what e.g. the zip format does (even when it wasn't designed for that). There are also lossy compression methods. These take advantage of signals in the audio that cannot be heard by humans, and thus safely be discarded. When these are reconstructed, the unhearable signals are left out of the signal, but because we don't hear them, they aren't noticable.

Unfortunately, lossy encoding methods can leave slight distortions in the audio signal which sometimes can be hearable, mostly by people with "golden ears". The loss of hearable signals can be kept in a limit, though.

Technical Info

Technically, audio information in the mp3 file is stored in frames. mp3 frames are a set of bytes that describes audio information. Frames can have different bitrates, that means they have different lengths. Although, every mp3 frame decodes to 1152 audio samples (for MPEG-1 Layer III). With a sample rate of 44100 Hz, a frame holds 26.1 milliseconds audio data. When using variable bitrate, the encoder can decide how much bits to spend on more complex parts of the audio to compress.

The audio data to compress is transformed into frequency domain with a method called MDCT (modified discrete cosinus transformation). From there, the encoder judges (by using a specific psychoacoustic model) how the transformed spectral values are stored within a frame. The psychoacoustic model determines which values are needed for hearing, and which one can be dropped, because they would be inaudible for the human ear. By optimizing the psychoacoustic model used, quality can be increased, even when using the same technology.

There are different layers in the MPEG audio standard that describes slightly different frame formats and sample rates. Layer I is the simplest layer, whereas Layer III has the most complexity. Because of this, Layer III is mostly used and is called mp3 as an abbreviation.

History

The mp3 standard, also known as MPEG-1 Layer III, was first developed by the Fraunhofer Institute for Integrated Circuits (FhG IIS-A), in 1987. It was primarily developed for digital audio broadcasting (DAB). Two years later, FhG patented the technologies behind the format. Later it was submitted to the ISO (International Standards Organization) and added to the MPEG-1 video standard as audio compression scheme.

A nice explanation of the mp3 format can also be found here: Howstuffworks: how mp3 files work

........................................................................

MPEG Audio Compression

The following document explains the MPEG Audio Compression format.

The Basics

This is one of many methods to compress audio in digital form trying to consume as little space as possible but keep audio quality as good as possible. MPEG compression showed up as one of the best achievements in this area.

This is a lossy compression, which means, you will certainly loose some audio information when you use this compression methods. But, this lost can hardly be noticed because the compression method tries to control it. By using several quite complicate and demanding mathematical algorithms it will only loose those parts of sound that are hard to be heard even in the original form. This leaves more space for information that is important. This way you can compress audio up to 12 times (you may choose compression ratio) which is really significant. Due to its quality MPEG audio became very popular.

MPEG standards MPEG-1, MPEG-2 and MPEG-4 are known but this document covers first two of them. There is an unofficial MPEG-2.5 which is rarely used. It is also covered.

MPEG-1 Audio (described in ISO/IEC 11172-3) describes three Layers of audio coding with the following properties:

  • one or two audio channels
  • sample rate 32kHz, 44.1kHz or 48kHz
  • bit rates from 32kbps up to 448kbps

MPEG-2 Audio (described in ISO/IEC 13818-3) has two extensions to MPEG-1, 
usually referred as MPEG-2/LSF and MPEG-2/Multichannel.

MPEG-2/LSF has the following properties:

  • one or two audio channels
  • sample rates half those of MPEG-1
  • bit rates from 8 kbps up to 256kbps

MPEG-2/Multichannel has the following properties:

  •  up to 5 full range audio channels and an LFE-channel (Low Frequency Enhancement subwoofer!)
  • sample rates the same as those of MPEG-1
  • highest possible bitrate goes up to about 1 Mbps for 5.1

MPEG Audio Frame Header

An MPEG audio file is built up from smaller parts called frames. Generally, frames are independent items. Each frame has its own header and audio informations. There is no file header. Therefore, you can cut any part of MPEG file and play it correctly (this should be done on frame boundaries but most applications will handle incorrect headers). For Layer III, this is not 100% correct. Due to internal data organization in MPEG version 1 Layer III files, frames are often dependent of each other and they cannot be cut off just like that.

When you want to read info about an MPEG file, it is usually enough to find the first frame, read its header and assume that the other frames are the same. This may not be always the case. Variable bitrate MPEG files may use so called bitrate switching, which means that bitrate changes according to the content of each frame. This way lower bitrates may be used in frames where it will not reduce sound quality. This allows making better compression while keeping high quality of sound.

The frame header is constituted by the very first four bytes (32 bits) in a frame. The first eleven bits (or first twelve bits, see below about frame sync) of a frame header are always set and they are called "frame sync". Therefore, you can search through the file for the first occurence of frame sync (meaning that you have to find a byte with a value of 255, and followed by a byte with its three (or four) most significant bits set). Then you read the whole header and check if the values are correct. You will see in the following table the exact meaning of each bit in the header, and which values may be checked for validity. Each value that is specified as reserved, invalid, bad, or not allowed should indicate an invalid header. Remember, this is not enough, frame sync can be easily (and very frequently) found in any binary file. Also it is likely that MPEG file contains garbage on it's beginning which also may contain false sync. Thus, you have to check two or more frames in a row to assure you are really dealing with MPEG audio file.

Frames may have a CRC check. The CRC is 16 bits long and, if it exists, it follows the frame header. After the CRC comes the audio data. You may calculate the length of the frame and use it if you need to read other headers too or just want to calculate the CRC of the frame, to compare it with the one you read from the file. This is actually a very good method to check the MPEG header validity.

Here is "graphical" presentation of the header content. Characters from A to M are used to indicate different fields. In the table, you can see details about the content of each field.

AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM

Sign Length
(bits)
Position
(bits)
Description
A 11 (31-21) Frame sync (all bits set)
B 2 (20,19)

MPEG Audio version
00 - MPEG Version 2.5
01 - reserved
10 - MPEG Version 2 (ISO/IEC 13818-3)
11 - MPEG Version 1 (ISO/IEC 11172-3)

Note: MPEG Version 2.5 is not official standard. Bit No 20 in frame header is used to indicate version 2.5. Applications that do not support this MPEG version expect this bit always to be set, meaning that frame sync (A) is twelve bits long, not eleven as stated here. Accordingly, B is one bit long (represents only bit No 19). I recommend using methodology presented here, since this allows you to distinguish all three versions and keep full compatibility.

C 2 (18,17) Layer description
00 - reserved
01 - Layer III
10 - Layer II
11 - Layer I
D 1 (16) Protection bit
0 - Protected by CRC (16bit CRC follows header)
1 - Not protected
E 4 (15-12)
Bitrate index
bits MPEG-1 MPEG-2 and 2.5
Layer I Layer II Layer III Layer I Layer II and III
0000 Free Format Bitrate
0001 32 32 32 32 8
0010 64 48 40 48 16
0011 96 56 48 56 24
0100 128 64 56 64 32
0101 160 80 64 80 40
0110 192 96 80 96 48
0111 224 112 96 112 56
1000 256 128 112 128 64
1001 288 160 128 144 80
1010 320 192 160 160 96
1011 352 224 192 176 112
1100 384 256 224 192 128
1101 416 320 256 224 144
1110 448 384 320 256 160
1111 not an allowed value

NOTES: All values are in kbps
Free Format Bitrate: if the correct fixed bitrate (such files cannot use variable bitrate) is different than those presented in upper table it must be determined by the application. This may be implemented only for internal purposes since third party applications have no means to find out correct bitrate. Howewer, this is not impossible to do but demands lot's of efforts.

MPEG files may have variable bitrate (VBR). This means that bitrate in the file may change. I have learned about two used methods:

  • bitrate switching. Each frame may be created with different bitrate. It may be used in all layers. Layer III decoders must support this method. Layer I & II decoders may support it.
  • bit reservoir. Bitrate may be borrowed (within limits) from previous frames in order to provide more bits to demanding parts of the input signal. This causes, however, that the frames are no longer independent, which means you should not cut this files. This is supported only in Layer III.

For Layer II there are some combinations of bitrate and mode which are not allowed. Here is a list of allowed combinations:

Bitrate allowed Mode
free, 64, 96, 112, 128, 160, 192 all
32, 48, 56, 80 single channel
224, 256, 320, 384 stereo, intensity stereo, dual channel
F 2 (11,10)
Sampling rate frequency index (values are in Hz)
Bits MPEG-1 MPEG-2 MPEG-2.5
00 44100 22050 11025
01 48000 24000 12000
10 32000 16000 8000
11 reserved
G 1 (9)

Padding bit
0 - frame is not padded
1 - frame is padded with one extra slot

Padding is used to fit the bit rates exactly. For an example: 128k 44.1kHz layer II uses a lot of 418 bytes and some of 417 bytes long frames to get the exact 128k bitrate. For Layer I slot is 32 bits long, for Layer II and Layer III slot is 8 bits long.
H
(8) Private bit. It may be freely used for specific needs of an application, i.e. if it has to trigger some application specific events.
I 2 (7,6) Channel Mode
00 - Stereo
01 - Joint stereo (Stereo)
10 - Dual channel (Stereo)
11 - Single channel (Mono)
J 2 (5,4)

Mode extension (Only if Joint stereo)

Mode extension is used to join informations that are of no use for stereo effect, thus reducing needed resources. These bits are dynamically determined by an encoder in Joint stereo mode.

Complete frequency range of MPEG file is divided in subbands There are 32 subbands. For Layer I & II these two bits determine frequency range (bands) where intensity stereo is applied. For Layer III these two bits determine which type of joint stereo is used (intensity stereo or mid/side stereo). Frequency range is determined within decompression algorithm.

Bits Layer I and II Layer III
Intensity Stereo Bands Intensity stereo MS stereo
00 bands 4 to 31 off off
01 bands 8 to 31 on off
10 bands 12 to 31 off on
11 bands 16 to 31 on on
K 1 (3) Copyright
0 - Audio is not copyrighted
1 - Audio is copyrighted
L 1 (2) Original
0 - Copy of original media
1 - Original media
M 1 (1,0) Emphasis
00 - none
01 - 50/15 ms
10 - reserved
11 - CCIT J.17

How to calculate frame size

Read the BitRate, SampleRate and Padding (as value of one or zero) of the frame header and use the formula:

FrameSize = 144 * BitRate / SampleRate + Padding

Example: BitRate = 128000, SampleRate=441000, Padding=0  ==> FrameSize = 417 bytes

MPEG Audio Tag MP3v1

The TAG is used to describe the MPEG Audio file. It contains information about artist, title, album, publishing year and genre. There is some extra space for comments. It is exactly 128 bytes long and is located at very end of the audio data. You can get it by reading the last 128 bytes of the MPEG audio file.
AAABBBBB BBBBBBBB BBBBBBBB BBBBBBBB
BCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCD
DDDDDDDD DDDDDDDD DDDDDDDD DDDDDEEE
EFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFG

Sign Length
(bytes)
Position
(bytes)
Description
A 3 (0-2) Tag identification.
Must contain 'TAG' if tag exists and is correct.
B 30 (3-32) Title
C 30 (33-62) Artist
D 30 (63-92) Album
E 4 (63-96) Year
F 30 (97-126) Comment
G 1 (127) Genre

The specification asks for all fields to be padded with null character (ASCII 0). However, not all applications respect this (an example is Winamp which pads fields with <space>, ASCII 32).

There is a small change proposed in MP3v1.1 structure. The last byte of the Comment field may be used to specify the track number of a song in an album. It should contain a null character (ASCII 0) if the information is unknown.

Genre is a numeric field which may have one of the following values:

0 'Blues' 20 'Alternative' 40 'AlternRock' 60 'Top 40'
1 'Classic Rock' 21 'Ska' 41 'Bass' 61 'Christian Rap'
2 'Country' 22 'Death Metal' 42 'Soul' 62 'Pop/Funk'
3 'Dance' 23 'Pranks' 43 'Punk' 63 'Jungle'
4 'Disco' 24 'Soundtrack' 44 'Space' 64 'Native American'
5 'Funk' 25 'Euro-Techno' 45 'Meditative' 65 'Cabaret'
6 'Grunge' 26 'Ambient' 46 'Instrumental Pop' 66 'New Wave'
7 'Hip-Hop' 27 'Trip-Hop' 47 'Instrumental Rock' 67 'Psychadelic'
8 'Jazz' 28 'Vocal' 48 'Ethnic' 68 'Rave'
9 'Metal' 29 'Jazz+Funk' 49 'Gothic' 69 'Showtunes'
10 'New Age' 30 'Fusion' 50 'Darkwave' 70 'Trailer'
11 'Oldies' 31 'Trance' 51 'Techno-Industrial' 71 'Lo-Fi'
12 'Other' 32 'Classical' 52 'Electronic' 72 'Tribal'
13 'Pop' 33 'Instrumental' 53 'Pop-Folk' 73 'Acid Punk'
14 'R&B' 34 'Acid' 54 'Eurodance' 74 'Acid Jazz'
15 'Rap' 35 'House' 55 'Dream' 75 'Polka'
16 'Reggae' 36 'Game' 56 'Southern Rock' 76 'Retro'
17 'Rock' 37 'Sound Clip' 57 'Comedy' 77 'Musical'
18 'Techno' 38 'Gospel' 58 'Cult' 78 'Rock & Roll'
19 'Industrial' 39 'Noise' 59 'Gangsta' 79 'Hard Rock'

Winamp expanded this table with these next codes:

80 'Folk' 92 'Progressive Rock' 104 'Chamber Music' 116 'Ballad'
81 'Folk-Rock' 93 'Psychedelic Rock' 105 'Sonata' 117 'Power Ballad'
82 'National Folk' 94 'Symphonic Rock' 106 'Symphony' 118 'Rhytmic Soul'
83 'Swing' 95 'Slow Rock' 107 'Booty Brass' 119 'Freestyle'
84 'Fast Fusion' 96 'Big Band' 108 'Primus' 120 'Duet'
85 'Bebob' 97 'Chorus' 109 'Porn Groove' 121 'Punk Rock'
86 'Latin' 98 'Easy Listening' 110 'Satire' 122 'Drum Solo'
87 'Revival' 99 'Acoustic' 111 'Slow Jam' 123 'A Capella'
88 'Celtic' 100 'Humour' 112 'Club' 124 'Euro-House'
89 'Bluegrass' 101 'Speech' 113 'Tango' 125 'Dance Hall'
90 'Avantgarde' 102 'Chanson' 114 'Samba'  
91 'Gothic Rock' 103 'Opera' 115 'Folklore'

Any other value should be considered as 'Unknown'

Credits

Created by Predrag Supurovic.
Thanks to Jean for debugging and polishing of this document, 
Peter Luijer, Guwani, Rob Leslie and Franc Zijderveld for valuable comments and corrections.

© 1998, 1999 Copyright by DataVoyage

This document may be changed. 

Check http://www.dv.co.yu/mpgscript/mpeghdr.htm for updates.



Spacial Audio Solutions - SAM Broadcaster
SAM Broadcaster is your complete radio station in a box. Everything needed for operating an online radio station - on the professional or enthusiastic hobbiest level - is in its software; the automation system, playlist management, encoders, statistics, and web integration. Plus, this program can generate revenue.

 
Spacial Audio Solutions - SAM Party DJ
SAM Party DJ has dual player decks, beat matching, crossfading, gap killer, queue, cue/fade/intro points, voiceFX, and soundFX.  Add an ability to manage LARGE libraries, and you have a full-featured professional DJ system that's perfect for clubs, bars, restaurants, or any retail setting.  At home, it's awesome!
 
Spacial Audio Solutions - SimpleCast
SimpleCast defines the word "easy." Here's a powerful application that allows you to record live audio from your soundcard and stream it across the internet. Live events, audio from your files, or an automation system; if you hear it through speakers SimpleCast can stream it to the world!


Audacity
A Free, Cross-Platform
Digital Audio Editor

Spacial Audio Solutions - Spacial Audio Solutions

GoldWave
GoldWave is a sound editor, player, recorder, and converter


Introduction to MP3

& MPEG Audio Compression

MPEG Audio Tag ID3v2
TAG format which is different than ID3v1 and ID3v1.1

ACOUSTICA
MP3 Audio Mixer is an intuitive multi-track sound recording and mixing utility

AV Music Morpher Gold
Audio Editing, Recording software with multiple features

BonkEnc
A multipurpose CD Ripper, Encoder, and Converter for audio files including MP3, Ogg Vorbis and Wave formats
How To Start 
An Internet Radio Station

html hit counter code
html hit counter

WEB DESIGN & HOSTING PACKAGES     CURRENCY CONVERTER     WORLD TIME CLOCKS     THE ULTIMATE ISLAND ESCAPE

© 2008 LizardLoungeMusic.com