|
|
Advertisement |
 |
PPC
> Computing
Guides > What
is it?
PPC Guide to Compression
Iain Laskey begins a series explaining the ins
and outs of compression, what it is, how to use it and where to find
it.
Whilst modern PCs tend to come equipped with pretty
sizable hard drives, it is still all too easy to run out of room.
Those of us with smaller drives often have an even harder time
juggling files to maximise the free space. A further problem is when
you send or receive files via the Internet. It can take a long
expensive call to send a big file to someone so what can be done to
help? The answer is to compress the files so they take up less room.
There are two basic types of compression, lossy and lossless.
Loser!
Lossy compression shrinks files by throwing away
bits of data that hopefully won’t be noticed. MP3 is such a
system. It relies on the psycho-acoustic way the brain interprets
audio and uses various tricks to produce something which sounds
almost the same but is actually missing as much as 90% of the data.
Another lossy system is Jpeg or JPG, which is designed to provide
high compression on photographic type images.
Not Such a Loser
The other type of compression is lossless where the
file is made smaller but can then be restored to its original form
with no effect on the data. This seemingly impossible task relies on
the fact that most files contain large amounts of space or
repetitive data. As an example, a Word document unsurprisingly
contains words. In this article, the word ‘compression’ appears
over and over again, each one taking 11 bytes of storage. A
compression system could note this and after the first occurrence,
rather than store the actual word, it can store a one byte indicator
to say it is a repeat word plus a byte to indicate which word it is.
The result is that each occurrence of ‘compression’ now needs 2
bytes not 11, a saving of 9 bytes and over 80% for that word. If you
now repeat that process for the 256 most common words, you can make
quite a difference to the size of the file. When you decompress the
file, the decompression program finds these codes for repeated words
and restores the full words in their place thus restoring the
document to its original size and content.
Another example is pictures of charts and graphs.
Large portions of the chart will be the same colour, perhaps whole
lines. Rather than storing an entire row of perhaps 800 white pixels
with each needing two bytes to store the colour (allowing a maximum
of 65535 possible colours) which would result in 2 x 800 or 1600
bytes, you could store two bytes for the colour, a code byte that
means ‘repeat this many times’ and another two to store the 800.
That ends up as 4 bytes to store what used to be 1600, a huge
saving. I won’t go into too much detail about why two bytes can
hold numbers up to 65535 as this is only of interest to programmers
and may confuse the issue slightly.
 |
|
Graphs
contain lots of repetitive colour data |
How Can I Use Compression?
One way is to use programs that are designed to
compress and uncompress files. Once compressed, files cannot
generally be used until they are decompressed again and as such,
compression is good for archival or for emailing. I tend to compress
files before burning them to a CD-R to maximise the use of space
there. In the next article I will look at ZIP files, a common
standard for compressing files. In many cases, programs and files
you download from the Internet will be in ZIP format requiring that
you ‘UnZIP’ them before being able to use them.
Compression is also used in many cases without you
knowing. Your modem uses a form of compression when it sends and
receives data. You may have noticed that even if you are connected
at 33K which ought to limit download speeds to around 3.5k a second,
you often see double that speed when downloading text and other
highly compressible files.
Another place it happens transparently is with
graphics files. Get a JPG file and open it using a graphics program
such as Paint Shop Pro. Now save it as a TIF file. Taking an example
file, I loaded a 60K JPG file and when saved as a TIF it grew to
802K in size purely because the JPG is stored in a compressed format
whereas the TIF file is essentially uncompressed. Different graphics
formats have different effects on the size and appearance of file.
As a general rule, JPG is good for photos, GIF is good for graphs
and charts and TIF will produce the best results but at a huge
premium in size.
|