This WWW page presents the powerful wavelet-based image compression method called Set Partitioning in Hierarchical Trees (SPIHT). This award-winning method has received worldwide acclaim and attention since its introduction via this web site in 1995. Thousands of people, researchers and practioners alike, have now tested and used SPIHT. It has become the benchmark state-of-the-art algorithm for image compression.
The SPIHT method is not a simple extension of traditional methods for image compression, and represents an important advance in the field. The method deserves special attention because it provides the following:
Each of these properties is discussed below. Note that different compression methods were developed specifically to achieve at least one of those objectives. What makes SPIHT really outstanding is that it yields all those qualities simultaneously. So, if in the future you find one method that claims to be superior to SPIHT in one evaluation parameter (like PSNR), remember to see who wins in the remaining criteria.
Extensive research has shown that the images obtained with wavelet-based methods yield very good visual quality. At first it was shown that even simple coding methods produced good results when combined with wavelets. SPIHT belongs to the next generation of wavelet encoders, employing more sophisticated coding. In fact, SPIHT exploits the properties of the wavelet-transformed images to increase its efficiency.
Many researchers now believe that encoders that use wavelets are superior to those that use DCT or fractals. We will not discuss the matter of taste in the evaluation of low quality images, but we do want to say that SPIHT wins in the test of finding the minimum rate required to obtain a reproduction indistinguishable from the original. The SPIHT advantage is even more pronounced in encoding color images, because the bits are allocated automatically for local optimality among the color components, unlike other algorithms that encode the color components separately based on global statistics of the individual components. You will be amazed to see that visually lossless color compression is obtained with some images at compression ratios from 100-200:1.
If, after what we said, you are still not certain that you should believe us (because in the past you heard claims like that and then were deeply disappointed), we understand you point of view.
There are three things you can do to convince yourself
While using the programs please recall that:
In some systems with progressive image transmission (like WWW browsers) the quality of the displayed images follows the sequence: (a) weird abstract art; (b) you begin to believe that it is an image of something; (c) CGA-like quality; (d) lossless recovery. With very fast links the transition from (a) to (d) can be so fast that you will never notice. With slow links (how "slow" depends on the image size, colors, etc.) the time from one stage to the next grows exponentially, and it may take hours to download a large image. Considering that it may be possible to recover an excellent-quality image using 10-20 times less bits, it is easy to see the inefficiency. Furthermore, the mentioned systems are not efficient even for lossless transmission.
The problem is that such widely used schemes employ a very primitive progressive image transmission method. On the other extreme, SPIHT is a state-of-the-art method that was designed for optimal progressive transmission (and still beats most non-progressive methods!). It does so by producing a fully embedded coded file (see below), in a manner that at any moment the quality of the displayed image is the best available for the number of bits received up to that moment.
So, SPIHT can be very useful for applications where the user can quickly inspect the image and decide if it should be really downloaded, or is good enough to be saved, or need refinement.
A strict definition of the embedded coding scheme is: if two files produced by the encoder have size M and N bits, with M > N, then the file with size N is identical to the first N bits of the file with size M.
Let's see how this abstract definition is used in practice. Suppose you need to compress an image for three remote users. Each one have different needs of image reproduction quality, and you find that those qualities can be obtained with the image compressed to at least 8 Kb, 30 Kb, and 80 Kb, respectively. If you use a non-embedded encoder (like JPEG), to save in transmission costs (or time) you must prepare one file for each user. On the other hand, if you use an embedded encoder (like SPIHT) then you can compress the image to a single 80 Kb file, and then send the first 8 Kb of the file to the first user, the first 30 Kb to the second user, and the whole file to the third user.
But what is the price to pay for this "amenity"? Surprisingly, with SPIHT all three users would get (for the same file size) an image quality comparable or superior to the most sophisticated non-embedded encoders available today. SPIHT achieves this feat by optimizing the embedded coding process and always coding the most important information first.
An even more important application is for progressive image transmission, where the user can decide at which point the image quality satisfies his needs, or abort the transmission after a quick inspection, etc.
A preprint of the original journal article describing SPIHT is available via download. Here we take the opportunity to comment how it is different from other approaches.
SPIHT represents a small "revolution" in image compression because it broke the trend to more complex (in both the theoretical and the computational senses) compression schemes. While researchers had been trying to improve previous schemes for image coding using very sophisticated vector quantization, SPIHT achieved superior results using the simplest method: uniform scalar quantization. Thus, it is much easier to design fast SPIHT codecs.
Actually, we do expect better compression results from vector quantizers in the future (someday, somewhere, as prophesied by Shannon), but it is unknown if their speed will justify the gains.
The SPIHT process represents a very effective form of entropy-coding. This is shown by the demo programs using two forms of coding: binary-uncoded (extremely simple) and context-based adaptive arithmetic coded (sophisticated). Surprisingly, the difference in compression is small, showing that it is not necessary to use slow methods (and also pay royalties for them!). A fast version using Huffman codes was also successfully tested, but it is not publicly available.
A straightforward consequence of the compression simplicity is the greater coding/decoding speed. The SPIHT algorithm is nearly symmetric, i.e., the time to encode is nearly equal to the time to decode. (Complex compression algorithms tend to have encoding times much larger than the decoding times.)
Some of our demo programs use floating-point operations extensively, and can be slower in some CPUs (floating points are better when people want to test you programs with strange 16 bpp images). However, this problem can be easily solved: try the lossless version to see an example. Similarly, the use for progressive transmission requires a somewhat more complex and slower algorithm. Some shortcuts can be used if progressive transmission is not necessary.
When measuring speed please remember that the demo programs were written for academic studies only, and were not fully optimized.
SPIHT exploits properties that are present in a wide variety of images. It had been successfully tested in natural (portraits, landscape, weddings, etc.) and medical (X-ray, CT, etc) images. Furthermore, its embedded coding process proved to be effective in a broad range of reconstruction qualities. For instance, it can code fair-quality portraits and high-quality medical images equally well (as compared with other methods in the same conditions).
SPIHT has also been tested for some less usual purposes, like the compression of elevation maps, scientific data, and others. (If you also found a new application, please tell us.)
SPIHT codes the individual bits of the image wavelet transform coefficients following a bit-plane sequence. Thus, it is capable of recovering the image perfectly (every single bit of it) by coding all bits of the transform. However, the wavelet transform yields perfect reconstruction only if its numbers are stored as infinite-precision numbers. In practice it is frequently possible to recover the image perfectly using rounding after recovery, but this is not the most efficient approach.
For lossless compression we proposed an integer multiresolution transformation, similar to the wavelet transform, which we called S+P transform (see papers). It solves the finite-precision problem by carefully truncating the transform coefficients during the transformation (instead of after).
A codec that uses this transformation to yield efficient progressive transmission up to lossless recovery is among the SPIHT demo programs. A surprising result obtained with this codec is that for lossless compression it is as efficient as the most effective lossless encoders (lossless JPEG is definitely not among them). In other words, the property that SPIHT yields progressive transmission with practically no penalty in compression efficiency applies to lossless compression too.
Almost all image compression methods developed so far do not have precise rate control. For some methods you specify a target rate, and the program tries to give something that is not too far from what you wanted. For others you specify a "quality factor" and wait to see if the size of the file fits your needs. (If not, just keep trying...)
The embedded coding property of SPIHT allows exact bit rate control, without any penalty in performance (no bits wasted with padding, or whatever). The same property also allows exact mean squared-error (MSE) distortion control. Even though the MSE is not the best measure of image quality, it is far superior to other criteria used for quality specification.
Errors in the compressed file cause havoc for practically all important image compression methods. This is not exactly related to variable length entropy-coding, but to the necessity of using context generation for efficient compression. For instance, Huffman codes have the ability to quickly recover after an error. However, if it is used to code run-lengths, then that property is useless because all runs after an error would be shifted.
SPIHT is not an exception for this rule. One difference, however, is that due to SPIHT's embedded coding property, it is much easier to design efficient error-resilient schemes.
This happens because with embedded coding the information is sorted according to its importance, and the requirement for powerful error correction codes decreases from the beginning to the end of the compressed file. If an error is detected, but not corrected, the decoder can discard the data after that point and still display the image obtained with the bits received before the error. Also, with bit-plane coding the error effects are limited to below the previously coded planes.
Another reason is that SPIHT generates two types of data. The first is sorting information, which needs error protection as explained above. The second consists of uncompressed sign and refinement bits, which do not need special protection because they affect only one pixel.
While SPIHT can yield gains like 3 dB PSNR over methods like JPEG, its use in noisy channels, combined with error protection as explained above, leads to much larger gains, like 6-12 dB. (Such high coding gains are frequently viewed with skepticism, but they do make sense for combined source-channel coding schemes.)
SPIHT uses wavelets designed for natural images. It was not developed for artificially generated graphical images that have very wide areas of the same color (like this), charts, some cartoons, etc.
Note that we are not referring to computer-generated images that are meant to look natural, with lots of shading. For compression purposes they are just like natural images.
Even thought there are methods that try to compress efficiently both graphic and natural images, the best results for graphics have been obtained with methods like the Lempel-Ziv algorithm (used in the zip programs, see also FAQ.). Actually, graphics can be much more effectively compressed using the rules that generated them. For instance, this page was very efficiently coded using HTML. Guess how many bytes it would need to be sent as a color fax...
So, because there is still no "universal compression" scheme, in the future documents we will use more extensively what is already being used by WWW browsers: one decoder for text, another for sound, another for natural images (how about SPIHT?), another for video, etc.