Skip to main content icon/video/no-internet

Data Compression Methods

Data compression is a common solution to two problems: (1) the need to decrease the storage space required for spatial data and (2) the need to minimize the resulting transmission times. Extensive exploratory research has been carried out and many algorithms have been implemented for the compression of image data and video data. Several industrial standards, such as JEPG2000 and MPEG4, have been approved as well. Three factors have to be carefully considered when the user selects or implements one data compression method, namely, the degree of compression, the amount of “lossy” information, and the time required to compress and decompress the data.

Data compression can be considered to be “lossy” or “lossless.” Lossless compression methods (e.g., Huffman coding, arithmetic coding) first detect the probability of each symbol and then encode them according to some coding algorithms and require a decoding algorithm to reconstruct the original data from the compressed data. The reconstructed data are identical to the original.

Several solutions can be directly applied for lossless compression, such as running length encoding, dictionary coders (e.g., LZW), the Burrows-Wheeler transform, context mixing, and Slepian-Wolf coding. For example, Huffman coding-based algorithms create a Huffman tree according to the frequency or probability of a symbol's appearance and encode high-probability symbols with low bits. Arithmetic coding-based algorithms represent each possible sequence of n symbols by a separate interval on the number line between 0 and 1. Compared with lossless compression, lossy compression is another kind of compression method that is possible if some loss of fidelity is acceptable. Lossy compression methods include the discrete cosine transform, fractal compression, wavelet compression, vector quantization, linear predictive coding, Modulo-N code for correlated data, A-law Compander, Mu-law Compander, and Wyner-Ziv coding.

Generally, a lossy data compression will be guided by research on how people perceive the data in question. Compression methods are commonly used in conjunction with two representations of spatial data: raster data and vector data. Various sophisticated algorithms have been implemented for the compression of raster data (e.g., remote sensing imagery, digital elevation models), such as wavelet-based compression algorithms, and for the compression of triangulated irregular network (TIN)-based models (e.g., terrains). Nevertheless, algorithms for compressing raster data and TIN-based models cannot directly compress vector data because of the intrinsic complexity of topology within vector data. As a consistent topology has to be maintained, viable solutions for the compression of vector data should be carefully considered. Alternatively, the time performance of decompression is also an important factor.

Vector quantization (VQ) is a lossy data compression method based on the principle of block coding. It is a fixed-to-fixed length algorithm. The main principle of VQ is as follows: Given a vector source with its known statistical properties, a distortion measurement and the number of vectors find a dictionary (codebook) that results in the smallest average distortion. Therefore, the procedure of VQ is to create a dictionary with S’ symbols from the message with S symbols (S’ < S) and to build a mapping between S’ and S. VQ-based algorithms are more suitable for the compression of points in geographic data layers.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading