Back to RasterLite2 doc index
RasterLite2 reference Benchmarks (2019 update)
Intended scope
In recent years new and innovative lossless compression algorithms have been developed.The current benchmark is intended to check and verify, through practical tests, how these new compression methods do practically perform under the most usual conditions.
More specifically, a comparison will be made between the relative performances of new and older lossless compression methods.
The contenders
The following general purpose lossless compression methods will be systematically compared:- DEFLATE: (aka Zip)
This is the most classic and almost universally adopted lossless compression method.
It was initially introduced about 30 years ago (in 1991), so it can be assumed to be the venerable decane of all them. - LZMA: (aka 7-Zip)
This is a well known and widely adopted lossless compression method.
It's younger than DEFLATE having been introduced about 20 years ago (in 1998). LZMA is an extreme implementation of lossless compression.
It's usually able to achieve very impressive compression ratios (by far better than DEFLATE), but at the the severe cost of compression speed.- LZMA can be easily deadly slow.
- LZ4
This is a more modern algorithm having been introduced less than 10 years ago (in 2011), so it's usage is still rather limited.
LZ4 is also an extreme implementation of lossless compression, but in a completely different manner than LZMA.
It has been greatly optimized to be extremely fast, but at the cost of a lower compression ratio. - ZSTD (aka Zstandard)
This is a very recently introduced algorithm (2015), and it's adoption is still rather limited.
Curiously enough, both LZ4 and ZSTD are developed and maintained by the same author (Yann Collet).
ZSTD is a well balanced algorithm, with the intention of becoming a modern replacement for DEFLATE, being able to be faster and/or to achieve better compression ratios.
Just few technical details about the most relevant innovations introduced by ZSTD:- The old DEFLATE was designed so to require a very limited amount of memory, and this impaired, in many ways, its efficiency.
Modern hardware can easily support a lot of memory, so ZSTD has borrowed a few ideas from LZMA about using a less constrained and more efficient memory allocations.
More specifically, DEFLATE is based on a 32KB moving data window; both LZMA and ZSTD adopt a more generous size (1MB) as a moving window. - Both DEFLATE and ZSTD adopts the classic Huffman coding for reducing the information entropy.
But ZSTD can also support a further advanced mechanism based on Finite State Entropy, a very recent technique which is much faster.
- The old DEFLATE was designed so to require a very limited amount of memory, and this impaired, in many ways, its efficiency.
Whenever possible and appropriate the following lossless compression methods specifically intended for images / rasters will be tested as well:
- PNG
This is a very popular format supporting RGB and Grayscale images (with or without Alpha transparencies).
PNG fully depends on DEFLATE for data compression. - CharLS
This is an image format (RGB and Grayscale) having a limited diffusion but rather popular for storying medical imagery.
CharLS is based on Lossless JPEG, a genuinely lossless image compression schema, not to be confused with plain JPEG (which is the most classic example of lossy compression). - Jpeg2000
This is intended to be a more advanced replacement for JPEG, but it's not yet so widely supported as its ancestor.
Jpeg2000 is an inherently lossy compression, but under special settings it can effectively support a genuine lossless compression mode. - WebP
This too is an innovative image format intending to be a better replacement for JPEG.
WebP images are expected to support the same visual quality of JPEG but requiring a significantly reduced amount of storage space.
Just as Jpeg2000, WebP is also an inherently lossy compression, but under special settings it can effectively support a genuine lossless compression mode.
NoteBoth LZ4 and ZSTD are internally used by the most recent versions of the Linux Kernel, and thus can be considered a sign for high quality and very stable code.The most recent versions of the very popular libtiff already support the LZMA, WEBP and ZSTD compressions, and the same support is available on GDAL. |
Testing generic datasets
We'll start by first testing several generic datasets, so to stress all compression methods under the most common conditions.The same dataset will be compressed and then decompressed using each method, so to gather informations about:
- the size of the resulting compressed file.
The ratio between the uncompressed and compressed sizes will correspond to the compression ratio. - the time required to compress the original dataset.
- the time required to decompress the compressed file so to recover the initial uncompressed dataset.
Note: compressing is a more difficult operation than decompressing, and will always require more time.
The speed differences between the various compression algorithms will be more noticeable during compression, but also the differences in decompression speeds (although less impressive) are worthy of a careful evaluation.
- for any compression algorithm to be slow (or even very slow) during compression, can be easily be considered a trivial and forgivable issue.
Compression usually happens only once in the lifetime of a compressed dataset, and there are many ways for minimizing the adverse effects of intrinsic slowness.
For exsample, you could compress your files in batch mode, possibly during off-peak hours, which would justify a longer process time, since a higher compression ratio would be achived.
You could also enable (if possible) a multithread compression approach (parallel processing), so as to significantly reduce the required time. - A more serious issue is being slow during decompression, because decompression will happen more frequently; very frequently in many scenarios.
So a certain degree of slowness during decompression could easily become a serious bottleneck, severely limiting the overall performance of your system.
test #1 - compression of many CSV files
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
0.97 GB | LZ4 | 289 MB | 3.46 | 6.550 sec | 2.256 sec |
DEFLATE | 155 MB | 6.44 | 33.079 sec | 2.159 sec | |
ZSTD | 110 MB | 9.09 | 2.924 sec | 1.313 sec | |
LZMA | 47 MB | 21.42 | 1220.329 sec | 10.179 sec |
Quick assessment:
- The sample was a tarball containing a whole GTFS dataset.
- Text files are usually expected to be highly compressible (through many repetitions of the same worlds and values), and this test confirms these expectations.
- LZ4 is very fast both when compressing and decompressing, but the compression ratio is rather disappointing.
- DEFLATE is a very effective and well balanced compromise between speed and effectiveness.
It achieves a decent compression ratio and it's fast enough during both compression and decompression. - ZSTD clearly wins this first match hands down; it's impressively fast (in both directions) and it achieves a very good compression ratio.
- LZMA achieves a really impressive compressive ratio, but it's deadly slow when compressing (more than 10 times slower than DEFLATE).
But what's really bad is that it's slow during decompression (about 5 times slower than DEFLATE).
test #2 - compressing a SQLite database file
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
1.13 GB | LZ4 | 508 MB | 2.29 | 10.333 sec | 2.123 sec |
DEFLATE | 323 MB | 3.60 | 54.343 sec | 3.173 sec | |
ZSTD | 219 MB | 5.31 | 4.331 sec | 1.522 sec | |
LZMA | 82 MB | 14.26 | 646.670 sec | 17.930 sec |
Quick assessment:
- The sample used was a SQLite/SpatiaLite database containing the same GTFS dataset used in the previous test.
- Databases are usually expected to be strongly compressible (so many repetitions of ZERO, SPACE and NULL values), and this test confirms these expectations.
- LZ4 tends to be very fast but not very effective in its compression ratio.
- DEFLATE still brings good results, despite its venerable age.
- ZSTD is once more the winner of this test, being both fast and effective.
- LZMA is unbeatable with its very high compression ratio, but unhappily, a barely tolerable slowness during both compression and decompression.
test #3 - compressing many Shapefiles
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
1.19 GB | LZ4 | 0.99 GB | 1.20 | 6.413 sec | 0.893 sec |
DEFLATE | 870 MB | 1.40 | 48.004 sec | 4.553 sec | |
ZSTD | 880 MB | 1.39 | 5.416 sec | 1.292 sec | |
LZMA | 682 MB | 1.79 | 740.077 sec | 45.624 sec |
Quick assessment:
- The sample was a tarball containing several Shapefiles (Road Network and Administrative Boundaries of Tuscany).
- Shapefiles contains mainly raw binary data, and therefore it is difficult to reach a high compression ratio.
This explains why, in this specific test, the compression ratios are not very good. - LZ4 tends to be very fast but not very effective in its compression ratio.
- DEFLATE still brings good results, despite its venerable age.
- ZSTD is once more the winner of this test, being noticeably faster than DEFLATE.
But it's worth noting that in this specific test it's unable to reach a better compression ratio than DEFLATE. - LZMA is unbeatable with its very high compression ratio, but unhappily, a barely tolerable slowness during both compression and decompression.
test #4 - compressing a Landsat 8 scene (satellite imagery)
Uncompressed Size | Algorithm | Compressed Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|---|
1.78 GB | LZ4 | 1.07 GB | 1.65 | 5.104 sec | 1.285 sec |
DEFLATE | 928 MB | 1.97 | 56.643 sec | 7.176 sec | |
ZSTD | 929 MB | 1.96 | 7.261 sec | 2.329 sec | |
LZMA | 798 MB | 2.29 | 957.182 sec | 95.288 sec |
Quick assessment:
- The sample was a tarball containing a Landsat 8 scene.
- Satellite imagery contains mainly raw binary data, and therefore it is difficult to reach a high compression ratio.
This explains why, in this specific test, the compression ratios are not very good. - LZ4 tends to be very fast but not very effective in its compression ratio.
- DEFLATE still brings good results, despite its venerable age.
- ZSTD is once more the winner of this test, being noticeably faster than DEFLATE.
But it's worth noting that in this specific test it's unable to reach a better compression ratio than DEFLATE. - LZMA is unbeatable with its very high compression ratio, but unhappily, a barely tolerable slowness during both compression and decompression.
Final assessment (and lessons learned)
- The efficiency of all lossless compression algorithm's strongly depends on the type of data (internal data distribution) within the sample.
- samples presenting a very regular and easily predictable internal distribution have a low information entropy, and can be strongly compressed.
A typical example: text files, written in any language, based on the same alphabet, thus forming a regular pattern. - samples presenting an irregular and random internal distribution have a high information entropy, and can be only moderately compressed.
A typical example: any kind of binary file.
Note: any binary file presenting a perfectly random internal distribution of values cannot be compressed at all, due to the lack of regular patterns.
- samples presenting a very regular and easily predictable internal distribution have a low information entropy, and can be strongly compressed.
- any lossless compression strategy implies a trade off between speed and compression ratio:
- you can optimize for speed, but in such a case the compression ratio will be worse (often due to the lack of time to search for reoccurring patterns).
(this is the choice adopted by LZ4). - at the opposite end of the spectrum you can optimize for high compression ratios, , but in this case you must allow for the time (low speed) to find the reoccurring patterns.
(this is the choice adopted by LZMA). - the wisest approach falls somewhere in between; a well balanced mix (a reasonable compromise) between speed and compression ratio.
(this is the choice of both DEFLATE and ZSTD).
- you can optimize for speed, but in such a case the compression ratio will be worse (often due to the lack of time to search for reoccurring patterns).
- the very recently introduced ZSTD clearly is a superior alternative to the old DEFLATE:
- ZSTD is always noticeably faster than DEFLATE, both during compression and decompression
- ZSTD is not always capable of reaching better compression ratios then DEFLATE (it depends on the sample's information entropy).
Often ZSTD will easily outperform DEFLATE compression ratios.
Even when not, it still capable of achieving the same compression ratios than DEFLATE but in a swifter timeframe.
- LZ4 is not really interesting (at least for a normal scenario).
It's surely very fast, but not impressively faster than ZSTD.
And its compression ratios are not very spectacular. - LZMA offers the best method, when a high compression ratio is an absolute must.
But its dreadful speed (both during compression and decompression) must be taken into account, easily becoming a severe bottleneck. - DEFLATE isn't yet dead; despite its rather venerable age it is still an efficient performer.
And considering that it is widly supported, it will surely survive for many years to come.
Testing Raster Coverages
This second group of tests will concentrate on comparing the various lossless compression methods, as implemented by RasterLite2, for the encoding and decoding of Raster Coverage Tiles..- Several distinct RasterLite2 databases will be created and fully populated by importing the same image sample, but by applying a different compression method for each database.
- The compression ratios will be then computed from the sizes of the uncompressed database (method NONE) and the other databases based on the same sample.
- The compression time will be the time (as reported by rl2tool) required by the creation and full population of each database.
- The decompression time will be the time (as reported by spatialite CLI) for the execution of a SQL script containing 256 SELECT RL2_GetMapImageFromRaster() statements.
All requested images will be 1000x1000 pixels at full resolution, centered on different locations and adopting various SLD/SE styles.
This is assumed to be a realistic evaluation, because it basically corresponds to the typical workload of an hypothetical WMS server. - Note: the measured timings will not always correspond to the basic speed of each compression method, since other factors must be taken into account.
One major factor being the effects of I/O operations, while reading the image data - which possibly may also have to be decoded.
However, since the operational sequences are the same for all tests that are based on the same sample, the different timings result will be caused by the compression method.
Test #5 - Grayscale Raster Coverage
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 481 MB | 1.00 | 54sec | 1min 44sec |
LZ4 very fast compression | 416 MB | 1.16 | 59sec | 1min 48sec |
DEFLATE zip compression | 349 MB | 1.38 | 1min 5sec | 1min 44sec |
ZSTD fast compression | 346 MB | 1.39 | 1min 0sec | 1min 54sec |
LZMA 7-zip compression | 345 MB | 1.40 | 3min 2sec | 2min 3sec |
PNG lossless image format | 346 MB | 1.39 | 1min 8sec | 1min 41sec |
LL_WEBP lossless WebP | 320 MB | 1.50 | 4min 27sec | 2min 02sec |
LL_JP2 lossless Jpeg2000 | 323 MB | 1.49 | 4min 26sec | 2min 21sec |
CHARLS lossless JPEG | 339 MB | 1.42 | 2min 38sec | 2min 6sec |
Quick assessment:
- this test is based on a sample of 25 B&W TIFF+TFW Sections (forming a 5x5 square) centered around the city of Florence.
The original dataset is the Ortophoto imagery (year 1978; scale 1:10000) published by Tuscany - as expected from the previous tests, to obtain a lossless compression, with a high compression ratio, for photographic images can very difficult to achieve.
- for this specific test DEFLATE, ZSTD, and PNG achieved more or less equivalent compression ratios, with similar compression and decompression timings.
It's worth noting that the decompression time for DEFLATE, ZSTD and PNG are, more or less, the same as with NONE (uncompressed), so cause no rendering bottleneck. - As expected, LZ4 is fast but unable to achieve a decent compression ratio.
- LZMA continues to be very slow during compression and decompression.
- The real delusion comes from LL_WEBP, LL_JP2 and CHARLS.
These algorithms are specifically designed for the compression of photographic imagery, but they are unable to outperform the other generic multipurpose compression algorithms.
They achieve a marginally better compression ratio, but they are otherwise deadly slow. Truely not worth the effort.
Test #6 - RGB Raster Coverage
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 1.51 GB | 1.00 | 1min 17sec | 1min 51sec |
LZ4 very fast compression | 1.21 GB | 1.25 | 1min 31sec | 1min 47sec |
DEFLATE zip compression | 800 MB | 1.94 | 1min 56sec | 1min 40sec |
ZSTD fast compression | 816 MB | 1.90 | 1min 29sec | 1min 37sec |
LZMA 7-zip compression | 710 MB | 2.18 | 7min 23sec | 2min 11sec |
PNG lossless image format | 830 MB | 1.86 | 2min 29sec | 1min 49sec |
LL_WEBP lossless WebP | 525 MB | 2.95 | 7min 18sec | 1min 48sec |
LL_JP2 lossless Jpeg2000 | 802 MB | 1.92 | 11min 31sec | 3min 16sec |
CHARLS lossless JPEG | 912 MB | 1.70 | 7min 54sec | 2min 47sec |
Quick assessment:
- this test is based on a sample of 9 RGB TIFF+TFW Sections (forming a 3x3 square) centered around the town of San Giovanni Valdarno.
The original dataset will also be used in the following tests, but in this case the Near Infrared spectral band was completely removed. - this test confirms the results of the previous test (Grayscale).
- the unique exception is LL_WEBP, that in this case achieves the best compression ratio of them all, and achieves a fairly good decompression time.
Test #7 - Multispectral (4-bands) Raster Coverage
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 2.01 GB | 1.00 | 3min 18sec | 1min 55sec |
LZ4 very fast compression | 1.61 GB | 1.24 | 3min 41sec | 1min 48sec |
DEFLATE zip compression | 1.02 GB | 1.97 | 5min 5sec | 1min 42sec |
ZSTD fast compression | 1.07 GB | 1.87 | 3min 35sec | 1min 46sec |
LZMA 7-zip compression | 882 MB | 2.34 | 11min 7sec | 2min 15sec |
PNG lossless image format | 1.08 GB | 1.85 | 4min 43sec | 1min 47sec |
LL_WEBP lossless WebP | 758 MB | 2.72 | 9min 36sec | 1min 51sec |
LL_JP2 lossless Jpeg2000 | 1.05 GB | 1.92 | 16min 23sec | 3min 53sec |
Quick assessment:
- this test is based on a sample of 9 4-bands (RGB+NearInfrared) TIFF+TFW Sections (forming a 3x3 square) centered around the town of San Giovanni Valdarno.
The original dataset is the Ortophoto imagery (year 2013; scale 1:2000) published by Tuscany - this test confirms the results of the two previous tests (Grayscale and RGB).
- Here also LL_WEBP scores the best compression ratio of them all, and also achieves a fairly good decompression time.
Test #8 - Datagrid Raster Coverage (ASCII Grid - floating point single precision)
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 2.01 GB | 1.00 | 6min 30sec | 2min 6sec |
LZ4 very fast compression | 845 MB | 2.45 | 6min 36sec | 2min 9sec |
DEFLATE zip compression | 623 MB | 3.32 | 7min 2sec | 2min 6sec |
ZSTD fast compression | 614 MB | 3.36 | 6min 26sec | 1min 55sec |
LZMA 7-zip compression | 513 MB | 4.03 | 11min 20sec | 3min 5sec |
Quick assessment:
- this test is based on a huge ASCII Grid (DTM, 10m x 10m cell size).
The original dataset is the Orographic DTM 10x10 published by Tuscany - this specific test shows a slight superiority of ZSTD over DEFLATE; it's able to achieve a better compression ratio and it's faster during compression and decompression.
- LZ4 confirms to be fast but unable to score a good compression ratio.
- LZMA continues to achieve an impressive compression ratio, but still a barely tolerable slowness during both compression and decompression.
Test #9 - Datagrid Raster Coverage (TIFF - INT16)
Compression Method | DB Size | Compression Ratio | Compression Time | Decompression Time |
---|---|---|---|---|
NONE no compression | 480 MB | 1.00 | 17sec | 1min 39sec |
LZ4 very fast compression | 317 MB | 1.51 | 21sec | 1min 48sec |
DEFLATE zip compression | 205 MB | 2.34 | 28sec | 1min 39sec |
ZSTD fast compression | 207 MB | 2.32 | 20sec | 1min 42sec |
LZMA 7-zip compression | 168 MB | 2.86 | 2min 0sec | 2min 3sec |
Quick assessment:
- this test is based on the very popular ETOPO1 global relief model of Earth's surface published by NOAA
- For this test ZSTD and DEFLATE are basicly on par with each other; each achieving better results for different areas.
- LZ4 confirms to be fast but unable to score a good compression ratio.
- LZMA continues to achieve an impressive compression ratio, but still a barely tolerable slowness during both compression and decompression.
Conclusions
- For Raster Coverages, general purpose, lossless compression algorithms, can be successfully deployed.
They are never able to achieve an impressive compression ratio (for which a lossy compression is required), but they can effectively ensure a valuable reduction of required storage space - without imposing any loss of information.
And some of them are fast enough to not impose an overhead::- LZ4 doesn't seem to be advisable for the compression of raster data.
Although it is true, tends to be very fast; but not very effective in its compression ratio to be really interesting.
It can be consider more as a research/academic tool, that is difficult to use in a production environments. - DEFLATE and ZSTD are almost on par for the use with raster data; both are well balanced and effectively useful.
ZSTD often shows a clear superiority in speed and/or compression ratios, but the results are practically unpredictable, due the unknown factor of reoccurring patterns of the source data.
A careful evaluation, through a case by case, practical test is always required. Assume that there is no reliable, general, rule. - LZMA is unbeatable for its compression ratio, but so deadly slow during decompression that discourages its adoption in production environments.
Its best use probably is only for long term storage of really huge raster coverages, where high compression ratios can easily overcome any consideration about speed.
- LZ4 doesn't seem to be advisable for the compression of raster data.
- Lossless compression algorithms specifically designed for raster images do not necessarily bring better results.
They are too often unable to achieve a better compression ratio than the general purpose algorithms, and as a general rule they are unacceptably slow during decompression.
Sometimes being too sophisticated and complex doesn't always pay off: and this seems to be true in this case.- PNG results are clearly the best of this group.
It's a true image format but has (more or less) the same compression ratio and speeds as DEFLATE or ZSTD, and not at all surprising considereing that PNG is based on DEFLATE. - WebP (in its lossless mode) is only marginally interesting.
Sometimes (but not always) it can achieve a better compression ratio than DEFLATE, ZSTD and PNG without imposing any noticeable bottleneck. - CharLS (aka lossless JPEG) and Jpeg2000 (in its lossless mode) are definitely not at all interesting.
Both them are unable to achieve any interesting compression ratio, and are deadly slow (most noticeably LL_JP2).
LZMA can easily achieve a better compression ratio with more or less the same level of speed (i.e. slow).
- PNG results are clearly the best of this group.
- Note: DEFLATE, ZSTD and PNG require about the same decompression time as NONE (no compression at all); and sometimes they are even marginally faster.
This is a very relevant finding, because it confirms that the deployment of a compressed Raster Coverage and without the danger of causing a bottleneck performance.- A short rationale: every (de)compression will requires more CPU cycles, but compression will always cause less I/O operations.
On a modern hardware system, this is a beneficial trade off,so the processing of compressed data will usually be on par (or even faster) then processing uncompressed data.
- A short rationale: every (de)compression will requires more CPU cycles, but compression will always cause less I/O operations.
Back to RasterLite2 doc index