pngwolf png optimizer

This is the homepage of `pngwolf`, an open source utility to optimize the file size of PNG images. The format allows encoders to apply a filter to each row in an image that transforms the bytes used to represent the pixels in the image data by mathematically relating them with bytes of neighbouring pixels. As an example, a gradient may be represented as a sequence of numbers like 1, 2, 3, 4, and a filter transforms the sequence into a sequence like 1, add 1, add 1, add 1, which usually compresses better. It is infeasible for most images to analyze all possible combinations of filters, so `pngwolf` uses a genetic algorithm to gradually evolve a selection of filters to the best it can find within some boundaries like the time a user wishes the tool to run. Then the image data is compressed using the Deflate implementation from Igor Pavlov's 7-Zip compression toolkit, which, at the right settings, is usually able to compress at a higher ratio than other encoders, at the expense of compression time.

Source code`pngwolf` on github. See the README file there for further details.

Unlike other tools (see Cosmin Truţa's "A guide to PNG optimization" for an overview) `pngwolf` does not attempt to change the way in which image data is stored, such as transforming images using only a few colors to an image using a color palette, neither does it remove invisible meta data that may be stored in an image. Where such optimizations are possible, a different tool may be able to produce better results. Generally, `pngwolf` works best on RGB and RGBA images. It does not currently support interlaced images, but should handle all other options in the PNG format fine. Beyond the genetic filter selection `pngwolf` offers features such as allowing users to specify how much time they want it to spend on an image, users can abort optimization without losing any optimization found, it reports progress in a, at the user's option, verbose machine-readable format based on YAML, and it offers many options to tune various aspects of it.

Normalizing Transparency

When reviewing some test samples, I've noticed that many RGBA images use fully transparent white pixels which at eight bit depth are represented as 0xFFFFFF00 while transparent black is 0x00000000 which would usually seem to compress better. Many images even mix the two or use a whole range of fully transparent colors. Most likely these are left over from the editing process, which is partially revealed by inverting the alpha channel or ignoring the alpha values altogether. in March 2011
SHA-1 323e5e218cfb2265b9b4d27bf7f4caf75a40fea4
normal inverted dropped
Normal image Alpha inverted Alpha dropped

To make these I used ImageMagick, `convert in.png -channel A -negate out.png` for the alpha channel inversion, and `convert in.png -alpha off out.png` to drop the alpha channel. The image also has a tEXt chunk noting "Adobe ImageReady" has been used to make it. Apparently the Web export function in Photoshop does not remove these bits, which is okay, sometimes you might actually use the color information of fully transparent pixels for something, but that is somewhat unusual on the Web, so this isn't the best default. As noted below, `pngwolf` reduced the image data in this image to 1748 bytes. Asking it to make fully transparent pixels black, size goes down, using the same settings as in the test below, to 1240 bytes, saving 29%. The `pngwolf` option for this is --normalize-alpha. It is not enabled by default as it is not a lossless transformation.


To study the effectiveness of this approach I used CutyCapt to download the Alexa 1000 web sites and captured all images that were downloaded when loading them into a browser. I then filtered out any image that wasn't a properly formed, non-interlaced RGB or RGBA PNG image, and any image one of the other tools optimized by changing the color type or bit depth of the image, as `pngwolf` does not make such changes. The options for `pngwolf` were thus, specified here including some of the defaults in case I change them:

  % pngwolf --max-stagnate-time=10 --7zip-mpass=15
      --7zip-mmc=258 --7zip-mfb=258 --zlib-level=9

For all the images in the same this took several hours to run. With lower settings the process would complete much sooner, in fact, for almost all images in the set simply re-compressing them without looking for better filter combinations significantly reduces the size of most files. The table contains the size, in bytes, of the coalesced data in all IDAT chunks (which contain the deflated image data) that each of the tools generated, along with the SHA-1 sum of the original image (for `pngwolf` it shows the input size if the best result was worse than the input). The last column indicates how `pngwolf` fared against the best of the others, including the original.

The following image summarizes the results, giving the number of times each of the tools achieved a certain rank among the tools tested. Note that this is only the rank, it does not account for differences in size beyond that.

Comparison by rank

Benchmark results
SHA-1 pngcrush -b optipng -o7 advpng -z4 pngout pngwolf % vs best
0e6fc948522c9674e06670398e91248735ec9c8f 5089 5089 5089 5089 5089 100%
3b4b7bf1e2888dfe8b303afe169dc3427ad8b059 10870 10870 10870 10870 10870 100%
5a228ff87464d3f840843f662aea632d0e438793 4975 4975 4975 4975 4975 100%
7d391b5e9e77cfb529b564e7ff208c95a561ce16 82605 82605 82605 82605 82605 100%
815362177047993ec4d00e05ce241ce231b0c7dc 42508 42508 42508 42508 42508 100%
914c0061d910a0328475e9b8f7a4bdb230f16fde 3419 3419 3419 3419 3419 100%
9ea99aa7cb1885cde6b4582a4b3346b5cbe84915 3834 3834 3834 3834 3834 100%
a3f6ffd2b0d9fa626fc1b31819383b944c3de5d6 8553 8553 8553 8553 8553 100%
e6d187daec39c3564ee17b123922778901fc2daf 7549 7549 7549 7549 7549 100%
ad8d70e0ec8ce5310b19fd7b1228be275be2c7fc 4822 4103 4822 4485 4285 104%
9177d18377e12f226dd2bd363bb00e9675ae815e 7321 7174 7321 7199 7255 101%
ac1d94449a70d48a6f37da81c044775cc311a67e 19934 19147 19934 19934 19204 100%
3be4e67752e4e79b48e7a05610641af1cb03ebd3 21671 20293 21671 21671 20312 100%
dfe2d1e6f0bfd91edd1dc443883bf5ed405ac4f4 5201 5198 5201 5261 5201 100%
021d641cba539505bd0cabc5a18bf3772d4c8430 2996 2792 2722 2913 2722 100%
30feb83c04e70b6da0e875a44ea332577ee9e6cf 11206 9687 9405 10069 9605 102%
962d99f011e310f1ffe5296261fbec640e625e19 103322 87598 86347 101257 87300 101%
e17c407eec97036e92cfb30f4a2fcc3a6a077304 17309 13792 12575 15147 12676 101%
20c89f4f0376474c7c21371110c3322f7a602c92 5451 4465 4248 4982 4278 101%
2780b3842643fd9ec73ba73eead44f132f0e3fad 11279 11279 10544 11279 10596 100%
d6cdfb27d26287364b7fc59440728caa06942496 3889 3244 3083 3680 3089 100%
3fc3fb4fe8a4581b9156861fa6ce02cf1b8fb841 62648 40604 38478 58537 38534 100%
adc6578f31c5f072cfd223b0b95cec4e9bb62337 3847 3077 2946 3572 2948 100%
0bfaca3712a35651634324a6aebc07ead19dfb6c 30433 30433 29580 30433 29598 100%
507d18ea6cd80972d516c704158a76b559b8d68c 2763 2531 2457 2656 2458 100%
d1851f7ffd710aa714696635e5713bf0331d6be1 16007 10765 10359 16007 10363 100%

(expand table)

Notes on settings

One of the more complicated things with `pngwolf` is knowing when to stop trying to find better filter combinations. For the benchmark above I picked two minutes of total time, and a maximum stagnation period of thirty seconds. I captured all the improvements and how long it took to find them relative to the previous improvement.

Diagram showing relationship between time taken and improvements found

Note the logarithmic scale. So almost all improvements are found within a couple of seconds, it's rare that you can let `pngwolf` run for a long while and suddenly it picks up some improvement (although that certainly does happen with some samples). So I'll probably set the default so that `pngwolf` gives up after finding no improvement for five to ten seconds.

Another problem is the estimator. At the time of writing, `pngwolf` uses a fast zlib setting (simply level five in the benchmark) to see how well 7-Zip might be able to compress the image data, as running 7-Zip at the setting that compresses best most of the time on each genome is infeasible for most images. This zlib setting is not always the best. I've re-ran the benchmark changing two settings, one is the maximum stagnation time (previously set to 30 seconds, now at 10 seconds) and the maximum total time (previously set to 120 seconds, now unbounded), and I changed the zlib level from five to six.

Results were somewhat mixed. In 35% this resulted in smaller files and in 27% of the cases the size was the same, and `pngwolf` won against `advpng` significantly more often, but in turn it lost more often to `pngout`. This is likely due to both of the changed settings and due to random chance. I take from the comparison that I should ensure `pngwolf` terminates using a low maximum stagnation time setting instead of a long total running time setting. Ideally I'd find a better estimator that runs much faster, like a Deflate implementation that can store partial analysis results so minor changes in the filter combinations cause only minor delays.

Notes on filtering heuristics

The PNG specification suggests using a simple heuristic to find a good combination of filters; `pngwolf` computes it and in verbose mode prints it out, alongside the size the deflated image date requires (as estimated by zlib). The heuristic simple looks at all the possible filters for each scanline and computes, after applying each filter, the sum of the bytes in the scanline, treating each as signed integer. The filter that gives the smallest sum is then chosen. This requires computing the effect of applying the various filters to each scanline, so `pngwolf` reports the sizes for those aswell.

Applying these simple filter configurations on the sample as discussed above, except this time I did not filter out images that might benefit from turning them into indexed images, and using just the zlib figures as reported when using `--info`, the sample comes out at these total sizes. Note that there are two additional filter heuristics that will be explained in a minute.

Sample sizes after refiltering

Looking just at which of the heuristics was best for each image (and note that there is some overlap if two heuristics produce the same size, though that is rare of larger images), we have:

How often is None, Avg, Sub, Up, Paeth, and the basic heuristic best

The specification mentions that using the suggested heuristic usually performs better than picking any single filter for all scanlines, that seems either wrong, or the authors do not consider None a filter. In practise a whole lot of images use the heuristic filter even though using None for all scanlines would be much better. Including all the other heuristics from the earlier chart (note that there is now even more overlap than before):

Winner with all heuristics included

Now I can explain the two additional heuristics I added as an experiment. Without putting much thought into it, I figured it's cheap, compared to the other things `pngwolf` does, to deflate each scanline for each filter and pick the filter that deflates best. While that ignores interaction between multiple scanlines, it's round about the best local heuristic there is. As noted earlier, using only this heuristic produces the smallest total, and other than choosing None all the time (which would prodce the second to largest total) it's the best most often.

The other experimental filter I added seemed even less promising initially as it is much simpler and more efficient: it simply counts the number of distinct bytes application of a filter to a scanline produces, and picks the filter that produces the fewest distinct ones. While Deflate is more interested in 3-grams and longer sequences, counting unigrams isn't the worst approximation. Turns out this heuristic is much better than expected, it's the second best for total size, and wins as often as the heuristic in the specification. Comparing only the three:

The three heuristics compared

And comparing only the two more inexpensive ones:

Counting distinct bytes compared to the heuristic in the specification

This would suggest we have a bit of a pacman situation between the two, just without the ghosts: with these two filter configurations, in 64% of the cases counting distinct bytes produces smaller image data than computing the sums of absolute differences, for a reduction in size over all the images of 2.26%. That would seem rather good for the second thing I thought of. The first thing I thought of, deflating each scanline, wins 74% of the time when compared directly to the specification's heuristic for savings of 3.82%, but it's obviously a good bit more expensive if you are looking for something you can do from a Save As dialog on a smartphone.

Bigram heuristic

So if unigrams work good, then so should bigrams. Looking for trigrams is a large part of what Deflate implementations do, so going there would not be much more interesting than the first approach of actually applying Deflate directly, so this is a bit of a middle ground. Well, it does well, considering the low computational and memory overhead. For the whole sample the saving is 3.3%, so this is almost as good as scanline deflation for the sample while using considerably fewer resources. So to summarize up to here in a graphically more suitable form:

Comparison by rank

The main problem with all the heuristically derived filter sequences is that applying the None filter to all or almost all scanlines is still the best choice for half of the sample, and a pretty bad choice for most of the rest compared to the other options. It seems reasonable to suggest that deflating the individual scanlines should be the best heuristic, if looking at the scanlines in isolation would allow to pick a sequence of filters that is better at competing with not filtering. It's not surprising that this is not the case, so to decide whether or not filtering is the best option multiple scanlines have to be looked at.

A simple solution to that is simply compressing with both the heuristically derived filter sequence and the unfiltered image data in parallel and then dropping whichever performed worse, but that would considerably increase the resource requirements for an encoder. Not too much of a problem for `pngwolf` as it tries all the heuristics and then tries hundreds and thousands of additional filter sequences in order to approximate the best one (for `pngwolf` a problem is that when options differ only in a couple of dozens of bytes, the zlib estimator it uses to decide which is best does not always succeed in picking the right option, so sometimes it picks the wrong heuristic or even makes it worse due to differences in the 7-Zip encoder it ultimately uses to produce the output image; that is the main reason it sometimes loses to `advpng`).

The obvious way to include information about many scanlines is to pick the filter that causes the least increase in size given the best filters for the previous scanlines. That is much more expensive than deflating the scanlines independently from one another, but should also give much better results. And so it does, the size of the sample as a whole is reduced by 2.1% compared to deflating scanlines individually and by 5.8% compared to the heuristic the specification suggests. Choosing None for all scanlines is still better in 26% of the cases, and in those cases the image data is 3% bigger on average, with a median of 1.2%. Interestingly, the other mentioned heuristics still win roughly 4% of the cases each.

See also

Perl programmers may be interested in the modules Compress::Deflate7, which makes 7-Zip's Deflate implementation available to Perl scripts and modules, and Image::PNG::Rewriter, which implements the scanline filter algorithms. I used these to prototype and test `pngwolf` and both come with suitable test suits to verify the code does not damage image data. The modules can be used to re-implement `pngwolf` in Perl with a suitable genetic algorithm implementation.


March 2011
Added notes on transparency normalization.
Added notes on settings.
Added notes on filtering heuristics.
Updated table with new results.


Björn Höhrmann ·
Donate via SourceForge, PayPal.