How to make PDFs smaller.

There are a lot of programs around that promise to make PDF files smaller. Some work. Some do not. Some are lossless. Some are not. The technique I am about to describe is lossless, and while it will not make all PDFs smaller, it does work on many.

The PDF format is a horrific mess inside. The format - said to have been penned in ichor upon the skin of a flawless black goat by a follower of the insane God Azathoth - features such delights as mixed ASCII and UTF16 characters within a string*, arbitary presence of absence of whitespace, ambiguous stream terminators and the use of <, [, {, (, " and << delimiters - with the << being completely different in meaning from <, though they may be used in succession to create <<<This sort > of mess>>. Linebreaks may be CR or CRLF, except where they must be CR. It suffers heavily from the problem that plagued HTML: A very lax specification meant that decoders had to be able to handle pretty much anything, which in turn meant that encoders would generate pretty much anything.

So let us not go there. Let us instead stay far away from the maddening horrors, and download a utility: qpdf. This utility can open a PDF file, and save it to another PDF file. On the way it may do all manner of useful things, none of which involve making the file smaller. Not yet. You need to make one tiny change.

So download this program, as source. Within it find libqpdf/PL_Flate.cc. Look for a reference to Z_DEFAULT_COMPRESSION and change it to Z_BEST_COMPRESSION. Compile. You just told it to call zlib with a parameter that will compress using a higher level of compression than the default. In my view, this ought to be the default. Z_DEFAULT_COMPRESSION was chosen back when computers were a lot slower. You'll need libjpeg-dev to compile.

Uninstall the version of qpdf provided by your package manager, including libqpdf (important! That's the part you just altered). Then just a quick make install, and you're ready. Now 'qpdf in.pdf out.pdf' will serve as a pretty decent PDF compressor. You could do better with zopfli, but patching libqpdf to use zopfli is beyond my skill. As well as recompressing PDFs to the highest level supported by zlib, it also removes redundent trailer structures and orphaned objects left over from editing and generation. It won't make all PDFs smaller, but it'll work on quite a few of them.

As an added bonus, qpdf will tidy up the unholy filth that is found inside many PDF files and output instead a nice, simple, well-formatted PDF that is not plagued by the need to follow a linked list of linked lists when reading the index and contains no orphan objects left behind by poorly-written exporters. This means fewer strange problems for those who want to view the PDF.


* PDF predates UTF8. The solution is a custom string encoding found no-where other than PDF which uses escape sequences to switch from ASCII to UTF-16 and back.