The Gallery of Fail

I take the shotgun approach: I constantly come up with new ideas to try. Some of them work out well, and some prove deeply flawed. This is a gallery of my bad ideas. Take a look at them, and see what doesn't work. Maybe you can even find a way to improve them into something that works.


Maotar reformats tar files slightly in a way that makes them more compressible. It doesn't alter how they extract, just rearranges the internal structure a little. It does work, and it does save space - but the savings are miniscule.

[quailsan@Daemoncore code]$ tar -cf test.tar fancybeep* PIC/ MiniQuincunx/ ponymath/ text*
[quailsan@Daemoncore code]$ ls -l test.tar
-rw-rw-r-- 1 quailsan quailsan 313169920 Dec 26 17:55 test.tar
[quailsan@Daemoncore code]$ cat test.tar |bzip2|wc -c
313533673
[quailsan@Daemoncore code]$ cat test.tar |gzip -9|wc -c
312230401
[quailsan@Daemoncore code]$ ./maotar test.tar test2.tar
[Long output cut]
[quailsan@Daemoncore code]$ ls -l test*.tar
-rw-rw-r-- 1 quailsan quailsan 313169920 Dec 26 17:58 test2.tar
-rw-rw-r-- 1 quailsan quailsan 313169920 Dec 26 17:55 test.tar
[quailsan@Daemoncore code]$ cat test2.tar |bzip2|wc -c
313533564
[quailsan@Daemoncore code]$ cat test2.tar |gzip -9|wc -c
312230376

Saved 109 bytes on bzip2, or 25 bytes on gzip -9. It works perfectly - the savings are just too tiny to be worth even entering the one extra command. I achieved much greater success on my images collection, a file perfectly suited to maotar - saved almost 0.02% on that. You can experiment with the program if you want, but it won't achieve more than a tiny saving.


Decapital transforms ASCII text in a manner that improves compressibility, and can invert the transformation perfectly at the other end. It works by altering the gramatical rules for capitalisation, reducing the number of capitals present - something that compressors do tend to like, as it makes the text easier to process using either proability-estimation or dictionary methods. The process is self-inverting for all inputs, which is certainly nice. It shares a common flaw with maotar, though: The savings are simply too small to be of any importance. It works, but not well enough to justify using another piece of software when compressing and decompressing. It is a very simple and very fast transformation though, with negligable memory requirements, so if you're designing a format or application that stores ASCII text in compressed form you can easily throw in this code and make the compressed data about 0.05% smaller.

Testing on a corpus of typical text (A handful of books from Project Gutenburg),

$ cat test.tar|wc -c
27351040
$ cat test.tar |bzip2|wc -c
8089467
$ cat test.tar |./decapital|bzip2|wc -c
8086766
$ cat test.tar |gzip -9|wc -c
10450658
$ cat test.tar |./decapital|gzip -9|wc -c
10418319

Both gzip and bzip2 see some benefit, even though they use very different algorithms. The benefit, however, is small: gzip goes from a ratio of 0.38209 to 0.38091, and bzip2 from 0.29576 to 0.29567. In the latter case, that equates to a reduction in compressed data size to 99.97% of the untransformed compressed size. A pitiful reduction, but a reduction even so. Proof that the idea is not entirely without merit.


vhsdeswim was supposed to be a filter for stabilising VHS rips, saving the expense of a time-base corrector for the analog step. It just didn't work. At all. Complete failure.