On CBZ creation.

I'm seeing a lot of people making slightly-misformatted CBZ files, and I want to tell them all off. So let's get a few things straight.

There is no document that defines the 'CBZ standard.' It's not even entirely clear where the format originated - the first reliable evidence suggests it may have been introduced by the CDisplay reader, but this is not certain. The CBZ file is barely even a format, so much as a loose consensus on the proper way to store a comic. Much of this consensus comes from the underground community of comic book archivists and pirates. Though there is no specification, there are certain expectations which must be adhered to if you want to make sure that your comic will display correctly in all readers.

Preparing your images.

If you are fortunate, your images will come to you in perfect quality. This will be the case if you are archiving pages from a website. If you are scanning a comic though, you may have to do some cleanup. One rule of scanning is to always scan in a far higher resolution than your intended final image, as you will need this high detail during the clean-up process after scanning. Only when that is done should the image be scaled down.

If your comic is without color, convert the scanned image to greyscale. It will make subsequent processing easier, and the file gets smaller too.

It is very important to carry out a histogram adjustment to ensure that your whites are true white, and your blacks true black. This is also the point where, if your comic is in color, you may carry out adjustment to compensate for yellowing of the paper or fading of the ink.

Crop your image. Tight! Your audience may well be reading on a small screen - smaller than the intended page size of the comic - so you need to make sure that space is used efficiently. White space also means fewer pixels for storing the actual artwork. So you should make sure your page is perfectly rotated, then crop your image right up to the edges of the comic. The only exception is if the artist is deliberately using that white space, in which case make sure to preserve it.

Dithering, half-tone and the Ben Day process are your enemy here. Most printing processes deposit ink on the paper in tiny dot patterns. In many comics, these dots may not be so tiny. While some graphic novels are valued art pieces, comics as a historical medium were mass-produced documents intended for a young audience with very little pocket money: They were produced by the lowest-bidding printers around, on the cheapest paper and with a rushed production. This meant a dot pitch so large that the individual dots became easily visible. This pattern even became part of the visual language of the medium, which artists would adopt deliberately even when using higher quality printing processes to invoke a connection.

Unfortunately these dithering patterns really mess up digital image processing - they will lead to moire patterns as the image is zoomed, and ruin the power of image compression leading to over-sized files. They have to go. Cleaning these up is a challenge in itself, for which you may find detailed instructions elsewhere. The key step is to scan your comics at a very high resolution initially, capturing every individual dot, in order to then perform the filtering and scaling to a reasonable size after scanning.

How to make them.

Filenames

Firstly, there are filenames. Everyone is in agreement that the filenames, when sorted, sort into the page order. What is not so strictly agreed is the lexographic ordering of the end-of-string when sorting. That is, how do you order files when not all the names are the same length? Does 99.jpg come before or after 100.jpg? 1 is lower than 9, so shouldn't 1 be first? What about 100.jpg vs 100-1.jpg? The truth is that there is no universal agreement on this, so different viewers may display pages in differing orders. There is a very simple way to avoid getting into this sort of mess: ALL FILENAMES SHOULD BE OF EQUAL LENGTH. Pad them with zeros if need be. Avoid deviating from this: No doubling up numbers for two-page spreads, no 'cover.jpg' or similar. Tolerable, though barely, is adding a tag to the end of filenames just to ensure ordering without needing to renumber pages - typically done for covers or for a 'tag' image identifying the person or team responsible for preparing the file.

Bad: 98.jpg, 99.jpg, 100.jpg, 101.jpg.

Good: 098.jpg, 099.jpg, 100.jpg, 101.jpg.

Very bad: 98.jpg, 99-100(Double).jpg, 101.jpg 102.jpg.

Acceptable: 000a.jpg, 000b.jpg, 001.jpg, 002.jpg... where 000a and 000b are inner and outer covers, or varient covers from different publication runs or editions.

A CBZ containing nothing but sequentially-numbered images is perfectly valid as a CBZ file. For ease of sorting though, these filenames can be used as a way to store metadata. Either by placing them within the filename, or by placing those files within a directory. As a handy hint, you might want to put the number of pages in as well - it can be helpful when comparing different releases.

Putting an inner directory is also acceptable, so long as there is only one. Due to the way ZIP stores directory structure, it won't make any difference.

Good: 001.jpg 002.jpg 003.jpg...

Better: Artist-Title-Issue N-Pg001.jpg Artist-Title-Issue N-Pg002.jpg Artist-Title-Issue N-Pg003.jpg

Alternative: Artist-Title-Issue N/Page001of192.jpg, Artist-Title-Issue N/Page002of192.jpg, Artist-Title-Issue N/Page003of192.jpg

Some people use - as a separator, some a ., some a sequence like .-. or _-_. It does not matter, so long as you are consistent with it.

Formats

Secondly, contained formats. There are three formats in common use for storing the pages within a CBZ: JPG, PNG and GIF. Anything other than these three *will* have issues on some viewing software. That means no BMP files, please, and no weird things like PCX or TIFF. GIF should also be discouraged because it's really just obsolete, so GIF files should be converted to PNG - they'll take less space.

We are now beginnning to see WebP used in CBZ files as well for the superior compression it offers - but right now, not all readers support it. I strongly urge anyone involved in CBZ reader development to correct this, because it is not unusual to find such a CBZ. One day, AVIF may well follow.

But there is more! It's not enough to just make a CBZ file: You should make it as well as you can. That means compacting too. There are utilities which can optimise images - jpegoptim, pngcrush, optipng and the like. For no loss in quality, these will make your images smaller. Use them. Then advzip will make your CBZ file even smaller still.

Metadata

Then there is the matter of metadata. How I wish this were simple! Unfortunately the CBZ file, being a consensus rather than a specification, doesn't have any agreement on how to store metadata. There are two ways to do this: The ComicRack way, and the Calibre way. I personally dislike the Calibre way, as it stores the data in a zip comment field - and as practically no-one is even aware that the zip comment field even exists, any program which edits the CBZ without being aware of it (including my own) is likely to make this metadata disappear. Thus, if you do wish to store metadata to identify your comic, I advise using the ComicInfo.xml file method. It'll handle the important fields: Title, series, issue, volume, author.

These are important. Without metadata, collectors are going to struggle to find or organise your file. It really is essential that this information is included. At the very minimum you need title and, if applicable, series, volume and issue number.

Additional files: Don't.

Some CBZ files contain additional files - as CBZ is a convention rather than a formal specification, people have added their own things. These include .txt or .nfo files to identify the release group, or error-detection in the form of .md5, .sfv or .csv files. Some even contain .lnk or .url shortcuts to a website, or .par files for error-correction. All of these I advise avoiding, for the following reasons:

While you are doing this, make sure you are not inadvertently including any 'junk' files your OS leaves lying around. No thumbs.db, or .ds_store. I've even encountered CBZ files with SUPERJPG.TNC or WS_FTP.LOG inside. That's just sloppy!

And now you have a neat comic.

Follow these rules, and you will produce CBZ files in which the pages always display in correct order, no matter what program is being used to view them.

Finally, a note on what not to make: No CBR, please. Popular as they are, they use the RAR container - it's not a properly open standard, and it's rather difficult for programmers to work with. CBZ is much better. The compression advantage of RAR over ZIP is minimal if any when dealing with compressed formats such as will be found in comics, and with the use of advzip there really is no advantage to speak of. Stick to CBZ.