Many years ago, I created this guide to 'extreme' compression using x264 - attaining compact, high-quality files through the use of the most heavily optimised configuration, without regard to processing time or to the amount of human attention required.
In the years since, technology and expectations have advanced and rendered much of this guide obsolete. HTML5 video, then a new technology, has become widely established. Where once almost all video was downloaded, it is now as likely to be streamed. Most importantly, device profiles matter a lot more, as viewing video on comparatively low-powered devices such as tablets and phones has become just as important as viewing on PCs and laptops.
As I write this new introduction, h265 is poised to take over from h264 - a codec substantially more advanced. I hae decided against updating this guide to modern technologies and conventions, as this would be a near-total rewrite. Certain portions of it, however, still hold value, and so nor am I going to take it down entirely. Insead I am archiving it into IPFS, to serve as both a reference and as a historical view into the video technology of the 2000's.
Please bear this in mind when using anything you read below: Much of it is written with old technology in mind, and no longer conforms to current best practices. These instructions are aimed at creating the most compact possible file for download, and push the playback device to a limit that many portables are unable to play back. Nothing below should be followed blindly, but only taken as a suggestion.
- Codebird, 2017, with regard to Codebird, 2007.
Those readers who have been encoding video for the longest will remember the advantages that could be gained from multi-pass compression - a two-pass would easily beat a one-pass, and a three-pass could do even better. This was very true back in the days of divx or xvid, but things are a little different now.
x264 does still support the option of specifying bitrate, and of using either single or multiple passes, but these are now largely superseded. The only situation in which a bitrate should be explicitly specified for a release file is when encoding a file for a purpose in which a strict bitrate must be adhered to, such as a streaming video or optical disc. Bitrates can also be explicitly given when testing the effects of different options on PSNR/SSIM - my own Ponymath script, used to generate some of the graphs here, does this.
The preferred means of balancing bitrate now is to specify the crf, or constant rate factor. This nifty field achieves quality just as good as multipass encode, but in just one. The downside is that you can't be sure exactly how big the file will be until it's done. Higher crf gives lower quality and lower final size. crf=23 is a good place to start. This is the number, more than any other, that controls the size-vs-quality tradeoff that is inherent in lossy compression. CRF can allow for the encoder to operate more efficiently, as it is not constrained by the need to conform to any specified bitrate and may adept freely to the complexity of the video.
There are some situations in which you may need to use a specified bitrate and parameters to control it, though. The most common is for streaming, when a maximum bitrate must be used to avoid the buffer emptying during high-motion scenes. Bitrate control is also required for optical media (ie, blu-ray) encoding, because the rate of rotation of the disc imposes a limit on transfer speed. Optical media may also have a minimum bitrate, to minimise the need for speed transitions during playback.
There is a third mode, constant quantitizer, but it is of little practical use. It's a predecessor mode left over from the development of what would eventually become CRF. It does have one niche use: Setting cq=0 puts the quality setting so high the decoder becomes, in abstract mathematical terms, lossless. In practical use it'll be very slightly lossy due rounding errors in the DCT and color space conversion, but it's quality is still high enough to make it an option for archival use.
There are some options which can almost always be changed to give an improvement in quality and/or bitrate, at the expense of encoding time. Though there are some files for which these strangely fail, such situations are rare. These are not enabled by default typically because either not all decoders support them or because they have only a slight benefit but are hugely processor-intensive. Although the most popular software decoder libavcodec will have no difficulty, some older decoders or hardware implementations could fail to decode video that uses options too extreme. I usually consider this to be an acceptable sacrifice - bandwidth is too important to waste, and those downloading files always have the option of transcoding to a compatible but very-high-bitrate file for viewing purposes.
Enables improved trellis encoding. I have yet to find a video where this didn't result in an improvement. It should always be used.
Use this. I don't actually understand how it works myself, but in every test I've ran it proved to be beneficial.
Allows the use of additional partitions.
Increases the quantitizer slew rate. This is actually nowhere near as good as you might guess, even on animation. I've run many tests, and noticed two things: Firstly, that as you raise this there comes a point (ten, give or take three, usually) where any additional increases does absolutely nothing. Secondly, that no below this point performance is completely unpredictable. I did reach one practical conclusion: There tends to be a peak at qpstep=7. I don't know how, but setting qpstep=7 seems to squeeze an extra 0.0005 or so out of SSIM. Every little helps.
Or just four, or as much as eight. More is better, but beyond six the gains become marginal. This does have memory and performance penalties on the decoder, but if you're aiming for extreme encoding you are going to have to sacrifice that - it cannot be helped. Another option most effective on animation, one of the few times you might want to take it right up to eight or nine. Never go up to ten - it's in violation of even the most lenient profiles. If you're encoding 1080p, be especially cautious about high values, as it can cause compatibility issues. The decoder needs at least frameref times the size of an uncompressed frame to decode - on large frames and high frameref this can be more than the memory allocated to the decoder on embedded devices, though it'll pose no difficulty for a full computer to handle.
Raising frameref works well on live-action and CGI, and works miracles on animation.
Specifies when to use subpixel motion estimation. It's a setting that almost every guide says to alter, but few agree on the value. I've performed some tests myself using the objective SSIM measure, and determined that the optimal value is subq=10. Note that the subq and subme options are actually the same, under two aliases. You could increase this to the maximum of 11, but in my tests this showed an improvement over subq=10 so small it could barely be measured even on SSIM.
Tests on a 22-minute live-action movie, of exceptionally strong film grain. Tesa easily outperformed umh, largely due to the ability to increase its effective estimation quality with higher merange settings. The apparent anomaly at merange=128 is actually just a confusingly-placed legend.
Specify motion estimation. There is some debate among encoders regarding which is the best motion estimation method: umh or tesa. Tesa gives better results, of this there is no doubt, but is also cripplingly slow. So many advise the use of umh, which is almost as effective and many times faster. I personally though say tesa, every time - if you wanted to encode fast, you wouldn't be reading this guide! More importantly, in my own tests I noticed that (At least on live action, I've not tested animation) tesa was able to benefit significantly from increasing merange while umh, strangely, actually performed consistently worse with higher merange.
I don't know exactly what this does myself, but I've run some comparisons and it is of clear benefit. The default is mode one.