Compact video encoding

It's the 2020s. Use AV1.

So you want to make your videos small but still retain quality? It can be done!

If I wrote this ten years ago it would be a lot longer. It would be page after page of h264 options. I did in fact write such a guide. Today though, the answer is far simpler: AV1.

The AV1 codec is just the best. It's even better than h265. AV1 is, right now, the best there is. That's the secret to super-compact encodes. AV1.

The catch is that AV1 isn't as widely-supported as h264, or even h265. Tough. It's supported on Android and ChromeOS and every PC OS. That's good enough for me. You may have trouble if you want to play on a smart TV though, and while Apple does support AV1 on their newer devices they were late to the game as they'd commited their support to the rival technology h265.

If you want to make the best encodes then there's no place for second-best h265. Let your viewer transcode to a high-bitrate scratch file if they must.

A few tricks to squeeze the best out.

There are, through, a few extra tricks you can use to squeeze the most out of AV1, or compression in general.

Appropriate resolution.

Sometimes, 4K is marketing wankery. Sometimes even 1080p is. You've got to use what's right for the source: If it's a low-detail animation, why bother going over 720p? A lot of TV shows before HD were shot on SD tape because it was cheap. On a closely related note, beware of letterboxing - it's very common to see this on Blu-ray sources where the movie may well have not been produced in the 16:9 ratio of Blu-ray. If you have a letterboxed source you'll want to crop it to the correct resolution for that aspect ratio.

Wikipedia's list of recormended resolutions.

The one and only time you should even /consider/ upscaling is if you are using some advanced ML-based upscaling filter which isn't practical to run on the playback device in real time. Generally you want your viewer to decide on their actual display resolution according to their setup.

10-bit encoding.

For reasons of Deep Math, 10-bit encoding is sometimes more efficient than 8-bit. Counterintuitive, but possibly true. So if you have an 8-bit source, you might find it encodes better by converting it to 10-bit. I am still experimenting with the best way to go about this, as I believe a very range-limited limited blur could go a long way to removing quantisation banding. Remember that color depth conversion has to go before denoising though!

Cleaning the source.

Noise is the great enemy of compression. If you are working from a pristine Blu-ray rip then this won't be an issue and you won't have to worry about it. But if you are dealing with some old and degraded film, or a VHS rip of a long-forgotten TV program, you'll want to try to filter out the worst of it. Carefully. It's very easy to go overboard with noise removal and lose detail in the image. Animation is especially easy to work with though, and will let you do wonderful things using just filters like nlmeans. A clean image will compress better. Also be aware that in some cases noise is used intentionally - directors use it as part of the visual language of cinema to evoke the feel of an era by emulating a VHS tape or 8mm recording, or more subtly to show a flashback or change of perspective. So don't ruin some found-footage horror film by undoing that.

With that in mind, what you want something like the chain 'hqdn3d=0:0:2:2,nlmeans=s=1' - a very mild denoise with both spatial and temporal elements.

Variable frame rate.

Not all frames need to be equal in length - modern containers allow for identical frames to be 'collapsed.' It's not a whole lot of use on live-action video; the only times that holds perfectly still are titles and establishing shots. On animation though, it works miracles. Seriously. I got an hour and a half of 1080p animated movie into 400MB and it looked perfect. The MKV and WebM containers support this, but you need to enable it by making the final filter in your video filter chain 'mpdecimate=max=6:hi=384' This also won't work if your video has any noise - it need to be pristine.

Pay attention to audio too.

You want the Opus codec. If AV1 is the best video codec around, Opus is the best audio - it'll beat MP3 easily, or even AAC. Added bonus: If you're using AV1 and Opus, you can use the WebM container! It's a lot like MKV. Actually a subset of MKV. If you use WebM though, your video can be played in many web browsers, will be natively supported on all Google environments, and can be embedded in websites. Cool. Only catch is that your subtitles are limited to WebVTT format - it's similar to SRT and trivial to convert.

Watch out for a source with inappropriate audio tracks as well. Most often this is a 'stereo' audio in which both tracks are identical, hiding that it is truly mono. More rarely will be a 5.1 audio in which only the front stereo speakers are used. This occurs because the media has passed through a process at some point which did not support the ideal number of channels (DVD, for example, has only 5.1 and stereo modes). If you properly downmix this to the correct channel count then you will get a more efficient encode, plus the playback device will be in a better position to make use of whatever speaker configuration the end viewer has. Finally, setting -frame_duration 60 will slightly (though only very slightly) improve your compression.

The other AV1

There are two AV1 encoders in ffmpeg. There's libsvtav1, and libaom-av1. The libaom encoder is actually quite a bit better, but also so ridiculously slow as to be unusable unless you are trying to win an encoding contest, so most of the time you'll be using libsvtav1. You may want to use libaom if you are trying for a prestige, record-setting file though.

libavtav1 parameters.

libsvtav1 has a lot of configurable options. All of the easy time-quality tradeoff ones are easily accessed through ffmpeg option "-preset 1". The other options are mostly intended for specialised situations (you may need to fiddle with them if you are doing something with HDR video, 3D or film grain synthesis then you may want to look in to them, but for the most part you can trust in ffmpeg to pass any required paramaters to libsvtav1.

The horrible black-and-white trick, maybe.

A horrible little trick. Usually video uses three color channels. RGB for display, YUV for encoding, but always three. But you can actually do it using just one. Luma only. Give ffmpeg -pix_fmt y8. This will result in the video turning black-and-white, of course, but if you are encoding a black-and-white movie.... It should improve the encoded quality a little in theory, but more usefully it means one-third the data to churn through on encoding and viewing, speeding up the process. Unfortunately only the libaom encoder supports this, and my own testing as to how much of a benefit this might achieve has been inconclusive. I also found the y10 - ten-bit greyscale - had compatibility problems with some players.

Tiles.

Tiles are good, in moderation. But they do hurt encoding efficiency. Generally you never want to use them below 1080p, and they are most essential at higher resolutions than that. Tiles are a feature to allow for more parallisation in encoding and decoding. Use them accordingly.

Applying the theory

Now you've got the theory, here's how to do it in practice:

ffmpeg -i <sourcefile> -map 0 -c:a libopus -frame_duration 60 -c:v libsvtav1 -preset 1 -svtav1-params film-grain-denoise=0:tile-columns=2 -vf hqdn3d=0:0:2:2,nlmeans=s=1,mpdecimate=max=6:hi=384 <outfile.webm>

There. Now go fourth, and stop sending me half-gigabyte video clips from your iPhone.