confero

Confero identifies files with a lot of data in common within a large set of files. Any long enough stretches of bytes, regardless of their position within the file. It usually won't help search for similar images - there are more specialised utilities for that - but it is very effective for many types of document. It will identify archives which contain the same files within, or different revisions of a document. For example, I used it to examine a large collection of downloaded web comic archives contained within zip files, and it identified instances where the same comic series had been included twice under two different titles.

It does this using a combination of variable-length chunking followed by Jaccard simularity comparison of the resulting chunk hashes.

Compiled for Windows, source for linux.

Windows compiled executable removed due to false positive on Windows Defender.