codebird@excession:/media/spinny/data/weeklybackup/code/hashtar$ md5sum ~/.steam/bin/*.so
codebird@excession:/media/spinny/data/weeklybackup/code/hashtar$ tar -cf test.tar ~/.steam/bin/*.so
tar: Removing leading `/' from member names
codebird@excession:/media/spinny/data/weeklybackup/code/hashtar$ ./hashtar md5sum test.tar
Hashtar and zip_sha256 are utility programs for working with tar and zip files, particually in identifying ZIP files that are content-identical but binary-distinct.
Hashtar is a very small program. It performs only one simple task. Execute it, passing a tar file (Or tar on stdin) and the name of an executable. It will then, for each file within the tar, execute the specified program and feed it that file on stdin, passing output to stdout. Most often the specified program will be a hashing function - md5sum, sha256sum, something of similar nature. It's handy for verifying tape backups, as it can be easily used to produce a list of hashes for every file stored on a tape.
It won't take tar.gz/bz2/xz directly, but you can easily pipe them through their respective decompressors first. Symlinks in tar files and other such non-file contents are ignored. The 'longlink' extension is correctly handled.
The program is just one small source file - the tar format is quite simple, so only the bare minimum of code is required to parse it and pass the data to an external program.
zip_sha256 is a similar program, but limited to the sha256 hash. It has three modes of operation. The first outputs a list of sha256 hashes, one for each file contained within the ZIP, sorted by filename. The second outputs the SHA256 of all files concatenated, again sorted by filename. The third outputs the SHA256 of all files plus their filenames concatenated, again sorted by filename.
I created this small utiltity as an aid in identifying duplicated files: It generates fingerprints. The hashes output by zip_sha256 are unique not to the ZIP file, but to the contents of the ZIP file - regardless of the compression settings, the utility used to compress the file, or the order in in which the files are stored within. A zip which has been optimised with advzip, for example, will generate the same fingerprint as the original file. As will the the same contents after having been extracted and re-compressed into a new ZIP (As zip_sha256 ignores all modification times).
Published as C source, package libzip-dev required.