deduplify is a Python command line tool that will search a directory tree for duplicated files and optionally remove them. It generates an MD5 hash for each file recursively under a target directory ...
There's a command-line interface too! Note: Camelot only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF ...