I've already explained the general idea behind registering a big amount of photos as a Group of Published Photos in a single application with the Us Library of Congress. Now let's get a little bit more technical.
As always i don't assume any responsibility for the procedures explained here, you do everything at your own risk. Make sure to always have at least one backup copy of your photos.
What I did was to collect all of my published photos in a local folder (A): images from Flickr, Facebook, Twitter, NomadTravellers, my teenage blog, forums, picasa, paperback magazines, and whatever else I could remember I had ever published.
Then in another folder (B) I put a resized copy of every photo I took after a year of my choice. I didn't lose time to make a selection since the price for the registration of a group of Photographs is not dependent on the number of Photos, so I just selected all of my photos.
Then using several software to find duplicates I compared the two folders, and deleted all of the matching photos so that in B I would have left only photos from any year that were never published before.
This is more easily said than done, because there are different algorithm that can match images according to similarity. And several basic editing actions, including resizing, cropping, rotating, or putting a watermark that can severely influence the photo matching. In particular cropping images makes them almost impossible to match with their originals, so you should search manually for all of your cropped photos.
Also there were so many false positive, in particular for dark photos like night skies, that you can't safely delete all the matches together. That meant hours and hours of boring work checking the result one by one to make sure they were real matches. I repeated this iteration with several pieces of software, until I couldn't find any match any more and I was sure enough there were no previously published photos on my folder B.
This is a non complete list of the free software that eased my job:
AllDup: it finds duplicates comparing filenames, dates and also content. I used it as the first one because it's fast, even if it's potential is limited mostly to filenames. It turned out to be extremely useful when I didn't change the original filename coming out of the camera.
Awesome Photo Finder: Can find duplicates by visual content, even if they are only similar and not exactly the same. The user interface is quite simple and not really friendly, but the engine is quite good. Can be downloaded also as a portable version.
DupeGure Picture Edition: Finds duplicates according to content. The user interface is good, and also the selection options and preview. Unfortunately it's extremely slow to populate the results and to scroll them.
Exiftool: I used this powerful tool to solve another problem that I faced while preparing my application: file format. In order to compare the photos, it is better to have all the images in Jpg. Converting tens of thousands of photos from Raw to Jpg was not an option, but I found a short-cut.
With Exiftool, a really powerful and free application, I could quickly extract the jpeg preview from the Raw file and even add the metadata from the original file.
First I selected the Raw files that didn't have already an extracted Jpeg. To do that I used AllDup, comparing filename but ignoring extension (ie. finding all duplicates according to filename only: 0001.raw and 0001.jpg were considered as duplicates).
Then with Exiftool I extracted the preview of the remaining Raw files, and automatically embedded the metadata. This is the command I used:
exiftool -if "$jpgfromraw" -b -jpgfromraw -w %d%f.jpg -execute -if "$previewimage" -b -previewimage -w %d%f.jpg -execute -tagsfromfile @ -srcfile %d%f.jpg -overwrite_original -common_args -r --ext jpg C:\Users\....
The folder at the end of the command should be the folder where you have your Raw files that you want the Jpg extracted from. The command -r includes also subfolders.
Another problem I faced was making sure to have a consistent naming structure and no duplicate names. In my previous camera the counting of the filename was from 0000 to 9999. So every 10.000 photos the filename was repeating and if put in the same folder images from different periods, an image would overwrite a photo with the same name but different content! To solve this problem I renamed the photos adding in front of their original name the date they were taken. This is the command I have used in Exiftool
exiftool -d %Y-%m-%d-%%f.%%e "-filename<CreateDate" -r D:\....
AF rename your files 1.1: The previous renaming with exiftool, was just a temporary filename, after the final selection was ready I renamed all of my file with a progressive number 00001.jpeg using this small but useful tool.
XnConvert: I then converted all the Jpegs into smaller files. Actually you better do this at the beginning, before running the Duplicate software, so that the matching process speeds up considerably.
Once again I used a free software, XnConvert, resizing my photos to 600x600 72dpi Jpeg Quality around 70%.
In the end I also manually and visually scanned all the selected 40.000 photos. In particular I looked for and deleted photos:
- In which I was appearing: unless it was a selfie or you used a tripod, it meant I didn't take the photo, so I was not owning the copyright of the photo (you might use a face detector software for that, but it wouldn't work when you are with your back towards the camera, and they are not yet reliable);
- Containing logos, brands, writings or other things that might be copyrighted;
- Notes that I sometimes take with my camera: computer screenshots, contact card, maps, etc.
For the registration you also need to do a list of filenames, titles and Published dates.
It's not time efficient to give real titles to 40.000 photos. There is also a distinction between Title and Filename. What I did was to make sure that filename and title would coincide, and I assigned a progressive number from 00001.jpg to 42000.jpg
So my file list would look something like this:
I still tried to automate as much as possible the work to list all the filenames.
You might easily do this in Libreoffice Calc (Free excel alternative).
Another option is to automatically create a list of the files inside a folder. You can do this by opening a command prompt and accessing the desired folder. Then paste this command:
dir /b > print.txt
This will create a file print.txt with a list of the filenames only (no paths, dates, etc) inside the selected directory. Here more details.
After filling up the online form on the Usa copyright service website and paying the fee, uploading a file deposit is quite straight forward. You can zip together several thousands of photos, up to a maximum filesize of 500MB. Then you can select and upload several files all together.
Unfortunately after a while (probably 120 min) the server is timing out, and asking you to click a button within 60 seconds to keep the session active. Of course I was not looking at the screen constantly, so that my session was expiring every time.
In this happens, if you selected 6 files and 4 were totally uploaded before the session would expire, then also those 4 files would be deleted after the window would close. So my suggestion is to select and upload the files one by one.
This would largely depend on your internet upload speed, but my experience was that after uploading about 2 files of 400MB, the 3rd would trigger an expired session.
I think that's enough for now. Of course this was not a full guide to register a group of published photos, more of a tips and tricks for people that are already into the procedure. In case you have any doubt, feel free to ask in the comments below, but consider that in no case my articles nor my comments have any legal value, and you are assuming the responsibility for following my suggestions.