Structuring Your Archive
Before we delve into the details
of workflow-based image processing, let’s take a moment to consider an
important topic that tends to be overlooked, but that can have a
significant impact on your workflow—namely, how to optimally structure
your rapidly-growing collection of digital image files. As you’ve
no doubt already noticed, collections of digital images tend to grow
very rapidly, and can easily become a dense, unmanagable mess.
The critical issue here is how to efficiently find what you’re looking
for in that mess, when the need arises—i.e., how to find the proverbial
needle in a haystack.
As it turns out, this very issue has been subjected
to intense academic investigation over the past few decades, primarily
in the field of information retrieval.
Unfortunately, the science of information retrieval, as applied to
digital images, has advanced rather more slowly than the other areas of
that field. The problem is that computers really aren’t very good
at figuring out what an image file actually depicts—i.e., whether it’s an image
of a golden-winged warbler at dawn, or of the Eiffel tower at
dusk. For all intents and purposes (i.e., in the absence of
carefully curated databases and highly trained classification
algorithms), the computer has no idea what the image contains.
Thus, to make automated searching for particular
images in your growing archive feasible, you’re going to have to tell
the computer what each image contains: whether a blackbird clinging to
a cattail, an osprey feeding bits of fish to its offspring, or an
ostrich with its head in the sand. The simplest way to do this is
to indicate the photo’s semantic content via the image’s
filename. For example, an image showing a juvenile osprey
exercising its flight muscles in the nest might be called osprey-chick-flapping.jpg. In
all likelihood, if you’ve got one such image, then you’ve probably got
many, so you’ll want to number them—e.g., osprey-chick-flapping-47.jpg.
The idea is simply to spell out in the filename what the image
shows. The purpose of doing so is to allow the computer’s
automated file searching tools to find the images that satisfy whatever
criteria you give them. You might, for example, search for all
photos depicting ospreys, or all photos depicting osprey chicks, or all
photos depicting osprey chicks flapping their wings. When
choosing a filename for an image, you really need to think about the
types of automated searches you might perform in the future; your
ability to find particular photos later will depend on how well you’ve crafted their names.
Most operating systems have a search tool
that allows you to find files based on their filenames.
If you make a habit of putting the species name (and
possibly other relevant information) in the filenames of
your images, your archive will be more easily searchable.
Assigning an informative filename can be done very
conveniently when saving the postprocessed version of an image.
Photoshop will prompt you for a filename anyway (since you can’t save
changes back to the RAW file), so that’s a good time to choose a name
that will be informative. Since you’ve just finished
postprocessing the image, you’ll know at that time what species of bird
is depicted in the scene. I recommend using a filename of the
species - RAWID . ext
where species would be
something like “great-blue-heron”, RAWID would be the base filename
assigned to the RAW file by the camera (i.e., something like 7D1E4233 or the like), and ext would be PSD, JPG, or TIFF,
depending on the type of file you’re saving (PSD files are native
Photoshop files that contain layers, saved selections, text elements,
etc.). The advantage of this filename convention is that the
species part (which might also include descriptive words such as “flying” or “eating”) can be searched easily, while
the RAWID portion of the filename allows you to connect the processed
JPG file to the original RAW file if necessary (for example, if the
original JPG is a low-resolution version for display on a web page, and
you decide later to re-process the RAW file to produce a printable
high-resolution JPG or TIFF for publication).
If your JPG files retain EXIF
information, you can also use specialized search tools, such as the Find function in Adobe Bridge,
that allow you to search for files based on the equipment settings used
for the photo, such as ISO and focal length (see the figure
below). EXIF data is stored in both the RAW files and the JPG
files that are produced by the camera. In post-processing, if you
save a RAW photo from Photoshop into a JPG file, that JPG file will
also retain the EXIF data. Note, however, if you perform screen-scraping—using a
pixel-capture utility to copy the image directly from the screen into a
file—the EXIF data won’t be copied to the new file.
Screen-scraping is sometimes useful for non-Photoshop image-processing
applications, since some of those applications don’t show you an exact WYSIWYG (What You See is What You Get)
representation of the file on screen; that is, the file saved by the
application may not look exactly like what you see on-screen when
editing the image. Once you get the on-screen image looking the
way you like it, screen-scraping that image directly to a file using a
separate pixel-copying utility will ensure that the saved file looks
exactly like what you saw on the screen. If you’re using
Photoshop, however, this shouldn’t be necessary (though you may need to
adjust the setting under View >
Proof Setup for your monitor). Exporting a JPG directly
from Photoshop should produce a WYSIWYG file, which will also retain
EXIF information. The EXIF information contains just about
everything you’d want to know about the technical details of the image:
the shutter speed, aperture, focal length, flash power, exposure
compensation level, etc.
Some tools, like Adobe Bridge, allow you to search
through your RAW files based on EXIF features such as focal
length, ISO setting, etc.
The next thing to consider is how to choose a directory structure for organizing
your files. Keeping different sets of files in different
subdirectories allows you to zero in on the particular files you’re
looking for faster. Some options are to create separate
directories for individual bird species, or for different shooting
locations, or by the date the photos were taken. I prefer the
latter option. As illustrated in the figure below, my archive is
organized by year at the top level, then by month within each year, and
then by day within each month.
advantage of this type of organization is that adding new photos
requires the minimal amount of time and effort. When I return
home from the field, I simply create a new folder with today’s date,
and then download all my memory cards into that directory. (I
also backup the new directory to two external hard drives).
Although organizing my directory structure by species would make
searching later easier, I rarely have enough time to sort through all
the photos I’ve taken each day and move them to the appropriate
directories. However, during post-processing (which sometimes
doesn’t happen till months after the photos were taken), I can put the
species name in the filename of the processed JPG file, which allows
searching by species using a search tool such as Adobe Bridge or the Spotlight tool in Mac OS X.
Fig. 12.2.3: An
archive structure based on date. If you know roughly
when you took the photo you’re looking for, you can perform a directed
search based on the date. This structure has minimal maintenance
requirements: whenever you upload photos from a memory card, you
simply create a new directory with today’s date. That’s ideal for
numbers of RAW files. For your postprocessed JPG files, a
structure might be based on taxonomic classification or location.
In addition to my vast archive of RAW files, I also keep a
separate, much smaller archive of post-processed JPG files. When
searching for images of a particular species, I can then search the
smaller JPG archive, and then when I’ve chosen the handful of JPG files
I’m interested in, I can track those back to the corresponding RAW
files in my RAW archive (based on the capture date in the EXIF).
Because my JPG archive is separate from my RAW archive, I can structure
it differently. In particular, I can create specialized
subdirectories for various purposes—e.g., for individual species, or
for files that have been optimized for printing on a particular medium,
or for photos taken during a particular birding trip, etc.
Because there are far fewer post-processed images than RAW files in my
archives, I can more easily find the time to organize the JPG’s into a
thematic directory structure. Note that in some operating systems
you can also create links or shortcut files that allow one file
to appear simultaneously in multiple directories (without wasting space
by actually duplicating the file). This is useful for when a
photo naturally belongs to multiple categories—e.g., both the “Great Blue Heron” category and the “Photos from My Florida Trip” category.
At some point, your archive will almost certainly
become so big that it no longer fits on your computer’s hard
drive. I now keep my archive on external hard drives rather than
on my computer’s internal hard drive. When the external drive
becomes full, I simply buy another external drive and put the newer
images on the new drive. I keep two identical copies of each
drive (for a total of three copies), for backup purposes, and I keep
those copies in different physical locations: one at home, one at work,
and one in a safe-deposit box at my bank. I’ve learned the hard way that hard
drive failures are extremely common.
In terms of the file formats for post-processed
images, I use JPG because it’s more space-efficient than TIFF.
Sometimes I need to generate a TIFF version for publication, and in
those cases I either convert the JPG file directly to TIFF (in
Photoshop), or go back to the original RAW file and post-process it
from scratch into TIFF (i.e., if I think that doing so will produce
more detail in the resulting TIFF than was present in the JPG).
Remember that the images you post-process for posting on your web page
or sending to friends via email will generally be much too small (i..e,
be of insufficient resolution) for printing or publication. This
is why you should always retain your RAW files in case you need to
later re-process them to produce a higher-resolution image.
Finally, note that there are commercial database
applications that can be used to manage your photos. I generally
don’t recommend these, since they tend to tie you to a particular
operating system (or even a particular version of the operating sytem),
and in some cases they may be prone to losing images via corruption of