Cool@Hoole

Invisible Digitization?

This entry was posted in Software, Work Flow and tagged . Bookmark the permalink.

Sometimes Digital Services captures “Oddities,” which is content that can’t be uploaded into Acumen. This may be content digitized for patrons, donors, and other institutions; or it might be tiff image files that have already been placed online or archived that need enhancing. Any digitization that does not result in tiff files that can be placed online is called “invisible digitization,” and it happens more often than you think. Often, this digitization is a regular step in the process or at least a normal element of the program’s workflow, but it can be hard to track. It might be easy to keep up with initially, but the bigger the repository grows, the more likely it is that this information will be lost.

Our “Oddities” are broken down into four categories: Corrective, Offline, OCR Transcripts, and Weeded Digitization. If the tiffs already exist online but they are not of acceptable quality, then any rescan or further optimization of that particular tiff or group of tiffs would be categorized as Corrective digitization. Offline digitization consists of creating tiffs for a specific patron, institution, or web exhibit. OCR transcripts are formed when we digitize typed transcripts, which are discarded later after OCR. Weeded digitization consists of items that, after digitizing, we learn we do not have the legal right to place online, or else those files are duplicates of content already online.

Digital services records invisible digitization daily in our shift reporter tool, and also in four separate spreadsheets located on a shared drive. Currently our pipeline incorporates a total count at the end of each month to show productivity, including how much content was produced that would not be considered “new content” or “Oddities”, and exactly how many scans that content consists of. Currently the spreadsheets are saved as tab delimited .txt files and our SizeandProgress script picks up the .txt files to include the numbers in our count.

We are in the process of creating a script that will pull the collection identifier, scans, and type of “invisible digitization” from the shift reporter, completely leaving out the need for spreadsheets to capture the count. Our long term goal for tracking oddities would be to create an interface that utilizes a database that is capable of storing oddities in an array, so we could easily retrieve oddity information and create a report based on that information. It’s important that we be able to show all the work that goes into producing our digital content, especially when the evidence isn’t online to speak for itself.

(Thanks to Jessica Anderson, our Repository Manager, for writing this up to share!)

This entry was posted in Software, Work Flow and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published.