Digitizing Oswalt's Fileslips

Oswalt went to great lengths (heroic lengths, actually, considering when he was working) to transcribe his work on Pomoan languages digitally. But it turns out that he left out a lot of stuff. Like, for instance, cough, most of the nouns.

So the transcription needs to be completed. There are a lot of folks who participated in LSA310 over the summer who would be happy to contribute to a transcription effort, but before we can get everyone collaborating, we'd need to get the fileslips digitized. (And then there's the issue of actually doing the transcriptions; but that's another project.)

First method

Initially I tried taking a picture of cards by holding them up in front of the webcam on my laptop. It was more or less legible, and fairly fast. But the resolution of the webcam was low. Also, it's hard to get consistent lighting and cropping when you hold the image there by hand.

Quality degrades even more for pink cards, written in pencil.

(Ehem, please notice Mary Haas ' typewriter in the background.)

A few cards are typed, but even then quality can be low:

Getting all MacGyver

Enough with what didn't work.

First you get a box. One of these will do:

The box needs a hole big enough to put the camera lens through.

To keep the index cards in place, I used a plastic index card holder, two bucks at Walgreens.

As you can see, the hole and the holder need to more or less line up. I was careful to make sure the holder was at close to a right angle to the edge of the box, but since the camera can be adjusted before beginning and doesn't move until you're done, placement is not too critical.

Here's the basic idea of how the camera will be held in place. I happened to have one of those flexy tripod thingies. (I suppose rubber bands or something would also suffice.)

Camera hack

Since we want to be efficient (there are thousands of fileslips), we need to figure out how to take images quickly, preferably by looping automatically so we don't have to move the camera. There is in fact a way to add such capabilities to a point-and-click camera, with free software. (Thanks to my geeky friend from Helsinki for showing it to me! No, not that geeky guy from Helsinki...)

It's a bit fiddly...

...but it boils down to copying certain files onto your memory card, and then clicking a series of buttons when you turn the camera on. (There's no risk to the camera since the files are only on the card; if you format it or use a different card your camera goes right back to normal. It doesn't void the warranty, either.)

Pretty much ready to go at this point:

(Those aren't actually 3x5 index cards so they're not lined up right. But Oswalt's are, so they'll fit snugly in the holder.)

The resulting images

Here's a crummy webcam pic of a little test fileslip I made:

And here's the resulting digitized version:

Full-resolution image of sample file.

Just looking at a zoomed-in word, the resolution seems sufficient:

So the idea is, put a stack of cards into the holder, line stuff up, and start the camera clicking. Each n seconds (where n depends on your prestadigitory prowess), remove a card from the top of the stack, put it back down backside-up if there's something on the back, and then place the imaged card in a pile off the stack. If you keep a bookmark in the archival box to mark where you've taken the current stack from, risk of shuffling any of the cards is low.

Todo list

  1. Practice getting consistently good lighting (the samples were done in my apartment in the wee hours).
  2. Improve the script on the camera to turn macro on and flash off automatically
  3. Figure out how to make the memory card bootable so you don't have to press fiddly buttons, just insert "the special memory card."
  4. Improve the script to save the resulting images with more mnemonic filenames.
  5. Look into having the camera take raw images rather than JPGs. This is possible with CHDK, but I haven't tried it yet. I'm told by the people who work on CHDK that the increase in resolution will be minimal at this focal length, and wouldn't be worth the considerable increase in file size.

Conclusion

So, this isn't archive quality. Most importantly, the output is JPG rather than an raw format such as TIFF. But the important thing here, I think, is that it's fast. The tradeoff is worth it, I think, given the cost—the whole setup costs no more than a digital camera and a memory card. If one wanted to be fancier-schmancier about it, one could get a copystand for a couple hundred bucks—that's what the folks at Kaipuleohone, the University of Hawai'i Digital Ethnographic Archive, are using. (They're also using a digital camera as opposed to a scanner, by the way.)

But for this project the cost is essentially only a time investment.