Plenty of digital humanists have gotten quite good at knowing how to take text files and, as Steve Ramsay says, “screw around” with them in fairly sophisticated ways using various algorithms–TF-IDF, topic modeling, N+7. But lots of digital artifacts aren’t text. We aren’t (I’m supposing) as good at screwing around with those.
So I want to talk about what the basic toolkit is or should be for playing around when you have a big file with some other kind of digital files–particularly images or sound files. What projects out there are making creative use of open-source image-processing software we can drop on our own files? Can we make some of the format-agnostic techniques that we sometimes use for clustering texts–normalized compression distance, say–be useful on binary audio or image files as well as on ASCII text? Are there any open source image-processing programs out there with the potential to be as useful for historians of visual artifacts as MALLET can be for textual scholars?
I’m coming at this from a position of a few experiments but no deep expertise. I think we’ll be able to rope in some people with more experience with clustering and classification techniques on big stores of images. If you have some fantastic way of exploring through MP3 files or archival photographs, or some idea about what software out there we should be taking advantage of and aren’t–or if you have a big stack of archival photos and want to do something useful with them–I’d love to see it.