From Digitization Fundamentals

Journal 5 – Digitization Fundamentals

American Research Libraries (ARL) Code of Best Practices for Fair Use infographic
American Research Libraries (ARL) Code of Best Practices for Fair Use infographic

Our fifth and final day of class was split into two parts: a class session in the morning, and “show and tell” in the afternoon. There was also a lecture on the ethics of digital humanities research in the afternoon. In the morning we discussed the problem of copyright in digital humanities projects, and also the question of code literacy.

Copyright is certainly a tricky thing to figure out, and sadly there are no solid answers except for court judgements. There are, however, some guidelines that can be followed for fair use, or fair dealing practices. The notion that no one can really tell you what is, or what is not, a copyright violation can have a chilling effect on academic scholarship. Important for our discussion was that academics can rely on fair use legally. Following a set of best practices can help ensure that works under copyright remain protected, while also allowing for new and innovative scholarship.

The second portion of our discussion on code literacy was even more contentious than questions of fair use. The night before we watched, or rather listed to, a roundtable discussion posted to Rhizome’s Vimeo page. Although the audio was terrible, the discourse was quite interesting. At the heart of the matter was defining “code literacy” — is it a scientific or technical goal, focusing on engineering and programming aspects — or is the objective humanistic, centering on the idea that code is everywhere in our modern lives and that we have the power to direct or our own futures, digital or otherwise?

There was no clear answer of course, it was more so a point of reflection on digital technology and DH overall. As we worked through the week in Digitization Fundamentals, we learned technical skills and we also learned how to use those skills for creative and meaningful production. Balancing these two facets of code literacy, the scientific and the humanistic, will remain a central feature of our digital projects to come.

Journal 4 – Digitization Fundamentals

A study in video compression: one video of red moving balls shown in three different video formats. In this photo, the red ball in the lower left shows fewer artifacts than the other two.
A study in video compression: one video of a red moving ball shown in three different video formats. In this photo, the red ball in the lower left shows fewer artifacts than the other two. However, its higher quality and file size make it a more difficult video to share online.

On the fourth day of class we moved into video editing. This was an interesting class because video editing seems more approachable than working with audio, but its processing and distribution is also more complex. Frame rates, variable screen sizes, color reproduction, and compression of data are just some of the variables that effect video production. Video is also a very powerful medium, in that it captures the mind in a way that other media might not.

Video production is a time-consuming process with many stages of development. The steps of pre-production, production or shooting, and post-production each have their own components and processes to consider. Equipment from cameras and tripods, to computers and video monitors are necessary to ensure a quality film — not to mention actors, scripts, storyboards, as well as financing and distribution. With all of these things in mind, it’s easy to see how producing a short film would mirror many of the project management considerations within DH.

At its most basic level, video is a form of storytelling. Most videos or films are linear, with the author or director of the film taking control of the form and movement of the narrative. There are some new tools for non-linear storytelling with video, such as Korsakov, but unfortunately Korsakov relies on Flash which is rapidly being replaced by HTML5 video on the web. Youtube makes the distribution of video seem easy, but the process is actually quite complicated. Films must be compressed with a codec to bind clips and audio together, and they must also be a small enough file size to be streamed and shared.

Our final class project for the show and tell was a video produced by the entire class. I contributed my audio file, a song titled “Ice Cream Dubstep,” which I made in class. Other students worked on filming the class itself, some filmed students for individual interviews, other students also made audio clips, and two students, Heather and Rachel, worked together to edit all of these disparate artifacts in one final video.

Ice Cream Dubstep

[soundcloud url=”″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

After the class created Ice Cream Original, I downloaded the audio files that we were working with in Audacity.

From here I took the tracks in a new direction, creating “Ice Cream Dubstep.”

The vocals are the same as the original, just modified with “stretch and pitch” using Adobe Audition. The gong sound and the ice cream truck melody are also altered with Audition. I removed the applause, and I added in two dubstep tracks from Freesound, “dubstep_drumloop_crunch” and “dubstep growl.”

Adding the tracks to Audition was easy, but it was a little tricky to make them melodic. Within Audition I zoomed-in and examined the waveforms in order to line-up the tracks so they hit on the proper beats.

I also uploaded Ice Cream Dubstep to Soundcloud. Using Adobe Illustrator I made a quick graphic for Ice Cream Dubstep using an ice cream poster pack purchased from Creative Market.


Ice Cream Original

[soundcloud url=”″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

On the third day of class in Digitization Fundamentals we worked with audio files, and it was pretty amazing.

The instructor Robin Davies worked through the differences between audio frequency and sampling rate, and we got our feet wet with Audacity. One of the students recorded a short line of poetry, and Robin tweaked the settings and microphone placement. This showed us how important it is to get a good capture of the original audio, many times the mic is too far away and there’s too much noise in the background.

The line of poetry recorded was from Wallace Stevens’, “The Emperor of Ice Cream”:

Let be be finale of seem.
The only emperor is the emperor of ice-cream.

With this track running in Audacity, Robin added some sound clips from Freesound, and we extracted some audio from a Youtube clip. We ended up with a strange melange of sounds, the poetry line, a gong, an ice cream truck, and applause. I’ve called this track “Ice Cream Original,” and I posted it to Soundcloud.

Journal 3 – Digitization Fundamentals

Working with audio files in Audacity: These files were mixed in class to produce the original clip “Ice Cream,” which included poetry read aloud by a student in the class.

Day three was all about audio, and I’ll have to admit, this was my favorite subject of the week. I had worked with audio in the past, but I never really understood the processes that were taking place. The numbers and waveforms in programs like Audacity and Adobe Audition can be a bit daunting to understand. Thankfully, Robin presented the material in a clear and compelling way, taking us step-by-step through the process. This class tied-in with the first day, as we worked more directly with the numbers and mathematical concepts of bits and bytes. Even though waveforms are visual representations of sound, it was easier to see how the waveforms related directly to mathematical data points.

To help us work through the process of editing audio, Robin played a single track by Radiohead in a variety of ways. Reducing the sampling rate had a direct impact on the playability of the track, moving it from clean and crisp to dull and distant in tone. Since they’re both measured in Hertz or Hz, it was difficult at first to see how audio frequency and sampling rate differed from one another.

Audio frequency is the vibration, the wave itself that is the actual sound. Sampling rate is the number of times per second the wave is measured. Humans can hear a range of sound, or audio frequency between 20 and 20,000 Hz. In order to turn these waves into digital data that is representative of the sound, the waves must be measured at twice the Hz, around 40,000 Hz. This high sampling rate is necessary to capture the waves at crests and at troughs, giving a full picture of the sound. Capturing the wave at the same rate as its audio frequency would only produce data points for crests, or only troughs, and the sound would not be dynamic or representative of the original.

Journal 2 – Digitization Fundamentals

Our second day of class in Digitization Fundamentals covered images, data file formats, HTML and CSS. One of the instructors, Robin Davies, discussed how libraries were increasingly advertising for positions that included DH skills, such as Digital Preservation Librarian. This job description called for the management of “emerging digital preservation practices,” and also for expertise with “all phases of the life cycle of digital content with the aim of long-term retention and access.” Important here is not only the requirement for preexisting skills with digital artifacts, but the idea that digital practices change often, and digital humanists must stay ahead of the curve to ensure that DH projects stand the test of time.

Digitizing materials must occur in such a way that the digital capture is as true to the original as possible. This means, of course, that no photo filters should be used, or other aesthetic tools so common in social media. However, there’s also the possibility that some tweaking of the device’s default settings might be necessary to create a digital file that’s true to the original. For example, if a page of text is black ink on white paper, scanning this with black and white settings might seem appropriate. If the text has faded a bit though, and the paper is no longer bright white but yellowish or greenish, a color scan or photograph would more accurately convey the age and texture of the document.

We had an interesting conversation about the difference between a scan and a photograph. The class really had a tough time drawing a solid line between the two, as there was much overlap between the processes. Book scanning can be especially difficult. The spine of the book should remain unharmed and unbroken, making flat-bed scanners problematic. Using an apparatus with two cameras pointing at each page from an angle can help retain the shape of the book, while also enabling digitization. But is this a scan, or a photograph? Does the file format, whether pdf or jpg, shape the definition? And are higher resolutions always necessary, and which process might produce the most faithful digital artifact? These questions are difficult to answer, and perhaps the best practice is to use the methods most fit according to time, materials, and resources.

Non-destructive apparatus at DIY Book Scanner. A scanner like this was described in the book, Mr. Penumbra’s 24 Hour Bookstore, which we discussed in class.

Journal 1 – Digitization Fundamentals

The first day of class was an overview of the course and the expectations for the week. We made our introductions and discussed some of the aspects of digitization we would cover in the class. One of the instructors, Michael Nixon, discussed how digital objects such as photographs are representations of the real world, however they also leave out a lot of information. For example, a digital image of a garden does not include the smell of flowers, the photo might not indicate location or direction, or time of day. This extra information or metadata is important for considering the digital image as a more complete representation of what it’s attempting to capture. In addition to this, there is also the question of how photographs are displayed digitally. Not only is the photographer making decisions that effect the final image, but the computer is also processing information and interpreting the data within the image file in order to show a visual object on the screen.

Digital Humanities projects need to have a set of goals in order to be successful. Besides the project data itself, there are also other aspects to consider, such as storage formats, backups, and data migration. At each of these points of contact and transference, data can be corrupted or lost, and it’s important to have a set strategy that all members of the DH project are aware of and follow. Developing a set of best practices for DH projects at the outset can help team members feel more certain about their actions. There are many aspects of a DH project, from the initial idea and grant writing, to the digitization of materials and their transcription, hosting, and archival storage. Measuring development by milestones and using sitemaps and wireframes for communication can help all members of the team stay on track and feel positive about their contribution to the project.

garden photo with grid overlay
Visualizing how a computer might begin to breakdown an image into parts in order to process the photographic data.