By Steve Anderson

Journal 3 – RDF and Linked Open Data

This morning we worked with Turtle(s?), a nickname for Terse RDF Triple Language. Turtle allows the long text of complex triples to be written in an abbreviated format. For the purposes of the class, or at least my own benefit as a beginner with RDF, writing-out the triples in long form is best. Once the entire triple is there in its extended form, it’s easier to see the connection between the full triple and the abbreviated Turtle version of the triple. Going over Turtle this morning was helpful because it made us think about the triples we were using, their composition, and how they could be better structured.

RDF (Resource Description Framework) is not a programming language, so there’s no way for it to throw an error if something is missing or incorrect. An error message could come later from another program that can’t find information, or if information isn’t presented as expected. However, this delay can make working with RDF triples a bit tricky.

JSON-LD is a way to use both JSON (Javascript Object Notation (it’s not using “Javascript” anymore, but it was initially part of the format)) and Linked Data. JSON is relatively new, used by tech startups, and its popularity is growing because of the many web applications being developed. JSONlint is a validator for JSON, and the playground at json-ld.org can be used for further testing. JSON-LD is designed to build on existing APIs (Application Program Interface) in semantic ways to make them more usable and data rich. This enhancement isn’t for human readability, but for machines so that APIs are more easily accessed and actionable.

JSON-LD is a relatively new way of transporting data. Its initial development began in 2010, and it became a W3C Recommendation as of January 2014. There’s a quite a bit of controversy surrounding JSON-LD, but it’s not really about the method or the technical specifications. Instead, the argument is whether JSON-LD is for the Semantic Web (human readable) or for API enhancement (machine readable). On the surface the discourse surrounding JSON-LD might appear to be trivial, only a matter for technical debate. The deeper question though, is how information is represented in digital space, and what role people have as developers and consumers in our modern and machine-actionable world.

This example of a Turtle at w3.org contains errors, it should read ( :a :b :c ) - no spaces between the colon and the b & c values.
This example of a Turtle at w3.org contains errors, it should read ( :a :b :c ) – no spaces between the colon and the b & c values.

Poems Don’t Always Have to Rhyme, You Know

I’ve been watching Wes Anderson’s film Rushmore (1998) over the last couple of days. I’m enjoying the movie so far, but not so much for Rushmore itself. It’s more so that Rushmore is an early example of how later films like The Life Aquatic with Steve Zissou (2004), Moonrise Kingdom (2012), and The Grand Budapest Hotel (2014) came to be. In these later films the storytelling moves from great in The Life Aquatic, to masterful in The Grand Budapest Hotel. Especially in The Grand Budapest Hotel, the storytelling is nuanced, elegant, and dainty.

Moonrise Kingdom and The Grand Budapest Hotel both focus on societal outcasts, those least able to represent themselves within bureaucratic structures. Zero is a stateless bellboy who falls helplessly in love with the fearless Agatha in The Grand Budapest Hotel. In Moonrise Kingdom the resourceful Sam and the indomitable Suzy also discover love. Both of these couples struggle because of political and social systems that are difficult to maneuver and unfair in their judgement. All four of these characters are excluded from society in some way for being different. Yet, as couples in love they are helped by their parents or mentors to become whole, both individually and together. The journey for these characters is not toward some sort of prescribed normalcy, but rather toward acceptance and inclusivity. As Sam says in Moonrise Kingdom, “poems don’t always have to rhyme, you know.”

Sam and Suzy in Moonrise Kingdom.
Sam and Suzy in Moonrise Kingdom.

Journal 2 – RDF and Linked Open Data

In the morning today we talked about RDF and how its data is composed. RDF is about sharing and exchanging information, but not necessarily about sharing the tools to interpret the information. RDF can be like NoSQL in that it’s flexible, just add more properties. When the project becomes more mature though, things needs to be locked down and standardized. Eventually, the information about “blank node” connections would need to be published so that all connections can be clear outside the project.

An informal graph of sample triples by W3C: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/

In the afternoon we worked with markers and construction paper, laying out a physical example of the materials we could work with. In our case this was ancient pottery, and also the variables or attributes of possible pottery fragments. We had a specialist in our group, an academic that works with ancient Mediterranean pottery fragments, and she was able to give us a wide variety of attributes. Each fragment of pottery has multiple data points, such as shape, type, place found, date of creation, and type of glaze. Each of these attributes requires a further deconstruction, such as place, requiring both a name as a string value or text, and also a latitude and longitude value that’s numeric and geographical. RDF information needs to be very granular and specific. For example, not just a dollar amount for price, but two specifications — the dollar amount field would be a number reference, and also a currency reference that would point to a web-hosted ontology.

The value of “glaze” would point to an additional table containing information such as the elemental makeup and the percentage of each element contained within the fragment. It’s possible to use the RDF triple (subject<-->predicate<-->object) of glaze<-->element<-->percentage, but this would not necessarily be machine readable. People could understand that the percentage was a feature of the element, but machines/computers might get stuck at the element value. It’s not certain that machines would read an element and then also look for a percentage, most often the machine reading would stop at the element itself. If a blank node was used, perhaps titled “has components,” then this blank node could point to both the element and the value. This would relate the element and the percentage together without requiring one value to be privileged over the other. Using the title of “has components” would also make this blank node understandable for people.

Our RDF Graph for the term "glaze." On the left of "glaze" is the blank node with the name "has components."
Our RDF Graph for the term “glaze.” On the left of “glaze” is the blank node with the name “has components.”

Journal 1 – RDF and Linked Open Data

In the morning portion of class today we analyzed and critiqued projects that have occurred over the years, including the Indiana Ontology Philosophy Project [https://inpho.cogs.indiana.edu], and the 1995 Cervantes Project [http://cervantes.tamu.edu/V2/CPI/index.html]. We discussed RDF ontologies, or vocabularies, and we looked at a few databases that house these descriptors, such as dbpedia.org. Both scientific and humanistic data have been increasing exponentially over the last decade, and the need to link these resources together in an open format is very apparent. However, many academics are unaware that the methods they use to create and distribute data are closed systems and formats, such as Word documents and PDFs. Using the frameworks for linked open data can ensure that web-based projects become connected to the scholarly record, instead of being siloed and possibly forgotten in lonely corners of the Internet.

In the afternoon we covered database types, including SQL, NoSQL, Graph, and LDAP. With each of these database types come benefits and also pitfalls, but the key takeaway is to use the database type you’re most familiar with to help get projects off the ground. SQL databases are more rigid than NoSQL, but the additional flexibility of NoSQL can help projects without a clear idea of their datasets to begin building while the initial development is still in process. RDF is itself a framework, but not a standard. This is obvious in the RDF acronym, Resource Description Framework, but the ubiquity of the term can make it appear as though RDF is fully fleshed-out and set in stone. Overall, what’s really being worked toward through RDF and Linked Open Data is to interconnect web resources in such a way that they’re beneficial for knowledge creation by humans, and this can only be done if they’re inherently readable and actionable by machines.

Screen capture of the Cervantes Project showing multiple problems, including character encoding.
Screen capture of the Cervantes Project showing multiple problems, including character encoding.
Screen capture of from the Shelley-Goodwin Archive, images and text match, and also additional viewing options.
Screen capture of from the Shelley-Godwin Archive, images and text match, and also additional viewing options.

Tweets from Social Knowledge Creation

Some of my tweets and other favorite tweets from the Social Knowledge Creation in the Humanities mini-conference (Conference schedule PDF archive):

Journal 5 – Digitization Fundamentals

American Research Libraries (ARL) Code of Best Practices for Fair Use infographic
American Research Libraries (ARL) Code of Best Practices for Fair Use infographic

Our fifth and final day of class was split into two parts: a class session in the morning, and “show and tell” in the afternoon. There was also a lecture on the ethics of digital humanities research in the afternoon. In the morning we discussed the problem of copyright in digital humanities projects, and also the question of code literacy.

Copyright is certainly a tricky thing to figure out, and sadly there are no solid answers except for court judgements. There are, however, some guidelines that can be followed for fair use, or fair dealing practices. The notion that no one can really tell you what is, or what is not, a copyright violation can have a chilling effect on academic scholarship. Important for our discussion was that academics can rely on fair use legally. Following a set of best practices can help ensure that works under copyright remain protected, while also allowing for new and innovative scholarship.

The second portion of our discussion on code literacy was even more contentious than questions of fair use. The night before we watched, or rather listed to, a roundtable discussion posted to Rhizome’s Vimeo page. Although the audio was terrible, the discourse was quite interesting. At the heart of the matter was defining “code literacy” — is it a scientific or technical goal, focusing on engineering and programming aspects — or is the objective humanistic, centering on the idea that code is everywhere in our modern lives and that we have the power to direct or our own futures, digital or otherwise?

There was no clear answer of course, it was more so a point of reflection on digital technology and DH overall. As we worked through the week in Digitization Fundamentals, we learned technical skills and we also learned how to use those skills for creative and meaningful production. Balancing these two facets of code literacy, the scientific and the humanistic, will remain a central feature of our digital projects to come.

Journal 4 – Digitization Fundamentals

A study in video compression: one video of red moving balls shown in three different video formats. In this photo, the red ball in the lower left shows fewer artifacts than the other two.
A study in video compression: one video of a red moving ball shown in three different video formats. In this photo, the red ball in the lower left shows fewer artifacts than the other two. However, its higher quality and file size make it a more difficult video to share online.

On the fourth day of class we moved into video editing. This was an interesting class because video editing seems more approachable than working with audio, but its processing and distribution is also more complex. Frame rates, variable screen sizes, color reproduction, and compression of data are just some of the variables that effect video production. Video is also a very powerful medium, in that it captures the mind in a way that other media might not.

Video production is a time-consuming process with many stages of development. The steps of pre-production, production or shooting, and post-production each have their own components and processes to consider. Equipment from cameras and tripods, to computers and video monitors are necessary to ensure a quality film — not to mention actors, scripts, storyboards, as well as financing and distribution. With all of these things in mind, it’s easy to see how producing a short film would mirror many of the project management considerations within DH.

At its most basic level, video is a form of storytelling. Most videos or films are linear, with the author or director of the film taking control of the form and movement of the narrative. There are some new tools for non-linear storytelling with video, such as Korsakov, but unfortunately Korsakov relies on Flash which is rapidly being replaced by HTML5 video on the web. Youtube makes the distribution of video seem easy, but the process is actually quite complicated. Films must be compressed with a codec to bind clips and audio together, and they must also be a small enough file size to be streamed and shared.

Our final class project for the show and tell was a video produced by the entire class. I contributed my audio file, a song titled “Ice Cream Dubstep,” which I made in class. Other students worked on filming the class itself, some filmed students for individual interviews, other students also made audio clips, and two students, Heather and Rachel, worked together to edit all of these disparate artifacts in one final video.

Digital Imaginary – Tweets from Session 1 Colloquium, Week 1

Tweets regarding my DHSI Colloquium talk on my dissertation research (The Digital Imaginary), “Sharing the Digital Imaginary: Dissertation Blogging and the Companion Website” (colloquium schedule PDF archive):