This morning we worked with Turtle(s?), a nickname for Terse RDF Triple Language. Turtle allows the long text of complex triples to be written in an abbreviated format. For the purposes of the class, or at least my own benefit as a beginner with RDF, writing-out the triples in long form is best. Once the entire triple is there in its extended form, it’s easier to see the connection between the full triple and the abbreviated Turtle version of the triple. Going over Turtle this morning was helpful because it made us think about the triples we were using, their composition, and how they could be better structured.
RDF (Resource Description Framework) is not a programming language, so there’s no way for it to throw an error if something is missing or incorrect. An error message could come later from another program that can’t find information, or if information isn’t presented as expected. However, this delay can make working with RDF triples a bit tricky.
JSON-LD is a relatively new way of transporting data. Its initial development began in 2010, and it became a W3C Recommendation as of January 2014. There’s a quite a bit of controversy surrounding JSON-LD, but it’s not really about the method or the technical specifications. Instead, the argument is whether JSON-LD is for the Semantic Web (human readable) or for API enhancement (machine readable). On the surface the discourse surrounding JSON-LD might appear to be trivial, only a matter for technical debate. The deeper question though, is how information is represented in digital space, and what role people have as developers and consumers in our modern and machine-actionable world.
I’ve been watching Wes Anderson’s film Rushmore (1998) over the last couple of days. I’m enjoying the movie so far, but not so much for Rushmore itself. It’s more so that Rushmore is an early example of how later films like The Life Aquatic with Steve Zissou (2004), Moonrise Kingdom (2012), and The Grand Budapest Hotel (2014) came to be. In these later films the storytelling moves from great in The Life Aquatic, to masterful in The Grand Budapest Hotel. Especially in The Grand Budapest Hotel, the storytelling is nuanced, elegant, and dainty.
Moonrise Kingdom and The Grand Budapest Hotel both focus on societal outcasts, those least able to represent themselves within bureaucratic structures. Zero is a stateless bellboy who falls helplessly in love with the fearless Agatha in The Grand Budapest Hotel. In Moonrise Kingdom the resourceful Sam and the indomitable Suzy also discover love. Both of these couples struggle because of political and social systems that are difficult to maneuver and unfair in their judgement. All four of these characters are excluded from society in some way for being different. Yet, as couples in love they are helped by their parents or mentors to become whole, both individually and together. The journey for these characters is not toward some sort of prescribed normalcy, but rather toward acceptance and inclusivity. As Sam says in Moonrise Kingdom, “poems don’t always have to rhyme, you know.”
In the morning today we talked about RDF and how its data is composed. RDF is about sharing and exchanging information, but not necessarily about sharing the tools to interpret the information. RDF can be like NoSQL in that it’s flexible, just add more properties. When the project becomes more mature though, things needs to be locked down and standardized. Eventually, the information about “blank node” connections would need to be published so that all connections can be clear outside the project.
In the afternoon we worked with markers and construction paper, laying out a physical example of the materials we could work with. In our case this was ancient pottery, and also the variables or attributes of possible pottery fragments. We had a specialist in our group, an academic that works with ancient Mediterranean pottery fragments, and she was able to give us a wide variety of attributes. Each fragment of pottery has multiple data points, such as shape, type, place found, date of creation, and type of glaze. Each of these attributes requires a further deconstruction, such as place, requiring both a name as a string value or text, and also a latitude and longitude value that’s numeric and geographical. RDF information needs to be very granular and specific. For example, not just a dollar amount for price, but two specifications — the dollar amount field would be a number reference, and also a currency reference that would point to a web-hosted ontology.
The value of “glaze” would point to an additional table containing information such as the elemental makeup and the percentage of each element contained within the fragment. It’s possible to use the RDF triple (subject<-->predicate<-->object) of glaze<-->element<-->percentage, but this would not necessarily be machine readable. People could understand that the percentage was a feature of the element, but machines/computers might get stuck at the element value. It’s not certain that machines would read an element and then also look for a percentage, most often the machine reading would stop at the element itself. If a blank node was used, perhaps titled “has components,” then this blank node could point to both the element and the value. This would relate the element and the percentage together without requiring one value to be privileged over the other. Using the title of “has components” would also make this blank node understandable for people.
In the morning portion of class today we analyzed and critiqued projects that have occurred over the years, including the Indiana Ontology Philosophy Project [https://inpho.cogs.indiana.edu], and the 1995 Cervantes Project [http://cervantes.tamu.edu/V2/CPI/index.html]. We discussed RDF ontologies, or vocabularies, and we looked at a few databases that house these descriptors, such as dbpedia.org. Both scientific and humanistic data have been increasing exponentially over the last decade, and the need to link these resources together in an open format is very apparent. However, many academics are unaware that the methods they use to create and distribute data are closed systems and formats, such as Word documents and PDFs. Using the frameworks for linked open data can ensure that web-based projects become connected to the scholarly record, instead of being siloed and possibly forgotten in lonely corners of the Internet.
In the afternoon we covered database types, including SQL, NoSQL, Graph, and LDAP. With each of these database types come benefits and also pitfalls, but the key takeaway is to use the database type you’re most familiar with to help get projects off the ground. SQL databases are more rigid than NoSQL, but the additional flexibility of NoSQL can help projects without a clear idea of their datasets to begin building while the initial development is still in process. RDF is itself a framework, but not a standard. This is obvious in the RDF acronym, Resource Description Framework, but the ubiquity of the term can make it appear as though RDF is fully fleshed-out and set in stone. Overall, what’s really being worked toward through RDF and Linked Open Data is to interconnect web resources in such a way that they’re beneficial for knowledge creation by humans, and this can only be done if they’re inherently readable and actionable by machines.
After a long but invigorating week of DH, Heather @HVanMouwerik, Juliette @profjuliette, and I went for a stroll down by the beach in Victoria. It was a beautiful day, the gardens were in bloom, and we happened upon a tea festival. Only in Victoria.
Our fifth and final day of class was split into two parts: a class session in the morning, and “show and tell” in the afternoon. There was also a lecture on the ethics of digital humanities research in the afternoon. In the morning we discussed the problem of copyright in digital humanities projects, and also the question of code literacy.
Copyright is certainly a tricky thing to figure out, and sadly there are no solid answers except for court judgements. There are, however, some guidelines that can be followed for fair use, or fair dealing practices. The notion that no one can really tell you what is, or what is not, a copyright violation can have a chilling effect on academic scholarship. Important for our discussion was that academics can rely on fair use legally. Following a set of best practices can help ensure that works under copyright remain protected, while also allowing for new and innovative scholarship.
The second portion of our discussion on code literacy was even more contentious than questions of fair use. The night before we watched, or rather listed to, a roundtable discussion posted to Rhizome’s Vimeo page. Although the audio was terrible, the discourse was quite interesting. At the heart of the matter was defining “code literacy” — is it a scientific or technical goal, focusing on engineering and programming aspects — or is the objective humanistic, centering on the idea that code is everywhere in our modern lives and that we have the power to direct or our own futures, digital or otherwise?
There was no clear answer of course, it was more so a point of reflection on digital technology and DH overall. As we worked through the week in Digitization Fundamentals, we learned technical skills and we also learned how to use those skills for creative and meaningful production. Balancing these two facets of code literacy, the scientific and the humanistic, will remain a central feature of our digital projects to come.
On the fourth day of class we moved into video editing. This was an interesting class because video editing seems more approachable than working with audio, but its processing and distribution is also more complex. Frame rates, variable screen sizes, color reproduction, and compression of data are just some of the variables that effect video production. Video is also a very powerful medium, in that it captures the mind in a way that other media might not.
Video production is a time-consuming process with many stages of development. The steps of pre-production, production or shooting, and post-production each have their own components and processes to consider. Equipment from cameras and tripods, to computers and video monitors are necessary to ensure a quality film — not to mention actors, scripts, storyboards, as well as financing and distribution. With all of these things in mind, it’s easy to see how producing a short film would mirror many of the project management considerations within DH.
At its most basic level, video is a form of storytelling. Most videos or films are linear, with the author or director of the film taking control of the form and movement of the narrative. There are some new tools for non-linear storytelling with video, such as Korsakov, but unfortunately Korsakov relies on Flash which is rapidly being replaced by HTML5 video on the web. Youtube makes the distribution of video seem easy, but the process is actually quite complicated. Films must be compressed with a codec to bind clips and audio together, and they must also be a small enough file size to be streamed and shared.
Our final class project for the show and tell was a video produced by the entire class. I contributed my audio file, a song titled “Ice Cream Dubstep,” which I made in class. Other students worked on filming the class itself, some filmed students for individual interviews, other students also made audio clips, and two students, Heather and Rachel, worked together to edit all of these disparate artifacts in one final video.