From RDF and Linked Open Data

Journal 2 – RDF and Linked Open Data

In the morning today we talked about RDF and how its data is composed. RDF is about sharing and exchanging information, but not necessarily about sharing the tools to interpret the information. RDF can be like NoSQL in that it’s flexible, just add more properties. When the project becomes more mature though, things needs to be locked down and standardized. Eventually, the information about “blank node” connections would need to be published so that all connections can be clear outside the project.

An informal graph of sample triples by W3C: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/

In the afternoon we worked with markers and construction paper, laying out a physical example of the materials we could work with. In our case this was ancient pottery, and also the variables or attributes of possible pottery fragments. We had a specialist in our group, an academic that works with ancient Mediterranean pottery fragments, and she was able to give us a wide variety of attributes. Each fragment of pottery has multiple data points, such as shape, type, place found, date of creation, and type of glaze. Each of these attributes requires a further deconstruction, such as place, requiring both a name as a string value or text, and also a latitude and longitude value that’s numeric and geographical. RDF information needs to be very granular and specific. For example, not just a dollar amount for price, but two specifications — the dollar amount field would be a number reference, and also a currency reference that would point to a web-hosted ontology.

The value of “glaze” would point to an additional table containing information such as the elemental makeup and the percentage of each element contained within the fragment. It’s possible to use the RDF triple (subject<-->predicate<-->object) of glaze<-->element<-->percentage, but this would not necessarily be machine readable. People could understand that the percentage was a feature of the element, but machines/computers might get stuck at the element value. It’s not certain that machines would read an element and then also look for a percentage, most often the machine reading would stop at the element itself. If a blank node was used, perhaps titled “has components,” then this blank node could point to both the element and the value. This would relate the element and the percentage together without requiring one value to be privileged over the other. Using the title of “has components” would also make this blank node understandable for people.

Our RDF Graph for the term "glaze." On the left of "glaze" is the blank node with the name "has components."
Our RDF Graph for the term “glaze.” On the left of “glaze” is the blank node with the name “has components.”

Journal 1 – RDF and Linked Open Data

In the morning portion of class today we analyzed and critiqued projects that have occurred over the years, including the Indiana Ontology Philosophy Project [https://inpho.cogs.indiana.edu], and the 1995 Cervantes Project [http://cervantes.tamu.edu/V2/CPI/index.html]. We discussed RDF ontologies, or vocabularies, and we looked at a few databases that house these descriptors, such as dbpedia.org. Both scientific and humanistic data have been increasing exponentially over the last decade, and the need to link these resources together in an open format is very apparent. However, many academics are unaware that the methods they use to create and distribute data are closed systems and formats, such as Word documents and PDFs. Using the frameworks for linked open data can ensure that web-based projects become connected to the scholarly record, instead of being siloed and possibly forgotten in lonely corners of the Internet.

In the afternoon we covered database types, including SQL, NoSQL, Graph, and LDAP. With each of these database types come benefits and also pitfalls, but the key takeaway is to use the database type you’re most familiar with to help get projects off the ground. SQL databases are more rigid than NoSQL, but the additional flexibility of NoSQL can help projects without a clear idea of their datasets to begin building while the initial development is still in process. RDF is itself a framework, but not a standard. This is obvious in the RDF acronym, Resource Description Framework, but the ubiquity of the term can make it appear as though RDF is fully fleshed-out and set in stone. Overall, what’s really being worked toward through RDF and Linked Open Data is to interconnect web resources in such a way that they’re beneficial for knowledge creation by humans, and this can only be done if they’re inherently readable and actionable by machines.

Screen capture of the Cervantes Project showing multiple problems, including character encoding.
Screen capture of the Cervantes Project showing multiple problems, including character encoding.
Screen capture of from the Shelley-Goodwin Archive, images and text match, and also additional viewing options.
Screen capture of from the Shelley-Godwin Archive, images and text match, and also additional viewing options.