From Daily log

Journal 5 – Professionalizing the Early Career Digital Humanist

On the last day of class, also the last day of Week 3 DHSI 2015, we worked on our individual projects. Many in the class created new websites on WordPress.com and especially Squarespace. WordPress.com websites are easy to setup, there are many free themes, and the blogging function is pretty straightforward. For those in the class that did not want a blog but did want a more stylish web presence, Squarespace was the favorite. There is no free tier for Squarespace, but the monthly cost is only around $8, or less with education pricing. Squarespace also has blogging features, but most of the students in the class that went with Squarespace chose to make a static website.

WordPress.com is free, however they also place ads on your site. These ads are seen by seen by a small percentage of visitors, and they help WordPress.com cover expenses. For academics entering the job market though, these ads can make the site look a little unprofessional. Since it’s difficult to tell who will see the ads and when (possibly a hiring committee, who might only visit the site once), it’s best to pay for an ad-free website, about $30 yearly.

Many of the Squarespace templates were clean and crisp. Students in the class really liked the clean modern design of Squarespace, but some found the customization options a little tricky. One of the students emailed Squarespace for help during class, and she received a reply within minutes. Such great customer service is another compelling reason to go with Squarespace.

Later in the afternoon we worked on grants and other types of project proposals. Some of us wrote proposals on the spot, while others refined documents they had already been working on. We used Dr. Karen’s Foolproof Grant Template, focusing on the transition between the initial “large general topic of wide interest” and the “gap in knowledge” section. This transition can make or break a grant proposal — it’s in the beginning of the document and it must be compelling and informative in order to catch the eye of the funding committee.

drewbarker.info webpage
One of the students, Drew Barker @drewNblue, built his Squarespace website in-class with his own photos. He even purchased a new domain name, http://drewbarker.info in order to professionalize his web presence.

Journal 4 – Professionalizing the Early Career Digital Humanist

Grant writing was the subject of our discussion this morning, and we focused on the professional and social networks that grant information flows through. Our class has a mix of people from the United States and Canada, so we covered both US and Canadian grant agencies.


Canadian funding agencies

  • Social Sciences and Humanities Research Council (SSHRC)
  • Canadian Foundation for Innovation (CFI)
  • Canadian Heritage
  • Province specific funding
  • Canada Council for the Arts

US funding agencies

  • National Endowment for the Humanities, Office of Digital Humanities (NEH ODH)
  • National Science Foundation (NSF)

Private Funding Organizations

  • Andrew W. Mellon
    • Scholarly Communication
    • Higher Education and Scholarship in the Humanities
  • MacArthur Foundation
    • Digital Media & Learning
  • Alfred P. Sloan Foundation
  • Wellcome Trust

Things to think about when proposing a DH project grant:
Audience
– the people receiving and deciding on the grant might not be steeped in the digital humanities or DH culture, be sure to explain things clearly and see the wider picture of the research

Interdisciplinarity
– not just inter-humanities, the sciences should be involved too, as a full partner, not just some type of technical support

Collaboration
– this can extend beyond the university or academia itself, or perhaps across institutions, perhaps even in different countries

Explaining equipment and space needs
– some items on the grant might not be approved or funded, but if these items are essential, be sure to include clear reasoning for the equipment or space, how it will be used, how long it will be needed for, and what will happen to the equipment at the end of the grant/research period

Making budgets seem reasonable
– this isn’t so much to make the budget small, but to make sure that all aspects of the project have been considered, this requires collaboration at the very beginning of the writing process


We also talked about “Dr. Karen Kelsky’s Foolproof Grant Template” at http://theprofessorisin.com The structure of the proposal is clear, concise, and very persuasive. It’s a must-have formula for constructing grant proposals (as well as a thesis, abstract, or prospectus).

Dr. Karen Kelsky's Foolproof Grant Template at http://theprofessorisin.com
Dr. Karen Kelsky’s Foolproof Grant Template at http://theprofessorisin.com

Journal 3 – Professionalizing the Early Career Digital Humanist

We dove into online presence and cyberinfrastructure today, starting with an overview of WordPress and Squarespace as possibilities. WordPress has two separate paths, either wordpress.com (which is a free service, with premium features available for a fee), or wordpress.org (which is the open source engine of WordPress, where you download and host your own installation). With WordPress.org we could run the installation on a host like asmallorange.com or reclaimhosting.com With the power of the self-hosted WordPress install though, comes a lot of responsibility. In addition to choosing a theme, designing pages and posts, and uploading content, there are other things to worry about, like storage space and bandwidth. In addition, wordpress.org sites need other things, like backup and malware protection. Running your own self-hosted WordPress site can be rewarding, but it also requires a high investment of time.

Squarespace was another great solution for hosting our online presence, and domain names can be purchased there as well. Squarespace has a blogging feature, just like WordPress, but it can also be extended in many other ways, such as adding an online storefront. With Squarespace, the control panel is within the site itself, so changes can be made right in the browser with the site live on the right-hand side of the screen. WordPress has been moving in this direction as well, where sites can be redesigned live with their new Customizer feature. Squarespace also has an education discount of 50%, it’s available at: http://squarespace.com/students

Along with creating a personal/academic website, we also discussed other methods of sharing scholarly work. Some of these options were posting audio clips to SoundCloud, creating a podcast, making a short video, and participating in a three-minute dissertation contest. The competition might go by different names according to the school, UC Riverside calls it “Grad Slam.” Creating a blog about your dissertation can be really helpful for working through the project, and also for gaining an audience. However, there are many caveats for this, as dissertation committees and future book publishers can have strict rules about the process — always best to check around before starting the project. A few notable digital dissertations were Amanda Visconti’s digital dissertation: http://dr.amandavisconti.com, Kathleen Fitzpatrick’s project for Planned Obsolescence: http://mcpress.media-commons.org/plannedobsolescence/ and Dani Spinosa’s project, Generic Pronoun http://genericpronoun.com

reclaim hosting screenshot
Reclaim Hosting is a great way to go for personal and professional cyberinfrastructure. It’s only $25 a year for 2GB of storage and unlimited bandwidth.

Journal 2 – Professionalizing the Early Career Digital Humanist

This morning we presented the elevator speeches we worked on yesterday afternoon to each other in small groups. An idea we had about this type of one-on-one or small group conversation, was to layer the information so that it’s easy to digest. For example, each sentence could provide a little bit of information about the project, beginning with a basic overview of time and place, and then touching on more complex ideas. Reading the elevator speech versus hearing it on the spot required different types of communication. We found that the ideas written down were clear, but when delivered verbally they could be too dense or heavy with information for a quick listen. Conversely, if the elevator speech is only set to be delivered in person verbally, it might be too informal for a more serious academic meeting. It’s important to hone the elevator speech with written and verbal practice, also considering body language and other nonverbal social cues from the listener.

The afternoon was a discussion of social media, and how we can present ourselves as academics online in the best ways. One of the problems many people face with online identities is that links or URLs can change over the years. An option that some researchers are using is a DOI (Digital Object Identifier). The DOI is a static reference that lives above the level of the domain name, and it can be used as a permanent location for links that might change over time. For example, if you had an important blog post at “example.com/p=25” but the URL was changed to another URL, like “example.com/june-2015/great-blog-post,” a DOI could be used to provide a permanent non-changing link. Many journals and libraries are using the CrossRef system to manage URLs, which helps prevents broken links or link rot. The ORCID (“orchid”) service provides DOIs that can be used for individuals. Funding agencies such as the NIH and NSF are using the integration of ORCID to link researchers and information about their grants. This will help reduce the paperwork overhead of applying for and tracking grants, as information can be entered into ORCID once and then applied to each particular project: https://orcid.org/blog/2014/02/19/link-your-orcid-record-your-funding

ORCID example
This image shows how ORCID information can travel between grant winners, funding agencies, and publishing groups without having to enter information multiple times.

Journal 1 – Professionalizing the Early Career Digital Humanist

After introductions in the morning we started discussing what it means to be a “professional” digital humanist, and how we present ourselves online. Online presence is something many cringe about, but in this modern digital age we’re all already online, somewhere, for something. What that “something” is might be decided by others, unless you curate your own digital presence. We need to Google ourselves and see how we’re presented online. Ideally this is done from a computer you’ve never used or a private/incognito browser window. On your own computer the search results might be catered to you specifically, and it would be best to see your results as others see them. Ranking near the top of page one on Google can be really important for finding jobs, as committees are very likely to do some research about you online. Making sure that you’re visible and presented in the best light requires some work with building online profiles, adding photographs of yourself, and also taking a look at search engine optimization (SEO).

Many academics have a sense that tenure track positions are the only ones to shoot for, but “alt ac” careers can be rewarding as well, possibly with better hours and pay. As early career digital humanists we’re uniquely positioned to work in a variety of industries. Keeping ourselves open to possibilities outside or perhaps alongside academia is important. The vast majority of PhDs are able to get work, but only a small sliver finds a tenure track position. Even with a tenure track job there’s no certainty that tenure will be achieved. Moving forward as digital humanists we should keep our research front and center, and also look toward the horizon for new developments in social media and digital technology.

In the later afternoon we started working on our “elevator speech,” which is a short and precise verbal communication about a major project you’re working on. The idea behind the elevator speech is that if you found yourself in an elevator with a person offering a job, you could quickly explain yourself and your work before the doors open at the next floor. The elevator speech idea comes from the business world, but it’s also very helpful for explaining digital humanities projects or dissertation research in a clear, concise, and friendly way — definitely a good skill for grant writing too.

google search screenshot
This Google Search screenshot for “Steve Anderson” doesn’t show me at all. Using my full name – Steven Gordon Anderson – would help, along with other specific information, like my university ( UC Riverside) or my hometown (La Verne, CA).
google search via vpn screenshot
This Google Search was done from DHSI in Victoria, using the VPN service from the UC Riverside Library. The Canadian search results are gone since the VPN makes it appear as though I’m in the United States. I’m still nowhere to be seen though — time for some SEO work.

Journal 5 – RDF and Linked Open Data

Discussion today included the difficulties of teaching coding and computer literacy. Graphical user interfaces (GUI) have enabled non-programmers to become more familiar with computers and their possibilities. However, these GUI (sometimes pronounced “gooey”) interfaces also change frequently depending on the application and its design aesthetics. Innovations in hardware, such as the trackpad and touch screens, can also effect computer literacy and usage. The command line interface (CLI) seems to be more timeless, with the logic of the program perhaps becoming more apparent through the typed commands themselves. Our instructor Jim mentioned that people learning to code for the first time have often spent years writing and communicating in academia or business, with sentences as the basic structure. For some though (including myself), writing sentences for computers instead of people is terribly difficult, despite the many similarities in structure and language. Programming classes for humanists, whether RDF and Linked Open Data, Ruby on Rails, or Python, can help scholars learn the language and logic of computers, which helps in using our own devices in new ways and also for understanding how social media and the wider Internet operate.

RDF Graph example worksheet for The Canadian Women's World Cup team.
RDF Graph example worksheet for The Canadian Women’s World Cup team.

Our class project for the week was a demonstration of RDF and Linked Open data that was hand-written on construction paper. We chose the Women’s World Cup as our subject, with the Canadian team as our primary node. From the Canadian team outwards, we connected various bits of information in a series of triples, or three part data references. These triples were then written in Turtle to abbreviate the code. We used dbpedia.org as our authoritative ontology, circling the nodes in blue that would reference its database. For example, in the triple [“Canadian team” (subject) <--> “sponsored by” (predicate) <--> “Umbro” (object)], Umbro would link out to its dbpedia.org page: http://dbpedia.org/page/Umbro This gives the Umbro node a definitive reference, and the other information on the page, such as Umbro’s website, their brands, and location would also be accessible through the link. A query could then be run, “Is the Canadian women’s soccer or futbol team sponsored by any European companies.” Even though “Europe” was not on our node worksheet, the dbpedia.org reference would allow an extension of the query into the wider web, resulting in an answer of, “Yes, by Umbro in the UK.” This is a simple example of linked data, but with the further extensions provided by authoritative ontologies, more complex queries would certainly be possible.

RDF triples in Turtle composed of the Graph information in the image above.
RDF triples in Turtle composed of the Graph information in the image above.

Journal 4 – RDF and Linked Open Data

This morning we discussed some steps to open up data. These suggestions were distilled from the Open Knowledge Foundation’s Open Data Handbook by instructor Jim Smith: https://www.dhdata.org/dhdata/datasets/1-how-to-open-up-data/index.html

One of the main points was that DH projects need to start small, and there should be a series of steps or stages for development, even in the smallest projects. Part of this idea of starting small includes limiting the size of the initial dataset. Small datasets can still be helpful to wider communities. If the dataset isn’t moving in the correct direction, its small size can allow for redirection before too much time has been invested.

The dataset for Theatre Finder (a DH project about historic theaters) is written in JSON, but it could be altered to be JSON-LD without destroying or having to rewrite the original database. Coding the dataset as JSON-LD would make the information in Theatre Finder much more usable, connecting it to the wider networks of linked data across the web. Theatre Finder is a good candidate for becoming an authority dataset, or ontology. Theatre Finder didn’t begin with this in mind, but the dataset information is comprehensive and is widely used.

It’s important to consider that database ontologies are generally created by the people who use and need the data on a regular basis. Before creating a dataset DH professionals need to ask the question, “who can and will use this data?” This question is important, because these groups will also be the ones to contribute to the dataset. Stability is also a factor of community involvement. Datasets are constantly changing, not just by adding information to them, but by also reconsidering definitions and vocabularies inline with social and cultural change.

Example data from Theatre Finder. The information under "Overview," such as country and city, could be connected to dbpedia.org as linked data.
Example data from Theatre Finder. The information under “Overview,” such as country and city, could be connected to dbpedia.org as linked data.

Journal 3 – RDF and Linked Open Data

This morning we worked with Turtle(s?), a nickname for Terse RDF Triple Language. Turtle allows the long text of complex triples to be written in an abbreviated format. For the purposes of the class, or at least my own benefit as a beginner with RDF, writing-out the triples in long form is best. Once the entire triple is there in its extended form, it’s easier to see the connection between the full triple and the abbreviated Turtle version of the triple. Going over Turtle this morning was helpful because it made us think about the triples we were using, their composition, and how they could be better structured.

RDF (Resource Description Framework) is not a programming language, so there’s no way for it to throw an error if something is missing or incorrect. An error message could come later from another program that can’t find information, or if information isn’t presented as expected. However, this delay can make working with RDF triples a bit tricky.

JSON-LD is a way to use both JSON (Javascript Object Notation (it’s not using “Javascript” anymore, but it was initially part of the format)) and Linked Data. JSON is relatively new, used by tech startups, and its popularity is growing because of the many web applications being developed. JSONlint is a validator for JSON, and the playground at json-ld.org can be used for further testing. JSON-LD is designed to build on existing APIs (Application Program Interface) in semantic ways to make them more usable and data rich. This enhancement isn’t for human readability, but for machines so that APIs are more easily accessed and actionable.

JSON-LD is a relatively new way of transporting data. Its initial development began in 2010, and it became a W3C Recommendation as of January 2014. There’s a quite a bit of controversy surrounding JSON-LD, but it’s not really about the method or the technical specifications. Instead, the argument is whether JSON-LD is for the Semantic Web (human readable) or for API enhancement (machine readable). On the surface the discourse surrounding JSON-LD might appear to be trivial, only a matter for technical debate. The deeper question though, is how information is represented in digital space, and what role people have as developers and consumers in our modern and machine-actionable world.

This example of a Turtle at w3.org contains errors, it should read ( :a :b :c ) - no spaces between the colon and the b & c values.
This example of a Turtle at w3.org contains errors, it should read ( :a :b :c ) – no spaces between the colon and the b & c values.

Journal 2 – RDF and Linked Open Data

In the morning today we talked about RDF and how its data is composed. RDF is about sharing and exchanging information, but not necessarily about sharing the tools to interpret the information. RDF can be like NoSQL in that it’s flexible, just add more properties. When the project becomes more mature though, things needs to be locked down and standardized. Eventually, the information about “blank node” connections would need to be published so that all connections can be clear outside the project.

An informal graph of sample triples by W3C: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/

In the afternoon we worked with markers and construction paper, laying out a physical example of the materials we could work with. In our case this was ancient pottery, and also the variables or attributes of possible pottery fragments. We had a specialist in our group, an academic that works with ancient Mediterranean pottery fragments, and she was able to give us a wide variety of attributes. Each fragment of pottery has multiple data points, such as shape, type, place found, date of creation, and type of glaze. Each of these attributes requires a further deconstruction, such as place, requiring both a name as a string value or text, and also a latitude and longitude value that’s numeric and geographical. RDF information needs to be very granular and specific. For example, not just a dollar amount for price, but two specifications — the dollar amount field would be a number reference, and also a currency reference that would point to a web-hosted ontology.

The value of “glaze” would point to an additional table containing information such as the elemental makeup and the percentage of each element contained within the fragment. It’s possible to use the RDF triple (subject<-->predicate<-->object) of glaze<-->element<-->percentage, but this would not necessarily be machine readable. People could understand that the percentage was a feature of the element, but machines/computers might get stuck at the element value. It’s not certain that machines would read an element and then also look for a percentage, most often the machine reading would stop at the element itself. If a blank node was used, perhaps titled “has components,” then this blank node could point to both the element and the value. This would relate the element and the percentage together without requiring one value to be privileged over the other. Using the title of “has components” would also make this blank node understandable for people.

Our RDF Graph for the term "glaze." On the left of "glaze" is the blank node with the name "has components."
Our RDF Graph for the term “glaze.” On the left of “glaze” is the blank node with the name “has components.”

Journal 1 – RDF and Linked Open Data

In the morning portion of class today we analyzed and critiqued projects that have occurred over the years, including the Indiana Ontology Philosophy Project [https://inpho.cogs.indiana.edu], and the 1995 Cervantes Project [http://cervantes.tamu.edu/V2/CPI/index.html]. We discussed RDF ontologies, or vocabularies, and we looked at a few databases that house these descriptors, such as dbpedia.org. Both scientific and humanistic data have been increasing exponentially over the last decade, and the need to link these resources together in an open format is very apparent. However, many academics are unaware that the methods they use to create and distribute data are closed systems and formats, such as Word documents and PDFs. Using the frameworks for linked open data can ensure that web-based projects become connected to the scholarly record, instead of being siloed and possibly forgotten in lonely corners of the Internet.

In the afternoon we covered database types, including SQL, NoSQL, Graph, and LDAP. With each of these database types come benefits and also pitfalls, but the key takeaway is to use the database type you’re most familiar with to help get projects off the ground. SQL databases are more rigid than NoSQL, but the additional flexibility of NoSQL can help projects without a clear idea of their datasets to begin building while the initial development is still in process. RDF is itself a framework, but not a standard. This is obvious in the RDF acronym, Resource Description Framework, but the ubiquity of the term can make it appear as though RDF is fully fleshed-out and set in stone. Overall, what’s really being worked toward through RDF and Linked Open Data is to interconnect web resources in such a way that they’re beneficial for knowledge creation by humans, and this can only be done if they’re inherently readable and actionable by machines.

Screen capture of the Cervantes Project showing multiple problems, including character encoding.
Screen capture of the Cervantes Project showing multiple problems, including character encoding.
Screen capture of from the Shelley-Goodwin Archive, images and text match, and also additional viewing options.
Screen capture of from the Shelley-Godwin Archive, images and text match, and also additional viewing options.