Building “The Life DHSI with Steve Anderson”: Blogging and Professionalization in the Digital Humanities

The white paper below is my final assignment for Week 3, Professionalizing the Early Career Digital Humanist, as part of the Graduate Certificate in Digital Humanities at DHSI 2015.


Introduction

This summer I spent three weeks at the University of Victoria completing the Graduate Certificate in Digital Humanities. The certificate is offered by the English department at UVic in conjunction with DHSI (Digital Humanities Summer Institute) and the ETCL (Electronic Textual Cultures Lab). This year at DHSI 2015 was the first run of the program, which requires five weeks of study. I was able to transfer-in my classes from DHSI 2014, and also HILT 2014 (Humanities Intensive Learning and Teaching) which took place at the University of Maryland last August, enabling me to complete the certificate this year. Requirements for the certificate include a daily log or journal, a presentation on work-in-progress each week, and also a white paper discussing each week’s material. In addition to these items, a digital project must also be completed for each class. Since I was at DHSI for all three weeks this year, I made this website or blog for my digital project, The Life DHSI with Steve Anderson, which extended across all three classes. This digital project especially connected with my last class in week three, Professionalizing the Early Career Digital Humanist.

Professionalization and the web

One of the key takeaways of the professionalization class was the need to control our own digital presence. In order to present ourselves online in the best possible light, we first need to be findable on the web. With a common name like “Steve Anderson” finding myself with a Google search was not as simple as it might seem. Adding more information, such as my middle name (Gordon), my university affiliation at UC Riverside, or my hometown of La Verne, California, definitely helped. The idea is to make the task of finding you online as easy as possible for job search and hiring committees. I already had a personal website at stevenganderson.org, but the domain name didn’t match how I usually represent myself, I rarely use “Steven” or my middle initial. Through the professionalization class I changed the domain name for my personal site from stevenganderson.org to steveanderson.digital This new “.digital” domain name is much more memorable, and also in line with my digital humanities work. With some SEO or search engine optimization, I might be able to raise my profile a little higher in a general Google search. In addition to my steveanderson.digital web presence, I also created this WordPress blog, The Life DHSI with Steve Anderson, for my DHSI classes as an homage to Wes Anderson (no relation!) and his film The Life Aquatic with Steve Zissou. Through The Life DHSI website, the journals and essays required for the Graduate Certificate in DH found a home on the web, possibly helping others attending DHSI and those interested in the DH certificate. By combining the writing for the certificate program with a new blog centered on DHSI, I increased my digital humanities web presence as well.

Finding a theme

The notion here of “finding a theme” is twofold. First, there is the need for a certain aesthetic or design for the website that is interesting and memorable. With The Life DHSI I was drawing on Wes Anderson’s films, which use carefully chosen typographical elements and interesting color palettes. Anderson’s films rely on analog and digital techniques and they focus on the art of storytelling, and these aspects are very much a part of my own work in digital humanities. I purchased the domain name thelifedhsi.org from Hover, and began searching online for graphics related to The Life Aquatic film. The second part of “finding a theme” is the template or digital design. I thought briefly about working with Drupal for this project, but I settled on WordPress since the weeks at DHSI were already so busy. I’m more familiar with WordPress, and this made it easier for me to focus on content for the site, which I would be generating on a daily basis. The theme for the site is Make Plus by The Theme Foundry ($99 yearly subscription), and the typography or web fonts are provided by Typekit which I use as part of my Adobe Creative Suite subscription ($20 monthly for students and teachers). With my (Wes) Andersonian design and Make Plus theme, I was ready to construct the website. My idea was to keep the blog fun and friendly, but also informative and professional — DHSI is a wonderful mix of these two qualities, and this is something I tried to capture in digital form. I also made sure to include Accessibility features by using Joe Dolson’s plugin that allows users to change the color, contrast, and text size as needed along with other options. Twitter is a major feature of many academic conferences these days, and especially at DHSI there is a constant stream of activity. On The Life DHSI homepage I included my Twitter handle (@sgahistory) front and center, as well as a feed of #dhsi2015 activity.

Managing content, and sanity

A year or so ago a few of the WordPress websites I was working on were hacked — all of the WordPress core data and plugins were up-to-date, the attackers may have gained access through simple user passwords, or perhaps through out-of-date server software. These sites ran open-source software from WordPress.org which requires self-hosting, WordPress.com sites are run by the company Automattic. One of the sites was totally destroyed, and it had to be rebuilt from scratch as the backups were infected and damaged as well. All of the hacked sites were hosted on university servers at UC Riverside, and while free of charge for academic projects there was little support, especially for dealing with viruses and backups. For the sites that were hacked but still online, I was able clean them with the help of Sucuri (yearly fee of $89 per site at the time, less with multiple sites, but plans have also changed). The frustrating and time-consuming experience of cleaning, repairing, and hardening multiple WordPress sites led me to look for hosting outside of the university when possible. Many web hosts offered low prices, but much needed services such as backup and security were either an additional cost or not available at all. Managed hosting with WP Engine has been much more reliable, and with fantastic customer support as well (WP Engine helped me automatically redirect all posts at stevenganderson.org to steveanderson.digital so that no links were broken in the changing of domain names). WP Engine updates WordPress core files automatically, and they include data caching so no plugins are needed to speed up the site. Security, anti-malware, and automatic backups are included in the $29 monthly fee, and a global CDN or content delivery network is also available (WP Engine also has yearly options, and packages for multiple installs). Building digital projects and creating content is one thing, but managing them can be quite a chore, and sadly the time, labor, and monetary expense necessary for proper maintenance is often overlooked. Through our discussions in Professionalizing the Early Career Digital Humanist we found that while many social media and blogging platforms are free, sometimes it’s best to go with premium services. Along with WordPress and WP Engine, the online platform Squarespace also provides managed hosting for websites and blogs. Many in the professionalization class chose Squarespace to develop their first personal website ($8 a month for hosting, security, and design all in one, or even less with education pricing).

Archiving a digital project

Using a managed host like WP Engine or Squarespace is fantastic while the blog is being developed and actively maintained. Over time though, the associated costs might overwhelm a tight budget, and if new content is no longer being added it could be time to archive the website. Self-hosted WordPress installations require a MySQL database, Php software, and a web server to host the data. With WP Engine these things are all taken care of, but when it comes time to archive the site a new strategy is needed. Making sure the website is captured by the Internet Archive is one way to make certain that the content remains online for the foreseeable future, and making local offline backups is always a good idea too (BackupBuddy by iThemes is great for this). During the professionalization class I signed up for Reclaim Hosting, a new web hosting company focusing on academic institutions and individual students and scholars with yearly plans as low as $25! I chose the “faculty & institution” plan of $45 a year, which includes unlimited domains. The Life DHSI website could be moved to a functioning WordPress installation at Reclaim Hosting, and I could also setup test sites for working with Drupal, Omeka, Wikis, and Scalar. If The Life DHSI blog is no longer being actively updated, transforming the WordPress data into a static HTML5 website would eliminate the need for a MySQL database and also make the data more portable and secure. Static HTML5 could be hosted at Github for free, even with a custom domain, although the domain name would need to be paid for each year via a domain registrar. For the time being I’ll keep The Life DHSI at WP Engine, as I’d like to add posts about DHSI 2014 and also HILT 2014, and I plan on attending DHSI in the future as well.

Conclusion

Building The Life DHSI was not an easy task, but it was an enriching one. Through the website I have been able to share my daily logs, white papers, and other posts about DHSI openly on the web. Digital humanists and digital scholars are especially active on the Internet, and it’s important to consider our digital projects and web environments from a variety of perspectives. The process of building The Life DHSI helped me to see how even small digital projects have many points to consider, from online security and data protection, to aesthetic design choices and accessibility features. Many times the lowest cost alternative is used, especially for social media, but even users of free services such as Twitter and Facebook can have their information put at risk. Digital humanists, whether early-career or otherwise, should be adept at analyzing online service providers, as well as making informed decisions about the amount of effort and funding that digital projects require for sustainability and longevity.

Journal 5 – Professionalizing the Early Career Digital Humanist

On the last day of class, also the last day of Week 3 DHSI 2015, we worked on our individual projects. Many in the class created new websites on WordPress.com and especially Squarespace. WordPress.com websites are easy to setup, there are many free themes, and the blogging function is pretty straightforward. For those in the class that did not want a blog but did want a more stylish web presence, Squarespace was the favorite. There is no free tier for Squarespace, but the monthly cost is only around $8, or less with education pricing. Squarespace also has blogging features, but most of the students in the class that went with Squarespace chose to make a static website.

WordPress.com is free, however they also place ads on your site. These ads are seen by seen by a small percentage of visitors, and they help WordPress.com cover expenses. For academics entering the job market though, these ads can make the site look a little unprofessional. Since it’s difficult to tell who will see the ads and when (possibly a hiring committee, who might only visit the site once), it’s best to pay for an ad-free website, about $30 yearly.

Many of the Squarespace templates were clean and crisp. Students in the class really liked the clean modern design of Squarespace, but some found the customization options a little tricky. One of the students emailed Squarespace for help during class, and she received a reply within minutes. Such great customer service is another compelling reason to go with Squarespace.

Later in the afternoon we worked on grants and other types of project proposals. Some of us wrote proposals on the spot, while others refined documents they had already been working on. We used Dr. Karen’s Foolproof Grant Template, focusing on the transition between the initial “large general topic of wide interest” and the “gap in knowledge” section. This transition can make or break a grant proposal — it’s in the beginning of the document and it must be compelling and informative in order to catch the eye of the funding committee.

drewbarker.info webpage
One of the students, Drew Barker @drewNblue, built his Squarespace website in-class with his own photos. He even purchased a new domain name, http://drewbarker.info in order to professionalize his web presence.

Journal 4 – Professionalizing the Early Career Digital Humanist

Grant writing was the subject of our discussion this morning, and we focused on the professional and social networks that grant information flows through. Our class has a mix of people from the United States and Canada, so we covered both US and Canadian grant agencies.


Canadian funding agencies

  • Social Sciences and Humanities Research Council (SSHRC)
  • Canadian Foundation for Innovation (CFI)
  • Canadian Heritage
  • Province specific funding
  • Canada Council for the Arts

US funding agencies

  • National Endowment for the Humanities, Office of Digital Humanities (NEH ODH)
  • National Science Foundation (NSF)

Private Funding Organizations

  • Andrew W. Mellon
    • Scholarly Communication
    • Higher Education and Scholarship in the Humanities
  • MacArthur Foundation
    • Digital Media & Learning
  • Alfred P. Sloan Foundation
  • Wellcome Trust

Things to think about when proposing a DH project grant:
Audience
– the people receiving and deciding on the grant might not be steeped in the digital humanities or DH culture, be sure to explain things clearly and see the wider picture of the research

Interdisciplinarity
– not just inter-humanities, the sciences should be involved too, as a full partner, not just some type of technical support

Collaboration
– this can extend beyond the university or academia itself, or perhaps across institutions, perhaps even in different countries

Explaining equipment and space needs
– some items on the grant might not be approved or funded, but if these items are essential, be sure to include clear reasoning for the equipment or space, how it will be used, how long it will be needed for, and what will happen to the equipment at the end of the grant/research period

Making budgets seem reasonable
– this isn’t so much to make the budget small, but to make sure that all aspects of the project have been considered, this requires collaboration at the very beginning of the writing process


We also talked about “Dr. Karen Kelsky’s Foolproof Grant Template” at http://theprofessorisin.com The structure of the proposal is clear, concise, and very persuasive. It’s a must-have formula for constructing grant proposals (as well as a thesis, abstract, or prospectus).

Dr. Karen Kelsky's Foolproof Grant Template at http://theprofessorisin.com
Dr. Karen Kelsky’s Foolproof Grant Template at http://theprofessorisin.com

Linked Open History: Using RDF and Linked Open Data to Connect Primary Source Materials

The white paper below is my final assignment for Week 2, RDF and Linked Open Data, as part of the Graduate Certificate in Digital Humanities at DHSI 2015.


Introduction

Historians rely on primary source materials for their research, and also for teaching students of history. The Internet holds a wealth of digitized and born digital primary sources from all regions of the past. However, these digital historical materials are often not cataloged or referenced in meaningful ways. The methods of RDF (Resource Description Framework) and Linked Open Data could be used to mark and notate online documents that contain primary source information for certain historical periods. Many instructors of history collect primary sources in digital formats for use in the classroom, yet these records often end up in dark archives or personal collections. This is not so much because of copyright, as fair use or fail dealing methods enable portions of these materials to be shared legally.[1] The main reason many of these materials are locked away in private digital repositories is that rendering these files as Linked Open Data is especially difficult. Even under the best circumstances the process of creating Linked Open Data is time-consuming and labor intensive. Despite these difficulties though, incremental progress should be made toward methods of using Linked Open Data in historical research. Primary source materials that are embedded with the proper metadata would be easier to find online by scholars and students. In addition, data visualization methods could be used to show relationships between texts, photographs, and videos across time and place, adding new richness to historical scholarship.

Defining RDF and Linked Open Data (LOD)

RDF (Resource Description Framework) is a method of sharing or exchanging information across the web. As its name implies, RDF is only a framework for exchanging data, and there is no set standard for creating RDF information. RDF works in a series of three nodes, or triples, which are broken into “subject,” “predicate,” and “object.” The power of RDF is that it can change semantic information that is readable by humans, into loosely structured data that is readable by machines. With this transformation from semantic content to RDF information, machines can take action on the data, ideally without human intervention. RDF information itself is not a database, but it can be housed in database formats, such as SQL, NoSQL, and Graphs. This flexibility allows RDF to be used in many ways and across different information structures. Linked Open Data, or LOD, is the idea put forth by Tim Berners-Lee that information on the web should be linked together in meaningful ways — as stated by Berners-Lee, “With linked data, when you have some of it, you can find other, related, data.”[2] The “open” portion of LOD works toward ensuring that materials on the web are freely available for linking, so that the web of linked data is not broken by restrictive licensing or other impediments. Once semantic information is turned into RDF data, it can be linked together creating powerful networks of information for usage by people, and machines as well.

Authoritative Description

In order for RDF and Linked Open Data to work, there must be an authoritative file that nodes can link to that gives definitive information on the subject. Such a collection of authority files is referred to as an ontology or vocabulary for referencing data. An example of a widely used ontology is dbpedia.org, which references wikipedia.org in order to collect information about connected data points. For example, the Wikipedia page for Theodore Roosevelt contains a wealth of information about the American president, and also a link for disambiguation. Many people have been named Theodore Roosevelt, as well as schools, buildings, and ships, and ensuring which Theodore Roosevelt is being referenced is essential for both human clarity and machine action. If a node within an RDF triple contains “Theodore Roosevelt” referring to the president, the DBpedia page can be used as the definitive reference: http://dbpedia.org/page/Theodore_Roosevelt Using a link on the Internet, or URL (Uniform Resource Locator), as a nodal reference helps all other connected nodes create further connections across the web of data.

LOD in Libraries, Archives, and Museums

Many libraries, archives, and museums have finding aids that help scholars locate materials. RDF and Linked Open Data could be used to make these finding aids more extensible, which would help scholars work between collections and also between institutions.[3] At the moment, many finding aids and archival reference materials are in .pdf format, which is no longer proprietary since Adobe released the standard to ISO in 2008.[4] However, PDF files are not as open to data exchange as HTML or other forms of writing for the web. While they might contain descriptive information about materials in the collections, .pdf files and other closed or proprietary formats prevent the connectivity of LOD. Moving finding aids from PDF to HTML or other markup languages and including a URI (Uniform Resource Identifier, often used as an IRI, Internationalized Resource Identifier), would enable linkage between finding aids. Archives and other academic institutions are also poised to share ontologies that describe their collections. These ontologies could either be centralized or independent, as long as they contain a stable URL or link for other institutions to reference. Some institutions have already begun to extend their collections with LOD, but more work is definitely needed.[5]

LOD and Primary Sources in the Classroom

In my own teaching practices I have created primary source materials for students using text from Google Books and other online primary source repositories, such as the Internet History Sourcebook.[6] These primary source texts may be changed into excerpts, or left in their original form. The most important change for the classroom though, is to use responsive HTML5 so that students can read the materials on their laptops, tablets, and especially smartphones. None of these devices are required in the classroom, and printed materials are also welcome, of course. HTML5 elements can also be used to ensure that printed resources are clear and well formatted. My collection of primary sources at my project-website, SGA Historical Materials [http://sgahm.org/primary-sources] is open on the web, however there is no RDF or LOD metadata attached to the HTML5 files. Through the RDF and Linked Open Data class at DHSI 2015 I have definitely seen the power of linking these materials together, and to the open web of linked data. Little by little, I hope to work on my historical materials project, and add more metadata to the documents. Also, I will need to do more research to ensure that materials I create remain as accessible as possible. If the inclusion of RDFa or Microdata within the .html file hinders screen readers or other assistive devices, different methods will need to be used. Perhaps maintaining multiple documents linked together, just as CSS style sheets operate, could enable both accessibility and LOD connections.

Visualizing Primary Sources with LOD

Primary sources used in the classroom are often disconnected from one another. Some may be on the web as HTML, some as PDFs either as text or images, and still other materials may be in printed books. RDF and Linked Open Data could be used to draw linkages between text-based documents, visual materials, and other historical primary sources. These linkages would form a web of interconnectivity that could be visualized much as the Linked Jazz project has done with jazz musicians and their social networks.[7] As instructors of history move away from providing facts and more toward helping with the context of historical documents, providing a way to see how texts and ideas have interacted with each other in the past can be a powerful tool. I can imagine something of a timeline slider, with curated primary sources materials connected to one another. Zooming in or out on a specific area could show relationships between documents, illustrating how historical figures and events were connected in the past. This project would require quite a bit of innovative programming, as well as preparing documents with LOD-ready metadata. This project can be saved for the future, but the vision of LOD and the networks of knowledge it can create will help make current metadata practices more effective.

Conclusion

RDF and Linked Open Data are very powerful ideas that are still in their formative stages of development and implementation. Historians and other academics in the Humanities can contribute to LOD in meaningful ways by adding pertinent metadata to historical materials and using open digital formats and licenses. Researchers and students alike can benefit from LOD practices, which can help make archival research more efficient and instruction more compelling. Moving information into the realm of LOD does have a high cost of time and labor, but the power of LOD is such that current practices must begin to shift toward more open and functional methods. Efforts should focus on the projects that will have the most immediate impact, while also keeping an eye toward future technical developments and innovations.


Footnotes

[1] “Code of Practices in Fair Use,” Association of Research Libraries http://www.arl.org/focus-areas/copyright-ip/fair-use/code-of-best-practices#.VX3vrmDtJUR

[2] Tim Berners-Lee, “Linked Data,” W3.org http://www.w3.org/DesignIssues/LinkedData.html (2006, revised 2009)

[3] The #lodlam hashtag on Twitter is great place to discover the newest innovations in Linked Open Data with in Libraries, Archives, and Museums, as well as the http://lodlam.net/ website.

[4] “PDF Format Becomes ISO Standard,: ISO, News, July 2008 http://www.iso.org/iso/home/news_index/news_archive/news.htm?refid=Ref1141

[5] Kate Theimer, “Archives who have implemented linked data?” Archives Next blog, March 2013 http://www.archivesnext.com/?p=3450

[6] Paul Halsall ed., “Internet History Sourcebook Project,” Fordham University http://legacy.fordham.edu/halsall/index.asp

[7] “About the Project,” Linked Jazz https://linkedjazz.org/about-the-project/

Journal 3 – Professionalizing the Early Career Digital Humanist

We dove into online presence and cyberinfrastructure today, starting with an overview of WordPress and Squarespace as possibilities. WordPress has two separate paths, either wordpress.com (which is a free service, with premium features available for a fee), or wordpress.org (which is the open source engine of WordPress, where you download and host your own installation). With WordPress.org we could run the installation on a host like asmallorange.com or reclaimhosting.com With the power of the self-hosted WordPress install though, comes a lot of responsibility. In addition to choosing a theme, designing pages and posts, and uploading content, there are other things to worry about, like storage space and bandwidth. In addition, wordpress.org sites need other things, like backup and malware protection. Running your own self-hosted WordPress site can be rewarding, but it also requires a high investment of time.

Squarespace was another great solution for hosting our online presence, and domain names can be purchased there as well. Squarespace has a blogging feature, just like WordPress, but it can also be extended in many other ways, such as adding an online storefront. With Squarespace, the control panel is within the site itself, so changes can be made right in the browser with the site live on the right-hand side of the screen. WordPress has been moving in this direction as well, where sites can be redesigned live with their new Customizer feature. Squarespace also has an education discount of 50%, it’s available at: http://squarespace.com/students

Along with creating a personal/academic website, we also discussed other methods of sharing scholarly work. Some of these options were posting audio clips to SoundCloud, creating a podcast, making a short video, and participating in a three-minute dissertation contest. The competition might go by different names according to the school, UC Riverside calls it “Grad Slam.” Creating a blog about your dissertation can be really helpful for working through the project, and also for gaining an audience. However, there are many caveats for this, as dissertation committees and future book publishers can have strict rules about the process — always best to check around before starting the project. A few notable digital dissertations were Amanda Visconti’s digital dissertation: http://dr.amandavisconti.com, Kathleen Fitzpatrick’s project for Planned Obsolescence: http://mcpress.media-commons.org/plannedobsolescence/ and Dani Spinosa’s project, Generic Pronoun http://genericpronoun.com

reclaim hosting screenshot
Reclaim Hosting is a great way to go for personal and professional cyberinfrastructure. It’s only $25 a year for 2GB of storage and unlimited bandwidth.

Journal 2 – Professionalizing the Early Career Digital Humanist

This morning we presented the elevator speeches we worked on yesterday afternoon to each other in small groups. An idea we had about this type of one-on-one or small group conversation, was to layer the information so that it’s easy to digest. For example, each sentence could provide a little bit of information about the project, beginning with a basic overview of time and place, and then touching on more complex ideas. Reading the elevator speech versus hearing it on the spot required different types of communication. We found that the ideas written down were clear, but when delivered verbally they could be too dense or heavy with information for a quick listen. Conversely, if the elevator speech is only set to be delivered in person verbally, it might be too informal for a more serious academic meeting. It’s important to hone the elevator speech with written and verbal practice, also considering body language and other nonverbal social cues from the listener.

The afternoon was a discussion of social media, and how we can present ourselves as academics online in the best ways. One of the problems many people face with online identities is that links or URLs can change over the years. An option that some researchers are using is a DOI (Digital Object Identifier). The DOI is a static reference that lives above the level of the domain name, and it can be used as a permanent location for links that might change over time. For example, if you had an important blog post at “example.com/p=25” but the URL was changed to another URL, like “example.com/june-2015/great-blog-post,” a DOI could be used to provide a permanent non-changing link. Many journals and libraries are using the CrossRef system to manage URLs, which helps prevents broken links or link rot. The ORCID (“orchid”) service provides DOIs that can be used for individuals. Funding agencies such as the NIH and NSF are using the integration of ORCID to link researchers and information about their grants. This will help reduce the paperwork overhead of applying for and tracking grants, as information can be entered into ORCID once and then applied to each particular project: https://orcid.org/blog/2014/02/19/link-your-orcid-record-your-funding

ORCID example
This image shows how ORCID information can travel between grant winners, funding agencies, and publishing groups without having to enter information multiple times.

Journal 1 – Professionalizing the Early Career Digital Humanist

After introductions in the morning we started discussing what it means to be a “professional” digital humanist, and how we present ourselves online. Online presence is something many cringe about, but in this modern digital age we’re all already online, somewhere, for something. What that “something” is might be decided by others, unless you curate your own digital presence. We need to Google ourselves and see how we’re presented online. Ideally this is done from a computer you’ve never used or a private/incognito browser window. On your own computer the search results might be catered to you specifically, and it would be best to see your results as others see them. Ranking near the top of page one on Google can be really important for finding jobs, as committees are very likely to do some research about you online. Making sure that you’re visible and presented in the best light requires some work with building online profiles, adding photographs of yourself, and also taking a look at search engine optimization (SEO).

Many academics have a sense that tenure track positions are the only ones to shoot for, but “alt ac” careers can be rewarding as well, possibly with better hours and pay. As early career digital humanists we’re uniquely positioned to work in a variety of industries. Keeping ourselves open to possibilities outside or perhaps alongside academia is important. The vast majority of PhDs are able to get work, but only a small sliver finds a tenure track position. Even with a tenure track job there’s no certainty that tenure will be achieved. Moving forward as digital humanists we should keep our research front and center, and also look toward the horizon for new developments in social media and digital technology.

In the later afternoon we started working on our “elevator speech,” which is a short and precise verbal communication about a major project you’re working on. The idea behind the elevator speech is that if you found yourself in an elevator with a person offering a job, you could quickly explain yourself and your work before the doors open at the next floor. The elevator speech idea comes from the business world, but it’s also very helpful for explaining digital humanities projects or dissertation research in a clear, concise, and friendly way — definitely a good skill for grant writing too.

google search screenshot
This Google Search screenshot for “Steve Anderson” doesn’t show me at all. Using my full name – Steven Gordon Anderson – would help, along with other specific information, like my university ( UC Riverside) or my hometown (La Verne, CA).
google search via vpn screenshot
This Google Search was done from DHSI in Victoria, using the VPN service from the UC Riverside Library. The Canadian search results are gone since the VPN makes it appear as though I’m in the United States. I’m still nowhere to be seen though — time for some SEO work.

Making HTML5 More Approachable: Writing for the Web, Accessibility, and Cyberinfrastructure

The white paper below is my final assignment for Week 1, Digitization Fundamentals, as part of the Graduate Certificate in Digital Humanities at DHSI 2015.


Introduction

Learning to write for the web with HTML5 can be a daunting task. The general procedure is for students to view the raw data of an HTML document, learn about the meanings of tags and the process of encoding, and then hopefully begin writing HTML. This approach, however, relies on other embedded factors and assumptions, such as high computer literacy and facility with web environments. Especially in the Humanities where these skills are not necessarily required, newcomers to HTML are often anxious and afraid to share their projects with others. Discussing the basics of cyberinfrastructure[1] can help students visualize how documents circulate on the Internet, and using visual materials instead of raw code can help lower the barriers to writing for the web.

Defining HTML5

HTML5 consists of three parts, HTML (Hypertext Markup Language), CSS (Cascading Style Sheets, often referred to as CSS3, the third iteration), and Javascript (a programming language).[2] HTML is the markup language that adds semantic meaning to content. These HTML tags, such as <p>paragraph</p>, wrap-around and describe content with opening and closing angle brackets. The fifth version of HTML, or HTML5, contains new tags and markup for modern web development such as <video> and <audio>. CSS contains the aesthetic markers for altering the look and feel of webpages. CSS can be encoded within the HTML page, or as it most often is, referenced as a separate .css file or “style sheet.” The cascading nature of CSS refers to the ability of these style sheets to be hierarchical, where newer style sheets can overwrite previous ones without needing to remove or delete the latter. HTML and CSS are what the majority of people will work with when learning HTML5. Both HTML and CSS are methods of encoding content, not of programming or writing code. Javascript is the third portion of the HTML5 trio, and while essential for the web it can be perplexing for beginners. Javascript is the code-writing or programming portion of HTML5, and it can be tackled later on when a good foundation of HTML and CSS have been achieved.

Accessibility in Web environments

The word “accessible” can have multiple meanings when working with web environments. For many, the term accessibility relates to the openness of data on the web, making information clear and jargon-free, and also ensuring that links and important information are easy to find. However, accessibility also has a deeper meaning. Accessibility refers to the need to create digital environments that are inclusive, responsive, and universal in their design.[3] Web sites and digital projects must be created with all users in mind so that people with disabilities, whether physical, cognitive, or temporary, are not excluded from interaction.[4] Unfortunately, the process of learning HTML often skips past both usages of the word accessible, resulting in web environments that are complicated, and frustrating or impossible to use. Accessibility is often thought of after a project is completed, many times as a result of legal action.[5] Websites or other digital projects that have features of accessibility added to them after the fact can suffer from poor implementation and design, while also incurring additional costs for development. All digital projects, from the smallest personal webpage to the largest corporate website, should be designed from the very beginning with accessibility as a primary feature. The best place to start this process is with those just beginning to write for the web, whether in HTML5 or through other forms of web creation.

Text editors

A major stumbling block in the process of learning HTML5 is the text editor. Many new students learning HTML are unfamiliar with text editor programs, even if there’s already one installed on their computer. The text editor is generally less complicated to use than Word or other word-processing programs, but the unfamiliarity of the program can make students uneasy. An explanation of how word-processing programs incorporate code and other markup behind the visible surface of the document can help students see the reason for text editors. Spending time on the installation and usage of the text editor, along with ensuring that .html and other file extensions open by default, can help reduce student frustration. Instructors should also use the same programs that students are using, especially when projecting their computer screen to the class. It can be disconcerting to see a highly modified text editor running on the projector, and using the command line should be avoided for the sake of clarity as well. Students approaching HTML5 or programming for the first time can feel overwhelmed when watching an expert deftly maneuver through the command line interface with custom keystroke settings and other time-saving shortcuts.

Working from the outside-in

HTML is a method of encoding text with semantic meaning for the web. It’s common for instructors to begin with “lorem ipsum” text, and there are some entertaining lorem ipsum generators on the web, such as https://baconipsum.com and http://www.cupcakeipsum.com These lorem ipsum generators can add a splash of fun, but they are also another layer of abstraction for new students to navigate through. Instead of working with placeholder text, it would be best to work with materials that are familiar to the student, perhaps their own writing or text from a favorite book, poem, or website. The idea here is to help the student maintain a sense of control over the process and reduce anxiety. By using content that’s familiar, students will have a mental image for how the information should be structured and presented. Using as many familiar points of contact and reference as possible can help students overcome some of the initial fear, and work toward questions of design and presentation. A discussion using visual materials and avoiding raw code can help students see the logic behind HTML. A printed page from a simple website could be used for students to write on, where they might draw arrows pointing to different sections of the page and their corresponding HTML tags and elements.

Hierarchies and structure

Along with using familiar content for teaching HTML, a good overview of the hierarchies and structures of a webpage can be helpful.[6] Building accessibility into a webpage begins with the general layout and order of content. One of the most important aspects is to review how headings operate in a hierarchical structure, but paragraphs and other tags do not. The header tags of <h1>, <h2>,<h3>, and so on, can be quite confusing to new students, in part because of the changing font size associated with the tag. Students often choose a header tag because it changes the visual design, not because it presents information in a logical order. Screen readers and other assistive devices rely on header tags to move through web documents, and this hierarchical order is essential for navigation and meaning. Beginning with a discussion of header and paragraph tags, and how they’re used for semantic rather than aesthetic purposes, is a great place to discuss accessibility and good web design.

Writing for humans, and machines too

Once students have some content on their webpage, they inevitably look toward styling, which then leads to CSS or cascading style sheets.[7] Inline styling, or adding CSS markup within the HTML document, is often the first step. However, CSS markup is usually referenced in another file, referred to by a link in the head of the .html document. Inline styling is an older and more simple styling practice, and if separate style sheets were never used students would still be able to construct a basic webpage with some design features. Yet, there’s an opportunity with CSS files to show students how documents on the web can hyperlink to one another behind the scenes. Even though the links or referenced files in the head portion of the .html file are not visible to the user of the webpage, this information is essential for interoperability on the web. The head portion of the HTML document is important because it’s meant for computers to read, not people. Constructing documents in HTML5 is a process of writing for multiple audiences, including machines, and discussion about this can help students visualize how information moves through the Internet.

The basic workflow

Once the discussion of writing for machines is complete, moving toward a workflow that includes a referenced CSS style sheet can help put theory into practice. Generally, students will work locally on their own computers, and this can result in a lack of understanding about how documents are made visible on the web. There’s nothing inherently wrong with working locally, but without discussing the storage and retrieval of documents over the Internet students will be unable to put their new skills into practice. Online server space is needed for this portion of the class, and availability will depend on institutional resources. Even if there are no resources available, a discussion of the process is essential for students, as this will help them more fully understand what “working locally” means. This process of learning HTML and CSS, locally or otherwise, will require two separate programs to be open (the text editor and a browser), and three separate windows (.html file, .css file, and the browser window). If students can see how these three components relate to one another, they are more likely to understand how documents are stored, retrieved, and revised on the web. A more advanced discussion about FTP (File Transfer Protocol) programs, and discussing the need for SFTP (SSH or Secure Shell FTP), document permission settings, and online security would be wonderful if time allows.

Conclusion

Writing for the web in HTML5 can be complicated to learn, but focusing on visual methods and having directed class discussions can increase student understanding. With HTML and CSS, students can construct static webpages and share information more easily across software platforms and hardware devices. By avoiding proprietary or closed formats and programs, such as Word and .pdf files, students will be engaging in the 5 Star Open Data plan.[8] Most important for 5 Star Open Data is getting the materials online, and a basic understanding of HTML and CSS empowers students to contribute to the Internet in meaningful ways. With their new HTML5 skills, students will be able to interact more fully with programs that they may already be using, such as an LMS (Learning Management System, like Blackboard or Canvas) or a CMS (Content Management System, such as WordPress or Drupal). With accessibility as a primary feature from the very beginning, students will be creating documents that are easier for people to interact with, and by using the logics inherent in well-structured data they will be making the materials easier for machines to read as well. Learning the basics of HTML5 is an opportunity to publish materials online, as well as an occasion to learn about the fundamentals of cyberinfrastructure and digital literacy.


Footnotes

[1] Gardner Campbell, “A Personal Cyberinfrastructure,” EDUCAUSE Review, September 2009 http://www.educause.edu/ero/article/personal-cyberinfrastructure

[2] A good overview of HTML5 and how its development has progressed is available in HTML5: The Missing Manual. Matthew MacDonald, HTML5: The Missing Manual, Nan Barber ed. (Sebastopol, CA: O’Reilly Media, 2013).

[3] George H. Williams, “Disability, Universal Design, and the Digital Humanities,” Debates in Digital Humanities, online open access edition (Minneapolis, MN: University of Minnesota Press, 2013) http://dhdebates.gc.cuny.edu/debates/text/44

[4] The Accessible Future project has good list of readings about accessibility in web environments: http://www.accessiblefuture.org/readings/

[5] Although accessibility in web environments is a moral, as well as a technical question, legal action is often the course that must be taken for change to occur. A list of recent lawsuits can be found at The University of Minnesota, Duluth, “Higher Ed Accessibility Lawsuits,” http://blog.lib.umn.edu/itsshelp/news/2013/10/higher-ed-accessibility-lawsuits.html

[6] Dive into HTML5 is a free and online resource, and best of all the website itself is built entirely in HTML5. Mark Pilgrim, Dive into HTML5, http://diveintohtml5.info

[7] As students begin to learn how CSS shapes the design of the webpage, a peek at CSS Zen Garden can show them how powerful CSS can be: http://csszengarden.com

[8] 5 Star Open Data, http://5stardata.info – a clear and easy to follow website discussing Timothy Berners-Lee’s ideas about linked data, http://www.w3.org/DesignIssues/LinkedData.html


Journal 5 – RDF and Linked Open Data

Discussion today included the difficulties of teaching coding and computer literacy. Graphical user interfaces (GUI) have enabled non-programmers to become more familiar with computers and their possibilities. However, these GUI (sometimes pronounced “gooey”) interfaces also change frequently depending on the application and its design aesthetics. Innovations in hardware, such as the trackpad and touch screens, can also effect computer literacy and usage. The command line interface (CLI) seems to be more timeless, with the logic of the program perhaps becoming more apparent through the typed commands themselves. Our instructor Jim mentioned that people learning to code for the first time have often spent years writing and communicating in academia or business, with sentences as the basic structure. For some though (including myself), writing sentences for computers instead of people is terribly difficult, despite the many similarities in structure and language. Programming classes for humanists, whether RDF and Linked Open Data, Ruby on Rails, or Python, can help scholars learn the language and logic of computers, which helps in using our own devices in new ways and also for understanding how social media and the wider Internet operate.

RDF Graph example worksheet for The Canadian Women's World Cup team.
RDF Graph example worksheet for The Canadian Women’s World Cup team.

Our class project for the week was a demonstration of RDF and Linked Open data that was hand-written on construction paper. We chose the Women’s World Cup as our subject, with the Canadian team as our primary node. From the Canadian team outwards, we connected various bits of information in a series of triples, or three part data references. These triples were then written in Turtle to abbreviate the code. We used dbpedia.org as our authoritative ontology, circling the nodes in blue that would reference its database. For example, in the triple [“Canadian team” (subject) <--> “sponsored by” (predicate) <--> “Umbro” (object)], Umbro would link out to its dbpedia.org page: http://dbpedia.org/page/Umbro This gives the Umbro node a definitive reference, and the other information on the page, such as Umbro’s website, their brands, and location would also be accessible through the link. A query could then be run, “Is the Canadian women’s soccer or futbol team sponsored by any European companies.” Even though “Europe” was not on our node worksheet, the dbpedia.org reference would allow an extension of the query into the wider web, resulting in an answer of, “Yes, by Umbro in the UK.” This is a simple example of linked data, but with the further extensions provided by authoritative ontologies, more complex queries would certainly be possible.

RDF triples in Turtle composed of the Graph information in the image above.
RDF triples in Turtle composed of the Graph information in the image above.

Journal 4 – RDF and Linked Open Data

This morning we discussed some steps to open up data. These suggestions were distilled from the Open Knowledge Foundation’s Open Data Handbook by instructor Jim Smith: https://www.dhdata.org/dhdata/datasets/1-how-to-open-up-data/index.html

One of the main points was that DH projects need to start small, and there should be a series of steps or stages for development, even in the smallest projects. Part of this idea of starting small includes limiting the size of the initial dataset. Small datasets can still be helpful to wider communities. If the dataset isn’t moving in the correct direction, its small size can allow for redirection before too much time has been invested.

The dataset for Theatre Finder (a DH project about historic theaters) is written in JSON, but it could be altered to be JSON-LD without destroying or having to rewrite the original database. Coding the dataset as JSON-LD would make the information in Theatre Finder much more usable, connecting it to the wider networks of linked data across the web. Theatre Finder is a good candidate for becoming an authority dataset, or ontology. Theatre Finder didn’t begin with this in mind, but the dataset information is comprehensive and is widely used.

It’s important to consider that database ontologies are generally created by the people who use and need the data on a regular basis. Before creating a dataset DH professionals need to ask the question, “who can and will use this data?” This question is important, because these groups will also be the ones to contribute to the dataset. Stability is also a factor of community involvement. Datasets are constantly changing, not just by adding information to them, but by also reconsidering definitions and vocabularies inline with social and cultural change.

Example data from Theatre Finder. The information under "Overview," such as country and city, could be connected to dbpedia.org as linked data.
Example data from Theatre Finder. The information under “Overview,” such as country and city, could be connected to dbpedia.org as linked data.