Standardizing Metadata for Digital Humanities

Digital Humanities (DH) applications such as databases, digital editions, and data visualizations provide users with the opportunity to search and curate datasets in new and interesting ways. By using the power of computing technologies, DH applications can uncover patterns in data that shed light on previously untold stories. In order for these applications to be successful, they require high quality metadata that is based on standardization and consistency. However, the historical and literary documents that make up the datasets for these applications are often messy, ambiguous, and varied. As metadata specialists, how do we aid the DH community in creating metadata standards that maintain the authenticity and spirit of original datasets while providing enough standardization for DH applications to be successful?

As a Metadata Librarian who also works in DH, I have been struggling with this question. For example, my colleague James Van Mil and I are creating a database of intake records for the University of Cincinnati’s House of Refuge Collection. The database consists of over 6,000 child intake records from the 19th and early 20th centuries. The records provide rich description about the admitted children, such as ethnicity, religion, offenses committed, and location of birth. Creating index terms for this dataset and consolidating terms has been difficult, because doing so requires making assumptions about the data that may not be consistent with the historical context in which it was created.

For example, there are multiple terms in the original dataset that refer to children of Jewish ancestry, including: “German Jew”; “Hebrew”; “Israelite”; and “Jewish.” Consolidating these terms under a broader term such as “Jewish” would be helpful for indexing purposes, but it might also lead users to make false conclusions about the data. In order to make indexing decisions thoughtfully, it is important to think of them as a form of data curation and make editorial policies accordingly. My colleague and I are still trying to find the balance between creating indexing terms that are searchable without being misleading.

One of the challenges of working with metadata and DH is that there are few discussions on how to create editorial policies around metadata standards. DH specialists are often focused on the Text Encoding Initiative Standard (TEI) metadata standard (which is the primary standard in DH). As a result, there has not been much research on other schemas or authority control, although there are efforts to incorporate linked data into DH, such as the RDF Textual Encoding Framework.

Librarians are also largely silent on these issues in the context of DH. My experience at conferences such as ALA has been that cataloging and metadata sessions focus on metadata standards in the context of cataloging bibliographic material for use in library systems. These sessions tend to be heavily MARC-oriented and non-technical issues such as ethics are not often discussed.

It would be very helpful if there were more cross-community discussions at ALA and other conferences between librarians specializing in metadata and cataloging standards and experts in digital scholarship. This is particularly important as the role of Metadata Services in academic libraries expands from cataloging-based services to consultation services. Thinking broadly, digital scholarship, whether in the humanities or the sciences, requires standards to be successful and metadata specialists are the experts who can provide advice, guidance, and support.

Carolyn Hansen

Carolyn Hansen is Metadata Librarian at the University of Cincinnati, where her responsibilities include the creation and management of metadata for library materials and digital collections. She has worked for Eastern Washington University, the Brooklyn Historical Society, ProQuest, and the American Geographical Society Library. Follow her on twitter @meta_caro

Leave a Reply