Digital Humanities: Navigating a Sea of Data


By Nathan Gallagher

The classic image of a historian is of someone rifling through books in enormous libraries or buried in dusty archives, looking for the right documents. Of course we still do these things, but the historian’s craft is no longer limited to what’s written on paper. Digital approaches to history range from online archives and museum exhibitions to historical mapping to complex data mining. These computer-based techniques all fall under the umbrella of “digital humanities,” and how they’re used all depends on the problem that needs to be solved. Here’s a great example of some of the new ways historians are approaching research questions in the information age.

Wooden shipbuilding was a huge industry in early modern Europe, and it demanded a lot of wood. Spain got its timber from Iberian forests as much as possible, but they had to look outside their borders to get all the wood they needed. This meant importing timber from the Baltic region.

baltic region

The Baltic region

But exactly how much timber did Spain have to import? That’s where the Sound Toll Registers Online comes in.

And what exactly are the Sound Toll Registers? In short, the Sound Toll Registers (literally!) contain a sea of valuable information about early modern trade in Europe.

sound toll

There’s just one problem: spelling was not very consistent in the early modern era. Let’s say I want to search for cargoes that went to Cádiz, Spain. I’ll need to make sure I include:

Cadiz, Cadix, Caditz, Cadis, Cadex, Cadim, Candiz, Cordix, Kadix, Kalix, Qadix

Now let’s say I want to see all the cargoes of planks that went to Cádiz. In addition to needing all these spellings of the city, I’ll now need to catch about 2600 variations of the word “plank.” How can I be sure I’m getting everything I want? Wouldn’t it be better if I could just enter a number and know that I’m getting all the shipments of planks?

The answer is a process called “coding.” Not in the same sense that you code a computer program, but actually assigning a code to certain values in the database.

So, if I’m only concerned with the planks that went to Cádiz, I’ll filter out a lot of those spellings of “plank” if I’m only working with the cargoes that went to Cádiz.

First, I need to index the different spellings of “Cádiz.” This means I need to make a new table that lists each unique spelling of all ports of destination with the count of how many times it occurs, which helps prioritize the most common occurrences in very large data sets. This new table should have an empty column where I can place a number if the city is a spelling of Cádiz.

table 1

Unique place names are indexed.

Now the tedious part: I need to find each spelling of Cádiz and mark it with a number in our new column. In this case we’ve used “5”.

Table 2

Indexed place names are marked with a code.

But now if I include this table in my search for Cádiz, I simply connect the original spelling in the index table to the spelling in the main table, and search destinations with a code of “5”.

Table 3

The indexed place name in the “Portdest” table is connected to naar (to) in the “Ladingen” (cargoes) table.

Now the database will look for cargoes where the destination matches the spellings we marked with a “5.”  And now that I only have the cargoes that went to Cádiz, I can repeat this process with product names. Here we’ve identified planks with the number “205”.

table 4

Products that went to Cadiz are indexed and coded using the same process.

This time there were only 78 variations of “plank” to code. Much more manageable!

Of course, this is a very simple example. Using this same technique, we made an entirely new coded table where each destination and product we were interested in was given its own unique number. It takes some time to do the coding, but the amount of work and anxiety it saves to be completely confident in our searches was well worth it.

Now we can do things in a few minutes that might take years without a database. For example, I can easily see all the cargoes of timber products that went to different regions of Spain each year:

table 5

The amount of timber cargoes to the three naval jurisdictions of Spain, 1750-1760. From information in the Sound Toll Registers Online.

As you can see, the Sound Toll Registers is a valuable tool for our research into timber usage in early modern Iberia. It just takes some clever navigation to find meaning in a sea of data.

Special thanks to Arne Solli from the University of Bergen for teaching me these techniques.

Nathan Gallagher is a ForSEAdiscovery pre-doctoral researcher at the University of Groningen, where he is researching timber exports to Spain during the 17th and 18th centuries.