Monthly Archives: April 2019

Activity 14: Constructing Data Visualizations

Partially born out of a frustrated effort to work through the technical issues that can arise from working with a new form of data representation, I produced the above two chord diagrams for this week’s exercise in data visualization. Both diagrams were made using data acquired from the Trans-Atlantic Slave Trade Database (the download page is available here).

After discovering Flourish offered a chord diagram feature–a type of visualization I had not heard of before–I realized this might have a lot of potential for representing slave trade networks, and I wanted to see if I could make that happen given the readily available access to data on the Trans Atlantic Slave Trade via the Slave Voyages website. The TAST Database is very thorough, consisting of data derived from the records of tens-of-thousands of individual enslaving voyages, encoding available details for each ship such as the number of people it carried as cargo or where it traveled to and from. That said, both of the above visualizations are constructed with three variables: the region in Africa that is “the imputed principal place of slave purchase”, the broad region in the world that is considered “the imputed principal place of slave disembarkation”, and the “total slaves on board at departure from last slaving port” (while this does not give an accurate representation of those who survived the journey across the ocean, I ultimately judged that it was better in such limited visualizations like these to skew towards the scale of people taken from Africa in the cargo holds of ships, rather than risk erasing them by using only the numbers of those who made it to their destination). The differences between the two visualizations is a matter of what voyages I drew from. My “Slave Trade from Different Regions of Africa” diagram is intended to highlight the diversity of areas people were taken from in the slave trade, and was made using the first 1500 voyages listed in the TAST Database (albeit, the “randomness” of this data sample has a high risk of being skewed by whether or not the voyages in the database were added in a relatively random order). Out of fascination for a region that is not often discussed in the history of the slave trade, my visualization of “Slave Voyages from Southeast Africa” consists strictly of the over 900 voyages that the database lists “Southeast Africa and the Indian Ocean Islands” as the principle source of human cargo. If nothing else, this draws attention to the range of places around the Atlantic these peoples from the Indian Ocean were brought to as strangers in chains in strange lands.

Admittedly, neither of these visualizations are ideal; my intent was to create a non-directional chord diagram in which both ends of a “chord” have the same thickness. In visualizing the slave trade and African diaspora, this makes sense for emphasizing the equal importance of peoples’ former homes and the lands they were brought to in bondage. Otherwise, we risk reducing the diversity of enslaved experiences to an amorphous destination of “America” or, even more often, recreating one of slavery’s inherent injustices of reducing peoples of wide ranging origins to an amorphous mass of “Africans.” Nonetheless, my visualizations do not achieve this. After failing to make the “non-directional” flow feature work with two different visualizations, what I am left with is one visualization depicting people funneled out of Southeast Africa into diverse destinations, and another depicting people from across the African continent funneling into a deceivingly singular “American” endpoint. While it can be informative and interesting to contemplate the relative scales of where people came from and ended up over the course of the slave trade (and even suggest that visually that the Southeast African slave trade might have covered a wider proportion of the TAST come the 19th century), the one directional nature of these visualizations makes these seem not enough like chord diagrams and too much like pie charts. More work and technical know-how is needed to make these achieve their potential.


Activity 13: Critiquing Data Visualizations

Before taking this class, I admit I would have been very inclined to take a data visualizations at face value. Data would at times feel like an unquestionable black box, and if I failed to understand a what a visualization was communicating, I took that as a personal fault rather than reason to question whether or not a visualization was good. But now, the questions come much more naturally: What is the argument of a visualization? By what means was the data constructed? What are its sources and variables? How well does the visualization fit its purpose? How intuitively does it convey its information and argument? And perhaps most importantly, what does the given visualization overlook or leave out? Does it actually contribute something to the analysis, or is it merely superficial?

A prime example of this process came in interrogating the above New York Times article. At a glance, this leaves the impression of a visualization that is well made (i.e. strong visual appeal) and fits the creator’s purpose. The author argues that there are certain patterns in the lives of people who become Congressional representatives which separate them from the American public, and we see in the visualization several points where the different representatives’ “paths” converge. But on closer inspection, the creator is trying to make it look like there is more substance than there actually is. The X-axis falsely implies a chronological order, when in reality, why “real estate” is closer to “military” than it is to “private law” on the X-axis is anyone’s guess. The Y-axis also has no clear meaning. Worse still, the visualization lacks data transparency: it offers only a mild disclaimer that items are not in chronological order in small print at the beginning, the methodology is stuck at the end in small print, whilst, despite allowing viewers to single out the “path” of an individual representative, the descriptive details like how long a said person was in a given line of work is left out. As far as this visualization is concerned, someone who worked on Wall Street for one year before quitting to get a Masters of Public Policy is the same as someone who got a Masters of Business Administration and then worked on Wall Street for 10 years. In a word: the author could have been far more honest with a different visualization. People can lie with data, either through convenient omissions or visualizations that obscure their flaws, making the ability to interrogate data all the more important as a historian or general consumer of knowledge.

Activity 12: Received and Derived Data

According to our reading of Hadley Wickham’s article “Tidy Data”, a dataset is “tidy” if (1) each variable forms a column, (2) each observation forms a row, and (3) each type of observational unit forms a table. Otherwise, the data is not structured in such a way so that the analyst can easily derive meaning from it (Wickham 1-3). For historians, this “tidy” structure is useful in part because of the sheer intuitiveness. While I was attempting to transcribe a portion of the 1799 George Washington slave ledger, I was struck upon the realization that I was building my spreadsheet to be “tidy” like this without intentionally designing it as such: designating the first column with the variable “name” and ascribing other variables like “location” or “labor status” to subsequent columns simply came naturally. The way George Washington structured his ledger was only useful to his own purposes, whilst the tidy structure means anyone can look from left to right to observe that the enslaved man Sam Cook was considered property of George Washington, lived in his Mansion House, and was considered too old to work (“passed labor”).

Of course, this process has also highlighted the significance of the contrast between received and derived data. The original intent behind the received data can be obscured or recreated in derived data (and which is “better” is always a contextual matter). For example, when I created the column “labor status,” I chose to take from how George Washington would categorize slaves at different locations under the special subheadings “passed labor” or “child,” and turn that into three “labor status” designations in my table: “working”, “passed labor”, or “child.” In this single decision, there were multiple layers of interpretive judgment calls. On one hand, it assumes that, in how Washington would categorize his slaves in this manner, there was a subtext of thinking of the enslaved as either useful, no longer useful, or eventually useful. But in deciding to use this subtext in the making of my derived data—which in theory is staying true to the intent of the received data George Washington left us—I also had to realize that I was recreating slavery’s injustice and dehumanization in doing so. In most circumstances, “child” would be considered a social status related to age and kinship; listing it as a labor status feels blatantly wrong. Yet on another flipside, if one were to make “child status” a yes-or-no category in the derived data, this would avoid recreating the injustice but at the cost of erasing the injustice contained in the received data. And when you get into deriving categories that simply are not listed in the received data (such as creating data on the gender of enslaved people based off their names or marital status in a ledger), these interpretive risks become an even steeper slope. If nothing else, this highlights why it is so important that we as historians need to be very transparent in saying not just what our final data analysis says, but how we reached that data and why we derived it as such. Data and statistics are powerful tools, and as the practice of deriving data shows, we can pull far more than meets the eye from information we are directly given to answer historical questions, but we can also create lies in doing so. In the same manner, one should not accept contemporary data at face value, but always ask how that data was created.

Activity 11 Creating Data Maps

For this week’s exercise in creating data maps, I ultimately chose to narrow in on using the census data to map white and enslaved populations by gender in Alabama in 1850. Admittedly, I came to this after experimenting with a few different scales and questions. I first thought to map how the locations of the free Black populations changed over time, but scrapped that after realizing the previous map by Lincoln Mullen we examined last week had already done that. I then attempted to see if I could notice any patterns in how the older (over 45) and younger (under 14) enslaved populations were distributed, but quickly realized the sheer difference in population sizes spread over the US and between these two age demographics made creating “bins” from which anything meaningful could be discerned difficult. Thus, this led me to narrow my scope down to the single state of Alabama in examining demographics of sufficiently comparable scale: male and female white and enslaved populations.

Considering the prevalence of rape and sexual exploitation in the history of American slavery, I wanted to know if there was a tendency for areas that had higher populations of white men and lower populations of white women to also have disproportionately high populations of Black enslaved women. In other words, I wanted to know if the census figures would suggest evidence of male enslavers actively seeking out black women as sexual property on a large scale. The map I created, with the population categorized into 24 evenly-sized bins, does not appear to confirm this hypothesis. Counties that had high populations of Black enslaved women also tended to have comparable populations of Black enslaved men. Quite the opposite of my original thoughts, some counties that had more enslaved women than men appeared to have similar white male and female populations. In hindsight, this should have been an expected result: if enslaved women were preferred as domestic servants, they would be in greater demand in counties that had more white married families. Additionally, as we know from reading both Miles’s and Schermerhorn’s books, accounts of married men sexually abusing their slaves were far from uncommon. That said, something that did jump out at me in this map is how, regardless of the numbers of white women or enslaved men, counties having higher numbers of enslaved women than white men were not uncommon, and were rather prevalent towards the middle of Alabama. If nothing else, a map such as this is an example of the dangers that can come with data mapping without sufficient care: while the map does not provide direct evidence for white men explicitly purchasing Black women as sexual objects, there is still plenty of room for such circumstances to have taken place.