Week 8 Text Analysis

This week, I learned about text analysis using Voyant and topic modelling with Mallet. Admittedly, even though this was not my first time seeing Voyant, the tool and methodology involved still can feel like an overwhelming experience. The very idea of “reading without reading” a whole corpus of texts does not always feel intuitive, and it still took some time for me to get my barings toward understanding “what” exactly is being represented in the various windows in Voyant.

Nonetheless, after downloading the corpus of plain text files off of the Documenting the American South website, I ran the corpus through Mallet to organize the texts into 40 different topics. Like everyone in the class, the first topic I examined in Voyant was the Haitian revolution (distinguished by the appearance of words like “Hayti” or “Ouverture”), but I quickly realized that text analysis is subject to similar limitations as primary reading, but in a scaled up manner: it is hard to comprehend what you are looking at without enough prior knowledge. Thus, after taking another look at the topics identified by Mallet, I noticed that one appeared to deal with the slave trade (“distinguished by words like, “Africa”, “ship”, and “boarded”). I then took the top 25 documents in that topic and labelled them with their respective publication date, as I was interested in seeing if there was any way rhetoric surrounding the slave trade had evolved over time. After loading the 25 documents into Voyant, I noticed the “correlations” feature said that there was a correlation between the words “Africa” and “great” in the corpus. I was then curious to see if this was a rhetorical correlation that evolved over time: were people more inclined to use the word “great” while referring to Africa as time progressed, and the slave trade increasingly became seen as an “antique of the past”?

The answer turned out to be a resounding “no”. The figures at the bottom of this post were generated using the “StreamGraph” and “Trends” tools in Voyant, where blue in the graphs represents the word “great” and the brown represents “africa”. While it is clear to see that someone who was speaking of Africa was also more likely to use the adjective “great”, by no means is there a clear chronological pattern. In hindsight, this should not have been surprising: while I was going through all the CSV file containing the metadata for each document, I noticed that most documents identified by Mallet in this topic were slave narratives. In that case, the real correlation might be as simple as, the more someone talked about Africa, the more likely they were to use a celebratory word like “great.” This also raises an issue that might arise from using topic modeling to guide text analysis: people that were indicting the slave trade were the most likely to be talking about it extensively in the first place. But as a consequence, topic modeling carries the risk of cherrypicking your document sample if done carelessly.

Leave a Reply

Your email address will not be published. Required fields are marked *