What Historians Can Do with Voyant Tools

Voyant Tools is a set of digital tools that enable researchers to explore and interpret texts in new ways.  Its opensource, web-based application has five main features:

1. Cirrus – The word cloud feature enlarges and centrally positions the words that occur most frequently in a corpus.  It includes a collocation visualization that shows how close the words are to each other within a text.

2. Reader – In a textbox, the researcher can see the text, key terms are highlighted, and a bar graph at the bottom shows the length of the documents in the corpus.

3. Trends – Visualization represents the frequencies of terms across documents in a corpus or across segments in a document.

4. Summary – Information such as vocabulary density and readability in the documents within the corpus is easily accessible.

5. Contexts – The words on each side of a keyword are visible, and it offers a correlation tool that tells what terms each word is likely to appear near.

In the Introduction to Digital Humanities at George Mason University, I had the opportunity to explore Voyant Tools.  My assignment was to explore a corpus created from the more than 2,000 interviews with formerly enslaved people that were created by the Works Progress Administration Federal Writers’ Project in the 1930s.  The interviews are available in the public domain as part of the Library of Congress American Memory website. Files were created for class use from the .txt file of the transcription of the volume from Project Gutenberg. The 17 state files, from which interviews were recorded, included only the interviews. (Introductory and editorial information was removed.)  Voyant can read .txt files, PDFs, .doc files and other formats.

The cirrus word cloud is the most easily used feature in Voyant Tools.  It searches the corpus for the most frequent words.  The entire corpus includes 10,094 instances of the word “old.” 

The cirrus or word cloud for the corpus (17 state documents from the WPA slave interviews completed in the 1930s).

The other features in Voyant work with the cirrus so if you click on “Old”, a trend graph also shows the usage of the word in the text from the 17 different states.  When I click on the term “old” in the reader, the context tool displays a few words on either side of “old” for each instance it is found in the corpus (there is also a slide to expand the number of words shown).  This feature allows the researcher to see the usage of the term.  The first 8 appearances of “old” in the corpus reference how people (from the individuals being interviewed to Jefferson Davis) had aged since the Civil War as well as references to the age people were early in life when major historical or life events occurred (as in “five years old”).

The context in Voyant Tools showing “old.”

Next, the researcher can see the frequency of the selected word within specific documents. (Click on a dot representing a word in a state/document in the trends graph. In the menu that appears, and click on Documents to display Trends only in that state/document.)

The trend graph in Voyant Tools shows which documents are most likely to include a term.

When I did this for Tennessee, in which “old” appeared far less regularly in proportion to the text than it did in the corpus, I found that the context and use shifted. “Old” still referenced buildings and people, but it also referenced the desire to receive overdue “old age pensions”.  Most interesting was the reference to relationships: “old master” and “old mistress”. This phrase indicated proximity and relationship the way aunt or uncle might but without kinship and affection. “Old times” and activities are also present.  This is not surprising since life narrative interviews often invoke nostalgia. (https://tinyurl.com/4m4bbmss )

Context with “Old” in the Tennessee files.

Oklahoma is the state where “old” occurs most frequently as a proportion of the text. The state’s interviews have many more mentions of the interviewee being a young child (as in “five years old”) than Tennessee; “my old master” and “mistress” also appear. Voyant Tools can reveal these differences but it cannot tell us why.  Did the variance have to do with the questions interviewers asked?  Did the difference result from demographic differences or post-Civil War migrations?  Or are they mere coincidences?

States also have different significant terms.  “Mother” was one of the top terms in the Kansas documents (https://tinyurl.com/4er4mrce), but it was not among the top three in the cirrus for the entire corpus. 

The cirrus showing only the Kansas document.

Although meanings seemed to be similar across the corpus, the Kansas context shows the term especially associated with sorrow, slaves, people getting in touch at the end of their lives, and never seeing one another again.  There is a correlation feature is Voyant Tools that allows the researcher to see if the use of terms rises and falls with others.  (Click on the correlations; type the term that you want to look for in the search bar; and sort for the state or documents you want on the bottom right corner under “Scale”.) In the Kansas documents, mother appears with Christmas, left, grandmother, body, bury, cook, and gone.

The correlation (under context) showing the proximity of terms to mother in the Kansas interviews.

The prevalence of terms connected to love and sorrow is striking.  They appear throughout the corpus but in the entire corpus search the correlations more often merely identify people.

Voyant Tools provides much more than search capability.  It encourages the researcher to see new patterns and can help the researcher identify new questions.

Leave a comment

Your email address will not be published. Required fields are marked *