Word clouds

by Nick

A very useful post the other day from Lisa Spiro at Digital Scholarship in the Humanities, covering two things:

  • Using word clouds
  • Text comparison tools

I’ve been messing around with both over the last couple of days. Below are some thoughts on uses of word clouds.

Word clouds are a useful visual representation of the frequency with which a word appears – the bigger the word in the cloud, the more it appears in the text. They’re often used for blogs to represent tags the blogger has used. I’ve got two in the sidebar on the right, one for the categories I sort my posts into and one for the tags I’ve used.

Words clouds aren’t horribly difficult things to learn how to program. I’ve been following Bill Turkel’s wiki on how to become a programming historian and have managed to make my own using Python. But if you want to cheat, Wordle offers you a much easier way. Just cut and paste your text into the website and it automatically generates a cloud for you. You can then customise it within a range of styles.

How is this useful for historians? Well, I’m in the early stages of planning my dissertation and one use I’ve found has been to refine my topic. There are two extremes in choosing a thesis: you can start with a small topic and work your way up to finding the overall themes it will address, or start with a big theme and work your way down. If you’re choosing the former, word clouds can be a very quick and helpful way of distilling out key concepts.

As an example, I’ve cut and pasted the text for Henry Walker – one of the Civil War journalists and pamphleteers I’m hoping to study in my dissertation – from the Dictionary of National Biography.

What can you glean from this? “Perfect” and “Occurences” occur quite a lot, naturally enough given Perfect Occurences was a newsbook he edited. But what about other titles he edited? They’re less prominent. Is this something significant about Walker’s legacy, or does it also tell us something about his biographer’s priorities? “Trade” and “apprenticeship” also spring out – again, significant given that Walker started life as an ironmonger and did not spend his whole career as a parliamentary hack. This is a context sometimes ignored in his life. “Hebrew” also comes out quite strongly. Walker was fluent in it, but what significance should we read into this – is it of importance for understanding his writing?

Let’s compare this text to the biography of Walker in the early 20th century Cambridge Companion to English Literature.

Perfect Occurences is nowhere to be seen. “Cromwell” and “Charles” loom much larger in the cloud. “Drogheda” also looks quite strong, something that doesn’t emerge in the DNB’s cloud.

These are just a few of the questions that occurred to me when I generated this cloud. They’ve all given me leads to follow up or do more thinking about, both in relation to Walker and the historiography surrounding him, and I was able to do it instantly without a detailed trawl through the text. Now in Walker’s case his biography is very short, and naturally you would go through it in detail anyway – but for much longer texts, I can see Wordle having even more potential. With the set of key words it generates, you can then go trawling through other resources such as JSTOR and the RHS bibliography, looking for additional relevant secondary works. It’s not a substitute for reading and analysing a text yourself in detail. But it does provide a very useful supplement, particularly if you are trying to summarise a text.

Next time I will give some details about the uses I’ve made of text comparison tools.