Data Visualization

This week we have been talking about the use of text mining and data visualization by historians. I know “text mining” sounds pretty technical but I’d bet most people have dealt with it before. A familiar example of text mining is a word cloud. Below is an example that I made from our marriage license spread sheet.

Generated by
Word cloud based on 1921 Clay County Marriages

One of the cool things about text mining and data visualization within a word cloud is that it shows us which words are used most frequently. In this word cloud, you can see that some of our most used words are Clay, Cass, North, Dakota, Minnesota, etc. It’s immediately obvious that our project is tied to Minnesota . However, one downside is that “North Dakota,” “Clay County,” and “Cass County” were all processed as individual words instead of phrases. Another misrepresentation is the word “Township” which has two issues in this word cloud. First, you’ll notice that “Township” isn’t shown in this map at all despite being one of the most frequently used words in our spreadsheet. This is due to the shape and dimensions of the cloud. The program decided that there wasn’t room to include “Township” and keep it the sized needed to represent its frequency so it dropped the word altogether. This happened with a few other words as well and I had to fiddle with the dimensions quite a bit to even get as many to appear as they do above. Second, “Township” has the same problem as “North Dakota.” The word “Township” should be connected to the actual town names. As far as analysis goes, it’s not very helpful to just see the word “Township” and it would be more beneficial to see the frequency of specific locations such as “Hawley Township” to see how often we found couples from these places.

Now obviously I only spent about 10 minutes on this word cloud so its nowhere near the level of a professional cloud to be used in marketing or whatnot. If I were to use this to present our project I would go back in and see what I could do to tie phrases together and exclude meaningless words. I still think it’s a good example to show that text mining and data visualization is useful and worth exploring.

For a more professional example, take a look at On Broadway, a visual representation of data taken from the entirety of Broadway in New York City in 2014. This includes street views, Twitter and Instagram data, taxi drop offs, and more. It presents all of this data in an interactive graph with pictures, color frequency charts, and daily averages. As opposed to the word cloud, this example is not text based and instead relies more on numerical data and images and makes them more accessible to the public. Instead of looking at a scientific graph or a statistical analysis, users can scroll through the On Broadway application. Or, if you are interested in an analysis of this data, the website has a section dedicated to noticed patterns. For example, based on the data they have separated Broadway into two parts: Broadway 1 to the south and Broadway 2 to the north, with the separation based on activity. Broadway 1 has high tourist activity leading to high social media activity and more taxi drop offs. If you isolate Time’s Square, which is in Broadway 1, this is where you see spikes of activity across the board. Logically, this information makes sense and if given data I could have figured it out. However it is much more interesting to be able to see it visually and be able to interact with the data in ways I wouldn’t have been able to on paper.

I think this interactive component is the most important piece of data visualization when it comes to museums and historical societies. If you ever hear me talk about my ideas especially in the museum industry, I frequently rant about how we are living in a society and economy based on experiences. At a museum we aren’t selling information or education, that can all be gotten online. No, we’re selling the experience of visiting a museum. Visitors want something they can interact with and use that it unique in comparison to the information they can get online or in books and journals. By using data mining and visualization, we can create these interactive features to enhance the experience.

Our world is already hugely dependent on technology and the internet. Now with the all of the quarantine and social distancing due to Coronavirus or Covid-19, people are turning to the internet even more for their entertainment and education. Museums worldwide have been doing a pretty good job at adapting to this by offering tours, seminars, exhibits, and performances to the public online. This is all thanks to these digital technologies and the field of digital history. We don’t know yet what kind of lasting impact the Coronavirus will have on our world but I’m going to bet this whole idea of accessing art and experiences through the internet isn’t going to fade. It’s up to us to take advantage of the tools in front of us and create experiences people want to be a part of.

Leave a Reply

Your email address will not be published. Required fields are marked *