Posted on 2014/10/14 by

Big Data and a Metallica Database

For this probe I chose to shift disciplines from my commitment to English literary studies to one – what I would call one of my primary musical passions – with which I have never engaged in any formal examination. The extent to which I chose to explore, categorize, and quantify my interest in Metallica is without a doubt a reflection of my juvenile state of mind, and on a more prominent level, perhaps reflects the degree to which I lack any serious creative potential as a graduate student. Despite my questionable mind frame, I am nevertheless approaching this task with a lot of passion and dedication because Metallica is quite frankly one of the only bands to which I have remained committed from the first time I heard their music. This motivation hardly provides relevant justification for making Metallica the object of my examination; however, my connection with this music serves as a useful analogy in terms of the way that I understand big data and shifting research standards in the humanities.

I first heard a Metallica song when I was fourteen years old and 12 years later I still occasionally listen to their tracks. In the early days of my love of Metallica I purchased all of their albums in CD format, while I likewise purchased as many of their concert videos that I could afford. For additional live performances in the form of either audio or video files, I would use sharing applications such as Bearshare and later Kazaa to get as much of their music and concert footage as I possible could. I recall waiting days for rare videos that lacked substantial support to download, and I repeated this waiting process numerous times for the sake of experiencing more and more Metallica. Beyond their official releases, my perception of the band’s catalogue of music and videos gradually became filtered through what I could access on these sharing platforms. The corpus with which I would work remained contingent upon what I could find on these applications. In total, I downloaded hundreds of live tracks and videos. I cannot provide an exact figure because I no longer have access to the computer on which I downloaded all of those files, and I did not back up those files on an external hard drive. What a fool I was. Nevertheless, there was a time when I believed that I had accessed every possible live track and performance that a Metallica fan could possibly obtain. Again, what I fool I was…

If you enter Metallica in Youtube’s search field, you will see that there are currently over nine million related videos on this platform alone. There is not enough time in a human life for a person to watch and analyze all of these videos…

In light of this particularly daunting bit of information, I chose to draw all of the video files that I recall downloading and watching at one time or another from the millions of video files on Youtube. I categorized them in a personal database so that I may have a record of the clips that I have viewed in the past twelve years. To create my database I used Zotero which – from what I understand – is a research tool that synchronizes directly with Firefox and Chrome browsers, and according to this software, I have viewed three hundred and seventeen different Metallica videos. However, these are the only ones that I can recall watching and I am sure that if I were to explore further into the depths of Youtube I could easily add another hundred to the collection.

Screen Shot 2014-10-14 at 2.01.20 PM

Although this database helped create a comprehensible data set from which I could then categorize and interpret this collection, I was approaching this task according to my conditioning as a student of literature. As jockers states: “The literary scholar of the twenty-first century can no longer be content with anecdotal evidence, with random ‘things’ gathered from a few, even ‘representative,’ texts” (8). My approach to this database did not gravitate beyond what a ‘textual’ read would enable when I first examined the corpus, and I remained content in my traditional approach of which Jockers warns. Initially, I would argue based on the performance footage that I acquired that Metallica’s Justice album and tour indicates their alcohol and cocaine fueled critique of contemporary economic practices in the U.S., the perpetual destruction of social and personal relations due to warfare, and the alienated self in relation to a deteriorating national environment, while the album likewise indicates the bands internal power dynamics due to the record’s mastering process and their adjustments to their new bass player. This type of assessment represents what we would typically do as literature students: collect of set of information, interpret it in relation to its content and related texts, and draw a set of conclusions based on our examinations. Yet, like Jockers claims, “the literary researcher must embrace new, and largely computational, ways of gathering evidence. […] Today’s student of literature must be adept at reading and gathering evidence from individual texts and equally adept at accessing and mining digital-text repositories” (9). Consequently, although I engaged with a digital medium in my analysis, I did not gravitate beyond gathering evidence and making claims based on specific samples of a larger whole. In fact, like he goes on the suggest, “The very object of analysis shifts from looking at the individual occurrences of a feature in context to looking at the trends and patterns of that feature aggregated over an entire corpus.” (25). The potential for new scholarship through this reoriented approach led me to reconsider my initial angle on my database, and I subsequently attempted to utilize one of the tools that is specified on our syllabus for this week as a new approach to my corpus.

I consulted Graham, Milligan, and Weingart’s online text The Historian’s Macroscope: Big Digital History in which they identify (among many other things) data visualization in the form of word clouds as an effective first step when working with a large dataset and mining for information (to use Jocker’s analogy). However, as Underwood defines in “Where to start with text mining,” “There are two kinds of obstacles [that we confront when text mining]: getting the data you need, and getting the digital skills you need.” Fortunately, I had the data, but was skeptical of my digital skills when endeavoring to utilize a word cloud. Fortunately, the use of word clouds on is rather basic and user friendly and I attempted to produce a word cloud based on the lyric content of a collection of Metallica songs from my database. Although I was eager to produce this visualization, my Firefox was in need of an updated plug-in in order to produce the word cloud. When I attempted to install the new plug-in, my Firefox crashed and I can no longer open it since the “identity of the developer cannot be confirmed’ when I attempt to initiate the browser. I am sure that I would have been able to produce a rather interesting word cloud had I not encountered this technical issue (and had more tech knowledge to overcome this problem), but regardless of this minor conundrum, the point of exploring a word cloud was to pursue an avenue of research that engages with big data and pushes research practices into new avenues.

Again, as Jockers states, “Broad attempts to generalize about a period or about a genre by reading and synthesizing a series of texts are just another sort of microanalysis. This is simply close reading, selective sampling, of multiple ‘cases’; individual texts are digested, and then generalizations are drawn. It remains largely qualitative approach” (25). The intersection between qualitative approaches which are much more familiar to my current research practices, and the potential for alternative research avenues according to what Underwood and Graham, Miligan, and Weingart discuss, perhaps lends some credibility to what I have attempted to pursue, however mundane my work with this database may be. A typical qualitative approach has its limitations according to the model that I followed, although the potential for nuance within a synthesized method between what Jocker’s identifies as macro and micro scales through his economics analogy introduces a rich environment for divers scholarship.

The purpose of this exercise that I outlined for myself was simply to disrupt something familiar and what I assumed to be understandable. This database endeavor was a stretch from my initial downloading practices as a fourteen year old Metallica fan, and I find it rather intriguing to see how my experience of this music shifts according to a new digital framework that enables different – perhaps distant – readings of this band’s material. Granted, I cannot say that I have an entirely new outlook on their music, or that these exercises have opened up a new way to engage with their material. In fact I am wondering how this exercise has contributed anything valid at all. To be honest, I have thought about this band enough and I am eager to move on to something else, but this exercise has, first and foremost, reinforced the dynamics between a typical qualitative approach to a given object and the potential to expand a reading along avenues that redefine potential scholarship. With ever-growing datasets that seem far too overwhelming to approach in a scholarly context, at the very least there a tools to help quantify and engage with materials that would ordinarily seem far too broad and obscure according to my skill sets. So there are over nine million Metallica videos on Youtube…I think I will leave it at that.

Starting with a familiar dataset helped me interpret how big data operates according to the parameters that I set for my particular project. I can honestly say that I still do not know where this will lead me in my own research as a literature student, but the shift towards these new standards undoubtedly now feels palpable and has reoriented the familiar. Will big data and text mining approaches significantly impact my current practices in the near future? Of this I am uncertain, but I am open for the potential…

And if anyone can explain to me how this Probe is not a Bootcamp, then I would be very grateful!


Works Cited

Jockers, Matthew A. “Revolution,” “Tradition,” “Macroanalysis.” Macroanalysis: Digital Methods for Literary History. Urbana: University of Illinois Press, 2013. 1-32.

Underwood, Ted. “Where to Start with Text Mining.” The Stone and the Shell. August 14, 2012.

Graham, S., Milligan, I., Weingart, S. “Topic Modeling By Hand.”

Print Friendly, PDF & Email