Visualizing character relations
Breaking Bad is a critically acclaimed American television series created by Vince Gilligan. The show originally aired on AMC from January 20, 2008 to September 29, 2013 and consists of five seasons with a total of 62 episodes. The plot revolves around Walter White, a seemingly unremarkable high school chemistry teacher whose life takes a dramatic turn when he is diagnosed with terminal lung cancer. Facing his mortality, Walter dives into the world of methamphetamine production, driven by a desperate need to secure his family’s financial future. As Walter’s transformation from a mild-mannered teacher to a ruthless criminal mastermind unfolds, Breaking Bad explores themes of morality, desperation, greed, and the human psyche. Its exceptional storytelling, unforgettable characters, and profound thematic depth have solidified its status as a cultural phenomenon and a pillar of television history.
As a fan of Breaking Bad, I’ve always been intrigued by how these character interactions drive the story forward. The connections between Walter, his family, his partner Jesse Pinkman, and the various allies and adversaries they encounter form a intricate web that evolves throughout the series.In this blog post, I’ll be taking a unique approach to analyzing these relationships. Using network analysis techniques, I aim to visualize and explore the connections between characters, potentially revealing patterns that might not be immediately obvious when watching the series. Whether you’re a longtime fan of “Breaking Bad” or new to the show, this analysis aims to provide an interesting perspective on what makes the series so compelling.
To model the relationships between characters in the Breaking Bad series, we need a reliable source of dialogue between characters. Wikiquote, an open-source, community-led project, provides an extensive collection of quotes contributed by fans. While it may not be exhaustive, it offers a sufficient dataset for our exploratory purposes. Fortunately, there is a dedicated page for the Breaking Bad franchise.
To extract quotes data from Wikiquote, there are a few methods available. One option is to use a web-scraping tool to iterate through each Wikiquote page related to the Breaking Bad franchise and scrape the text. However, a more efficient approach is to utilize the publicly available and regularly updated data dumps provided by the Wikimedia organization. These dumps are updated at least monthly, making them a reliable source
The following commands were used to download the latest Wikiquote articles to my machine.
wget https://dumps.wikimedia.org/enwikiquote/latest/enwikiquote-latest-pages-articles.xml.bz2
bzip2 -d enwikiquote-latest-pages-articles.xml.bz2
Wget
is a command-line tool that makes it possible to download the latest zipped xml data dump from the internet directly to your active directory. bzip2 -d
allows us to decompress the XML file which is necessary to process this unstructured dataset into a semistructured or tabular form. The result is a large XML file that contains all of the Wikiquote article contents per the date of the dump (mine was last updated on 20-Jul-2024 18:02).
Several developers have created utilities to facilitate this process, and I will be using one such tool: wickedQuotes by heyseth. This utility simplifies the extraction and processing of quotes from the data dump, allowing us to focus on analyzing the interactions and relationships between the characters in Breaking Bad.
While some quotes are “one-liners” by a single character, I am interested in interactions between characters. My goal is to have a visualization that allows us to compare which seasons/episodes/characters had the most quoted conversations. Ideally, we’d also like to know if a certain character had many conversations in one episode and fewer in others? Were there any outlier episodes with lots of conversations? Was there one season with many conversations? All these domain tasks map neatly to abstract problems of hierarchical data. Hierarchical visualization techniques will be a reasonable solution given these questions.