Categories
Assignment 6

Diet and Exercise: It’s Not That Simple

When I was in the sixth grade, my science teacher had our entire class watch the documentary Super Size Me, a film that follows Director Morgan Spurlock’s month-long social experiment in which he mimicked the lifestyle of a habitual fast-food eater. He did this by (1) eating solely McDonald’s for an entire month and (2) disengaging in all additional exercise (to mirror the average number of steps per day for Americans at 5,000/day). Spurlock also ensured that he tracked how this lifestyle impacts his health by routinely visiting doctors to do weigh-ins and bloodwork and, as one could imagine, Spurlock’s health quickly deteriorated, and the measures were worse than what doctors had predicted. Not only did he gain 24.5 lbs in one month, but he also began experiencing heart palpitations by Day 21. As a sixth grader, I was absolutely appalled, and I vowed never to eat at McDonald’s again (with the exception of the occasional French fries or McFlurries).

            Spurlock’s film aimed to shed some light on the fast food industry and its influence over the American public. In watching his film, one might make a quick assumption (as I had) that the consumption of fast food is linked to obesity. As I have gotten older, though, and gained more exposure through courses like sociology about other factors that impact one’s adult life, I have grown a bit more skeptical about the true predictors of obesity. I have learned a lot in college, including within this past semester, not only in my Data Visualization for the Digital Humanities course but also in my other courses as well as in my own life.

In my own life, I am grateful to have things like the ability to attend Bucknell, to fly home for long breaks, and to go out to dinner with friends. At the same time, I’m very aware that not everybody has these same abilities. In fact, not everybody has the opportunity to go to college, and not because they don’t have the grades or because they aren’t smart enough, but because perhaps their mother needs them at home to help with their younger siblings, or to be an additional source of income that will help to pay rent. A lot of what I have discovered during my time at Bucknell is that people – for the most part – are a product of their environment, and thus have different values and skills based off of both how and where they were raised. This explains why when you look at people on Bucknell’s campus, you probably see people who look and behave very differently than those you might observe in the Walmart off of Route 15; it’s no secret that Bucknell’s students are very fitness-oriented.

For my final project, I wanted to take a closer look at this difference in fitness levels. To do this, I worked with the USDA’s Food Environment Atlas dataset, which is a compilation of statistics on food environment indicators with the purpose of stimulating research on the determinants of food choices and diet quality. The USDA collected this data by compiling multiple data sources, such as the 2009 Youth Risk Behavior Surveillance System and the U.S. Census Bureau. The Atlas has three categories of food environment factors: (1) food choices, (2) health and well-being, and (3) community characteristics. Within these categories, there are distinct elements that are considered for each county within the US: access, health, insecurity, restaurants, socioeconomic factors, and access to stores. I chose to explore at least one variable from each of these elements and observe how each of the variables I chose is related to a county’s average obesity rate.

The USDA’s Food Environment Atlas, with all data

The variables I chose to analyze in relation to a county’s average obesity rate were: the county’s percentage of the population with low access to stores, the county’s median household income, the county’s number of recreation and fitness facilities/1,000 people, the county’s poverty rate, and the county’s percentage of households with food insecurity. Initially, I had included an additional variable that would analyze the obesity rate in conjunction with the number of fast food restaurants/1,000 people, but statistical modeling on this relationship (which I had performed in my statistics class) demonstrated that it is rather insignificant. In fact, modeling this relationship in my statistics class was what inspired me to select this dataset for my final project in Data Visualization, because it made me wonder what the biggest socio-economic predictors of obesity truly are.

Attempt to combine data into one Tableau workbook was unsuccessful; error message kept appearing after the data was combined

In order to best capture the relationships each of my selected variables has to a county’s obesity rate, I chose to use Tableau for my visualization tool so that I could make use of its mapping and scatter plot functions. I found these to be quite useful and in plotting their relationships I discovered that the only variables that really displayed a significant relationship with county obesity rate were median household income, poverty rate, and average household food insecurity. I was additionally interested in how education played a role in obesity rate but didn’t have the data necessary to explore the relationship, so I met with Carrie to get some assistance. She helped me with obtaining the data, and I did the data plumbing on my own and had a lot of difficulty meshing the two data sources on Tableau. Eventually, I mapped education on its own in just two locations: San Diego (my hometown) and Union County (Lewisburg’s county). This was the result of another suggestion made by Carrie, which was to look at a few locations to compare and contrast them; I chose San Diego and Union County. Taking a closer look at two locations provides the opportunity for “constrained interaction at various checkpoints within a narrative, allowing the user to explore the data without veering too far from the intended narrative” (Segal and Heer, 1147)

Poverty rate and obesity rate (before moving obesity rate to y-axis)

Because I am only considering the education levels of people in San Diego and Union County, I cannot make a definitive statement about whether or not education level plays a role in obesity rate. San Diego’s average obesity rate is 19.1% while Union County’s is 28.2%, and I was surprised to find that although San Diego’s percentage of people with a bachelor’s degree or higher was larger than Union County’s at 35.7% versus 20.5%, San Diego also had 7.2% of people aged 25 or above with a 9th grade education or lower compared to Union County’s 5.6% (U.S. Census Bureau). This might indicate that while lower obesity rates are correlated to higher levels of education (which is actually a public statement made by the CDC), higher obesity rates might not be a result of lower levels of education (CDC Morbidity and Mortality Weekly Report).

Although this project could be frustrating at times due to the large amounts of data I was working with, the undependable program I used (lots of crashes!), and my inexperience with data plumbing, I found that I was very comfortable with building a story in data visualization when it was about a topic I am passionate about. I enjoyed using the Food Environment Atlas in both my statistics course as well as this one, because it provided me with a lot of different angles to gather insights from. I believe that looking at visualizations like the one I have created in this project are highly applicable for the real world, particularly for government legislation in determining how best to approach the issue of rising obesity rates. From my data, I would argue that the largest contributor to rising obesity rates isn’t a result of a lack of fitness facilities or from an abundance of fast food restaurants; rather, it’s a result of a lack of social equality and lack of access for those who need it most.

Works Cited

Economic Research Service (ERS), U.S. Department of Agriculture (USDA). Food Environment Atlas. https://www.ers.usda.gov/data-products/food-environment-atlas/

Ogden CL, Fakhouri TH, Carroll MD, et al. Prevalence of Obesity Among Adults, by Household Income and Education — United States, 2011–2014. MMWR Morb Mortal Wkly Rep 2017;66:1369–1373. DOI: http://dx.doi.org/10.15585/mmwr.mm6650a1

Segel, E, and J Heer. “Narrative Visualization: Telling Stories with Data.” IEEE Transactions on Visualization and Computer Graphics, vol. 16, no. 6, 2010, pp. 1139–1148., doi:10.1109/tvcg.2010.179.

U.S. Census Bureau, 2011-2015 American Community Survey 5-Year Estimates

Categories
Assignment 5

Assignment #5

Though initially frustrating to use and maneuver, as became more familiar with Gephi and its functions, I actually grew to like it more as well. I was chiefly concerned about the lack of an “undo” button – a button that I have come to realize I ordinarily rely heavily upon. The moment I reframed this in my mind from a bad thing to a good thing was the moment I began having fun with Gephi, in the sense that every click of every button was done with intent rather than simple exploration of the program.

With this, I knew that whatever data set I chose, it had to be something I was comfortable with – I wanted to make sure that I knew exactly what it was that I was exploring. As we looked in class at examples of social networks, dialogue between characters in Les Mis, and other datasets I browsed on my own, I discovered that I was actually really fascinated with network visualizations. When initially clicking through premade datasets, I tested the effects of pressing buttons like “filter” and seeing how it would impact the visualization. As I became more familiar with Gephi, though, I knew what questions I could ask that Gephi would be able to answer. I would be able to look at the visualization of the Marvel Cinematic Universe, for instance, and know how many times Iron Man interacted with Thor.

For my visualization, I chose to do an “exploratory data analysis . . . based around the idea that the network is important, but in as-yet unknown ways” by inputting my own data from my water polo team roster (Graham). I wanted to look at connections between my teammates based off of their home state, which club team they played for in high school, and whether or not they are in the College of Management at Bucknell. These connections are complex in that their variables are “highly interconnected and interdependent,” which are integral components of network visualizations (Lima).

As demonstrated by the network, many of my teammates are from California; this shared quality is the most prevalent commonality within the parameters of my network. This makes sense because water polo is a sport that, in the United States, is most popular on the west coast (likely due to the warmer weather and outdoor pools). If we had a team comprised of international students (which isn’t uncommon for water polo teams), my prediction is that the visualization representing club teams would be much more connected. In countries like Serbia, where water polo is a very popular sport, there are only a few club teams, and even fewer that play at a high enough level to pique the interest of US college coaches.

I also found it interesting that only three of us are in the College of Management. Most of my other teammates are in the College of Arts and Sciences, though their majors range anywhere from biology to international relations.

Unsurprisingly, two individuals in particular had a lot in common: Ally and Paige Furano, a junior and a sophomore, respectively. They are sisters and have played water polo together since the very beginning. Larissa Hodzic played for the same club team as them in high school, so these connections existed before they even entered Bucknell as freshmen. It would be interesting to add the men’s roster to my visualizations to see how our teams have crossed paths throughout the years.

I chose a random or “stochastic” layout, which is possible and legible using my data but “becomes more difficult to read as a network grows” (Graham). I used the stochastic layout because I thought it was unique; though my team functions as a unit, the network allows me to see how – at least at a very basic level – many of us are also different, some of us being so different that we aren’t even connected to others within the visualization! People who are otherwise key players on the team may not be central to the visualization; I think that this is interesting because it acknowledges the power that the creator of the visualization has and the bias that he or she may inadvertently have over the network’s structure. This network would look very different if I had chosen different parameters, but I didn’t – and that is the power of the creator. The importance of the network is that it demonstrates facets “such as decentralization, emergence, mutability, nonlinearity, and ultimately, diversity,” and yet it is critical that viewers are aware of the potential for information bias, even within this context. Overall, Gephi has been a very useful tool for me in illustrating the importance of looking at visualizations critically, and to take everything with a grain of salt. I would be interested to know how connected my team would be if I took more random variables into consideration.

Categories
Assignment 3

Assignment #3

In working with the Cushman Collection data, I was curious about the relationship between an individual’s gender and their position in the workforce. I theorized that – especially during the earlier years of the dataset – men would hold positions that were considered more prestigious when compared to positions held by women.

Positions Held by Women and Men

In using Palladio, I first wanted to look at all of the data; in this context, all of the positions held by both men and women for the provided dates. In this sense, this data is representational because the data must be understood within the context of its time. The arrangement of this visualization is important because it shows not only what positions are exclusive to men or women, but also which positions overlap in the two groups. Women’s positions that are exclusively for women are positions that embrace hegemonic femininity, or the traditional view of a woman’s place in society. The positions that are exclusively made up of women are ‘spouse’ and ‘aristocracy,’ positions that derive worth from some sort of outside source, whether it be a woman’s husband or a woman’s wealthy family. In contrast, the positions that are exclusively made up of men include positions such as author and journalist, positions that are often well-respected and viewed as legitimate. Positions that were shared between men and women range in how they are viewed by society, with the high-level shared positions being consultants, managers, and financiers. In first looking at the data, my hypothesis was that these higher-level positions would not be held by women until after the 20th century.

After getting an overview of positions held by both men and women in this dataset, I wanted to explore how these positions would be impacted with the passing of time. I did this by filtering the data by gender, then by date of the individual’s death (thus determining the time period in which they lived). Many of women’s positions shown on Palladio were post-1920s, whereas men’s positions were fairly evenly distributed throughout the available dates. In an effort to explore these relationships further, I moved onto Timeline JS to determine what was happening in the United States around this date range. I chose only a few key events and turning points to include in the timeline, because there are truly so many important events that have occurred surrounding the topic of women’s rights.

This information is valuable because, as Drucker notes, “almost all information visualizations are reifications of mis-information.” Being able to analyze the data for women’s positions in the United States both literally with regards to their careers as well as socially with regards to legislation allows viewers to gain context and understanding for the different positions that women have held over the years.

Categories
Assignment 2

Assignment #2

In order to construct my data, I met with Professor Faull to get a grasp on what it was that I wanted to look at. I hadn’t worked with Voyant nor Tableau before, so I felt overwhelmed with possibilities on what it was I could do. Professor Faull helped me to decide to look at “who are the enslaved people?” and to put this within the context of the 1860s texts.

Use of the word “slave” in Harriet A. Jacobs’ novel, Incidents in the Life of a Slave Girl
Most popular words throughout the corpus

Using Voyant, I analyzed both the corpus as a whole and the Jacobs reading. I chose the Jacobs reading specifically because it was the one document that was published in the 1860s, the time period of our data for the Tableau work. With Voyant, I studied the frequency of the word “slaves” as it appears in the Jacobs writing, which is meant to be a persuasive tool for the general public surrounding the idea that the ownership of slaves is wrong. I found it interesting that after the use of the word peaks around section 3 of the document, it has a sharp drop and doesn’t really rise back up.

Another idea I looked at with Voyant was the most popular words used throughout the corpus and how often each document used the most popular words. The main takeaway I gathered from this is how much more the Jacobs document uses the word “said” compared to the other documents in the corpus, indicating that this document is likely very different stylistically from the other documents; perhaps in its structure or point of view. Further, some consider higher use of the word “said” to indicate text that is less reliable than works that have less reported text.


Where did the enslaved people come from?
Which counties in the U.S. had the most enslaved people relative to their total population?

With Tableau, I was interested in mapping the geographical locations of slaves, both before they were enslaved and as they were enslaved people. It was challenging to find coherent information for the data surrounding the enslaved people’s country of origin, because there were many slight differences in some spellings of countries, e.g. “the Democratic Republic of Congo” vs “Congo.” Nonetheless, the data that was available for the number of enslaved people living in the United States was slightly more readily accessible, and I mapped it on Tableau by percentage of the county population that was enslaved.

In doing my own research, I wanted to know what other sources had to say about where enslaved people typically came from. I went to history.com, which noted that “of those Africans who arrived in the United States, nearly half came from two regions: Senegambia, the area comprising the Senegal and Gambia Rivers and the land between them, or today’s Senegal, Gambia, Guinea-Bissau and Mali; and west-central Africa, including what is now Angola, Congo, the Democratic Republic of Congo and Gabon.” Comparing this information to the maps I created makes sense for both the maps and the website, as the concentrated areas on my map primarily represent Senegambia and west-central Africa.

One thing I found interesting about the data set I was working with is how it left out some interesting and important details about the individuals who were taken from Africa to the Americas: some did not make it (source: gilderlehrman). In fact, about twelve percent of those who embarked did not survive the voyage to the Americas, but no one would know that by simply looking at the visualizations I created. A more effective visualization would be able to take this into consideration, as well as the number of individuals from/to each location.

Using Tableau and Voyant together is beneficial to an individual’s broader understanding of a topic. Tableau allows people to visually map data and quite literally see where people had been to where they were sent as enslaved people, which is quite powerful. Voyant allows people to gauge patterns and apply those to specific dates or time periods. Using the two pieces of software together makes for a powerful tool. Tanya Clement observed that the use of a visualization platform “combines the video streams from these cameras, and the resulting images duplicate a multidimensional viewpoint” and went on to discuss the encompassing vantage point, which is relevant through the use of Tableau and Voyant because the programs provide what Lima introduced in Chapter 2 as organicism. Organicism states that reality is best understood as an organic whole; in this context, Tableau and Voyant not only provide geographical locations, they also provide patterns of speech and text.

These tools allow us to perform what Clement describes as “differential reading,” which means to be both close and distant; subjective and objective. We do this through looking at human elements and analyzing things like vocabulary to deconstruct and then reconstruct data and text. As Professor Faull says in her essay, “The earliest ventures into thinking about visualization and literature are not at all digital, but rather focused on both the graphical rendering of plot and character and also the extraction of metadata from collections of documents.”

Categories
Assignment 1

Assignment #1

I chose these two visualizations because the information they contain is interesting to me – the first visualization is about trends in using the phrase “is the new” from various sources in 2005, and the second visualization is a chart that chronicles the growth of pop/rock music and its top selling artists from 1955 to 1978.

Both of these visualizations are static because they are predetermined for a specific year or a set period of time and will not change with new data. At the same time, although it isn’t dynamic in the sense that clicking an icon will bring its viewer to a new page, it is still interactive in the sense that it contains interesting content to hook the viewer in and make them want to look at the data. In the first visualization, it is similar to what DuBois describes as “sociological content,” the major difference being the level of importance in the topic (clearly, sentence patterns in 2005 are not nearly as important as the experiences of the Black American in the 1900s).

In the visualization about music, the creator uses a technique that reminds me of Meirelles’ description of node-link diagrams, which “use symbolic elements to stand for nodes, and lines to represent the connections between them” (55). In the case of this visualization, an arrow extending from a performers name shows the length of time that he/she remained a major hit maker. One issue surrounding this visualization is that it inherently comes with bias, as discussed in the reading by D’lanzio & Klein. They write, “If data are not available on a topic, no informed policy will be formulated; if a topic is not evident in standardized databases, then, in a self-fulfilling cycle, it is assumed to be unimportant.” Are there other visualizations surrounding the progression of music? Are the “major players” in this visualization key figures in other visualizations that might be similar to it? This goes back to Lima’s point in Visual Complexity, in which trees are associated with the “notion of centralism, or centralization, which expresses either an unequivocal concentration of power and authority in a central person or group of people,” in this case, the artists (43). Despite this, I believe that both of these visualizations capture interesting data and present their findings in a way that is aesthetically appealing. Though these visualizations may be static, they remain engaging by having creative topics.

With the Digital Humanities Sample Book, I chose to analyze “American Panorama” and “Mapping Metaphors.” “American Panorama” was created by the Digital Scholarship Lab at the University of Richmond and was funded by the Andrew W. Mellon Foundation. They pursued this project digitally because it is an interactive map that is grouped by collection on its homepage. Its sources are published on a separate webpage, where the Lab also cites its methods used to collect the data. The writing for the project is clear and interesting, clearly written as though it is intended for mass audiences rather than small groups of highly educated individuals. This project differentiates itself from a traditional or analogue research project by allowing the viewer to search by city or state using its interactive capabilities, rather than containing a ton of information that might be overwhelming to a viewer. The authors likely decided to make this interactive because it’s an effective way to give people a lot of information in a way that is engaging, and people can actually learn about the locations they want to learn about. It has additional familiarity in the sense that it uses Google Maps as its mapping host. With regards to strengths, this project excels with its ability to engage the viewer using interactive tools and interesting colors as well as the ability to search by city. One drawback is that it didn’t have every city, which poses the question: why did the creators choose the cities they chose? Unfortunately, the data they used is for a specific date range, so there isn’t much that the creators can do about it. Overall, though, the project is interesting and has come a long way since its website was created four years ago; it began with four projects, and it currently has eight.

“Mapping Metaphors” was slightly less impressive than “American Panorama.” The authors came from a team at the University of Glasgow who wanted to discover metaphorical connections within the English language, and they were funded by the Arts and Humanities Research Council. This visualization’s description is slightly more difficult to find than that of “Mapping Metaphors,” as it requires the viewer to navigate to another tab to find the purpose and central thesis. Generally speaking, though, the website’s navigation is creative in the sense that it has a giant spinning circle in the middle of the site to use for navigation. This navigation tool is both a strength and a weakness, as it looks interesting but it is challenging to actually use. Meirelles broke things down into how people see things; dominant colors included red-green and yellow-blue, yet this project uses red, green, and blue, which isn’t very aesthetically pleasing. The project’s source materials come from the thesaurus, but it’s unclear how they chose their categories, which may lead to some false assumptions within the data. Although it has potential to be a really interesting project, its execution falls short in a few ways, namely confusion with how to use the project’s main tool as well as the visual design and structure of the website.