Categories
Data Sets Data Visualization and Art Pie charts

Pie Charts

Pie charts are useful for depicting parts of a whole in a dataset. However, most people think that pie charts are a one-size-fits-all solution for data visualization, often making a visual mess of the data or telling a misleading story about its message.

Recently, the Pew Research Center published a report about the value of online instruction in the wake of the ongoing COVID-19 pandemic. The majority of participants in one study covered in the report say that online instruction does not have the same value as in-person instruction.

Source: Pew Research Center

Looking at the pie chart above from a design perspective, the researchers used a darker shade of teal to depict the majority opinion. In comparison, the lighter shade shows the minority opinion. The pie slice that is greyed out represents the 2% of survey participants who did not answer or had no opinion.

Pie Charts Related to My Interests

While looking for data that is better suited to a pie chart, I recall a census site created by a small group of Final Fantasy XIV Online players, XIV Census. Though the data shown on the site may not be up-to-date (only shows data collected as of April 2020), I did find a pie chart for Grand Company statistics. Hovering my mouse over each slice only shows the number of characters enlisted in each Grand Company, as well as players who have not progressed far enough into the game’s story to select a company. However, the information does not list the percentages, so I took it upon myself to properly graph the data.

Source: XIV Census

The chart on the left depicts Grand Company affiliation across all characters, while the chart on the right shows where all active characters are enlisted. Among all characters, the Company distribution seems fairly equal. Compared to the number of active players, however, it is evident that the Maelstrom and Order of the Twin Adder are the two most popular Grand Companies.

Categories
Data Sets data visualizations

Lollipop Chart

What Are Lollipop Charts

Lollipop charts are alternatives to traditional bar graphs that present their data in more visually appealing ways. They are best used when the differences between the data you are trying to present is quite large. When using a standard bar graph, this can make the chart look very cluttered and impeded the viewers ability to quickly discern what the chart is trying to say. The lollipop chart gets around this issue by forgoing the use of wide bars and instead using small columns ending dots, similar to a lollipop, to show the end point of the data. Ben Neville, a blogger at the data visualization site Tableau.com, has a post that looks much deeper into the topic than I do here.

My Lollipop Chart

As you all are probably aware of by now, I am very interested in renewable energy, so I have decided to look into the renewable energy capacity of the top 20 producing countries. This data can be found on the International Renewable Energy Agency’s website. I created mine by following the steps outlined in the “Effective Data Visualization” textbook.

Categories
Data Sets Visualization Tools

Lollipop Chart

This week we all created lollipop charts. While looking online for different explanations on how to make a lollipop chart, I came across a really easy one. It did not explain the steps clearly towards the end but I figured it out with the textbook for the last few steps. Compared to the textbook this website: excelk.com went straight to the end product in a few steps.

A lollipop chart is just like a bar graph, but easier to follow when you need to display a large number of values and when those values are high. This type of chart is also more appealing to look at.

For my data set, I was looking through the CDC website and came across Depression in the US Household Population. They had graphs displaying the percentage of symptoms persons of ages 12+ with depression. I picked to use race and ethnicity as my x-axis vs gender, and ages. But also the PDF form which included the data set.

I will say this lollipop graph was pretty easy to make thanks to the website I used. The only problem I came across is trying to change the colors on each of the x-axes ( I wanted no symptom, mild, moderate, and severe on the graph to have their own color to better distinguish the severities.) I also wanted to add data labels, but I could not figure out how to.

This graph shows persons of ages 12 and over by the severity of depressive symptoms and their race and Hispanic origin in the US, 2009-2012.
Categories
Data Sets

Renewable Energy in the US

A Brief Introduction

Given my background in Sustainability, I chose to explore the trends of energy production and consumption in the United States. Energy in the United States is an interesting topic as, due to its sheer size and geographic diversity, the United States has the potential to utilize a wide variety of renewable energy sources. Information regarding the production and consumption of energy in the US has been cataloged by the Energy Information Administration (EIA) ever since it was founded in 1974. In addition to providing the raw data, the EIA also publishes a wide variety of reports and infographics that can serve as excellent resources when designing your own graphics. Despite the EIA’s usefulness, I found that University of Michigan has put together a fact sheet that can serve as a much better introduction to the topic for those who are not already familiar with the Energy field. It is this fact sheet that I will be highlighting and discussing in the following sections of this post.

Michigan Fact Sheet

Drawing from the EIA’s statistics and other sources, the Michigan fact sheet has put together a number of graphics that make it easy to compare the energy production of all types of renewable energy sources. Below are examples of the two main types of data comparisons that the fact sheet displays, changes over time and differences between countries.

These graphs help translate the raw data presented by the EIA into something that is easier to understand at a glance. They are perfect for displaying simple, overarching trends for their respective fields, but their simplicity makes them ill-suited for handling data with many facets. This is where the following graphics excel.

The field of energy production can be quite complex when accounting for all the various ways of generating energy and the scale at which that energy is generated/utilized. These graphics help the viewer understand this fact by visually differentiating between the individual sub-categories of a much larger piece of data.

Categories
Data Sets

Men versus Women in College

Datasets for Potential Project Use

I decided to look at how current gender trends in college may result in a different version of America in the future. Ever since the early 1980s, women have been earning more Associate’s and Bachelor’s degrees than men. It took a little while longer for women to surpass men in earning professional and Doctor’s degrees, but they have earned more of these degrees than men on average ever since about the mid-2000s. The education gap between men and women has been projected to increase in women’s favor as time goes on. Here is the summary of raw data that is available through the Department of Education. https://nces.ed.gov/programs/digest/d19/tables/dt19_318.10.asp

Below is a fairly recent table showing percentages of US college degrees by gender and a projection of how it may look in the future. Tables are an easy way to compare numbers, but they often lack visual appeal. These are good for displaying accurate numbers, but there is not much to draw in the audience. Comparisons must be mathematically interpreted.

These are good for obtaining averages of a data set because there is little work needed to get the numbers.

We can also use a line graph to compare numbers, which is a very effective way to draw in the audience through the color scheme and trends in the data that are easy to interpret through rising/declining figures. The following graph is not very recent, but it gives us a better visual representation of the growing gender gap in American colleges.

Line graphs allow the audience to easily interpret data over a timeline. Color schemes, labeling, and trends increase visual appeal.

What these numbers mean is that more college-educated women are entering the workforce than men. You would expect these figures to mean that women are the majority of today’s American breadwinners. However, although women have been earning more post-secondary degrees than men in recent times, men still currently dominate the earnings reports while holding the same job positions. This is probably due to a long history of oppression passed down through the generations and complications of motherhood. One argument is that it is difficult to measure work ethic that is not on an individual basis, so we may never know for sure in regards to everyone’s specific situation. We can compare these income inequalities in a bar graph.

Bar graphs are phenomenal for side by side number comparisons that include many subcategories. Color schemes, length of the bars, and data labels make it less difficult to interpret data.

There is a lot to dissect when it comes to gender inequalities because we cannot freely assume that women suffer in all areas of life more than men, but this is a decent general overview of how we can utilize datasets for project use on this subject.

Categories
Data Sets Data Visualization and Art Examples Infographics

Comparing Numbers

Data is everywhere. It’s been said ad nauseam, but advances in technology at scientific methodology have made it possible to collect and display data more efficiently. The major problem, however, is making sense of a set of data. Often, people ask, “What do these numbers mean,” especially when attempting to compare two or more numbers. Thankfully, data visualization makes it easier to comprehend the data and its message. The only problem is figuring out how to tell a story by comparing numbers using data visualization.

Prior to the emergence of the COVID-19 pandemic, I rarely pay attention to the political climate in the United States, opting instead to focus on my academics or escape to virtual worlds via computer gaming. Yet deep down, I’ve always wondered if others like me do pay attention to politics and other news topics, especially where a presidential election is concerned.

How interested have people been in previous elections?

Pulling data from Google Trends, I went ahead to see how interested Americans have been in previous elections since the year 2000. However, the data only goes as far back as 2004, since Google implemented search interest trend analysis around that time.

A chart depicting search interest for the U.S. presidential election since 2004. Source: Google Trends.

There are four significant peaks in search interest (shown in the chart above), and those spikes occurred in November of each general election year– 2004, 2008, 2012, and 2016– when the polls opened to the public on Election Day. In 2004, George W. Bush won his second term as President, though the search interest pales compared to 2016’s peak. That year, businessman Donald Trump ran for office and won the Electoral College vote to become the 45th president of the United States despite losing the popular vote to Hillary Clinton. Initially, in March of that year, Americans thought someone like him wouldn’t be able to secure the presidency. I recall seeing memes on the internet emerging around when he announced his candidacy but thought nothing else of it until he won the election. Based on the data from previous years, I can guess that search interest will spike again this year around Election Day. With issues such as COVID-19 and social unrest in the wake of George Floyd’s death at the hands of police officers, the search interest peak among U.S. voters will most likely dwarf that of 2016’s data.

When the search interest data is viewed another way, say on a map of the nation, it’s evident that some states had a stronger interest in election results than others.

A regional chart depicting election search interest by state. Source: Google Trends.

Some regions on the map, such as the District of Columbia and smaller states like New Jersey, do not have data labels tied to them due to graphing limitations in Excel. Still, Google Trends does show that the District of Columbia has the highest volume of interest when it comes to the presidential election. After that, New Jersey comes in second on a top-five comparison chart, followed by North Dakota, Maine, and Alaska.

The data provided by Google trends gives a general idea of how interested Americans are in general elections. But what do those numbers look like among demographics, and what do Americans look for in a candidate?

Breaking down election interest among demographics

Moving forward, I decided to pay a visit to the Pew Research Center’s website, which holds a repository of data trends for this year’s presidential election. The topics covered range from election-related social movements to hot-button issues such as COVID-19 and police reform.

One issue that Donald Trump has made a fuss about for the past four years is voter fraud, but it seems that more Americans are paying attention to those claims, especially where fraud resulting in Trump’s re-election is concerned. President Trump has allegedly made moves to attempt to block mail-in voting, a move that health experts say is more pandemic-friendly than in-person voting. Political analysts believe that if Americans are encouraged to vote by mail, the results could spell the end of Donald Trump’s presidency. Moreover, there is media documentation of President Trump crying foul on mail-in voting as a means of committing voter fraud.

Source: Pew Research Center

However, the Pew Research Center’s data shows that it’s mostly Republican voters who share President Trump’s sentiment about voter fraud. Democrats, on the other hand, do not seem to be concerned about the issue. Furthermore, the research shows that voter preference in news sources may influence political opinion. The disclaimer states that Republican voters tend to watch Fox News or listen to radio talk shows, while Democrats prefer to get their news from MSNBC and the New York Times.

Political divide based on news source is one matter facing the nation this election year, but influence from conspiracy theorists and foreign powers is quite another. Misinformation propagated by groups such as QAnon and Russian government-backed “troll farms” has led to confusion about the election, even paranoia, and cases of domestic terrorism.

Source: Pew Research Center

Roughly 47% of registered American voters have heard about QAnon. Among those who have heard o the conspiracy theory group, 40% of Republican-leaning voters believe that QAnon’s claims chalk up to being a good thing. On the other hand, 90% of Democratic-leaning voters believe that the misinformation QAnon spreads is bad for the country. Moreover, QAnon isn’t the only roadblock in getting clear-cut, unbiased information about the election. Since Donald Trump’s election, numerous reports have been published exposing negative influence on the United States’ affairs from global powers such as Russia. Many of those reports center on online “troll farms” funded and backed by the Kremlin. Their purpose is to flood social media sites such as Facebook with misleading information about Trump’s opponents in the hopes that their false claims would be shared by unsuspecting and uneducated users. Even worse, many Americans fear that hackers supported by foreign governments will alter the election results or create other forms of chaos that would impact the election.

Source: Pew Research Center

Two years ago, 67% of Americans in both parties believed that Russia would attempt to influence the U.S. elections in some capacity, with 50% of Republicans believing that Russia would interfere with the election compared to 80% of Democrats who expect it. The Pew Research Center asked the same question twice this year amid the COVID-19 pandemic, and the results show an increased concern in election interference by the Russians or other foreign countries. 75% of Americans reported in August that they are concerned that Russia would try to interfere with the election, likely in favor of Donald Trump; 62% of Republicans and 88% of Democrats said the chance of foreign interference occurring would be possible.

What surprised me the most was a report on Catholic voters. According to the Pew Research Center, Catholics in the United States are “split down the middle politically,” with 48% leaning toward Republican policy and 47% leaning toward the Democratic approach. Furthermore, Catholics are swing voters: in 2016, 52% of Catholics voted for Donald Trump versus 44% who voted for Hillary Clinton, while the percentage clocked in at 54% to 45% in favor of Barack Obama when he ran against John McCain in 2008.

Source: Pew Research Center

As someone who grew up in a Catholic household that often voted Republican, I can attest to how my parents’ beliefs shaped mine, at least until I arrived at an age where I decided to forge my fate. Once I became a registered voter, I turned my back on voting Republican vs. Democrat, opting to support independent candidates despite not knowing anything about politics. That was the case until this year when COVID-19 took the world by surprise and changed everything. Still, I carried on and shut politics out of my mind, once more focusing on my studies and learning more about the virus. Then a report came out exposing Donald Trump for covering up the seriousness of the pandemic. I had a feeling he was downplaying the issue to keep the American public from panicking. Still, the fact that he made moves to politicize the plague for his benefit infuriated me to no end. It was then that I knew that he had to go, for the sake of the nation’s health.

Conclusion

Data visualization can help voters make informed decisions on candidates and national issues and reflect trends in voting patterns. It is especially apparent in the partisan divide between Republican and Democrat voters today, though it doesn’t have to be that way. Regardless of your political beliefs, it is imperative to know that every voice counts, including yours. Now more than ever, the fate of the United States rests in her people’s hands. Whether you choose to go in-person to the polls or submit your ballot by mail…

Get out there and vote!

Categories
Data Sets Examples Infographics

Free Datasets: Comparing Numbers

October 1st, 2020 by Erika Kacprzykowski


Each day, data is being collected all across the world. There are hundreds upon thousands of different topics such as psychology, healthcare, and consumer analysis, many of which are free to access. This can be extremely useful for future projects or if you are just looking to expand your knowledge on the comparison of these numbers. Today, I have listed a couple great examples of free datasets that can be used for any project as well as some websites to access even more datasets.

  1. Mall Customers Dataset
Found on https://data-flair.training/blogs/machine-learning-datasets/

This free dataset focuses on the collection of people at the mall. This is not limited to one specific subject such as age but gender, annual income, spending score are all included. This would be a great dataset to use for the comparison of males and females. Potential projects are also listed here which can be extremely helpful for those wanting to create an infographic but maybe do not know where to start.

2. Heart Disease Dataset

Found on https://www.cdc.gov/nchs/fastats/heart-disease.htm

For those who are interested in anything related to health, the site listed above is a great resource for this topic. These include datasets relating to mortality, morbidity, and disease. All of which contain multiple pages of data collection.

3. Website: Piktochart Blog

Found from https://piktochart.com/blog/100-data-sets/

This is a great website that lists 50 different datasets that are mostly free to access. These categories include finance, internet, etc. (see picture above). With all of these options, you will be able to choose which one works best for you.

4. The World Bank

Found on https://data.worldbank.org/

With the ability to search any topic you want, the possibilities are endless. The site also contains other features such as a Microdata Library, the ability to open a Data Catalog, Finances, Projects & Operations and more. This is another great website if you are looking to find more free datasets but maybe don’t know where to start.

Conclusion

Overall, there are thousands of datasets online. Some being free, while others requiring a subscription. However, with these free resources there are plenty of options to choose from. The ability to compare numbers of a particular topic whether this is gender, age, ethnicity, or countries is very important and useful in society. The possibilities and projects available are endless.