In the age of information, numerous types of data visualizations are designed to make it easier for the deliverers to transfer knowledge, patterns, and insights much easier to the receivers. Therefore, it is critical for the data wizards to know which visualization to choose while making an insightful dashboard. On the other hand, it is imperative for the viewers of the presentation to have at least an understanding of several types of graphs and plots being used regularly. In this article, I’ll explain some general rules that may help to better comprehend some common charts and graphs, e.g., stacked bar chart, pie chart, etc. by creating four distinct categories: distribution, comparison, composition, and correlation. It does not mean that it is an ultimate solution or a stiff boundary that limits us to only use a chart in a particular way. Instead, it is a conclusion drawn from my experience related to what kind of information a chart gives in a particular way.
The charts that come under distribution show where data points are concentrated and where they are bare in a single dimension. One of the advantages of using distribution charts is that it can be applied in market research, like in consumer segmentation and any other analysis which comes under demographics. Among all the types of distribution graphs, the most common ones are box plot, map, and histogram.
Maps are also frequently used by data analysts to show demographic data. By joining it to geospatial data, it indicates where your customers are and location in the world. The working behind map charts is that numeric values are grouped by a certain geographical attribute, like country, state, city, or even regions.
Whenever you try to analyze which number is smaller or larger when we display it on a table or a spreadsheet. However, combining a visual component to the difference and contrast significantly decreases the amount of time and mental energy required to describe the data. It can be accomplished through diverse types of comparison charts like line charts, bar charts or box plots.
Bar chart is one of the most widely used charts used for comparison of categorical data. As we can observe in the chart below, it is easy to comprehend or compare the dissimilar categories of data in a much quicker way. The bar chart is remarkably like a histogram. The fundamental difference between bar chart and histogram is that the x-axis of bar chart is categorical attribute and has numerical interval in the same axis of histogram. For example, in the chart below, we compare the number of profits generated from different markets like “EMEA, APAC, etc. But if we want to compare the profits in different ranges like 18 – 34, etc. then we will transform the same x-axis to different ranges / bin and call it a histogram.
Furthermore, you cannot limit the bar chart to just one categorical data. If you want to group two categorical attributes, then it is better to go for the customer bar chart. For instance, if we break down the bar chart related to profits in different regions, then we can further diversify it into different years. This strategy allows us to compare data between different markets and based on different periods of order time.
Line Chart is the type of graph which is used when you need to show trends of numerical data throughout a specific period. This chart is commonly used in time series analysis, by visualizing the changes of a single numerical variable. Each line is a comparison of two different time periods. Additionally, we can introduce different categorical variables and use distinct colors to represent them. For example, the chart below plots the different number of orders during a specific time, and assorted colors of line indicate the different customer categories.
A few days ago, I was having an argument with coworkers as they were arguing that box plots should be in the category of a ‘distribution’ chart, as it is used for visualizing distribution of data through percentile. Yes, it is true that one can easily understand the concentration of data at 25 percent, 50 percent, and 75 percent. Furthermore, decreasing the opacity of each data point will enable the viewer to have more direct visualization of the distribution of data.
But unfortunately, box plots are rarely used in the presentations. Instead, it is normally used for comparison of separate groups of data. For example, as far as ANOVA tests are concerned, it is a great tool because it shows variation across diverse groups. In the box plot below, it shows how much score the students achieved in critical reason, mathematics, and writing.
If you want to find a holistic view of your visualization, then graphs that fall under composition are your best choice. For example: stacked bar chart, pie chart, and area chart
are designed to show the part-to-whole variable’s relationships.
It is used to represent percentages of contribution of different categorical data within a large dataset.
Stacked Bar Chart
A bar which is used when we need to break down different primary categories into different smaller secondary categories is known as a stacked bar chart. If we analyze the chart below, it is like the bar chart that we have already discussed above. The horizontal axis compares the profit performances of different segments. On vertical axis, it further shows the composition of different customer segments within each segment to enable the viewer to understand the graph a bit more in depth.
Area charts are not only like line graphs, but it also the best method to show the changes in a variable during a specific amount of time, and contribution of secondary variables in a primary variable. If we analyze the area plot below, it is easier to understand that from 2012 till 2015, profit generated from orders increased from around 250k to just above 500k. Additionally, for this growth during the last 4 years, most of the orders were from the categories of furniture, office supplies, and technology, respectively.
Correlation charts are primarily used when you want to find whether there is a relation between one or more pairs of variables within the same context. Heatmaps and Scatter plot are great charts to depict correlation between variables.
The basic working of scatter plot is that it plots one numeric attribute on one axis and another numeric attribute on the other axis to create a correlation in between them. Normally, scatter plot is applied to identify regression type of relationships like logistic regression, linear regression, and others. To find out the strength of correlation in between attributes, then we must analyze if they converge at certain areas. If they are concentrated on certain areas, then it means the correlation is strong, else if they are sparse then it means that the relationship is weak.
Heatmaps are used as a visual representation of correlation matrix. It is an interesting technique to find correlated variables in principle component analysis (PCA). By using a slope color code, we can immediately visualize which characteristics-pairs are strongly associated. In the heatmap below, those attributes which are stronger are in darker shades of blue.
Although I have explained some of the most used graphs in this article, but always remember that I have only touched the beginning. The things that fascinate me about the world of data analytics are that you discover new and interesting things each day. So, if you want to dive deeper into that ‘iceberg’ then please have a look at this article by Vizme.