[This article was originally published inr - Appsilon | Comprehensive solutions for data analysisand contribute generouslyBlogger R] (You can report problems with the content of this pageHere)
Want to share your content on R Bloggers?click hereif you have a blog orHereIf you don't.
Plot histograms with R and ggplot2
sincerely. How boring is your data visualization? Professional designers make graphic design look easy, but it really isn't. Fortunately, the R programming language offers countless ways to make your visuals more appealing.
Read more about our ggplot series:
In this article, you will learn how to create stunning histograms using Rggplot2
library. We'll start with a brief introduction and histogram theory if you're new to the topic. You will then learn how to create and modify ggplot histograms to take them to the next level.
Content:
- What is a histogram?
- Create your first ggplot histogram
- How to style and label a ggplot histogram
- Add text, title, subtitle, caption and axis labels to your ggplot histogram
- To summarize
What is a histogram?
A histogram is a way of graphically representing the distribution of data using columns of different heights. One bar (bin) represents a range of values, and the height of the bar represents the number of data points within that range. You can easily change the number of containers.
The easiest way to understand them is through visualization. The image below shows a histogram of 10,000 numbers drawn from a standard normal distribution (mean = 0, standard deviation = 1):
Figure 1 - Standard normal distribution histogram
Although the histogram does not look very impressive at first glance, it actually says a lot. When the data is normally distributed (bell curve), the following conclusions can be drawn:
- 68,26%Data points for standard deviations range from -1 to +1 (34.13% in both directions).
- 95,44%Data points for standard deviations range from -2 to +2 (47.72% in both directions).
- 99,72%Data points for standard deviations range from -3 to +3 (49.86% in both directions).
- Any value outside the range of -3 and +3 standard deviations is consideredexception.
In fact, you will rarely encounter a perfectly normal distribution. It is usually slanted in any direction or has multiple peaks. Keep this in mind when drawing conclusions based on the shape of the histogram alone.
Let's see how to visualize histograms using R and ggplot.
Create your first ggplot histogram
we will use each othermemory gap
datasets throughout the article for histogram visualization. This is a relatively small dataset showing life expectancy, population and GDP per capita for countries from 1952 to 2007. We will only use the subset showing European countries and discard everything else.
Here is the code needed to import the library, load and filter the dataset:
Here's what the first few lines look likeGM_I
looks like:
Figure 2 - European countries for the Gapminder dataset
We will visualizelife experience
Histogram column because it provides enough continuous data to work with.
Let's start by creating the most basic ggplot histogram. you can use itgeometric histogram()
function to do so. Assuming you've passed the dataset and the default aesthetic:
Figure 3 - Default histogram
Well, you won't see something like this on a website or in a magazine, so we better spruce up the keyboard a bit.
Let's start by changing the number of bins (lanes). The default value is 30 and is suitable in most cases. If you want your histogram to look likeboxerusing fewer containers. On the other hand, if you want the histogram to look like a density plot, scale it up by a degree. A histogram with 10 bins looks like this:
Figure 4 - Histogram with 10 bins
We'll stick with the default number of containers for the rest of this article because it looks a little better.
It hurts to see this color. There's nothing wrong with gray, but it looks boring. Here's how to adjust the ggplot histogram to get the Appsilon style - blue fill color and black border:
Figure 5 - Setting the fill and outline colors
Even better if you like the color blue. Next, let's take a closer look at styles and comments.
How to style and label a ggplot histogram
modeling
You can increase the dynamics of the ggplot histogram. For example, sometimes we want to add a vertical line representing the mean and two surrounding lines representing the standard deviation of the mean from -1 to +1. It is better to style the lines in a different way so that the histogram is not messed up.
The following code snippet draws a black line at the mean and dashed black lines at the -1 and +1 standard deviation markers:
Figure 6 - Adding vertical lines to the histogram
Are you up for the challenge?Try recreating our histogrampicture 1. Tip: use itgeom_segment()
to replacegeom_vline()
.
You often want to enrich the ggplot histogram by combining it with a density plot. It shows more or less the same information, only in the filesmootherFormat. Here's how to add a density map layer to your histogram:
Figure 7 - Adding a density plot to a histogram
It enables a richer display of data than the histogram itself. For example, if you want to embed the chart above in your dashboard, you can allow users to change the layer for maximum customization.
Want to create professional dashboards?Here's how to start your career as an R Shiny developer.
Note
Finally, let's look at how to label a ggplot histogram. Perhaps vertical lines seem too obtrusive and you need a plain text representation of a specific value.
First you need to create the filedata frame
for the note. It should contain the X and Y values and the labels that will be displayed:
Now you can turn them ongeometry text()
layer. Tip: Bold comments to make them easier to spot:
Figure 8 - Adding annotations to the histogram
The trick with comments is to leave a space between them so the text doesn't overlap.
Let's see how to remove this gray background color. The easiest way to do this is to add a more minimalist theme to your diagram. This oneclassic_theme()
is one of our top picks:
Picture 9 - Changing the theme
The only thing missing from our ggplot histogram is a title and axis labels. Without them, users don't know what they're looking at.
Add text, title, subtitle, caption and axis labels to your ggplot histogram
A title and axis labels are required for a production-ready chart. Descriptions are optional, but we'll show you how to add them. magic happens inlaboratory()
layer. You can use it to specify titles, subtitles, captions, x-axis and y-axis values:
Figure 10 - Adding titles, subtitles, descriptions and axis labels
This is a good start, but new additions are not uncommon. where you can change the font, color, size, etc.her()
layer. Just remember to include your custom theme layer, e.g.classic_theme()
before you write your style. Otherwise, they will be overwritten:
Figure 11 - Styling the title, subtitle and description
Now it's starting to take shape. It also matches the color palette of our ggplot histogram. We've covered everything you need to get started visualizing data distributions using histograms, so we'll stop there. But you can do more with visuals. Check out some of oursgreat demoFind out where advanced R development can take your data visualization.
Do you know of another way to visualize the data distribution? read oursThe Complete Guide to Boxplots.
To summarize
Today you learned what histograms are, why they are important in visualizing the distribution of continuous data, and how they can be used in R andggplot2
library. That's enough to get you on the right track, and now it's up to you to apply that knowledge to your dataset. We're sure you can handle it.
In Appsilon we used histograms andggplot2
R Shiny Corporate Dashboard Development Suite for Fortune 500 Companies If you have experience with R&R Shiny, we may already have a position for you.
Start your career at Appsilon -available positions.
articleHow to create great histograms in R: a complete guide to using ggplot2zApsilon | Comprehensive solutions for data analysis.
related
comeCommentAuthors can follow the link and leave a comment on their blog:r - Appsilon | Comprehensive solutions for data analysis.
R-bloggers.comrabatDaily email updatesoNormalRelated news and guideslearnand many other topics.Click here if you want to post or search for jobs related to R/data science.
Want to share your content on R Bloggers?click hereif you have a blog orHereIf you don't.