How to create great histograms in R: a complete guide to using ggplot2 (2023)

[This article was originally published inr - Appsilon | Comprehensive solutions for data analysisand contribute generouslyBlogger R] (You can report problems with the content of this pageHere

Want to share your content on R Bloggers?click hereif you have a blog orHereIf you don't.

How to create great histograms in R: a complete guide to using ggplot2 (1)

Plot histograms with R and ggplot2

sincerely. How boring is your data visualization? Professional designers make graphic design look easy, but it really isn't. Fortunately, the R programming language offers countless ways to make your visuals more appealing.

Read more about our ggplot series:

In this article, you will learn how to create stunning histograms using Rggplot2library. We'll start with a brief introduction and histogram theory if you're new to the topic. You will then learn how to create and modify ggplot histograms to take them to the next level.

Content:

What is a histogram?

A histogram is a way of graphically representing the distribution of data using columns of different heights. One bar (bin) represents a range of values, and the height of the bar represents the number of data points within that range. You can easily change the number of containers.

The easiest way to understand them is through visualization. The image below shows a histogram of 10,000 numbers drawn from a standard normal distribution (mean = 0, standard deviation = 1):

How to create great histograms in R: a complete guide to using ggplot2 (2)

Figure 1 - Standard normal distribution histogram

Although the histogram does not look very impressive at first glance, it actually says a lot. When the data is normally distributed (bell curve), the following conclusions can be drawn:

  • 68,26%Data points for standard deviations range from -1 to +1 (34.13% in both directions).
  • 95,44%Data points for standard deviations range from -2 to +2 (47.72% in both directions).
  • 99,72%Data points for standard deviations range from -3 to +3 (49.86% in both directions).
  • Any value outside the range of -3 and +3 standard deviations is consideredexception.

In fact, you will rarely encounter a perfectly normal distribution. It is usually slanted in any direction or has multiple peaks. Keep this in mind when drawing conclusions based on the shape of the histogram alone.

Let's see how to visualize histograms using R and ggplot.

Create your first ggplot histogram

we will use each othermemory gapdatasets throughout the article for histogram visualization. This is a relatively small dataset showing life expectancy, population and GDP per capita for countries from 1952 to 2007. We will only use the subset showing European countries and discard everything else.

Here is the code needed to import the library, load and filter the dataset:

Here's what the first few lines look likeGM_Ilooks like:

We will visualizelife experienceHistogram column because it provides enough continuous data to work with.

Let's start by creating the most basic ggplot histogram. you can use itgeometric histogram()function to do so. Assuming you've passed the dataset and the default aesthetic:

How to create great histograms in R: a complete guide to using ggplot2 (4)

Figure 3 - Default histogram

Well, you won't see something like this on a website or in a magazine, so we better spruce up the keyboard a bit.

Let's start by changing the number of bins (lanes). The default value is 30 and is suitable in most cases. If you want your histogram to look likeboxerusing fewer containers. On the other hand, if you want the histogram to look like a density plot, scale it up by a degree. A histogram with 10 bins looks like this:

How to create great histograms in R: a complete guide to using ggplot2 (5)

Figure 4 - Histogram with 10 bins

We'll stick with the default number of containers for the rest of this article because it looks a little better.

It hurts to see this color. There's nothing wrong with gray, but it looks boring. Here's how to adjust the ggplot histogram to get the Appsilon style - blue fill color and black border:

How to create great histograms in R: a complete guide to using ggplot2 (6)

Figure 5 - Setting the fill and outline colors

Even better if you like the color blue. Next, let's take a closer look at styles and comments.

How to style and label a ggplot histogram

modeling

You can increase the dynamics of the ggplot histogram. For example, sometimes we want to add a vertical line representing the mean and two surrounding lines representing the standard deviation of the mean from -1 to +1. It is better to style the lines in a different way so that the histogram is not messed up.

The following code snippet draws a black line at the mean and dashed black lines at the -1 and +1 standard deviation markers:

How to create great histograms in R: a complete guide to using ggplot2 (7)

Figure 6 - Adding vertical lines to the histogram

Are you up for the challenge?Try recreating our histogrampicture 1. Tip: use itgeom_segment()to replacegeom_vline().

You often want to enrich the ggplot histogram by combining it with a density plot. It shows more or less the same information, only in the filesmootherFormat. Here's how to add a density map layer to your histogram:

How to create great histograms in R: a complete guide to using ggplot2 (8)

Figure 7 - Adding a density plot to a histogram

It enables a richer display of data than the histogram itself. For example, if you want to embed the chart above in your dashboard, you can allow users to change the layer for maximum customization.

Want to create professional dashboards?Here's how to start your career as an R Shiny developer.

Note

Finally, let's look at how to label a ggplot histogram. Perhaps vertical lines seem too obtrusive and you need a plain text representation of a specific value.

First you need to create the filedata framefor the note. It should contain the X and Y values ​​and the labels that will be displayed:

Now you can turn them ongeometry text()layer. Tip: Bold comments to make them easier to spot:

How to create great histograms in R: a complete guide to using ggplot2 (9)

Figure 8 - Adding annotations to the histogram

The trick with comments is to leave a space between them so the text doesn't overlap.

Let's see how to remove this gray background color. The easiest way to do this is to add a more minimalist theme to your diagram. This oneclassic_theme()is one of our top picks:

How to create great histograms in R: a complete guide to using ggplot2 (10)

Picture 9 - Changing the theme

The only thing missing from our ggplot histogram is a title and axis labels. Without them, users don't know what they're looking at.

Add text, title, subtitle, caption and axis labels to your ggplot histogram

A title and axis labels are required for a production-ready chart. Descriptions are optional, but we'll show you how to add them. magic happens inlaboratory()layer. You can use it to specify titles, subtitles, captions, x-axis and y-axis values:

How to create great histograms in R: a complete guide to using ggplot2 (11)

Figure 10 - Adding titles, subtitles, descriptions and axis labels

This is a good start, but new additions are not uncommon. where you can change the font, color, size, etc.her()layer. Just remember to include your custom theme layer, e.g.classic_theme()before you write your style. Otherwise, they will be overwritten:

How to create great histograms in R: a complete guide to using ggplot2 (12)

Figure 11 - Styling the title, subtitle and description

Now it's starting to take shape. It also matches the color palette of our ggplot histogram. We've covered everything you need to get started visualizing data distributions using histograms, so we'll stop there. But you can do more with visuals. Check out some of oursgreat demoFind out where advanced R development can take your data visualization.

Do you know of another way to visualize the data distribution? read oursThe Complete Guide to Boxplots.

To summarize

Today you learned what histograms are, why they are important in visualizing the distribution of continuous data, and how they can be used in R andggplot2library. That's enough to get you on the right track, and now it's up to you to apply that knowledge to your dataset. We're sure you can handle it.

In Appsilon we used histograms andggplot2R Shiny Corporate Dashboard Development Suite for Fortune 500 Companies If you have experience with R&R Shiny, we may already have a position for you.

Start your career at Appsilon -available positions.

articleHow to create great histograms in R: a complete guide to using ggplot2zApsilon | Comprehensive solutions for data analysis.

related

comeCommentAuthors can follow the link and leave a comment on their blog:r - Appsilon | Comprehensive solutions for data analysis.

R-bloggers.comrabatDaily email updatesoNormalRelated news and guideslearnand many other topics.Click here if you want to post or search for jobs related to R/data science.

Want to share your content on R Bloggers?click hereif you have a blog orHereIf you don't.

References

Top Articles
Latest Posts
Article information

Author: Edwin Metz

Last Updated: 19/06/2023

Views: 6290

Rating: 4.8 / 5 (58 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Edwin Metz

Birthday: 1997-04-16

Address: 51593 Leanne Light, Kuphalmouth, DE 50012-5183

Phone: +639107620957

Job: Corporate Banking Technician

Hobby: Reading, scrapbook, role-playing games, Fishing, Fishing, Scuba diving, Beekeeping

Introduction: My name is Edwin Metz, I am a fair, energetic, helpful, brave, outstanding, nice, helpful person who loves writing and wants to share my knowledge and understanding with you.