How to make a histogram in R using ggplot2 (2023)

In this tutorial I will show you how to create a histogram in R using ggplot2.

Explains the ggplot histogram syntax and shows a step-by-step example of creating a histogram in ggplot2.

If you need something specific, click on any of the links below.

Content:

  • Introduction to histograms
  • syntax
  • example

As always, you will learn more if you thoroughly read the blog post from cover to cover.

A brief introduction to histograms

Let's take a quick look at what histograms are and how they are structured.

If you want to understand the syntax or see examples, you can go togrammatical partLubsample section.

Data distribution in a histogram diagram

Histograms are very important for data visualization, data mining and data analysis.

In fact, this is probably one of the 3 or 4 most important visualization techniques.

They are important because they help us visualize and study the distribution of data.

In particular, the histogram shows us the number of records for a certain range of variables.

The structure of the histogram

Their structure is as follows.

Typically, we map numeric variables to the x-axis. This is the variable we want to visualize so we can see how it is distributed.

How to make a histogram in R using ggplot2 (1)

This numeric variable is then divided into ranges, often called "ranges".

From there, we count the number of records for each bin and plot the number of records as bars. Therefore, each variable scope we analyze will have a container associated with it. The length of each bar represents the number of records.

When we plot all those columns together (again, one for each range), we get a histogram. Together, the collection of columns in a histogram shows us the shape of the data. They help us understand how data is distributed.

But, of course, we don't do it manually. As data scientists, we use a programming language like R to do all the calculations for us and plot the results.

Let's quickly discuss how to create a histogram in R.

How to make a histogram in R

There are actually many ways to create a histogramNormal.

You can create "old fashioned" histograms in R using "Base R". Specifically, you can create a histogram in R with the following commandrecord history()Function.

This is the old way of doing things and I do not agree with it at all.

R's old plotting functions are poorly designed. They are difficult to use. They are difficult to modify. And the graphs they create are relatively ugly.

To create a histogram in R, use ggplot2

If you want to create histograms in R, IstronglyIt is recommended to use ggplot2 instead.

ggplot2 is a powerful plotting library that gives you precise control over the appearance and layout of your plots.

The syntax is easier to change and the default drawings are pretty good.

With that in mind, I'll show you how to create a ggplot histogram.

ggplot syntax histogram

Now let's look at the syntax for creating a histogram with ggplot2.

I will try to explain everything in detail, but if you are new to ggplot2, you can check ourggplot2 tutorial for beginners.

How to make a histogram in R using ggplot2 (2)

Let me quickly explain this syntax.

ggplot function

Tenggplot()The function simply starts a plot using the ggplot2 data visualization system.

It is used whenever a visualization is created using ggplot2. However, the exact details of everything else vary from visual to visual.

data parameter

interior -ggplot()function you will findDanerange.

TenDaneThe argument allows you to specify the data frame that contains the variables to be plotted.

Note that ggplot2 is configured to visualize data in dataframes, so you must provide the name of the dataframe as an argument to this parameter.

For example, if you haveapartmentsyou would assumedane = txhousing.

aes function

also in itggplot()function, you will find the right oneAES()Function.

TenAES()Functions allow you to "map" variables to aesthetic properties of visuals. This may sound complicated, but it really boils down to associating the variables in the data frame with the axes and other chart properties.

if you want to see whatAES()function works and you should read ouraes() explanationFeatures from our ggplot2 tutorial.

parameter x

interior -AES()functions, you will seexrange.

TenxThe parameters allow us to specify a numeric variable to be mapped to the x-axis. This will be a numeric variable represented as a histogram.

For example, if you have a data frame namedthe medianyou would assumex = median.

Histogram "geom"

Finally we havegeometric histogram().

This tells ggplot2 that we want to plot a histogram.

Remember: when we use ggplot2, we specify the data frame and variable mappingDaneparameter,AES()function etc.

but definetype of plotlike a histogramscatter diagram,bar chartWait... we have to specify "geome".

Geometry ultimately determines what kind of diagram we will create.

To create a histogram, we usegeometric histogram().

Additional parameters

There are also some optional parameters that can be used to control the exact behavior of the histogram.

How to make a histogram in R using ggplot2 (3)

Let's look at them one by one.

color

Tencolorparameter controlframe colorNumber of histogram bins.

be careful.

Many people think that it affects the color of the interior, but this is wrong. Controls the border color. (I'll show examples in the examples section.)

Remember: R ima mnogoavailable colors. You can choose simple colors such asRed,zelena, andbluebut there are many more interesting colors, e.gfrom the marinesand more. Have fun and find something you like!

Also, if you provide an argument for this parameter, it must be represented as a string. For example, you can setcolor = "red".

puna

Tenpunaparameter controlinterior colorNumber of histogram bins.

Again, be careful. This onepunaparameter controls the internal color andcolorparameter controls the color of the border.

If you provide an argument to this parameter, it must be represented as a string. For example, you can setfill = "red".

Also remember: R has manyavailable colors. You can choose simple colors such asRed,zelena, andbluebut there are many more interesting colors.

wastebasket

TenwastebasketThe parameter controls the number of intervals plotted on the histogram.

By default it is set toTanks = 30.

However, you can increase or decrease the number of containers as needed.

Controlling the number of bins in a histogram is one way to change the way variables are analyzed. In general, reducing the number of containers smooths out data changes. Increasing the number of containers will reveal more details.

Which you choose (more detail or more "smoothness") depends on what you're looking for!

Example: plotting a histogram in R using ggplot2

OK Now that we understand the syntax, let's look at some examples of creating a histogram in R using ggplot2.

example:

  • Create a simple ggplot histogram
  • change border color
  • change the color of the basket
  • change the number of histogram intervals

First run this code

Before we start, let's upload the fileorderly universePackage. please rememberorderly universethe package includesggplot2.

we also checkapartments, which is the dataset we will use.

Called Tidyverse

you can uploadorderly universeA package containing the following code:

#----------------# Load packet #----------------library(tidyverse)
check the data

Next, let's take a quick look at our dataset.

In the example below we will useapartmentsA dataset containing housing data for various cities and years in Texas.

We can check this data frame with the commandpeek()Function:

txhousing %>% glance()

get out:

# Observations: 8602# Variables: 9# $ grad"Abilene", "Abilene", "Abilene", "Abilene", "Abilene", "Abilene", "Abilene... # $ lat2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000,...#$ months1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1...# $ Quantity sold72, 98, 130, 98, 141, 156, 152, 131, 104, 101, 100, 92, 75, 112, 118, 1...# $ Volume5380000, 6505000, 9285000, 9730000, 10590000, 13910000, 12635000, 10710...#$ median71400, 58700, 58100, 68600, 67300, 66900, 73500, 75000, 64500, 59300, 7...# $ description701, 746, 784, 785, 794, 780, 742, 765, 771, 764, 721, 658, 779, 700, 7...# $ Warehouse6.3, 6.6, 6.8, 6.9, 6.8, 6.6, 6.2, 6.4, 6.5, 6.6, 6.2, 5.7, 6.8, 6.0, 6...# $ data2000.000, 2000.083, 2000.167, 2000.250, 2000.333, 2000.417, 2000.500, 2...

Example 1: Create a simple ggplot histogram

Let's start with a very simple histogram.

Here we will draw a histogramthe medianCurrency exchange.

ggplot(dane = txhousing, aes(x = mediana)) + geom_histogram()

get out:

How to make a histogram in R using ggplot2 (4)

explain

It's very simple, but you need to understand it because it forms the basis for other examples.

Here we initiate the draw by callingggplot().

interior -ggplot()function, setdane = txhousing. This indicates that we will plot the dataapartmentsdata frame.

Next we haveAES()Function. Thanks to this, we can determine which variables represent which axes and which "aesthetics" of the action. Here we setx = medianwhich means we want a plotthe medianfrom the x-axis.

Finally, in the second row we seegeometric histogram(). This means that we want to plot the variable as a histogram.

Example 2: Changing the border color

Now that we've created the simple histogram in Example 1, let's make some changes.

Here we will change the color of the basket border.

ggplot(data = txhousing, aes(x = median)) + geom_histogram(color = 'turquoise4')

get out:

How to make a histogram in R using ggplot2 (5)

explain

It's quite simple.

The code is almost the same asExample 1.

The only difference is what we setcolor = 'turquoise4'to replacegeometric histogram(). This changed the container border color to turquoise.

Example 3: Changing the color of the container

Then we will change the color of the basket itself. Inside the cuvette.

We'll use it for thatpunarange.

let's see:

ggplot(data = txhousing, aes(x = mediana)) + geom_histogram(fill = 'czerwony')

get out:

How to make a histogram in R using ggplot2 (6)

explain

Everything here is almost identical to our simple ggplot histogramExample 1.

The only big difference is what we set upfill = "red". As you can see, this changed the color of the bucket toRed.

Note that there is no visible border between the containers. This is probably fine, but you can also change the border color. You can use for thatcolorparameters as shown in the figureExample 2.

Example 4: Changing the number of histogram ranges

Finally, let's modify the number of histogram ranges.

By default, ggplot2 produces a histogram with 30 bins. This is usually fine, but sometimes you want to increase or decrease the number of containers.

For that we can usewastebasketrange. Here we reduce the number of containers to 10 containers:

ggplot(dane = txhousing, aes(x = mediana)) + geom_histogram(bins = 10)

get out:

How to make a histogram in R using ggplot2 (7)

explain

It's very easy.

Here we have set up a histogram with 10 binscontainers = 10.

As you can see, by reducing the number of bins, we smooth out some of the variance in the data.

You can also try increasing the number of containers if needed. Try setting it to 60 or 70 and see what happens.

Remember that choosing the right number of containers is more of an art than a science. It really depends on what your goals are and what you're looking for in the data.

It's a good reminder that knowledge of syntax is not rigorous enough. you have to know howuseData visualization is fine!

Leave other questions in the comments below

Have questions about ggplot histograms? Want to know how to do something else that I haven't explained here?

If so, leave your questions in the comments section below.

Sign up to learn more about data analysis in R

This tutorial should give you a good idea of ​​how to create histograms in R using ggplot2.

But there is still much to learn.

If you want to master data visualization in R, you need to learn a lot more about ggplot2.

If you want to learn more about data analysis, you need to know dplyr, tidyr, forecasts, etc.

However, if you really want to master data analysis and data visualization in R, I highly recommend signing up to our mailing list. At Sharp Sight, we regularly publish tutorials that explain how to do data analysis using R and Python.

References

Top Articles
Latest Posts
Article information

Author: Prof. Nancy Dach

Last Updated: 09/10/2023

Views: 6288

Rating: 4.7 / 5 (57 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Prof. Nancy Dach

Birthday: 1993-08-23

Address: 569 Waelchi Ports, South Blainebury, LA 11589

Phone: +9958996486049

Job: Sales Manager

Hobby: Web surfing, Scuba diving, Mountaineering, Writing, Sailing, Dance, Blacksmithing

Introduction: My name is Prof. Nancy Dach, I am a lively, joyous, courageous, lovely, tender, charming, open person who loves writing and wants to share my knowledge and understanding with you.