In this tutorial I will show you how to create a histogram in R using ggplot2.
Explains the ggplot histogram syntax and shows a step-by-step example of creating a histogram in ggplot2.
If you need something specific, click on any of the links below.
- Introduction to histograms
As always, you will learn more if you thoroughly read the blog post from cover to cover.
A brief introduction to histograms
Let's take a quick look at what histograms are and how they are structured.
If you want to understand the syntax or see examples, you can go togrammatical partLubsample section.
Data distribution in a histogram diagram
Histograms are very important for data visualization, data mining and data analysis.
In fact, this is probably one of the 3 or 4 most important visualization techniques.
They are important because they help us visualize and study the distribution of data.
In particular, the histogram shows us the number of records for a certain range of variables.
The structure of the histogram
Their structure is as follows.
Typically, we map numeric variables to the x-axis. This is the variable we want to visualize so we can see how it is distributed.
This numeric variable is then divided into ranges, often called "ranges".
From there, we count the number of records for each bin and plot the number of records as bars. Therefore, each variable scope we analyze will have a container associated with it. The length of each bar represents the number of records.
When we plot all those columns together (again, one for each range), we get a histogram. Together, the collection of columns in a histogram shows us the shape of the data. They help us understand how data is distributed.
But, of course, we don't do it manually. As data scientists, we use a programming language like R to do all the calculations for us and plot the results.
Let's quickly discuss how to create a histogram in R.
How to make a histogram in R
There are actually many ways to create a histogramNormal.
You can create "old fashioned" histograms in R using "Base R". Specifically, you can create a histogram in R with the following command
This is the old way of doing things and I do not agree with it at all.
R's old plotting functions are poorly designed. They are difficult to use. They are difficult to modify. And the graphs they create are relatively ugly.
To create a histogram in R, use ggplot2
If you want to create histograms in R, IstronglyIt is recommended to use ggplot2 instead.
ggplot2 is a powerful plotting library that gives you precise control over the appearance and layout of your plots.
The syntax is easier to change and the default drawings are pretty good.
With that in mind, I'll show you how to create a ggplot histogram.
ggplot syntax histogram
Now let's look at the syntax for creating a histogram with ggplot2.
I will try to explain everything in detail, but if you are new to ggplot2, you can check ourggplot2 tutorial for beginners.
Let me quickly explain this syntax.
ggplot()The function simply starts a plot using the ggplot2 data visualization system.
It is used whenever a visualization is created using ggplot2. However, the exact details of everything else vary from visual to visual.
ggplot()function you will find
DaneThe argument allows you to specify the data frame that contains the variables to be plotted.
Note that ggplot2 is configured to visualize data in dataframes, so you must provide the name of the dataframe as an argument to this parameter.
For example, if you have
apartmentsyou would assume
dane = txhousing.
also in it
ggplot()function, you will find the right one
AES()Functions allow you to "map" variables to aesthetic properties of visuals. This may sound complicated, but it really boils down to associating the variables in the data frame with the axes and other chart properties.
if you want to see what
AES()function works and you should read ouraes() explanationFeatures from our ggplot2 tutorial.
AES()functions, you will see
xThe parameters allow us to specify a numeric variable to be mapped to the x-axis. This will be a numeric variable represented as a histogram.
For example, if you have a data frame named
the medianyou would assume
x = median.
Finally we have
This tells ggplot2 that we want to plot a histogram.
Remember: when we use ggplot2, we specify the data frame and variable mapping
but definetype of plotlike a histogramscatter diagram,bar chartWait... we have to specify "geome".
Geometry ultimately determines what kind of diagram we will create.
To create a histogram, we use
There are also some optional parameters that can be used to control the exact behavior of the histogram.
Let's look at them one by one.
colorparameter controlframe colorNumber of histogram bins.
Many people think that it affects the color of the interior, but this is wrong. Controls the border color. (I'll show examples in the examples section.)
Remember: R ima mnogoavailable colors. You can choose simple colors such as
bluebut there are many more interesting colors, e.g
from the marinesand more. Have fun and find something you like!
Also, if you provide an argument for this parameter, it must be represented as a string. For example, you can set
color = "red".
punaparameter controlinterior colorNumber of histogram bins.
Again, be careful. This one
punaparameter controls the internal color and
colorparameter controls the color of the border.
If you provide an argument to this parameter, it must be represented as a string. For example, you can set
fill = "red".
Also remember: R has manyavailable colors. You can choose simple colors such as
bluebut there are many more interesting colors.
wastebasketThe parameter controls the number of intervals plotted on the histogram.
By default it is set to
Tanks = 30.
However, you can increase or decrease the number of containers as needed.
Controlling the number of bins in a histogram is one way to change the way variables are analyzed. In general, reducing the number of containers smooths out data changes. Increasing the number of containers will reveal more details.
Which you choose (more detail or more "smoothness") depends on what you're looking for!
Example: plotting a histogram in R using ggplot2
OK Now that we understand the syntax, let's look at some examples of creating a histogram in R using ggplot2.
- Create a simple ggplot histogram
- change border color
- change the color of the basket
- change the number of histogram intervals
First run this code
Before we start, let's upload the file
orderly universePackage. please remember
orderly universethe package includes
we also check
apartments, which is the dataset we will use.
you can upload
orderly universeA package containing the following code:
#----------------# Load packet #----------------library(tidyverse)
check the data
Next, let's take a quick look at our dataset.
In the example below we will use
apartmentsA dataset containing housing data for various cities and years in Texas.
We can check this data frame with the command
txhousing %>% glance()
# Observations: 8602# Variables: 9# $ grad
"Abilene", "Abilene", "Abilene", "Abilene", "Abilene", "Abilene", "Abilene... # $ lat 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000,...#$ months 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1...# $ Quantity sold 72, 98, 130, 98, 141, 156, 152, 131, 104, 101, 100, 92, 75, 112, 118, 1...# $ Volume 5380000, 6505000, 9285000, 9730000, 10590000, 13910000, 12635000, 10710...#$ median 71400, 58700, 58100, 68600, 67300, 66900, 73500, 75000, 64500, 59300, 7...# $ description 701, 746, 784, 785, 794, 780, 742, 765, 771, 764, 721, 658, 779, 700, 7...# $ Warehouse 6.3, 6.6, 6.8, 6.9, 6.8, 6.6, 6.2, 6.4, 6.5, 6.6, 6.2, 5.7, 6.8, 6.0, 6...# $ data 2000.000, 2000.083, 2000.167, 2000.250, 2000.333, 2000.417, 2000.500, 2...
Example 1: Create a simple ggplot histogram
Let's start with a very simple histogram.
Here we will draw a histogram
the medianCurrency exchange.
ggplot(dane = txhousing, aes(x = mediana)) + geom_histogram()
It's very simple, but you need to understand it because it forms the basis for other examples.
Here we initiate the draw by calling
dane = txhousing. This indicates that we will plot the data
Next we have
AES()Function. Thanks to this, we can determine which variables represent which axes and which "aesthetics" of the action. Here we set
x = medianwhich means we want a plot
the medianfrom the x-axis.
Finally, in the second row we see
geometric histogram(). This means that we want to plot the variable as a histogram.
Example 2: Changing the border color
Now that we've created the simple histogram in Example 1, let's make some changes.
Here we will change the color of the basket border.
ggplot(data = txhousing, aes(x = median)) + geom_histogram(color = 'turquoise4')
It's quite simple.
The code is almost the same asExample 1.
The only difference is what we set
color = 'turquoise4'to replace
geometric histogram(). This changed the container border color to turquoise.
Example 3: Changing the color of the container
Then we will change the color of the basket itself. Inside the cuvette.
We'll use it for that
ggplot(data = txhousing, aes(x = mediana)) + geom_histogram(fill = 'czerwony')
Everything here is almost identical to our simple ggplot histogramExample 1.
The only big difference is what we set up
fill = "red". As you can see, this changed the color of the bucket to
Note that there is no visible border between the containers. This is probably fine, but you can also change the border color. You can use for that
colorparameters as shown in the figureExample 2.
Example 4: Changing the number of histogram ranges
Finally, let's modify the number of histogram ranges.
By default, ggplot2 produces a histogram with 30 bins. This is usually fine, but sometimes you want to increase or decrease the number of containers.
For that we can use
wastebasketrange. Here we reduce the number of containers to 10 containers:
ggplot(dane = txhousing, aes(x = mediana)) + geom_histogram(bins = 10)
It's very easy.
Here we have set up a histogram with 10 bins
containers = 10.
As you can see, by reducing the number of bins, we smooth out some of the variance in the data.
You can also try increasing the number of containers if needed. Try setting it to 60 or 70 and see what happens.
Remember that choosing the right number of containers is more of an art than a science. It really depends on what your goals are and what you're looking for in the data.
It's a good reminder that knowledge of syntax is not rigorous enough. you have to know howuseData visualization is fine!
Leave other questions in the comments below
Have questions about ggplot histograms? Want to know how to do something else that I haven't explained here?
If so, leave your questions in the comments section below.
Sign up to learn more about data analysis in R
This tutorial should give you a good idea of how to create histograms in R using ggplot2.
But there is still much to learn.
If you want to master data visualization in R, you need to learn a lot more about ggplot2.
If you want to learn more about data analysis, you need to know dplyr, tidyr, forecasts, etc.
However, if you really want to master data analysis and data visualization in R, I highly recommend signing up to our mailing list. At Sharp Sight, we regularly publish tutorials that explain how to do data analysis using R and Python.