# How to make a histogram in R using ggplot2 (2023)

In this tutorial I will show you how to create a histogram in R using ggplot2.

Explains the ggplot histogram syntax and shows a step-by-step example of creating a histogram in ggplot2.

If you need something specific, click on any of the links below.

Content:

• Introduction to histograms
• syntax
• example

As always, you will learn more if you thoroughly read the blog post from cover to cover.

## A brief introduction to histograms

Let's take a quick look at what histograms are and how they are structured.

If you want to understand the syntax or see examples, you can go togrammatical partLubsample section.

### Data distribution in a histogram diagram

Histograms are very important for data visualization, data mining and data analysis.

In fact, this is probably one of the 3 or 4 most important visualization techniques.

They are important because they help us visualize and study the distribution of data.

In particular, the histogram shows us the number of records for a certain range of variables.

#### The structure of the histogram

Their structure is as follows.

Typically, we map numeric variables to the x-axis. This is the variable we want to visualize so we can see how it is distributed. This numeric variable is then divided into ranges, often called "ranges".

From there, we count the number of records for each bin and plot the number of records as bars. Therefore, each variable scope we analyze will have a container associated with it. The length of each bar represents the number of records.

When we plot all those columns together (again, one for each range), we get a histogram. Together, the collection of columns in a histogram shows us the shape of the data. They help us understand how data is distributed.

But, of course, we don't do it manually. As data scientists, we use a programming language like R to do all the calculations for us and plot the results.

Let's quickly discuss how to create a histogram in R.

### How to make a histogram in R

There are actually many ways to create a histogramNormal.

You can create "old fashioned" histograms in R using "Base R". Specifically, you can create a histogram in R with the following command`record history()`Function.

This is the old way of doing things and I do not agree with it at all.

R's old plotting functions are poorly designed. They are difficult to use. They are difficult to modify. And the graphs they create are relatively ugly.

### To create a histogram in R, use ggplot2

If you want to create histograms in R, IstronglyIt is recommended to use ggplot2 instead.

ggplot2 is a powerful plotting library that gives you precise control over the appearance and layout of your plots.

The syntax is easier to change and the default drawings are pretty good.

With that in mind, I'll show you how to create a ggplot histogram.

## ggplot syntax histogram

Now let's look at the syntax for creating a histogram with ggplot2.

I will try to explain everything in detail, but if you are new to ggplot2, you can check ourggplot2 tutorial for beginners. Let me quickly explain this syntax.

### ggplot function

Ten`ggplot()`The function simply starts a plot using the ggplot2 data visualization system.

It is used whenever a visualization is created using ggplot2. However, the exact details of everything else vary from visual to visual.

### data parameter

interior -`ggplot()`function you will find`Dane`range.

Ten`Dane`The argument allows you to specify the data frame that contains the variables to be plotted.

Note that ggplot2 is configured to visualize data in dataframes, so you must provide the name of the dataframe as an argument to this parameter.

For example, if you have`apartments`you would assume`dane = txhousing`.

### aes function

also in it`ggplot()`function, you will find the right one`AES()`Function.

Ten`AES()`Functions allow you to "map" variables to aesthetic properties of visuals. This may sound complicated, but it really boils down to associating the variables in the data frame with the axes and other chart properties.

if you want to see what`AES()`function works and you should read ouraes() explanationFeatures from our ggplot2 tutorial.

### parameter x

interior -`AES()`functions, you will see`x`range.

Ten`x`The parameters allow us to specify a numeric variable to be mapped to the x-axis. This will be a numeric variable represented as a histogram.

For example, if you have a data frame named`the median`you would assume`x = median`.

### Histogram "geom"

Finally we have`geometric histogram()`.

This tells ggplot2 that we want to plot a histogram.

Remember: when we use ggplot2, we specify the data frame and variable mapping`Dane`parameter,`AES()`function etc.

but definetype of plotlike a histogramscatter diagram,bar chartWait... we have to specify "geome".

Geometry ultimately determines what kind of diagram we will create.

To create a histogram, we use`geometric histogram()`.

There are also some optional parameters that can be used to control the exact behavior of the histogram. Let's look at them one by one.

#### color

Ten`color`parameter controlframe colorNumber of histogram bins.

be careful.

Many people think that it affects the color of the interior, but this is wrong. Controls the border color. (I'll show examples in the examples section.)

Remember: R ima mnogoavailable colors. You can choose simple colors such as`Red`,`zelena`, and`blue`but there are many more interesting colors, e.g`from the marines`and more. Have fun and find something you like!

Also, if you provide an argument for this parameter, it must be represented as a string. For example, you can set`color = "red"`.

#### puna

Ten`puna`parameter controlinterior colorNumber of histogram bins.

Again, be careful. This one`puna`parameter controls the internal color and`color`parameter controls the color of the border.

If you provide an argument to this parameter, it must be represented as a string. For example, you can set`fill = "red"`.

Also remember: R has manyavailable colors. You can choose simple colors such as`Red`,`zelena`, and`blue`but there are many more interesting colors.

Ten`wastebasket`The parameter controls the number of intervals plotted on the histogram.

By default it is set to`Tanks = 30`.

However, you can increase or decrease the number of containers as needed.

Controlling the number of bins in a histogram is one way to change the way variables are analyzed. In general, reducing the number of containers smooths out data changes. Increasing the number of containers will reveal more details.

Which you choose (more detail or more "smoothness") depends on what you're looking for!

## Example: plotting a histogram in R using ggplot2

OK Now that we understand the syntax, let's look at some examples of creating a histogram in R using ggplot2.

example:

• Create a simple ggplot histogram
• change border color
• change the color of the basket
• change the number of histogram intervals

#### First run this code

Before we start, let's upload the file`orderly universe`Package. please remember`orderly universe`the package includes`ggplot2`.

we also check`apartments`, which is the dataset we will use.

##### Called Tidyverse

you can upload`orderly universe`A package containing the following code:

`#----------------# Load packet #----------------library(tidyverse)`
##### check the data

Next, let's take a quick look at our dataset.

In the example below we will use`apartments`A dataset containing housing data for various cities and years in Texas.

We can check this data frame with the command`peek()`Function:

`txhousing %>% glance()`

get out:

`# Observations: 8602# Variables: 9# \$ grad"Abilene", "Abilene", "Abilene", "Abilene", "Abilene", "Abilene", "Abilene... # \$ lat2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000,...#\$ months1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1...# \$ Quantity sold72, 98, 130, 98, 141, 156, 152, 131, 104, 101, 100, 92, 75, 112, 118, 1...# \$ Volume5380000, 6505000, 9285000, 9730000, 10590000, 13910000, 12635000, 10710...#\$ median71400, 58700, 58100, 68600, 67300, 66900, 73500, 75000, 64500, 59300, 7...# \$ description701, 746, 784, 785, 794, 780, 742, 765, 771, 764, 721, 658, 779, 700, 7...# \$ Warehouse6.3, 6.6, 6.8, 6.9, 6.8, 6.6, 6.2, 6.4, 6.5, 6.6, 6.2, 5.7, 6.8, 6.0, 6...# \$ data2000.000, 2000.083, 2000.167, 2000.250, 2000.333, 2000.417, 2000.500, 2...`

### Example 1: Create a simple ggplot histogram

Here we will draw a histogram`the median`Currency exchange.

`ggplot(dane = txhousing, aes(x = mediana)) + geom_histogram()`

get out: ##### explain

It's very simple, but you need to understand it because it forms the basis for other examples.

Here we initiate the draw by calling`ggplot()`.

interior -`ggplot()`function, set`dane = txhousing`. This indicates that we will plot the data`apartments`data frame.

Next we have`AES()`Function. Thanks to this, we can determine which variables represent which axes and which "aesthetics" of the action. Here we set`x = median`which means we want a plot`the median`from the x-axis.

Finally, in the second row we see`geometric histogram()`. This means that we want to plot the variable as a histogram.

### Example 2: Changing the border color

Now that we've created the simple histogram in Example 1, let's make some changes.

Here we will change the color of the basket border.

`ggplot(data = txhousing, aes(x = median)) + geom_histogram(color = 'turquoise4')`

get out: ##### explain

It's quite simple.

The code is almost the same asExample 1.

The only difference is what we set`color = 'turquoise4'`to replace`geometric histogram()`. This changed the container border color to turquoise.

### Example 3: Changing the color of the container

Then we will change the color of the basket itself. Inside the cuvette.

We'll use it for that`puna`range.

let's see:

`ggplot(data = txhousing, aes(x = mediana)) + geom_histogram(fill = 'czerwony')`

get out: ##### explain

Everything here is almost identical to our simple ggplot histogramExample 1.

The only big difference is what we set up`fill = "red"`. As you can see, this changed the color of the bucket to`Red`.

Note that there is no visible border between the containers. This is probably fine, but you can also change the border color. You can use for that`color`parameters as shown in the figureExample 2.

### Example 4: Changing the number of histogram ranges

Finally, let's modify the number of histogram ranges.

By default, ggplot2 produces a histogram with 30 bins. This is usually fine, but sometimes you want to increase or decrease the number of containers.

For that we can use`wastebasket`range. Here we reduce the number of containers to 10 containers:

`ggplot(dane = txhousing, aes(x = mediana)) + geom_histogram(bins = 10)`

get out: ##### explain

It's very easy.

Here we have set up a histogram with 10 bins`containers = 10`.

As you can see, by reducing the number of bins, we smooth out some of the variance in the data.

You can also try increasing the number of containers if needed. Try setting it to 60 or 70 and see what happens.

Remember that choosing the right number of containers is more of an art than a science. It really depends on what your goals are and what you're looking for in the data.

It's a good reminder that knowledge of syntax is not rigorous enough. you have to know howuseData visualization is fine!

### Leave other questions in the comments below

Have questions about ggplot histograms? Want to know how to do something else that I haven't explained here?

This tutorial should give you a good idea of ​​how to create histograms in R using ggplot2.

But there is still much to learn.

If you want to master data visualization in R, you need to learn a lot more about ggplot2.

If you want to learn more about data analysis, you need to know dplyr, tidyr, forecasts, etc.

However, if you really want to master data analysis and data visualization in R, I highly recommend signing up to our mailing list. At Sharp Sight, we regularly publish tutorials that explain how to do data analysis using R and Python.

Top Articles
Latest Posts
Article information

Author: Prof. Nancy Dach

Last Updated: 09/10/2023

Views: 6288

Rating: 4.7 / 5 (57 voted)

Author information

Name: Prof. Nancy Dach

Birthday: 1993-08-23

Address: 569 Waelchi Ports, South Blainebury, LA 11589

Phone: +9958996486049

Job: Sales Manager

Hobby: Web surfing, Scuba diving, Mountaineering, Writing, Sailing, Dance, Blacksmithing

Introduction: My name is Prof. Nancy Dach, I am a lively, joyous, courageous, lovely, tender, charming, open person who loves writing and wants to share my knowledge and understanding with you.