A quick guide on nice heatmaps in R (2023)

I get a lot of questions from people who want to quickly visualize their data using heatmaps - preferably as soon as possible. This is a major problem with exploratory data analysis because we often don't have time to digest entire books on specific techniques in various software packages to get the job done. However, once we're happy with the initial results, it's worth exploring the topic to further customize our plot and even refine it for publication. In this post, my goal is to briefly introduce one of several R thermal libraries for simple data analysis. I chose R because it is one of the most popular free statistics packages. Of course, there are many more tools that can produce similar results (there are many different heatmap packages even in R), but next time I'll leave this as an open topic.

bok#

  • bok
  • Screenplay review
  • Run the script in R
  • More detailed script parameters
    • A) Install and load the required packages
    • B) Load the data and convert it to matrix format
    • C) Setting and plotting heat maps
    • Optional: Choose a custom palette and color interval
    • Optional: Save the heatmap as a PNG file
    • draw a heat map
    • Update 19 February 2014 - Grouping methods
    • Update March 2, 2014 - Measurement Classification

Note: I also have a short tutorial for those who prefer PythonHeatmaps, Hierarchical Clustering and Dendrograms in Python"

A quick guide on nice heatmaps in R (1)

The files I used can be downloaded from the GitHub repository:https://github.com/rasbt/R_snippets/tree/master/heatmaps

A quick guide on nice heatmaps in R (2)

After this paragraph, you'll see the whole thing, so you know what you're dealing with: an R script that uses Rdrawingpackage for creating heat maps viaheat map.2()Function. Given that we "just want" to create a simple heatmap, this may seem like a lot, but don't worry, many parameters are not necessary, the details will be covered in the next sections.

##################################################### # ## ########## A) Install and load the required packages##################################################### # ## #######I !Require„gplot”)) { installation package„gplot”, addiction = he says librarygraphic diagram }I !Require„RColorBrewer”)) { installation package„RColorBrewer”, addiction = he says libraryRColorBrewer }##################################################### # ## ########## B) Load the data and convert it to matrix format##################################################### # ## #######Dane <- read.csv„../datasets/heatmaps_in_r.csv”, comment. sign=„#”To do <- Dane[,1] # assign labels in column 1 "names"food information <- data matrixDane[,2:EasyDane)]) # Convert columns 2-5 to a matrixline namefood information <- To do # Specify the line name##################################################### # ## ########## C) Adjust and draw the heat map##################################################### # ## ######## Create your own color palette, from red to greenmy palette <- color gradient paletteC"Red", "yellow", "green"))(N = 299# (Optional) Manually define color breaks for "oblique" color transitionscol_breaks = Csequence-1,0,lenght=100), # Red sequence0,01,0,8,lenght=100), # yellow sequence0,81,1,lenght=100)) #green# create a 5x5 inch imagePNG„../images/heatmaps_in_r.png”, # create PNG for heatmap width = 5*300, # 5 x 300 pixels high = 5*300, save = 300, # 300 pixels in cal point size = 8 # smaller fontHeat map.2food information, mobile phone notes = food information, # Same data set for cell labels main = "Link", # heat map title Nokol="Black", # Change the font color of cell labels to black density information="Yes", # Disable density display in color legend a clue="Yes", # Disable trace lines on heat map Margie =C12,9), # increase the margin around the plot depression=my palette, # Use a predefined palette rest=col_breaks, # Enable color transitions within certain limits Wood="Red", # draw only the line dendrogram Korff="Ten" # disable column groupingclosed development() # Close the PNG device

(download script)

To run an R script, start a new R session by typing R in the shell terminal or start R from the Applications folder. You can now execute the script by typing the following command in R:

source"put/do/skripti/toplotne karte_u_R.R"

A) Install and load the required packages#

At first glance, this part may seem a little more complicated than it actually needs to be, because the executionlibrary (package name)If the required R packages are already installed, just load them.

I !Require„gplot”)) { installation package„gplot”, addiction = he says librarygraphic diagram }I !Require„RColorBrewer”)) { installation package„RColorBrewer”, addiction = he says libraryRColorBrewer }

B) Load the data and convert it to matrix format#

We can import data into R from many different data file formats, including ASCII text files, Excel spreadsheets, and more. For this tutorial, we assume that the data format is Comma Separated Values ​​(CSV); it is probably one of the most popular data file formats.

A quick guide on nice heatmaps in R (3)

When we open the CSV file in our favorite plain text editor instead of a spreadsheet program (Excel, Numbers, etc.), it looks like this:

#Example heatmap dataset,,,,#13/12/08 sir,,,,#,variable 1,variable 2,variable 3,Variable 4measurement 1,0,094,0,668,0,4153,0,4613measurement 2,0,1138,-0,3847,0,2671,0,1529Measurement 3,0,1893,0,3303,0,5821,0,2632Measure 4,-0,0102,-0,4259,-0,5967,0,18Measure 5,0,1587,0,2948,0,153,-0,2208Measure 6,-0,4558,0,2244,0,6619,0,0457Measure 7,-0,6241,-0,3119,0,3642,0,2003Measure 8,-0,227,0,499,0,3067,0,3289Measure 9,0,7365,-0,0872,-0,069,-0,4252mid 10,0,9761,0,4355,0,8663,0,8107

(download CSV file)

When we load data from a CSV file into R and assign it to a variableDane, note the two lines of comments before the main data in the CSV file, marked with the octothorpe sign (#). Since we don't need these lines to draw the heat map, we can ignore themcomment. signthe argumentread.csv()Function.

Dane <- read.csv„../datasets/heatmaps_in_r.csv”, comment. sign=„#”

one of the more difficult partsheat map.2()The functionality is that data in numeric matrix format is required for plotting. By default we read data from a file using Rlegible()Lubread.csv()The function is saved indata sheetFormat. Tenstickerformat differs fromdata sheetThe fact is the formatstickerOnly one type of data can be stored, such as numeric, string, or boolean data. Fortunately, we don't have to worry about rows containing column names (var1, var2, var3, var4) becauseread.csv()The function treats the first row of data asaddressby default. However, if we want to include row names (measure1, measure2, etc.) in the numeric matrix, we run into problems. For convenience, we store these row names as variables in the first columnTo dowhich we can later use for allocationrow nameTransformed matrix.

To do <- Dane[,1]

Now we convert the numeric data into a variableDane(columns 2 through 5) into the matrix and assign it to the new variablefood information

food information <- data matrixDane[,2:EasyDane)])

Instead of using a rather awkward phrasencol(days)]which returns the total number of columns in the data table, we can also directly specify the integer 5 to specify the last column we want to include. However,ncol(days)]More suitable for larger datasets because we don't need to calculate all the columns to get the index of the last column to determine the upper bound. Then we assign column names and save them asTo dobefore, after

line namefood information <- To do

C) Setting and plotting heat maps#

Finally, our data is in the "correct" format to create a heatmap, but before we get to that, let's take a quick look at some of the customization options.

Optional: Choose a custom palette and color interval#

instead of using the default colorheat map.2()function, I want to show you how to use itRColorBrewerA package for making your own pallets. Here we choose the most popular heatmap options: colors from green to yellow to red.

my palette <- color gradient paletteC"Red", "yellow", "green"))(N = 299

There are many ways to specify colors in R. I find it most convenient to specify colors by color name. For a detailed overview of the different color names in R, see:http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

argument(n = 299)Let's determine how many colors we want in the palette. Of course, the higher the number of individual colors, the smoother the transition, the number 299 should be large enough to achieve a smooth transition. By default, RColorBrewer divides colors evenly so that each color in the palette is a similarly sized void of individual colors. However, sometimes we want the color range to be slightly skewed based on the data we're analyzing. Suppose our example data set consists of Pearson correlation coefficients (or R-values) ranging from -1 to 1, and we are particularly interested in samples with (relatively) high correlations: R-values ​​ranging from 0.8 to 1 ,0. We want to highlight these patterns in the heatmap, showing only values ​​between 0.8 and 1 in green. In this case, we can use the following code to define "non-uniform" color divisions:

col_breaks = Csequence-1,0,lenght=100), # Redsequence0,0,8,lenght=100), # yellowsequence0,81,1,lenght=100)) #green

Optional: Save the heatmap as a PNG file#

R supports a wide variety of vector graphics formats, such as SVG, PostScript, and PDF, and raster graphics (bitmaps), such as JPEG, PNG, TIFF, BMP, and others. Each format has its advantages and disadvantages, depending on the specific purpose (web page, magazine article, PowerPoint presentation, archive...) we choose one file format over another. Rather than go into detail in this guide about when to use a particular file format, I'll be using the more popular PNG format in our heatmaps. I chose PNG over JPEG because PNG offers lossless compression (JPEG is a lossy image format) with a slightly larger file size for a low cost. However, it can be omitted.png()If you just want to display the heatmap on the interactive R screen, use the function in the script.

PNG„../images/heatmaps_in_r.png”, # create PNG for heatmapwidth = 5*300, # 5 x 300 pixelshigh = 5*300,save = 300, # 300 pixels in calpoint size = 8 # smaller font

set parameters.png()This feature creates relatively small PNG files at very low resolution, which is not practical for heatmaps. So, we give the image an additional parameterwidth,highand resolutions. is a unitwidthandhighThese are pixels, not inches. So if we want to create an image that is 5 x 5 inches and 300 pixels per inch, we need to do some calculations: [1500 pixels] / [300 pixels/inch] = 5 inches. We also choose a slightly smaller font size of 8 points.

Make sure you don't forget to close it.png()Draw the device by function at the end of the scriptdev.off()Otherwise, you may not be able to open the PNG file for viewing.

draw a heat map#

Now let's get to the point and seeheat map.2()Function:

Heat map.2food information, mobile phone notes = food information, # Same data set for cell labels main = "Link", # heat map title Nokol="Black", # Change the font color of cell labels to black density information="Yes", # Disable density display in color legend a clue="Yes", # Disable trace lines on heat map Margie =C12,9), # increase the margin around the plot depression=my palette, # Use a predefined palette rest=col_breaks, # Enable color transitions within certain limits Wood="Red", # draw only the line dendrogram Korff="Ten" # disable column grouping

Update 19 February 2014 - Grouping methods#

If we want to change the default clustering method (full linkage method with Euclidean distance measure), it can be done as follows: For square matrices, we can define distance and clustering according to the matrix data:

distance = distancefood information, method = "Manhattan"group = hackerdistance, method = "branch office"

Finally insert itheat map.2()Function

Heat map.2food information, ... Err = Woodgroup), # use the default grouping method Korff = Woodgroup)) # use the default grouping method

Update March 2, 2014 - Measurement Classification#

I was just asked how to classify input variables by applying row or column labels. For example, if we want to classify a "measurement" variable into 3 different categories: Measurements 1-3 = Category 1 Measurements 4-6 = Category 2 Measurements 7-10 = Category 3. My solution was simply to provideline border coloras an additional parameterheat map.2()Function. For example. ,

Heat map.2food information, ... line border color = C # Group row variables into different ones present"siva", 3), # of categories, medium 1-3: green present"blue", 3), # middle 4-6: blue present"Black", 4)), # middle 7-10: red ...

Note that we can also provide similar labels for the according column variablescolumn border colordiscussion. Another useful addition is the addition of colored legends to our new category labels. The code for this example is:

landmarklend = 1 # end of square color legend linelegend"Up right", # position of the legend on the heatmap legend = C"Category 1", "Category 2", "Category 3"), # category tag depression = C"siva", "blue", "Black"), # color key a limited liability company= 1, # line style left drive = 10 # line width

The image below shows what the modified heatmap looks like with row classification and color legend applied:

A quick guide on nice heatmaps in R (4)

The full (working) script can be found at:https://github.com/rasbt/R_snippets/tree/master/heatmaps


If you liked this article, you can find meTwitterandLinkedInHere I share more content related to machine learning and artificial intelligence.
If you're looking for a way to support me and my work, please consider making a purchaseone of my booksor subscribe to my paid version for freeartificial intelligence communication. If you find it valuable, please share it and recommend it to others, I will be very grateful.A quick guide on nice heatmaps in R (5) A quick guide on nice heatmaps in R (6)

References

Top Articles
Latest Posts
Article information

Author: Fredrick Kertzmann

Last Updated: 22/08/2023

Views: 6306

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Fredrick Kertzmann

Birthday: 2000-04-29

Address: Apt. 203 613 Huels Gateway, Ralphtown, LA 40204

Phone: +2135150832870

Job: Regional Design Producer

Hobby: Nordic skating, Lacemaking, Mountain biking, Rowing, Gardening, Water sports, role-playing games

Introduction: My name is Fredrick Kertzmann, I am a gleaming, encouraging, inexpensive, thankful, tender, quaint, precious person who loves writing and wants to share my knowledge and understanding with you.