How does R handle missing values? (2023)

Version information:The code of this page was tested in R under development (unstable) (2012-02-22 r58461) on: 2012-03-28 using: knitr 0.4

Like other statistical packages, R can handle missing values. However, the way R handles missing values ​​may require a change of heart for those used to working with missing values ​​in other packages. On this page, we first cover the basics of representing missing values ​​in R. Then, for those coming from SAS, SPSS, and/or Stata, we outline some of the differences between missing values ​​in R and missing values ​​elsewhere. Finally, we'll cover some tools for handling missing values ​​in R, including data management and analysis.

Very simple

Missing data in R are shown as NA. NA is not a string or number, but a missing pointer. We can create vectors with missing values.

x1 <- C1, 4, 3, Do, 7x2 <- C"A", "B", Do, "Ten"

NA is one of the few non-numbers we can includex1No errors are generated (other exceptions are letters representing numbers or numerical concepts such as infinity). existx2, the third value is missing, and the fourth value is the string "NA". To see which values ​​in each R vector are identified as missing, we can usetak.naFunction. It will return a TRUE/FALSE vector containing all elements of the vector we provided.

tak.nax1## [1] false false false true falsetak.nax2## [1] false false true false

We see that R distinguishes between NA and "NA"x2–NA is considered a missing value, "NA" is not.

Differences from other packages

    • NA cannot be used for comparison: In other packages, "missing" values ​​are assigned an extreme numerical value - very high or very low. Thus, values ​​coded as missing can 1) be compared to other values ​​and 2) other values ​​can be compared to missing. In the SAS code example below, we compare the values ​​in y with 0 and the missing sign and find that both comparisons are valid (and the missing sign is less than zero).
test data; input xy; data line; 2.3 45 16 0; test data; test set; low = (y < 0); miss = (y = .); jogging; proc print data = test; jogging;Watch x y Miss Lowy 1 2 . 1 1 2 3 4 0 0 3 5 1 0 0 4 6 0 0 0

We can try an equivalent method in R.

x1 < 0## [1] FALSE FALSE FALSE DO FALSEx1==Do## [1] Look Look Look

Our missing values ​​cannot be compared to 0, nor can any of our values ​​be compared to NA, because NA has no value assigned - it either does or it doesn't.

    • NA for various missing data: In other packages, missing strings and missing numbers may be represented differently - empty quotes for strings and periods for numbers. U R NA represents all types of missing data. we are inx1andx2.x1is the object "number" andx2is a "Char" object.
    • Values ​​other than NA cannot be interpreted as missing: Other packages allow values ​​to be marked as "missing in system" so that they will be interpreted as missing in the analysis. In R you must explicitly change these values ​​to NA. This onetak.naYou can also use this function to make such changes:
tak.nax1 <- Whox1==7x1## [1] 1 4 3 Not applicable Not applicable

NA option in R

we introducedtak.naAs a tool for finding and creating missing values. This is one of several features built around NA. Most of the other NA features are the following optionsNA action.

Just as functions have defaults, R as software has similar low-level defaults. You can see these current settings using the commandoptions(). One of them is "na.action", which describes how to handle missing values. Possible settings for na.action in R include:

  • is omittedandNot applicable: Returns an object with lowercase and lowercase numbers removed if lowercase and lowercase letters contain missing values; the difference between skipping and excluding NA can be seen in some prediction functions and residual functions
  • already: return the object unchanged
  • fail: returns an object only if it contains no missing values

To see the current action in the options, usegetOption("on.action")We can create a data frame with missing values ​​and see how the above can be used to solve this.

G <- as.a.data.framestickerC1:5, Do, Easy = 2## V1 V2## 1 1 4## 2 2 5## 3 3 N/Ais omittedG## V1 V2## 1 1 4## 2 2 5Not applicableG## V1 V2## 1 1 4## 2 2 5failG## Error in na.fail.default(g): no value in objectalreadyG## V1 V2## 1 1 4## 2 2 5## 3 3 N/A

missing values ​​in the analysis

In some R functions, one of the arguments that the user can provide is:NA action. For example, if you look at helplumenNaredba, seeNA actionis one of the specified parameters. It will be used by defaultNA actionspecified in option R. If you want to use anotherNA actionFor regression you can use the methodlumenOrder.

two common optionslumenis the default value,is omittedandNot applicableIt does not use missing values, but retains the positions of residuals and fitted values.

## Use known Ancombe data and set some to NAAnscombe <- wAnscombe, { y1[1:3] <- Do}Anscombe # of views## x1 x2 x3 x4 y1 y2 y3 y4 ## 1 10 10 10 8 8,14 7,46 6,58 ## 2 8 8 8 8 8,14 6,77 5,76# 3 13 13 8 8,74 12,74 7,71 ## 4 9 9 9 8 8,8 1 8,77 7,11 8,84# # 5 11 11 11 8 8,33 9,26 7,81 8,47## 6 14 14 14 8 9,96 8,10 8,84 7,04## 7 6 6 6 8 7,24 6,13 6,0 8 5,25 ## 8 4 4 4 19 4,26 3 .10 5,39 12,50## 9 12 12 12 8 10,84 9,13 8,15 5,56## 10 7 7 7 8 4,82 7,26 6,42 7,91## 11 5 5 5 8 5, 68 4.74 5.73 6.89model. skipped <- lumeny2  y1, Dane = Anscombe, NA action = is omittedmodel.exc <- lumeny2  y1, Dane = Anscombe, NA action = Not applicable## Compare the impact on the residualsprecipitatemodel. skipped## 4 5 6 7 8 9 10 11 ## 0,727 1,575 -0,799 -0,743 -1,553 -0,425 2,190 -0,971precipitatemodel.exc## 1 2 3 4 5 6 7 8 9 10 ## ne ne ne 0.727 1.575 -0.799 -0.743 -1.553 -0.425 2.190 ## 11 ## -0.971## Compare performance to adjusted (predicted) valuesinstalledmodel. skipped## 4 5 6 7 8 9 10 11 ## 8,04 7,69 8,90 6,87 4,65 9,55 5,07 5,71installedmodel.exc## 1 2 3 4 5 6 7 8 9 10 11 ## no no no no 8.04 7.69 8.90 6.87 4.65 9.55 5.07 5.71

useNot applicableFill in the residuals and corresponding values ​​asDOwhere values ​​are missing. Other functions are not usedNA action, but has a different parameter (with some default values) to handle missing values. For example,the meaning isBy default, the command will return NA if there is a NA in the passed object.

the meaning isx1## [1] do

If you want to calculate the average of the non-missing values ​​in the passed object, you can do so in thefrom rmparameter (set to FALSE by default).

the meaning isx1, from rm = he says## [1] 2,67

Two common commands used in data management and research aresummarizeandtable. Tensummarizecommand (when used with numeric vectors) returns the number of NAs in the vector buttableThe command ignores NA by default.

summarizex1## Min Q1 Median Average Q3 Max NA ## 1.00 2.00 3.00 2.67 3.50 4.00 2tablex1## x1## 1 3 4 ## 1 1 1

See NA amongtableoutput where you can check "if" or "always"Ustadiscussion. The first will show NA in the output only when there is no data in the object. Others will contain NA in the output anyway.

tablex1, Usta = "mali"## x1## 1 3 4 ## 1 1 1 2table1:3, Usta = "Constantly"## ## 1 2 3 ## 1 1 1 0

Imputing data with missing values ​​in R is again different from other packages because NA cannot be compared to other values. By default sorting removes all NA values, so the length of the vector can be changed.

x1s <- tipx1## [1] 1 3 4lenghtx1s## [1] 3

The user can specify whether NA should be last or first in the sort order by specifying TRUE or FALSE for NAfinally Noradiscussion.

tipx1, finally Nora = he says## [1] 1 3 4 Not applicable Not applicable

Regardless of the purpose of your R code, it's a good idea to check for missing values ​​in your data and use help files for any functions you use. You should understand and familiarize yourself with the default missing value handler or specify a missing value handler for the values ​​you want to analyze.

References

Top Articles
Latest Posts
Article information

Author: Manual Maggio

Last Updated: 13/10/2023

Views: 6312

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Manual Maggio

Birthday: 1998-01-20

Address: 359 Kelvin Stream, Lake Eldonview, MT 33517-1242

Phone: +577037762465

Job: Product Hospitality Supervisor

Hobby: Gardening, Web surfing, Video gaming, Amateur radio, Flag Football, Reading, Table tennis

Introduction: My name is Manual Maggio, I am a thankful, tender, adventurous, delightful, fantastic, proud, graceful person who loves writing and wants to share my knowledge and understanding with you.