R null values: NULL, NA, NaN, Inf (2023)

[This article was originally published inR – TomazTsqland contribute generouslyBlogger R] (You can report problems with the content of this pageHere

Want to share your content on R Bloggers?click hereif you have a blog orHereIf you don't.

R handles various null values ​​and it is relatively important to understand how these values ​​behave during data preprocessing and modification.

In general, R supports:

  • Annul
  • Do
  • Jug
  • God rewards those who work diligently./-God rewards those who work diligently

Annulis the object that is returned when an expression or function returns an undefined value. In the R language, NULL (uppercase) is a reserved word and can also result from importing data of an unknown data type.

Dois a logical constant of length 1 and is an indicator of missing values. NA (uppercase) is a reserved word and can be converted to any other vector data type (except primitive vectors) or to a product when importing data. NA and "NA" (as a string) are not interchangeable. NA means unavailable.

JugRepresents Not-a-Number, a logical vector of length 1, applicable to numeric, real, and imaginary parts of complex values, but not to integer vector values. NaN is a reserved word.

informationand- InformationIt represents infinity (or negative infinity) and is the result of storing large numbers or dividing by zero. Inf is a reserved word and in most cases is an artifact of R computations and therefore rarely a data import. Infinite also tells you that the value is not missing and is a number!

All four types of null/missing data are accompanied by logistic functions available in R. They return TRUE/FALSE for each specific function: is.null(), is.na(), is.nan(), is.infinite() .

Just use the following code to get an overview of all values:

#Read the documentation for all data types: ?NULL?NA?NaN?Inf# Fill the variable a <- "NA"b <- "NULL"c <- NULLd <- NAe <- NaNf <- Inf### Check if variable the same? identical(a,d)# [1] FALSE# NA and NaN are not identical identical(d,e)# [1] FALSE###check data type length length(c)# [1 ] 0 length(d) # [ 1] 1 length(e)# [1] 1 length(f)# [1] 1### check datatype str(c); (c) class; #NULL#[1] "NULL" str (d) ;(d) class; #logs NA#[1] "logical" str(s); (e) class; #num NaN#[1] "numeric" str(f); class(f); # num Inf# [1] "Numbers"

Get data from R

Nullable data types may behave differently when propagated to eg list, vector or data.frame types.

We can test this and observe the behavior by creating a NULL, NA or NaN vector and data frame:

#NULL, NA i NaN v1 <- c(NULL, NULL, NULL)v2 <- NULL str(v1); pusty vector klasy(v1); tryb(v1) str(v2); class(v2); tryb (v2) v3 <- c(NA, NA, NA) v4 <- NAstr(v3); class(v3); tryb(v3) str(v4); class(v4); tryb(v4) v5 <- c( NaN, NaN , NaN)v6 <- NaNstr(v5); class(v5); tryb(v5) str(v6); class(v6); tryb(v6)

Of course, no matter how many elements a NULL vector may contain, it will always be an empty vector. For NA and NaN, it will be the length of the element it stores, with a slight difference, NA will be a logical class vector and NaN will be a numeric class vector.

When combined with mathematical operations, NULL vectors do not change size, but they do change type:

#Operation on NULL Vectorv1 <- c(NULL, NULL, NULL)str(v1)# NULLv1 <- v1+1str(v1)# num(0)

This only changes the class, but not the length, and there will still be no data left in the vector.

This is relatively the same behavior for DataFrames.

#data.framef1 <- data.frame(v1=NA,v2=NA, v3=NA)df2 <- data.frame(v1=NULL, v2=NULL, v3=NULL)df3 <- data.frame(v1 ) .. . ). =NaN, v2=NaN, V3=NaN)str(df1); str ( df2 ) s ( df3 )

A data frame with NULL values ​​in each column will appear as a data frame with 0 cases and 0 variables (0 columns and 0 rows). The NA and NaN data frame will contain 1 case and 3 boolean and numeric data type variables.

Different behavior when handling NULL, NA or NaN when adding new observations to the data frame.

Add to data frame "NA":

# Add a new row to the existing dataframedf1 <- rbind(df1, data.frame(v1=1, v2=2,v3=3))#explore data.framedf1

Of course, new rows are added, and adding new rows (vectors) of different sizes will generate an error because the data frame definition contains dimensions. The same behavior is expected for NaN values. On the other hand, using NULL gives different results:

#df2 will get the dimension definition df2 <- rbind(df2, data.frame(v1=1, v2=2)) #this will generate an error because the dimension definition is already set df2 <- rbind(df2, data.frame (v1= 1) ), v2=NULL))# and NA should be punished f2 <- rbind(df2, data.frame(v1=1, v2=NA))

On first assignment, df2 will receive dimension definitions, even though the first construct df2 is a nullable three-element vector.

Nullability also occurs when we search for non-existent vector elements due to out-of-range:

l <- lista(a=1:10, b=c("a","b","c"), c=seq(0,10,0.5))l$a# [1] 1 2 3 4 5 6 7 8 9 10l$c# [1] 0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5 4,0 4,5 5,0 5,5 6,0 6 ,5 7,0 7,5 8,0 8,5 9,0 9,5 10,0l$r# NULL

We call a sublist r of list l that is null, but instead of being missing or not present, it is null which is actually quite contradictory because the definition is not set. A vector call returns different results (not available):

v <- c(1:3)v[4]#[1] n/a

For NA and NULL data types, bounds are defined differently in lists and vectors.

Retrieving data from SQL Server

I will be using several different types of data that come from the following SQL tables.

Use AzureLanding; GOCREATE TABLE R_Nullables (ID INT IDENTITY(1,1) NOT NULL ,num1 FLOAT ,num2 DECIMAL(20,10) ,num3 INT ,tex1 NVARCHAR(MAX) ,tex2 VARCHAR(100) ,bin1 VARBINARY(MAX) ) INSERT IN R_Nullables Select 1.22, 21.535, 245, "This is text Nvarchar", "Text Varchar", 0x0342342Union All Select 3.4534, 25.5, 45, "This is another text Nvarchar", "Text Varchar2", 0x03423e3434 TGUNION ALL Select empty, null , null, null, null, null, null, null, null , NULL, NULL, NULLUNION SELECT ALL 0, 0, 0, '','',0x

Import the data into the R environment using the RODBC R library:

库(RODBC)SQLcon <- odbcDriverConnect('driver={SQL Server};server=TOMAZK\\MSSQLSERVER2017;database=AzureLanding;trusted_connection=true')# df <- sqlQuery(SQLcon, "WYBIERZ * Z R_Nullables")df < - sqlQuery(SQLcon, "从R_Nullables 选择ID,num1,num2,num3,tex1,tex2")close(SQLcon)

When executing a SELECT * varbinary query, the data type in SQL Server is represented as a 2GiB binary object in R, and you will most likely get an error because R will not be able to allocate memory:

R null values: NULL, NA, NaN, Inf (1)

After changing the columns, a df object is created. The demo is simple, but a bit complicated:

ID num1 num2 num3 tex1 tex21 1 1.2200 21.535 245 This is text Nvarchar Varchar text2 2 3.4534 25.500 45 This is another text Nvarchar Varchar text 23 3 NA NA NA 4 4 0,0000 0,000 0

When output from SQL Server and output from R are placed side by side, there are some differences:R null values: NULL, NA, NaN, Inf (2)

Represented as NULL in SQL Server and NA in R; it is a Boolean type, but not a true NA. and the only oneis a logical object, i.e. "unavailable" information. So, this means that dealing with NA is not just a matter of "unavailable" but types of "unavailable" information, each of which requires special attention, otherwise you will constantly get forced errors when performing some calculations or functions.

Data imported using SQL Server can be used as normal datasets imported into R in any other way:

#perform basic calculations df$num1 * 2# [1] 2.4400 6.9068 NA 0.0000is.na(df$num1)# [1] FALSE FALSE TRUE FALSE

The same logic applies to the text1 and text2 fields. Both are factors, but can handle NULL or NA values ​​separately.

# Textdf$text2# NULLdf$text1# NULL

This is quite unexpected as SQL Server data types are again not available in R. So change your original SQL query to convert all values:

df <- sqlQuery(SQLcon, "选择 ID, num1, num2, num3, CAST(tex1 AS VARCHAR(100)) kao tekst1, CAST(tex2 AS VARCHAR(100)) kao tekst2 FROM R_Nullables")

Run df fill again, the result of df$text1 will be:

[1] This is Nvarchar text. This is another Nvarchar textLevel: This is another Nvarchar text. This is Nvarchar text

Download data from TXT/CSV file

I've created an example txt/csv file to import into R as follows:

setwd("C:\\Korisnici\\Tomaz\\")dft <- read.csv("import_txt_R.txt")dft

Side by side; The R and CSV files will show that the data types are well handled:

R null values: NULL, NA, NaN, Inf (3)

But only at first glance. Let's verify the last observation by checking tpye :

is.na(dft[5,])# text1 text2 value1 value2#5 TRUE FALSE FALSE FALSE

This is a problem because each factor and value will be treated differently, and even though they are both of the same type, one is actually NA and the other is not.

same(class(dft[5,2]),class(dft[5,1])) #[1] TRUE

Be sure to check all types of data and values ​​before you go on your next trip.

As always, the code is available atwhen. Happy coding!R null values: NULL, NA, NaN, Inf (4)

related

comeCommentAuthors can follow the link and leave a comment on their blog:R – TomazTsql.

R-bloggers.comrabatDaily email updatesoNormalRelated news and guideslearnand many other topics.Click here if you want to post or search for jobs related to R/data science.

Want to share your content on R Bloggers?click hereif you have a blog orHereIf you don't.

References

Top Articles
Latest Posts
Article information

Author: Pres. Lawanda Wiegand

Last Updated: 08/06/2023

Views: 6316

Rating: 4 / 5 (71 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Pres. Lawanda Wiegand

Birthday: 1993-01-10

Address: Suite 391 6963 Ullrich Shore, Bellefort, WI 01350-7893

Phone: +6806610432415

Job: Dynamic Manufacturing Assistant

Hobby: amateur radio, Taekwondo, Wood carving, Parkour, Skateboarding, Running, Rafting

Introduction: My name is Pres. Lawanda Wiegand, I am a inquisitive, helpful, glamorous, cheerful, open, clever, innocent person who loves writing and wants to share my knowledge and understanding with you.