In this tutorial we will go over using the match, unique, and n_distinct function. Let’s create 3 vectors to use as an example.
#Unique, Match, Identical, n_distinct
setA <- c(5:10)
setB <- c(5,4,7,8,9,11)
setC <- c(5:10)
print(setA)
## [1] 5 6 7 8 9 10
print(setB)
## [1] 5 4 7 8 9 11
We will use the match function to compare SetA to SetB
#using match function
match(setA,setB)
## [1] 1 NA 3 4 5 NA
how many don’t match
sum(is.na(match(setA,setB)))
## [1] 2
Identical will give you a true or false when comparing if both vectors match
#using identical
identical(setA,setB)
## [1] FALSE
identical(setA,setC)
## [1] TRUE
n_distinct will count the unqiue values.
setD <- c(5,5,6,6,7)
print(setD)
## [1] 5 5 6 6 7
dplyr::n_distinct(setD)
## [1] 3
Use unique to remove duplicates
SetD1 <- unique(setD)
print(SetD1)
## [1] 5 6 7
you can use the names function in combination with identical or match function to check if column names are similar.
data("iris")
# head(iris)
iris_1 <- iris
iris_2 <- iris
identical(names(iris_1), names(iris_2))
## [1] TRUE
match(names(iris_1), names(iris_2))
## [1] 1 2 3 4 5
Now we will change the first column name to the second dataset and run the same lines of code as above
names(iris_2)[1] <- "wrong"
names(iris_2)
## [1] "wrong" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
identical(names(iris_1), names(iris_2))
## [1] FALSE
match(names(iris_1), names(iris_2))
## [1] NA 2 3 4 5
We can remove duplicate rows in a dataframe with the unique function
iris4 <- rbind(iris, iris_1)
paste0('numbers of rows: ',nrow(iris4))
## [1] "numbers of rows: 300"
iris4 <- unique(iris4)
paste0('numbers of rows after using unique function: ',nrow(iris4))
## [1] "numbers of rows after using unique function: 149"