In this tutorial we will go over using the match, unique, and n_distinct function. Let’s create 3 vectors to use as an example.

#Unique, Match, Identical, n_distinct 

setA <- c(5:10)

setB <- c(5,4,7,8,9,11)

setC <- c(5:10)

print(setA)
## [1]  5  6  7  8  9 10
print(setB)
## [1]  5  4  7  8  9 11

We will use the match function to compare SetA to SetB

#using match function
match(setA,setB)
## [1]  1 NA  3  4  5 NA

how many don’t match

sum(is.na(match(setA,setB)))
## [1] 2

Identical will give you a true or false when comparing if both vectors match

#using identical 
identical(setA,setB)
## [1] FALSE
identical(setA,setC)
## [1] TRUE

n_distinct will count the unqiue values.

setD <- c(5,5,6,6,7)
print(setD)
## [1] 5 5 6 6 7
dplyr::n_distinct(setD)
## [1] 3

Use unique to remove duplicates

SetD1 <- unique(setD)
print(SetD1)
## [1] 5 6 7

you can use the names function in combination with identical or match function to check if column names are similar.

data("iris")
# head(iris)

iris_1 <- iris
iris_2 <- iris

identical(names(iris_1), names(iris_2))
## [1] TRUE
match(names(iris_1), names(iris_2))
## [1] 1 2 3 4 5

Now we will change the first column name to the second dataset and run the same lines of code as above

names(iris_2)[1] <- "wrong"
names(iris_2)
## [1] "wrong"        "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"
identical(names(iris_1), names(iris_2))
## [1] FALSE
match(names(iris_1), names(iris_2))
## [1] NA  2  3  4  5

We can remove duplicate rows in a dataframe with the unique function

iris4 <- rbind(iris, iris_1)
paste0('numbers of rows: ',nrow(iris4))
## [1] "numbers of rows: 300"
iris4 <- unique(iris4)
paste0('numbers of rows after using unique function: ',nrow(iris4))
## [1] "numbers of rows after using unique function: 149"