The tutorial will show you how to analyze a column in different ways.
Lets bring in a sample dataset.
data("iris")
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Lets look at the unique species names.
unique(iris$Species)
## [1] setosa versicolor virginica
## Levels: setosa versicolor virginica
lets see how many distinct species are in iris dataset, this will be much more useful in a bigger dataset.
library(tidyverse)
n_distinct(iris$Species)
## [1] 3
Let’s breakdown the species by the number of times they show up in a dataset
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
Let’s now do the breakout by percentage
prop.table(table(iris$Species))
##
## setosa versicolor virginica
## 0.3333333 0.3333333 0.3333333