Examples of Built-in Datasets in R

R has several built-in datasets that are included in the base installation of the R software. These datasets are commonly used for testing and learning purposes. Here are some examples of built-in datasets in R:

iris: This dataset contains the measurements of the length and width of the petals and sepals of three different species of iris flowers.
mtcars: This dataset contains information about the fuel consumption and other characteristics of various car models.
ChickWeight: This dataset contains information about the weight of chicks over time on different diets.
airquality: This dataset contains information about the air quality measurements in New York from May to September 1973.
faithful: This dataset contains the eruption times of the Old Faithful geyser in Yellowstone National Park.
trees: This dataset contains the height and girth measurements of different trees.
CO2: This dataset contains measurements of the concentration of carbon dioxide in the atmosphere over time.
InsectSprays: This dataset contains information about the effectiveness of six different insect sprays.
PlantGrowth: This dataset contains information about the growth of plants under different conditions.

Example of Using a Built-in Data

In the code below, we first load the iris dataset using the data() function. We then use head() to view the first few rows of the dataset, and summary() to get a summary of the dataset, which includes the minimum, maximum, median, and quartile values for each column.

We use the apply() function to calculate the mean and standard deviation of each column in the dataset. The apply() function applies a function to either the rows or columns of a matrix or data frame, and the 2 argument tells it to apply the function to the columns.

# Load the iris dataset
data(iris)

# View the first few rows of the dataset
head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

# Get a summary of the dataset
summary(iris)

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
##

# Calculate the mean of each column in the dataset
apply(iris, 2, mean)

## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA

## Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
##           NA           NA           NA           NA           NA

# Calculate the standard deviation of each column in the dataset
apply(iris, 2, sd)

## Warning in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm =
## na.rm): NAs introduced by coercion

## Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
##    0.8280661    0.4358663    1.7652982    0.7622377           NA

Visualize Air Quality R data

You can load the airquality dataset using the data() function and then, create a scatter plot of Ozone vs. Wind using the plot() function, with “Wind Speed” and “Ozone Concentration” as the x and y labels, and “Air Quality Data” as the main title.

A regression line is added to the plot using the abline() function, which takes the output of the lm() function as an argument. The lm() function fits a linear regression model to the data, which we use to create the regression line. We also specify the color of the line as “red” using the col argument.

# Load the airquality dataset
data(airquality)

# Create a scatter plot of Ozone vs. Wind
plot(airquality$Wind, airquality$Ozone, xlab = "Wind Speed", ylab = "Ozone Concentration", main = "Air Quality Data")

# Add a regression line to the plot
abline(lm(airquality$Ozone ~ airquality$Wind), col = "red")

Builtin Data Set in R

Suborna Ahmed

2023-02-26

Examples of Built-in Datasets in R

Example of Using a Built-in Data

Visualize Air Quality R data