R has several built-in datasets that are included in the base installation of the R software. These datasets are commonly used for testing and learning purposes. Here are some examples of built-in datasets in R:
iris: This dataset contains the measurements of the length and width of the petals and sepals of three different species of iris flowers.
mtcars: This dataset contains information about the fuel consumption and other characteristics of various car models.
ChickWeight: This dataset contains information about the weight of chicks over time on different diets.
airquality: This dataset contains information about the air quality measurements in New York from May to September 1973.
faithful: This dataset contains the eruption times of the Old Faithful geyser in Yellowstone National Park.
trees: This dataset contains the height and girth measurements of different trees.
CO2: This dataset contains measurements of the concentration of carbon dioxide in the atmosphere over time.
InsectSprays: This dataset contains information about the effectiveness of six different insect sprays.
PlantGrowth: This dataset contains information about the growth of plants under different conditions.
In the code below, we first load the iris dataset using the data() function. We then use head() to view the first few rows of the dataset, and summary() to get a summary of the dataset, which includes the minimum, maximum, median, and quartile values for each column.
We use the apply() function to calculate the mean and standard deviation of each column in the dataset. The apply() function applies a function to either the rows or columns of a matrix or data frame, and the 2 argument tells it to apply the function to the columns.
# Load the iris dataset
data(iris)
# View the first few rows of the dataset
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
# Get a summary of the dataset
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
# Calculate the mean of each column in the dataset
apply(iris, 2, mean)
## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(newX[, i], ...): argument is not numeric or logical:
## returning NA
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## NA NA NA NA NA
# Calculate the standard deviation of each column in the dataset
apply(iris, 2, sd)
## Warning in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm =
## na.rm): NAs introduced by coercion
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 0.8280661 0.4358663 1.7652982 0.7622377 NA
You can load the airquality dataset using the data() function and then, create a scatter plot of Ozone vs. Wind using the plot() function, with “Wind Speed” and “Ozone Concentration” as the x and y labels, and “Air Quality Data” as the main title.
A regression line is added to the plot using the abline() function, which takes the output of the lm() function as an argument. The lm() function fits a linear regression model to the data, which we use to create the regression line. We also specify the color of the line as “red” using the col argument.
# Load the airquality dataset
data(airquality)
# Create a scatter plot of Ozone vs. Wind
plot(airquality$Wind, airquality$Ozone, xlab = "Wind Speed", ylab = "Ozone Concentration", main = "Air Quality Data")
# Add a regression line to the plot
abline(lm(airquality$Ozone ~ airquality$Wind), col = "red")