# load libraries
library(dplyr)
library(ggplot2)
library(stringr)
# set URL for downloading data
.dataURL <- paste('https://raw.githubusercontent.com',
'rfordatascience',
'tidytuesday',
'master',
'data',
'2023',
'2023-01-17',
'artists.csv',
sep = "/")
# download data
artists <- readr::read_csv(.dataURL)Tidy Tuesday is a weekly data project which is intended to be a platform for R users improving their capabilities in leveraging tidyverse ecosystem for data-related tasks such as data manipulation and visualisation provided by the R4DS online community (Mock 2022). I have known about the community since months ago and always been interested to participate since. This is the first time to actually work with the data provided by the organizer, so I will just use simple techniques. The main goal here is to just get accustomed with the environment as well as practicing writing and coding fast.
The topic for this week is art history data from arthistory data package by Lemus and Stam (2022) which contains data used for a thesis, titled Quantifying Art Historical Narratives, by Stam (2022). The package was intended to survey the demographic trends among artists in two of the most popular textbooks in art history in America, Janson’s History of Art and Gardner’s Art through the Ages.
Preparation
The first things to do is to load relevant library. Here, I used several packages from tidyverse, specifically dplyr for data manipulation, ggplot2 for data visualisation, and stringr for string manipulation. I also used a function from readr package, read_csv, to download the data from the TidyTuesday github repository.
Exploration and Cleaning
The next step is to perform a simple exploration for the data. The most basic function to do this is summary from base R.
summary(artists) artist_name edition_number year artist_nationality
Length:3162 Min. : 1.000 Min. :1926 Length:3162
Class :character 1st Qu.: 5.000 1st Qu.:1986 Class :character
Mode :character Median : 8.000 Median :1996 Mode :character
Mean : 8.223 Mean :1994
3rd Qu.:12.000 3rd Qu.:2009
Max. :16.000 Max. :2020
artist_nationality_other artist_gender artist_race
Length:3162 Length:3162 Length:3162
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
artist_ethnicity book space_ratio_per_page_total
Length:3162 Length:3162 Min. :0.0946
Class :character Class :character 1st Qu.:0.3082
Mode :character Mode :character Median :0.4093
Mean :0.5301
3rd Qu.:0.5941
Max. :3.7967
artist_unique_id moma_count_to_year whitney_count_to_year artist_race_nwi
Min. : 1.0 Min. : 0.000 Min. : 0.000 Length:3162
1st Qu.:108.0 1st Qu.: 0.000 1st Qu.: 0.000 Class :character
Median :189.0 Median : 1.000 Median : 0.000 Mode :character
Mean :201.8 Mean : 4.306 Mean : 1.957
3rd Qu.:305.8 3rd Qu.: 5.000 3rd Qu.: 0.000
Max. :413.0 Max. :64.000 Max. :40.000
Furthermore, the function purrr::map could be helpful and versatile for exploration. Specifically, I wanted to see unique values for each variables which lead me to find some missing data represented in character values e.g. "N/A", "N/A1", "N/A2", etc. Since the output was long, I limit to only print the artist_race column.
# find unique values for each column
purrr::map(artists, unique)$artist_raceAnother function from purrr::map family is purrr::map_if which you can use to perform a function for specific columns. Here, all missing values represented in character was replaced as NA.
# replace missing values with `NA`
artists <- artists |>
purrr::map_if(is.character,
function(x) {
str_replace_all(x, "^N/A.*", "") |>
na_if("")
}) |>
as_tibble() |>
na.omit()Data Visualisation
For summarising data in visual format, I tried to replicate some of Stam (2022). The first thing is to gain insight on the number of artists in Gardner’s Art Through the Ages. The steps for creating the graph are:
filterthe work of Gardner’s Art Through the Ages.summarisethe number of artists in the book, grouped byyear, and store it incountvariable.- create a plot using
ggplotfunction by assigningyearas the x-axis andcountas the y-axis. - use
geom_colto add graphical element of the bar chart. - use
geom_textto add labels of each count. - for more customization,
theme_minimalis used. - labels are added using
labsfunction.
# Visualising artist count of Gardner's Art Through the Ages
artists |>
filter(book == "Gardner") |>
group_by(year) |>
summarise(count = n()) |>
ggplot(aes(x = year, y = count)) +
geom_col(width = 2, fill = "#43ac65") +
geom_text(aes(label = count, y = count + 10), size = 2.5) +
theme_minimal() +
labs(title = "Overall Count of Artists in Gardner's Art Through the Ages",
x = "Year of Publication",
y = "Count")
The next graph describes the distribution of genders of the artists. This can be achieved through the following steps.
filterthe work of Gardner’s Art Through the Ages.- create a plot using
ggplotfunction by assigningyearas the x-axis andartist_genderas the fill component of the graph. - use
geom_barto add graphical element of the bar chart. - use
geom_hlineto add a horizontal line with the male proportion as the y value. - for more customization,
theme_minimalis used. - labels are added using
labsfunction.
# calculate the average proportion of male artist
gardner_male_avg <- mean(filter(artists, book == "Gardner")$artist_gender == "Male")
# Visualising gender distribution of Gardner's Art Through the Ages
artists |>
filter(book == "Gardner") |>
ggplot(aes(x = year, fill = artist_gender)) +
geom_bar(position = "fill", width = 2) +
geom_hline(yintercept = gardner_male_avg, size = 1) +
scale_fill_manual(values = c("#aa4365", "#4365aa")) +
theme_minimal() +
labs(title = "Gender of Artists in Gardner's Art Through the Ages",
x = "Year of Publication",
y = "Proportion",
fill = "Artist Gender")Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Using the same methods, the code could also be used to create visualisation for the Janson’s History of Art data.


Wrap up
Using the code above, I could recreate the visualisations by Stam (2022). I use mostly functions from ggplot and dplyr packages for data visualisation and manipulation. The tidyverse provides its users with many functions that can tackle most of data science and analysis jobs. Finally, Tidy Tuesday create a safe and supportive environment for R users to learn and practice using its functionality.