Bachelor

Bachelor Analysis

During this project, I will be analyzing different aspects of the Bachelor and the Bachelorette data. First, I will install the packages needed for this project.

library(tidyverse)
library(ggplot2)
library(tidytext)
library(leaflet)
library(readr)
library(readxl)
library(quarto)

Next, I will download my external data. I downloaded this data from Kaggle.com.(https://www.kaggle.com/datasets/rachelleperez/the-bachelor-vs-the-bachelorette)

contestants <- read_csv("~/Desktop/The.Bachelor.VS.The.Bachelorette/contestants.csv")
seasons <- read_csv("~/Desktop/The.Bachelor.VS.The.Bachelorette/seasons.csv")
episodes <- read_csv("~/Desktop/The.Bachelor.VS.The.Bachelorette/episodes.csv")
specials <- read_csv("~/Desktop/The.Bachelor.VS.The.Bachelorette/specials.csv")
episodesedit <- read_excel("~/Desktop/The.Bachelor.VS.The.Bachelorette/Book1.xlsx")
Final <- read_excel("~/Desktop/The.Bachelor.VS.The.Bachelorette/Book2.xlsx")

I will first be looking at the average viewers by season. This will allow me to see which season had the most viewers and was watched by the most amount of people I believe one of the later seasons will have the highest amount of viewers because the franchise is growing. I will be making a bar plot to better visualize the average viewers per season.

episodes |> 
  na.omit() |> 
  group_by(Season) |> 
  summarize(total_viewership = sum(`US. Viewers (millions)`)) |> 
  ungroup() |> 
  ggplot(aes(Season, total_viewership, fill = total_viewership)) + geom_col() + 
  geom_text(aes(label = Season), vjust = 2, color = "white")

My hypothesis that one of the later seasons will have the highest viewership was incorrect. The season with the highest viewership was season 2.

I will be looking which each week on different seasons to see which week is the most viewed. Each week,in different seasons, in the Bachelor franchise, is set up the same way.The same events happen in specific weeks each season. By analyzing which week has the highest amount of viewers, I will be able to see which specific event is the most viewed/well-liked. I believe the most viewed week will be the “After the Final Rose” or the “Final Rose.” I will be making a bar chart to better visualize the average viewers by week.

total_viewers_by_week <- episodesedit |> 
  mutate(`US. Viewers (millions)` = as.numeric(`US. Viewers (millions)`)) %>%
  group_by(Title) %>%
  summarise(total_viewers = sum(`US. Viewers (millions)`, na.rm = TRUE), .groups = "drop")

total_viewers_by_week |>
  na.omit() |>
  ggplot(aes(reorder(Title, total_viewers),total_viewers, fill = total_viewers)) + geom_col() + coord_flip()  +
  labs(title = "Total Viewers by Week",
       x = "Week",
       y = "Total Viewers (Millions)")

“Week 6” has the highest average amount of viewers throughout every season. “Week 6 is the week of”hometowns.” In this week, the bachelor/bachelorette visits all the remaining contestants hometowns and meets there family. I will be analyzing the different hometowns in the data. I hypothesize the majority of the contestants will be from major metropolitan areas. I will get a count of each hometown of the contestants that are not eliminated by hometowns. I will make a bar chart to better visualize the different hometowns that are visited during hometowns.

contestants |>
  count(Hometown) -> count_hometown

count_hometown |> 
  arrange(desc(n)) -> hometown_count

contestants |> 
  filter(Eliminated == "Winner") -> winners

winners |> 
  count(Hometown) -> count_hometown

count_hometown |> 
  arrange(desc(n)) -> hometown_count_winners

hometown_count |> 
  head(20) |> 
  ggplot(aes(Hometown, n, fill = n)) + geom_col() + coord_flip()

Next, I wanted to see where the winners from the franchise and the bachelor/ bachelor are from. I made a map to visualize this. I also made the map to show if the couple is still together, what show in the franchise they were on, and who the contestant is.

show_color <- colorFactor(c("blue", "yellow", "pink"), Final$Show)
together_color <- colorFactor(c("red", "green"), Final$`Still together`)

leaflet() |>
  addTiles() |>
  setView( -98.5795,39.8283, zoom=4) |>
  addCircleMarkers(
    data = Final,
    lat =Final$X,
    lng= Final$Y,
    color = ~show_color(Final$Show),
    opacity = .8,
    fillColor = ~together_color(Final$`Still together`),
    fillOpacity = 3,
    label = paste0(
      "Contestant: ", Final$Bachelor, " , ",
      "Still Together: ", Final$`Still together`)
  )

After seeing this map, I recognized there are not many couples who are still together that were on the show. For my final chart, I wanted to see how many contestants are still together and how many are broken up. I did this by making a bar chart.

seasons |>
  na.omit() |> 
  count(`Still together`, sort = TRUE) |> 
  ggplot(aes(`Still together`, n, fill=n)) + geom_col() 

There are only 5 couples who are still together from the bachelor.

While analyzing many different aspects of the Bachelor franchise, I made many observations. I first observed that the most viewed season was season 2. Next I looked into what week of the bachelor has the most viewership. I found that week 6 had the most viewership. Next, I looked into the hometowns of the winners of each season and the bachelor/bachelorette of each season. I found that most of hometowns feature major metropolitan areas. I made a map to visualize this. Finally, I made a bar chart to visualize how many of the couples from the bachelor are still together. I found many interesting observations throughout this full analysis of aspects from the bachelor franchise. This will help people investigate key components of the show and observe interesting findings.