In this project, I will analyze the KJV Bible using five different subdivisions. I want to see the most common words of each subdivision and see the sentiments of each subdivision. I believe poetry will have the most positive sentiment. I believe prophecy or history will have the most negative sentiment.
First, I have to load all my packages and import my text.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 31102 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): citation, book, text
dbl (2): chapter, verse
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
kjv |>unnest_tokens(word, text) -> kjv2
As an overview, I would like to see the most common words of the entire Bible before breaking it down by subdivision.
Next, I would like to see the most common words in the five sections the bible is split into. I will also make a chart for each to be able to clearly see the top ten common words of each. I am also only looking at the top ten in each section to prohibit too much data.
History_Books |>count(word, sort =TRUE) |>head(10) |>arrange(desc(n)) -> History_Common_WordsHistory_Common_Words |>ggplot(aes(word, n, fill = word)) +geom_col() +theme_classic() +labs (title="Most Common History Words")
Poetry_Books |>count(word, sort =TRUE) |>head(10) |>arrange(desc(n)) -> Poetry_Common_WordsPoetry_Common_Words |>ggplot(aes(word, n, fill = word)) +geom_col() +theme_classic() +labs (title="Most Common Poetry Words")
Prophecy_Books |>count(word, sort =TRUE) |>head(10) |>arrange(desc(n)) -> Prophecy_Common_WordsProphecy_Common_Words |>ggplot(aes(word, n, fill = word)) +geom_col() +theme_classic() +labs (title="Most Common Prophecy Words")
Gospels_Books |>count(word, sort =TRUE) |>head(10) |>arrange(desc(n)) -> Gospels_Common_WordsGospels_Common_Words |>ggplot(aes(word, n, fill = word)) +geom_col() +theme_classic() +labs (title="Most Common Gospel Words")
Epistles_Books |>count(word, sort =TRUE) |>head(10) |>arrange(desc(n)) -> Epistles_Common_WordsEpistles_Common_Words |>ggplot(aes(word, n, fill = word)) +geom_col() +theme_classic() +labs (title="Most Common Epistle Words")
After viewing the most common words in each subdivision, I noticed the similarities and differences. I noticed how each subdivision also have similar words such as God, Lord, ye, and thy. However each subdivision does seem to have different words that make them unique. In this history subdivision, it has words such as “Israel” and “Children.” Israel makes sense due to the fact these are books explaining context and the setting of the Bible. However, children is interesting because it not an uncommon word however it is only seen in the top ten common words in this subdivision. History does have a prominent usage of nouns in the common words. The poetry subdivision has words such as “heart” and “wicked.” In contrast to the nouns in the history subdivision, these seem to be more emotional words with sentiment. The prophecy subdivision seems to be most similar to the history subdivision. It also has the word “Israel” in the top ten. The prophecy subdivision also has a large amount of nouns in the most common words. The gospel subdivision has the most variations of names of God in the most common words. Names like “Jesus”, “Lord”, “Father”, and “God” are all in the top ten. Finally, the epistles subdivision has fairly similar words to the rest of the subdivisions, however, the word “spirit” stuck out to me. It is in the top ten most common words. This could be due to the term the “Holy Spirit” in the Bible being used often in the epistles subdivision. I made a bar plot for each of these to visualize
After analyzing the most common words and seeing the words, it would be interesting to see the sentiment in each of the subdivisions. This will help be able to see what subdivision has the most emotion. I will be creating a list of the top ten negative and positive sentiments of each subdivision. I will also find the total average sentiment for each subdivision and create a bar plot at the end.
“God” is the most common word that holds sentiment in every subdivision except the gospels, which its most common word with sentiment is “Jesus.” This makes sense since Jesus is what each gospel book is about. In three of the subdivisions, “history”, “prophecy”, and “epistles”, the word the the strongest negative sentiment is the word “bastard.” In poetry it is the word “hell”. In gospels it is the word “Cock.” In every subdivision the most common word with the most positive sentiment is the word “rejoice.” After I evaluated each of the sentiments in each book, I wanted to see the average sentiment of the entire subdivisions. I found the subdivision with the highest sentiment is the history subdivision. The subdivision with the lowest sentiment is the prophecy subdivision. I made a bar graph to better visualize this.
In conclusion, the I was correct about prophecy having the most negative sentiment but was incorrect about how poetry had the most positive sentiment.