Horror Movies

code
Author

Nikhita Purohit

Published

September 28, 2025

Horror Movies

Studying Top-Rated Horror Movies

EXPERIMENT OBJECTIVE:

The aim of this experiment could be to study how horror movies affect their audience in terms of heart rate, HRV, etc and compare them to ratings, reviews and movie characteristics.

1. Setting up R Packages

# SETUP CHUNK- LIBRARIES
#| label: setup
#| echo: false
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(mosaic) # Our all-in-one package
Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum
library(skimr) # Looking at data

Attaching package: 'skimr'

The following object is masked from 'package:mosaic':

    n_missing
library(janitor) # Clean the data

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(naniar) # Handle missing data

Attaching package: 'naniar'

The following object is masked from 'package:skimr':

    n_complete
library(visdat) # Visualise missing data
library(tinytable) # Printing Static Tables for our data

Attaching package: 'tinytable'

The following object is masked from 'package:ggplot2':

    theme_void
library(DT) # Interactive Tables for our data
library(crosstable) # Multiple variable summaries

Attaching package: 'crosstable'

The following object is masked from 'package:purrr':

    compact

2. Read Data

horrormovies_modified <- horrormovies <- readr::read_csv("../data/HorrorMoviedata.csv")%>%
  # Clean variable names
  janitor::clean_names(case="snake")
Rows: 50 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Movie, This film is a sequel., This film has at least one sequel
dbl (9): Ranking, Avg resting heart rate (BPM), Avg movie heart rate (BPM), ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
horrormovies_modified
# A tibble: 50 × 12
   ranking movie                   avg_resting_heart_ra…¹ avg_movie_heart_rate…²
     <dbl> <chr>                                    <dbl>                  <dbl>
 1       1 Sinister                                    64                     86
 2       2 Host                                        64                     88
 3       3 Skinamirink                                 64                     84
 4       4 Insidious                                   64                     85
 5       5 The Conjuring                               64                     84
 6       6 Hereditary                                  64                     82
 7       7 Smile                                       64                     83
 8       8 The Excorcism of Emily…                     64                     82
 9       9 Hell House LLC                              64                     81
10      10 Talk To Me                                  64                     79
# ℹ 40 more rows
# ℹ abbreviated names: ¹​avg_resting_heart_rate_bpm, ²​avg_movie_heart_rate_bpm
# ℹ 8 more variables: overall_difference_bpm <dbl>,
#   hrv_difference_percent <dbl>, highest_spike_bpm <dbl>, scare_score <dbl>,
#   this_film_is_a_sequel <chr>, this_film_has_at_least_one_sequel <chr>,
#   rotten_tomato_score <dbl>, year <dbl>

3. Examine Data

dplyr::glimpse(horrormovies_modified)
Rows: 50
Columns: 12
$ ranking                           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1…
$ movie                             <chr> "Sinister", "Host", "Skinamirink", "…
$ avg_resting_heart_rate_bpm        <dbl> 64, 64, 64, 64, 64, 64, 64, 64, 64, …
$ avg_movie_heart_rate_bpm          <dbl> 86, 88, 84, 85, 84, 82, 83, 82, 81, …
$ overall_difference_bpm            <dbl> 22, 24, 20, 21, 20, 18, 19, 18, 17, …
$ hrv_difference_percent            <dbl> 21, 18, 22, 18, 18, 19, 15, 17, 16, …
$ highest_spike_bpm                 <dbl> 131, 130, 113, 133, 132, 104, 114, 9…
$ scare_score                       <dbl> 96, 95, 91, 90, 88, 81, 78, 76, 75, …
$ this_film_is_a_sequel             <chr> "no", "no", "no", "no", "no", "no", …
$ this_film_has_at_least_one_sequel <chr> "yes", "no", "no", "yes", "yes", "no…
$ rotten_tomato_score               <dbl> 63, 99, 72, 67, 86, 90, 79, 45, 75, …
$ year                              <dbl> 2012, 2020, 2022, 2010, 2013, 2018, …
skimr::skim(horrormovies_modified)
Data summary
Name horrormovies_modified
Number of rows 50
Number of columns 12
_______________________
Column type frequency:
character 3
numeric 9
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
movie 0 1 4 38 0 50 0
this_film_is_a_sequel 0 1 2 3 0 2 0
this_film_has_at_least_one_sequel 0 1 2 3 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
ranking 0 1 25.68 14.81 1 13.25 25.5 37.75 50 ▇▇▇▆▇
avg_resting_heart_rate_bpm 0 1 64.00 0.00 64 64.00 64.0 64.00 64 ▁▁▇▁▁
avg_movie_heart_rate_bpm 0 1 76.94 4.62 68 74.00 77.0 79.75 88 ▃▇▇▃▂
overall_difference_bpm 0 1 13.00 4.67 4 10.00 13.0 16.00 24 ▃▇▇▃▂
hrv_difference_percent 0 1 14.36 3.09 7 13.00 14.0 16.00 22 ▂▅▇▃▁
highest_spike_bpm 0 1 102.38 14.92 78 90.25 99.5 114.00 133 ▆▇▅▆▂
scare_score 0 1 64.20 14.80 34 52.50 64.0 74.00 96 ▃▅▇▅▂
rotten_tomato_score 0 1 77.22 19.70 20 71.25 83.5 91.00 99 ▁▁▂▅▇
year 0 1 2011.46 13.18 1973 2010.00 2016.0 2020.00 2023 ▁▁▁▂▇
crosstable(ranking + hrv_difference_percent ~ this_film_is_a_sequel + this_film_has_at_least_one_sequel,
  data = horrormovies_modified
) %>%
  crosstable::as_flextable()

this_film_has_at_least_one_sequel

no

yes

this_film_is_a_sequel

no

yes

no

yes

ranking

Min / Max

2.0 / 50.0

28.0 / 42.0

1.0 / 49.0

12.0 / 35.0

Med [IQR]

22.5 [9.5;39.0]

38.0 [33.0;40.0]

25.5 [14.0;39.5]

21.0 [18.0;28.5]

Mean (std)

24.6 (16.9)

36.0 (7.2)

26.1 (15.3)

23.0 (8.4)

N (NA)

20 (0)

3 (0)

20 (0)

7 (0)

hrv_difference_percent

Min / Max

7.0 / 22.0

11.0 / 14.0

10.0 / 21.0

13.0 / 17.0

Med [IQR]

15.0 [12.2;17.0]

12.0 [11.5;13.0]

14.5 [12.8;16.0]

14.0 [13.5;16.0]

Mean (std)

14.3 (3.9)

12.3 (1.5)

14.6 (2.8)

14.7 (1.7)

N (NA)

20 (0)

3 (0)

20 (0)

7 (0)

crosstable(scare_score + rotten_tomato_score ~ this_film_is_a_sequel,
  data = horrormovies_modified
) %>%
  crosstable::as_flextable()

label

variable

this_film_is_a_sequel

no

yes

scare_score

Min / Max

34.0 / 96.0

49.0 / 74.0

Med [IQR]

64.0 [51.0;74.2]

63.5 [58.0;69.5]

Mean (std)

64.5 (16.1)

62.9 (8.3)

N (NA)

40 (0)

10 (0)

rotten_tomato_score

Min / Max

20.0 / 99.0

38.0 / 91.0

Med [IQR]

86.0 [75.0;93.0]

68.5 [52.8;82.2]

Mean (std)

80.0 (18.9)

66.1 (20.0)

N (NA)

40 (0)

10 (0)

4. Data Dictionary

Quantitative Data

  1. ranking(dbl): Overall ranking of the movie
  2. avg_resting_heart_rate_bpm(dbl): Average heart rate of audience while at rest (bpm)
  3. avg_movie_heart_rate_bpm(dbl): Average heart rate of audience while watching the movie (bpm)
  4. overall_difference_bpm(dbl): Difference between average movie heart rate and average resting heart rate (bpm)
  5. hrv_difference_percent(dbl): Change in heart rate variability (hrv) from resting state to watching the movie (%)
  6. highest_spike_bpm(dbl): The highest heart rate of the audience during the movie (bpm)
  7. scare_score(dbl): How scary the movie was (out of 100)
  8. rotten_tomato_score(dbl): Rotten tomato’s score of the movie (out of 100)
  9. year(dbl): The year the movie was released

Qualitative Data

  1. movie(fct): Name of the movie
  2. this_film_is_a_sequel(fct): Asks if the movie is a sequel (yes or no)
  3. this_film_has_at_least_one_sequel(fct): Asks if the movie has a sequel (yes or no)
horrormovies_modified %>%
  dplyr::count(movie,this_film_is_a_sequel,this_film_has_at_least_one_sequel) %>%
  tt()
movie this_film_is_a_sequel this_film_has_at_least_one_sequel n
28 Days Later no yes 1
A Nightmare on Elm Street (1984) no yes 1
A Quiet Place no yes 1
A Quiet Place Part 2 yes yes 1
Alien no yes 1
Barbarian no no 1
Black Phone no no 1
Evil Dead Rise no no 1
Get Out no no 1
Halloween (1978) no yes 1
Hell House LLC no yes 1
Hereditary no no 1
Host no no 1
Hush no no 1
IT (2017) no yes 1
Insidious no yes 1
Insidious 2 yes yes 1
Insidious: The Red Door yes yes 1
It Follows no no 1
Lights Out no no 1
Oculus no yes 1
Ouija: Origin of Evil yes no 1
Paranormal Activity no yes 1
Paranormal Activity 2 yes yes 1
Poltergeist no yes 1
Saw X yes no 1
Scream no yes 1
Sinister no yes 1
Skinamirink no no 1
Smile no no 1
Talk To Me no no 1
Terrifier 2 yes yes 1
Texas Chainsaw Massacre (1974) no yes 1
The Autopsy of Jane Doe no no 1
The Babadook no no 1
The Blair Witch Project no yes 1
The Conjuring no yes 1
The Conjuring 2 yes yes 1
The Conjuring: The Devil Made Me Do It yes yes 1
The Dark and The Wicked no no 1
The Descent no yes 1
The Excorcism of Emily Rose no no 1
The Excorcist no yes 1
The Grudge no yes 1
The Invisible Man no no 1
The Nun 2 yes no 1
The Ring no yes 1
The Thing no no 1
The Visit no no 1
The Witch no no 1
horrormovies_modified %>%
  dplyr::summarise(across(
    .cols = c(avg_resting_heart_rate_bpm,avg_movie_heart_rate_bpm,overall_difference_bpm,hrv_difference_percent,highest_spike_bpm,scare_score,rotten_tomato_score), # select columns

    .fns = list(
      mean = ~ mean(., na.rm = T),
      sd = sd,
      min = min, max = max
    )
  ))
# A tibble: 1 × 28
  avg_resting_heart_rate_bpm_mean avg_resting_heart_rat…¹ avg_resting_heart_ra…²
                            <dbl>                   <dbl>                  <dbl>
1                              64                       0                     64
# ℹ abbreviated names: ¹​avg_resting_heart_rate_bpm_sd,
#   ²​avg_resting_heart_rate_bpm_min
# ℹ 25 more variables: avg_resting_heart_rate_bpm_max <dbl>,
#   avg_movie_heart_rate_bpm_mean <dbl>, avg_movie_heart_rate_bpm_sd <dbl>,
#   avg_movie_heart_rate_bpm_min <dbl>, avg_movie_heart_rate_bpm_max <dbl>,
#   overall_difference_bpm_mean <dbl>, overall_difference_bpm_sd <dbl>,
#   overall_difference_bpm_min <dbl>, overall_difference_bpm_max <dbl>, …
horrormovies_modified <- horrormovies %>% tidyr::drop_na()
horrormovies_modified
# A tibble: 50 × 12
   ranking movie                   avg_resting_heart_ra…¹ avg_movie_heart_rate…²
     <dbl> <chr>                                    <dbl>                  <dbl>
 1       1 Sinister                                    64                     86
 2       2 Host                                        64                     88
 3       3 Skinamirink                                 64                     84
 4       4 Insidious                                   64                     85
 5       5 The Conjuring                               64                     84
 6       6 Hereditary                                  64                     82
 7       7 Smile                                       64                     83
 8       8 The Excorcism of Emily…                     64                     82
 9       9 Hell House LLC                              64                     81
10      10 Talk To Me                                  64                     79
# ℹ 40 more rows
# ℹ abbreviated names: ¹​avg_resting_heart_rate_bpm, ²​avg_movie_heart_rate_bpm
# ℹ 8 more variables: overall_difference_bpm <dbl>,
#   hrv_difference_percent <dbl>, highest_spike_bpm <dbl>, scare_score <dbl>,
#   this_film_is_a_sequel <chr>, this_film_has_at_least_one_sequel <chr>,
#   rotten_tomato_score <dbl>, year <dbl>
horrormovies_modified <- horrormovies %>%
  mutate(
    movie = as.factor(movie),
    this_film_is_a_sequel = as.factor(this_film_is_a_sequel),
    this_film_has_at_least_one_sequel = as.factor(this_film_has_at_least_one_sequel)
  ) %>%

  # arrange the Qual variables first, Quant next
  dplyr::relocate(where(is.factor), .after = ranking)

glimpse(horrormovies_modified)
Rows: 50
Columns: 12
$ ranking                           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1…
$ movie                             <fct> Sinister, Host, Skinamirink, Insidio…
$ this_film_is_a_sequel             <fct> no, no, no, no, no, no, no, no, no, …
$ this_film_has_at_least_one_sequel <fct> yes, no, no, yes, yes, no, no, no, y…
$ avg_resting_heart_rate_bpm        <dbl> 64, 64, 64, 64, 64, 64, 64, 64, 64, …
$ avg_movie_heart_rate_bpm          <dbl> 86, 88, 84, 85, 84, 82, 83, 82, 81, …
$ overall_difference_bpm            <dbl> 22, 24, 20, 21, 20, 18, 19, 18, 17, …
$ hrv_difference_percent            <dbl> 21, 18, 22, 18, 18, 19, 15, 17, 16, …
$ highest_spike_bpm                 <dbl> 131, 130, 113, 133, 132, 104, 114, 9…
$ scare_score                       <dbl> 96, 95, 91, 90, 88, 81, 78, 76, 75, …
$ rotten_tomato_score               <dbl> 63, 99, 72, 67, 86, 90, 79, 45, 75, …
$ year                              <dbl> 2012, 2020, 2022, 2010, 2013, 2018, …
horrormovies_modified2 <- horrormovies_modified %>%
  stats::setNames(c("Ranking","Movie","This Film is a Sequel","This Film Has at Least One Sequel","Average Resting Heart Rate (bpm)","Average Movie Heart Rate (bpm)","Overall Difference (bpm)","Heart Rate Variabilty (Difference %)","Highest Spike (bpm)","Scare Score","Rotten Tomato Score","Year of Release"))

horrormovies_modified2 %>%
  DT::datatable(
    style = "default",
    caption = htmltools::tags$caption(
      style = "caption-side: top; text-align: left; color: black; font-size: 100%;", "Horror Movies Dataset (Clean)"
    ),
    options = list(pageLength = 10, autoWidth = TRUE)
  ) %>%
  DT::formatStyle(
    columns = names(horrormovies_modified2),
    fontFamily = "Roboto Condensed",
    fontSize = "12px"
  )

5. Graphs

1. Do Sequels Trigger Different Audience Reactions Compared to Non-Sequels?

horrormovies_long <- horrormovies_modified2 %>%
  tidyr::pivot_longer(
    cols = c(`Overall Difference (bpm)`, `Highest Spike (bpm)`, `Scare Score`, `Average Movie Heart Rate (bpm)`),
    names_to = "measure",
    values_to = "value"
  )

# Create the box plot with facets
horrormovies_plot <- horrormovies_long %>%
  gf_boxplot(value ~ `This Film is a Sequel`, fill = ~ `This Film is a Sequel`) %>%
  gf_refine(scale_fill_brewer(name = "Legend = Is it a Sequel?", palette = "Accent")) %>%
  gf_facet_wrap(~measure, scales = "free_y") %>%   # one box plot per numeric variable
  gf_labs(
    title = "Audience Reactions: Sequels vs Non-Sequels",
    x = "Is This Film a Sequel?",
    y = "Value of Measure"
  )

horrormovies_plot

2. Do Movies That Critics Rate More Highly Also Tend To Be Scarier For Audiences?

gf_point(`Scare Score` ~ `Rotten Tomato Score`, data = horrormovies_modified2, color = 'maroon', size = 2) %>%
  gf_labs(title = "Do Higher Critic Ratings Correlate With Scarier Movies?",
          x = "Rotten Tomatoes Score",
          y = "Audience Scare Score")

3. Do Newer Movies Differ From Older Ones In How Much They Raise BPM?

horrormovies_plot2 <- horrormovies_modified %>%
  gf_col(overall_difference_bpm ~ factor(year), fill = ~this_film_is_a_sequel, position = "dodge") %>%
  gf_labs(title = "Do Newer Movies Differ From Older Ones In How Much They Raise BPM?",
          x = "Movie Release Year",
          y = "Average BPM Raised",
          fill = "Legend = Is it a Sequel?") %>%
  gf_theme(theme(axis.text.x = element_text(angle = 45, hjust = 1)))


horrormovies_plot2

6. Summary of Findings

From the data and the graphs plotted, it is evident that

  1. Heart rates of the audience tend to be slightly higher for non-sequels compared to sequels.

  2. Both sequels and non-sequels can cause strong spikes.

  3. Non- sequels tend to be scarier than sequels.

  4. Newer movies tend to have sequels as compared to older movies.

  5. People who watch newer movies tend to have a higher average bpm than people who watch older movies.

7. Surprising Aspects

  1. A movie’s Rotten Tomato rating does not completely predict how scary the movie is, because there is a wide variation in the scare scores whether the movie was rated highly or poorly.

  2. Some movies that received almost perfect ratings are only moderately scary compared to movies that are considered mediocre. This suggests that the scare factor is subjective.

  3. Movies that are not sequels are scarier than movies that are.