Is your dad weird?

Are SMI dads weirder than MIT dads?

EXPERIMENT OBJECTIVE: To find out if SMI dads are weirder than MIT dads.

1. Setting up R Packages

# SETUP CHUNK- LIBRARIES
#| label: setup
#| echo: false
#| warning: false
#| message: false

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(mosaic) # Our all-in-one package

Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum

library(skimr) # Looking at data


Attaching package: 'skimr'

The following object is masked from 'package:mosaic':

    n_missing

library(janitor) # Clean the data


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

library(naniar) # Handle missing data


Attaching package: 'naniar'

The following object is masked from 'package:skimr':

    n_complete

library(visdat) # Visualise missing data
library(tinytable) # Printing Static Tables for our data


Attaching package: 'tinytable'

The following object is masked from 'package:ggplot2':

    theme_void

library(DT) # Interactive Tables for our data
library(crosstable) # Multiple variable summaries


Attaching package: 'crosstable'

The following object is masked from 'package:purrr':

    compact

library(vcd)

Loading required package: grid

Attaching package: 'vcd'

The following object is masked from 'package:mosaic':

    mplot

library(visStatistics) # One package to test them all
### Dataset from Chihara and Hesterberg's book (Second Edition)
library(resampledata)


Attaching package: 'resampledata'

The following object is masked from 'package:datasets':

    Titanic

2. Read Data

dad_modified <- dad <- readr::read_csv("../data/3-weird_dads.csv")%>%
  # Clean variable names
  janitor::clean_names(case="snake")

Rows: 65 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gender, college, is_dad_weird, name

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dad_modified

# A tibble: 65 × 4
   gender college is_dad_weird name   
   <chr>  <chr>   <chr>        <chr>  
 1 F      SMI     No           Manya  
 2 F      SMI     No           Sradha 
 3 M      SMI     No           Arun   
 4 F      SMI     Yes          Nidhi  
 5 M      MIT     No           Shaurya
 6 M      MIT     No           Pratham
 7 M      MIT     No           Jeevan 
 8 M      SMI     No           Dhruv  
 9 F      SMI     Yes          Aakrati
10 M      SMI     Yes          Aakrsh 
# ℹ 55 more rows

3. Examine Data

dplyr::glimpse(dad_modified)

Rows: 65
Columns: 4
$ gender       <chr> "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "F", "M…
$ college      <chr> "SMI", "SMI", "SMI", "SMI", "MIT", "MIT", "MIT", "SMI", "…
$ is_dad_weird <chr> "No", "No", "No", "Yes", "No", "No", "No", "No", "Yes", "…
$ name         <chr> "Manya", "Sradha", "Arun", "Nidhi", "Shaurya", "Pratham",…

skimr::skim(dad_modified)

Data summary
Name	dad_modified
Number of rows	65
Number of columns	4
_______________________
Column type frequency:
character	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
gender	0	1.00	1	2	3
college	0	1.00	3	3	2
is_dad_weird	1	0.98	2	3	2
name	0	1.00	4	11	63

names(dad_modified)

[1] "gender"       "college"      "is_dad_weird" "name"

visdat::vis_dat(dad, sort_type = TRUE, palette = "default")

dad_modified <- dad %>% tidyr::drop_na()
dad_modified

# A tibble: 64 × 4
   gender college is_dad_weird name   
   <chr>  <chr>   <chr>        <chr>  
 1 F      SMI     No           Manya  
 2 F      SMI     No           Sradha 
 3 M      SMI     No           Arun   
 4 F      SMI     Yes          Nidhi  
 5 M      MIT     No           Shaurya
 6 M      MIT     No           Pratham
 7 M      MIT     No           Jeevan 
 8 M      SMI     No           Dhruv  
 9 F      SMI     Yes          Aakrati
10 M      SMI     Yes          Aakrsh 
# ℹ 54 more rows

visdat::vis_dat(dad_modified, sort_type = TRUE, palette = "default")

dad_modified %>%
  dplyr::count(gender, college) %>%
  tt()

gender	college	n
F	MIT	16
F	SMI	16
M	MIT	16
M	SMI	15
NB	MIT	1

dad_modified <- dad %>%
  dplyr::mutate(across(where(is.character), as.factor))
glimpse(dad_modified)

Rows: 65
Columns: 4
$ gender       <fct> F, F, M, F, M, M, M, M, F, M, F, M, F, F, M, M, F, M, M, …
$ college      <fct> SMI, SMI, SMI, SMI, MIT, MIT, MIT, SMI, SMI, SMI, SMI, SM…
$ is_dad_weird <fct> No, No, No, Yes, No, No, No, No, Yes, Yes, Yes, Yes, No, …
$ name         <fct> Manya, Sradha, Arun, Nidhi, Shaurya, Pratham, Jeevan, Dhr…

dad_modified %>%
  stats::setNames(c("Gender", "College", "Is_Your_Dad_Weird","Name"))

# A tibble: 65 × 4
   Gender College Is_Your_Dad_Weird Name   
   <fct>  <fct>   <fct>             <fct>  
 1 F      SMI     No                Manya  
 2 F      SMI     No                Sradha 
 3 M      SMI     No                Arun   
 4 F      SMI     Yes               Nidhi  
 5 M      MIT     No                Shaurya
 6 M      MIT     No                Pratham
 7 M      MIT     No                Jeevan 
 8 M      SMI     No                Dhruv  
 9 F      SMI     Yes               Aakrati
10 M      SMI     Yes               Aakrsh 
# ℹ 55 more rows

dad_modified %>%
  DT::datatable(
    style = "default",
    caption = htmltools::tags$caption(
      style = "caption-side: top; text-align: left; color: black; font-size: 100%;", "Weird Dads Dataset (Clean)"
    ),
    options = list(pageLength = 10, autoWidth = TRUE)
  ) %>%
  DT::formatStyle(
    columns = names(dad_modified),
    fontFamily = "Roboto Condensed",
    fontSize = "12px",
  )

4. Data Dictionary

Qualitative Data

gender(fct): Gender of student (M/F/NB)
college(fct): Which college the student is from (SMI/MIT)
is_dad_weird(fct): If the student thinks their dad is weird (Yes/No)
name(fct): Name of the student

5. Graphs

1. Are SMI dads weirder than MIT dads?

vcd::structable(data = dad_modified, is_dad_weird ~ college) %>% 
  as.matrix() %>%  
  addmargins()

       is_dad_weird
college No Yes Sum
    MIT 28   5  33
    SMI 19  12  31
    Sum 47  17  64

vcd::structable(is_dad_weird ~ college, data = dad_modified) %>%
    vcd::mosaic(gp = shading_max,
                main = "Are SMI dads weirder than MIT dads?")

Inferences

Only a few MIT students find their dads weird, than expected. But, more SMI students find their dads weird, than expected. In both cases, students are more likely to find their dads normal. Also, we can conclude that according to this sample, SMI dads are weirder than MIT dads.

2. Irrespective of the college, which gender is more likely to find their dads weird?

dad_modified %>% 
  gf_bar(~ is_dad_weird | gender,
         fill = ~gender) %>% 
  gf_labs( title = "Irrespective of the college, which gender is more likely to find their dads weird?",
           x = "Is Your Dad Weird?",
           y = "Count") %>% 
  gf_refine(scale_fill_brewer(palette = "Paired"))

Inferences

In general, it is more likely that women find their dads weird than men. This could be because some girl dads can come off as overprotective or ‘embarrassing’. The only non-binary student from MIT also finds their dad weird. Perhaps their father is not as supportive as they’d like.

3. Are women more likely to find their dads weird than men?

dad_modified2 <- dad_modified %>%
  filter(gender != "NB")

dad_modified2 %>% 
  gf_bar(~ gender | college,
         fill = ~ is_dad_weird,
         position = "fill") %>%
  gf_labs(
    title = "Are women more likely to find their dads weird than men?",
    x = "Gender",
    y = "Proportion",
    fill = "Legend: Is Your Dad Weird?")

Inferences

In the mosaic graph, we could only infer that SMI dads are weirder than MIT dads, but here we can also talk about it with respect to gender. In SMI, boys are more likely to find their dads weird, and in MIT, girls are more likely to find their dads weird.

This reason behind this could be that some dads are not very supportive when it comes to sending their son to a design school and would rather have them do engineering. Because of clashes with their dads, maybe SMI boys find their dads weird. Same goes for MIT girls who took up engineering.

4. Are students more likely to think their dads are weird?

dad_modified %>% 
  gf_bar(~ is_dad_weird, fill = ~ is_dad_weird) %>% 
  gf_labs( title = "Are students more likely to think their dads are weird?",
           x = "Is Your Dad Weird?",
           y = "Count",
           fill = "Legend: Is Your Dad Weird?") %>% 
  gf_refine(scale_fill_brewer(palette = "Accent", direction =-1))

Inferences

Overall, students are more likely to find their dads normal.

6. Summary of Inferences

Overall, most students across both colleges find their dads normal. However, based on this sample, SMI students are slightly more likely to find their dads weird than MIT students. When looking at gender, MIT girls appear more likely to perceive their dads as weird, while in SMI, boys show this trend more. These differences may reflect family expectations, academic choices, and communication dynamics rather than universal patterns.

7. Surprising Aspects

I thought MIT dads would be weirder than SMI dads because I assumed that dads who allow their children to study in a design school would be fun, easy-going and open-minded and would not be considered weird by their children. Results could change if a different sample was taken, perhaps after exam results are released.

8. Pearson’s Chi-squared test- Inference test for two proportions

contingency_table <- vcd::structable(data = dad_modified, is_dad_weird ~ college) %>% 
  as.matrix() %>%  
  addmargins()

xq_test_object <- xchisq.test(contingency_table)


    Pearson's Chi-squared test

data:  x
X-squared = 4.5477, df = 4, p-value = 0.3369

 28.00     5.00    33.00  
(24.23)  ( 8.77)  (33.00) 
 [0.59]   [1.62]   [0.00] 
< 0.76>  <-1.27>  < 0.00> 
     
 19.00    12.00    31.00  
(22.77)  ( 8.23)  (31.00) 
 [0.62]   [1.72]   [0.00] 
<-0.79>  < 1.31>  < 0.00> 
     
 47.00    17.00    64.00  
(47.00)  (17.00)  (64.00) 
 [0.00]   [0.00]   [0.00] 
< 0.00>  < 0.00>  < 0.00> 
     
key:
    observed
    (expected)
    [contribution to X-squared]
    <Pearson residual>

xq_test_object %>%
  broom::tidy() %>%
  select(statistic) %>%
  as.numeric() -> X_squared_observed
X_squared_observed

[1] 4.547698

Let us also evaluate the critical value for the Chi-Square distribution, with alpha = 0.05 and df = (nrows-1)*(ncols-1) = 4:

# Determine the Chi-Square critical value
X_squared_critical <- qchisq(
  p = .05,
  df = 4, # (nrows-1) * (ncols-1)
  lower.tail = FALSE
)
X_squared_critical

[1] 9.487729

We see that our observed chi squared value = 4.547698, and the critical value = 9.487729, which is larger. The p-value is 0.3369, which is greater than 0.05, indicating that the NULL hypothesis cannot be rejected. Therefore, we do not have enough evidence to conclude that the proportions of students who find their dads weird differ by college.