Fertility

Studying Fertility Decisions

EXPERIMENT OBJECTIVE:

The aim of this experiment could be to understand the characteristics of women of different ethnicity, especially Hispanic and African American women, in order to study their fertility decisions.

1. Setting up R Packages

# SETUP CHUNK- LIBRARIES
#| label: setup
#| echo: false
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(mosaic) # Our all-in-one package

Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum

library(skimr) # Looking at data


Attaching package: 'skimr'

The following object is masked from 'package:mosaic':

    n_missing

library(janitor) # Clean the data


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

library(naniar) # Handle missing data


Attaching package: 'naniar'

The following object is masked from 'package:skimr':

    n_complete

library(visdat) # Visualise missing data
library(tinytable) # Printing Static Tables for our data


Attaching package: 'tinytable'

The following object is masked from 'package:ggplot2':

    theme_void

library(DT) # Interactive Tables for our data
library(crosstable) # Multiple variable summaries


Attaching package: 'crosstable'

The following object is masked from 'package:purrr':

    compact

2. Read Data

fertility_modified <- fertility <- readr::read_csv("../data/Fertility.csv")%>%
  # Clean variable names
  janitor::clean_names(case="snake")

Rows: 254654 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): morekids, gender1, gender2, afam, hispanic, other
dbl (3): rownames, age, work

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

fertility_modified

# A tibble: 254,654 × 9
   rownames morekids gender1 gender2   age afam  hispanic other  work
      <dbl> <chr>    <chr>   <chr>   <dbl> <chr> <chr>    <chr> <dbl>
 1        1 no       male    female     27 no    no       no        0
 2        2 no       female  male       30 no    no       no       30
 3        3 no       male    female     27 no    no       no        0
 4        4 no       male    female     35 yes   no       no        0
 5        5 no       female  female     30 no    no       no       22
 6        6 no       male    female     26 no    no       no       40
 7        7 no       female  male       29 no    no       no        0
 8        8 no       male    male       33 no    no       no       52
 9        9 no       female  male       29 no    no       no        0
10       10 no       male    female     27 no    no       no        0
# ℹ 254,644 more rows

3. Examine Data

dplyr::glimpse(fertility_modified)

Rows: 254,654
Columns: 9
$ rownames <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ morekids <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "…
$ gender1  <chr> "male", "female", "male", "male", "female", "male", "female",…
$ gender2  <chr> "female", "male", "female", "female", "female", "female", "ma…
$ age      <dbl> 27, 30, 27, 35, 30, 26, 29, 33, 29, 27, 28, 28, 35, 34, 32, 2…
$ afam     <chr> "no", "no", "no", "yes", "no", "no", "no", "no", "no", "no", …
$ hispanic <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "…
$ other    <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "…
$ work     <dbl> 0, 30, 0, 0, 22, 40, 0, 52, 0, 0, 0, 52, 52, 52, 8, 7, 0, 40,…

skimr::skim(fertility_modified)

Data summary
Name	fertility_modified
Number of rows	254654
Number of columns	9
_______________________
Column type frequency:
character	6
numeric	3
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
morekids	1	2	3	2
gender1	1	4	6	2
gender2	1	4	6	2
afam	1	2	3	2
hispanic	1	2	3	2
other	1	2	3	2

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
rownames	1	127327.50	73512.42	1	63664.25	127327.5	190990.8	254654	▇▇▇▇▇
age	1	30.39	3.39	21	28.00	31.0	33.0	35	▁▃▅▇▇
work	1	19.02	21.87	0	0.00	5.0	44.0	52	▇▁▁▁▃

crosstable(work + age ~ hispanic + afam + other,
           data = fertility_modified
) %>%
    crosstable::as_flextable()

other		no				yes
afam		no		yes		no		yes
hispanic		no	yes	no	yes	no	yes	no	yes
work	Min / Max	0 / 52.0	0 / 52.0	0 / 52.0	0 / 52.0	0 / 52.0	0 / 52.0	NA / NA	NA / NA
	Med [IQR]	3.0 [0;41.0]	0 [0;42.0]	36.0 [0;52.0]	12.0 [0;50.2]	10.0 [0;50.0]	4.0 [0;40.0]	NA [NA;NA]	NA [NA;NA]
	Mean (std)	18.4 (21.7)	18.1 (21.7)	29.2 (22.4)	22.4 (23.2)	21.4 (22.8)	18.3 (21.5)	NA (NA)	NA (NA)
	N (NA)	216033 (0)	11117 (0)	12960 (0)	196 (0)	6764 (0)	7584 (0)	0 (0)	0 (0)
age	Min / Max	21.0 / 35.0	21.0 / 35.0	21.0 / 35.0	22.0 / 35.0	21.0 / 35.0	21.0 / 35.0	NA / NA	NA / NA
	Med [IQR]	31.0 [28.0;33.0]	30.0 [27.0;33.0]	30.0 [27.0;33.0]	30.0 [27.0;33.0]	31.0 [29.0;33.0]	30.0 [26.0;32.0]	NA [NA;NA]	NA [NA;NA]
	Mean (std)	30.5 (3.3)	29.9 (3.6)	29.9 (3.5)	30.0 (3.6)	30.7 (3.3)	29.2 (3.7)	NA (NA)	NA (NA)
	N (NA)	216033 (0)	11117 (0)	12960 (0)	196 (0)	6764 (0)	7584 (0)	0 (0)	0 (0)

4. Data Dictionary

Quantitative Data

rownames(dbl): Number of observations
age(dbl): Age of the mother (years)
work(dbl): Number of hours worked in a week

Qualitative Data

morekids(fct): Does the mother want more kids? (yes or no)
gender1(fct): Gender of the first born child
gender2(fct): Gender of the second born child
afam(fct): If the mother is African American (yes or no)
hispanic(fct): If the mother is Hispanic (yes or no)
other(fct): If the mother is neither African American nor Hispanic (yes or no)

fertility_modified %>%
  dplyr::count(morekids,gender1,gender2,afam,hispanic,other) %>%
  tt()

morekids	gender1	gender2	afam	hispanic	other	n
no	female	female	no	no	no	30506
no	female	female	no	no	yes	846
no	female	female	no	yes	no	1252
no	female	female	no	yes	yes	812
no	female	female	yes	no	no	1619
no	female	female	yes	yes	no	22
no	female	male	no	no	no	35651
no	female	male	no	no	yes	1110
no	female	male	no	yes	no	1451
no	female	male	no	yes	yes	940
no	female	male	yes	no	no	1812
no	female	male	yes	yes	no	23
no	male	female	no	no	no	35926
no	male	female	no	no	yes	1104
no	male	female	no	yes	no	1480
no	male	female	no	yes	yes	894
no	male	female	yes	no	no	1866
no	male	female	yes	yes	no	34
no	male	male	no	no	no	35261
no	male	male	no	no	yes	1123
no	male	male	no	yes	no	1379
no	male	male	no	yes	yes	876
no	male	male	yes	no	no	1730
no	male	male	yes	yes	no	25
yes	female	female	no	no	no	21132
yes	female	female	no	no	yes	726
yes	female	female	no	yes	no	1466
yes	female	female	no	yes	yes	1018
yes	female	female	yes	no	no	1522
yes	female	female	yes	yes	no	25
yes	female	male	no	no	no	17564
yes	female	male	no	no	yes	566
yes	female	male	no	yes	no	1257
yes	female	male	no	yes	yes	923
yes	female	male	yes	no	no	1411
yes	female	male	yes	yes	no	16
yes	male	female	no	no	no	17539
yes	male	female	no	no	yes	609
yes	male	female	no	yes	no	1326
yes	male	female	no	yes	yes	986
yes	male	female	yes	no	no	1401
yes	male	female	yes	yes	no	20
yes	male	male	no	no	no	22454
yes	male	male	no	no	yes	680
yes	male	male	no	yes	no	1506
yes	male	male	no	yes	yes	1135
yes	male	male	yes	no	no	1599
yes	male	male	yes	yes	no	31

fertility_modified %>%
  dplyr::summarise(across(
    .cols = c(age,work), # select columns

    .fns = list(
      mean = ~ mean(., na.rm = T),
      sd = sd,
      min = min, max = max
    )
  ))

# A tibble: 1 × 8
  age_mean age_sd age_min age_max work_mean work_sd work_min work_max
     <dbl>  <dbl>   <dbl>   <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
1     30.4   3.39      21      35      19.0    21.9        0       52

fertility_modified <- fertility %>% tidyr::drop_na()
fertility_modified

# A tibble: 254,654 × 9
   rownames morekids gender1 gender2   age afam  hispanic other  work
      <dbl> <chr>    <chr>   <chr>   <dbl> <chr> <chr>    <chr> <dbl>
 1        1 no       male    female     27 no    no       no        0
 2        2 no       female  male       30 no    no       no       30
 3        3 no       male    female     27 no    no       no        0
 4        4 no       male    female     35 yes   no       no        0
 5        5 no       female  female     30 no    no       no       22
 6        6 no       male    female     26 no    no       no       40
 7        7 no       female  male       29 no    no       no        0
 8        8 no       male    male       33 no    no       no       52
 9        9 no       female  male       29 no    no       no        0
10       10 no       male    female     27 no    no       no        0
# ℹ 254,644 more rows

fertility_modified <- fertility %>%
  dplyr::mutate(across(where(is.character), as.factor)) %>% 
  # arrange the Qual variables first, Quant next
  dplyr::relocate(where(is.factor), .after = rownames)

glimpse(fertility_modified)

Rows: 254,654
Columns: 9
$ rownames <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ morekids <fct> no, no, no, no, no, no, no, no, no, no, yes, no, no, no, no, …
$ gender1  <fct> male, female, male, male, female, male, female, male, female,…
$ gender2  <fct> female, male, female, female, female, female, male, male, mal…
$ afam     <fct> no, no, no, yes, no, no, no, no, no, no, no, no, no, no, no, …
$ hispanic <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, n…
$ other    <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, n…
$ age      <dbl> 27, 30, 27, 35, 30, 26, 29, 33, 29, 27, 28, 28, 35, 34, 32, 2…
$ work     <dbl> 0, 30, 0, 0, 22, 40, 0, 52, 0, 0, 0, 52, 52, 52, 8, 7, 0, 40,…

fertility_modified2 <- fertility_modified %>%
  stats::setNames(c("Row ID","Want More Kids","Gender of First Born","Gender of Second Born","African American","Hispanic","Other","Age","Work"))

fertility_modified2 %>%
  dplyr::slice_sample(n = 500) %>%
  DT::datatable(
    style = "default",
    caption = htmltools::tags$caption(
      style = "caption-side: top; text-align: left; color: black; font-size: 100%;", "Fertility Dataset (Clean)"
    ),
    options = list(pageLength = 10, autoWidth = TRUE)
  ) %>%
  DT::formatStyle(
    columns = names(fertility_modified2),
    fontFamily = "Roboto Condensed",
    fontSize = "12px",
  )

5. Graphs

1. Which Ethnic Groups are More Likely to Want More Children?

fertility_modified  %>%
  dplyr::slice_sample(n = 500) %>%
  dplyr::count(morekids,afam,hispanic,other) %>%
  tt()

morekids	afam	hispanic	other	n
no	no	no	no	270
no	no	no	yes	5
no	no	yes	no	12
no	no	yes	yes	10
no	yes	no	no	16
yes	no	no	no	148
yes	no	no	yes	15
yes	no	yes	no	8
yes	no	yes	yes	10
yes	yes	no	no	5
yes	yes	yes	no	1

fertility_chunk <- tibble::tribble(
  ~morekids, ~afam, ~hispanic, ~other, ~n,
  "no",  "no",  "no",  "no",  270,
  "no",  "no",  "no",  "yes", 6,
  "no",  "no",  "yes", "no", 12,
  "no",  "no",  "yes", "yes", 7,
  "no",  "yes", "no",  "no", 18,
  "yes", "no",  "no",  "no", 155,
  "yes", "no",  "no",  "yes", 4,
  "yes", "no",  "yes", "no", 5,
  "yes", "no",  "yes", "yes", 8,
  "yes", "yes", "no", "no", 15
)

fertility_chunk_modified <- fertility_chunk %>%
  filter((afam == "yes" & hispanic == "no" & other == "no") |
         (afam == "no" & hispanic == "yes" & other == "no") |
         (afam == "no" & hispanic == "no" & other == "yes"))

fertility_chunk_plot <- fertility_chunk_modified %>%
  mutate(ethnicity = case_when(
    afam == "yes" ~ "afam",
    hispanic == "yes" ~ "hispanic",
    other == "yes" ~ "other"
  )) %>%
  group_by(ethnicity) %>%
  summarise(count = sum(n))

gf_col(count ~ ethnicity, data = fertility_chunk_plot, fill = ~ ethnicity) %>%
  gf_labs(title = "Which Ethnic Groups are More Likely to Want More Children?",
          x = "Ethnicity",
          y = "Number of Mothers")%>%
  gf_refine(scale_fill_brewer(name = "Legend = Ethnic Group",palette = "Pastel1"))

2. Does a Mother’s Age Relate to How Many Hours She Works, and Does Wanting More Children Make a Difference?

fertility_modified3 <- fertility_modified %>%
  group_by(age, morekids) %>%
  summarize(avg_work = mean(work, na.rm = TRUE))

`summarise()` has grouped output by 'age'. You can override using the `.groups`
argument.

gf_col(avg_work ~ age, fill = ~morekids, data = fertility_modified3, position = "dodge") %>%
  gf_labs(title = "Average Work Hours vs Age by Desire for More Kids",
          x = "Age of Mother",
          y = "Average Hours Worked per Week") %>%
  gf_refine(scale_fill_brewer(name = "Legend = Want More Kids", palette = "Paired"))

3. Are Mothers Who Want More Kids Generally Younger or Older Than Those Who Don’t?

fertility_chunk_modified2 <- fertility_modified %>%
  gf_boxplot(age ~ morekids,fill = ~morekids) %>%
  gf_refine(scale_fill_brewer(name = "Legend = Desire For More Kids", palette = "PiYG")) %>%
  gf_labs(
    title = "Are Mothers Who Want More Kids Younger or Older?",
    x = "Desire For More Kids", y = "Age of Mother"
  )
fertility_chunk_modified2

4. How Does The Distribution Of Weekly Work Hours Vary Based On The Gender Of The First-Born Child?

gf_histogram(~ work | gender1, data = fertility_modified, binwidth = 5, fill = "#c885da") %>%
  gf_labs(title = "Work Hours Distribution by Gender of First Born",
          x = "Hours Worked per Week",
          y = "Count")

6. Summary of Findings

From the data and the graphs plotted, it is evident that

African American women are most likely to want more children, compared to women of other ethnicity.
Younger mothers work for fewer hours than older women, perhaps because they have just started working, or they are prioritizing family.
Women who work for longer hours in a week tend not to want more children.
In the box plot, we see that the IQR for the ‘no’ plot has wider variation as compared to the ‘yes’, suggesting that women of a wide range of ages do not prefer having children.
Most mothers either do not work at all, or work around 50 hours per week.
Whether the first born child is male or female, it does not seem to effect the working hours of the mother.

7. Surprising Aspects

According to the U.S. Census and CDC fertility data, Hispanic families are usually bigger than African American families, both in terms of fertility rates and household size. So, I was surprised to see that African American mothers desire having more kids than Hispanic mothers.
I was not expecting such a gradual rise for the 2nd chart. I thought that older women who work over 15 hours a week would prefer not having more children.
There are very few young mothers, under the age of 24 who want more children. It is evident from the outliers in the box plot.
Only few mothers are into part-time jobs. I assumed that most of the mothers would prefer part-time over full-time jobs.