Cuisines

code
Author

Nikhita Purohit, Ihina Purohit, Rishi Joseph

Published

October 20, 2025

Cuisines

Studying Different Cuisines

EXPERIMENT OBJECTIVE : To examine how nutritional content, preparation time, and user ratings vary across cuisines and countries of origin.

1. Setting up R Packages

# SETUP CHUNK- LIBRARIES
#| label: setup
#| echo: false
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(mosaic) # Our all-in-one package
Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum
library(skimr) # Looking at data

Attaching package: 'skimr'

The following object is masked from 'package:mosaic':

    n_missing
library(janitor) # Clean the data

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(naniar) # Handle missing data

Attaching package: 'naniar'

The following object is masked from 'package:skimr':

    n_complete
library(visdat) # Visualize missing data
library(tinytable) # Printing Static Tables for our data

Attaching package: 'tinytable'

The following object is masked from 'package:ggplot2':

    theme_void
library(DT) # Interactive Tables for our data
library(crosstable) # Multiple variable summaries

Attaching package: 'crosstable'

The following object is masked from 'package:purrr':

    compact
library(tastyR)
library(dplyr)
library(ggformula)

2. Reading and Cleaning Data

data("cuisines", package = "tastyR")
cuisines_modified <- cuisines %>% 
  janitor::clean_names(case="snake")
cuisines_modified
# A tibble: 2,218 × 17
   name     country url   author date_published ingredients calories   fat carbs
   <chr>    <chr>   <chr> <chr>  <date>         <chr>          <dbl> <dbl> <dbl>
 1 Saganak… Greek   http… John … 2024-02-07     1 (4 ounce…      391    25    15
 2 Coney I… Jewish  http… John … 2024-11-26     2 ¾ cups a…      301    17    31
 3 Diana's… Austra… http… CHIPP… 2022-07-14     1 ½ cups w…       64     3     9
 4 Chilean… Chilean http… Heidi  2025-01-31     ½ cup chop…      106     9     7
 5 Tex-Mex… Tex-Mex http… Ann    2025-02-18     2 cups all…      449    23    58
 6 Newfoun… Canadi… http… MomWh… 2022-08-12     1 (3 pound…      958    24   144
 7 Pasta e… Italian http… Buckw… 2023-12-12     1 cup dry …      378    10    59
 8 Danish … Danish  http… TheOt… 2020-06-19     4 cups all…       90     5    10
 9 Lemon P… Amish … http… Laura… 2025-01-21     2 cups all…      157     6    25
10 Pan con… Spanish http… Luis … 2025-06-02     1 large to…      322    16    39
# ℹ 2,208 more rows
# ℹ 8 more variables: protein <dbl>, avg_rating <dbl>, total_ratings <dbl>,
#   reviews <dbl>, prep_time <dbl>, cook_time <dbl>, total_time <dbl>,
#   servings <dbl>

3. Examining Data

dplyr::glimpse(cuisines_modified)
Rows: 2,218
Columns: 17
$ name           <chr> "Saganaki (Flaming Greek Cheese)", "Coney Island Knishe…
$ country        <chr> "Greek", "Jewish", "Australian and New Zealander", "Chi…
$ url            <chr> "https://www.allrecipes.com/recipe/263750/flaming-greek…
$ author         <chr> "John Mitzewich", "John Mitzewich", "CHIPPENDALE", "Hei…
$ date_published <date> 2024-02-07, 2024-11-26, 2022-07-14, 2025-01-31, 2025-0…
$ ingredients    <chr> "1 (4 ounce) package kasseri cheese, 1 tablespoon water…
$ calories       <dbl> 391, 301, 64, 106, 449, 958, 378, 90, 157, 322, 4, NA, …
$ fat            <dbl> 25, 17, 3, 9, 23, 24, 10, 5, 6, 16, 0, NA, 21, 2, 66, 8…
$ carbs          <dbl> 15, 31, 9, 7, 58, 144, 59, 10, 25, 39, 1, NA, 16, 63, 7…
$ protein        <dbl> 16, 7, 1, 1, 7, 46, 14, 1, 2, 7, 0, NA, 28, 6, 54, 17, …
$ avg_rating     <dbl> 4.8, 4.6, 4.3, 5.0, 3.8, 4.4, 4.3, NA, 4.6, 5.0, 4.7, 4…
$ total_ratings  <dbl> 25, 10, 126, 1, 13, 40, 3, NA, 65, 2, 182, 2, 19, 16, 9…
$ reviews        <dbl> 22, 9, 104, 1, 11, 32, 3, NA, 55, 2, 138, 2, 15, 16, 84…
$ prep_time      <dbl> 10, 30, 20, 10, 30, 30, 30, 40, 0, 5, 5, 5, 10, 10, 20,…
$ cook_time      <dbl> 5, 75, 15, 0, 15, 165, 75, 30, 0, 5, 0, 25, 10, 50, 16,…
$ total_time     <dbl> 15, 180, 180, 10, 45, 675, 585, 155, 0, 10, 5, 30, 50, …
$ servings       <dbl> 2, 16, 12, 6, 15, 6, 6, 84, 24, 1, 21, 8, 4, 10, 4, 8, …
skimr::skim(cuisines_modified)
Data summary
Name cuisines_modified
Number of rows 2218
Number of columns 17
_______________________
Column type frequency:
character 5
Date 1
numeric 11
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
name 0 1 4 87 0 2216 0
country 0 1 4 28 0 49 0
url 0 1 45 120 0 2218 0
author 0 1 1 35 0 1635 0
ingredients 1 1 29 1109 0 2217 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date_published 0 1 2009-02-09 2025-07-29 2024-07-14 751

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
calories 32 0.99 358.41 240.04 3 190.0 319.5 477.0 2266 ▇▃▁▁▁
fat 55 0.98 18.76 16.96 0 7.0 15.0 26.0 225 ▇▁▁▁▁
carbs 35 0.98 31.96 26.06 1 13.0 26.0 45.0 264 ▇▂▁▁▁
protein 39 0.98 16.61 16.30 0 4.0 11.0 25.0 159 ▇▁▁▁▁
avg_rating 97 0.96 4.51 0.40 1 4.3 4.6 4.8 5 ▁▁▁▂▇
total_ratings 97 0.96 85.25 148.24 1 6.0 24.0 87.0 997 ▇▁▁▁▁
reviews 108 0.95 76.93 142.07 1 6.0 21.0 74.0 975 ▇▁▁▁▁
prep_time 0 1.00 21.50 60.72 0 10.0 15.0 25.0 1800 ▇▁▁▁▁
cook_time 0 1.00 41.75 63.18 0 10.0 25.0 45.0 600 ▇▁▁▁▁
total_time 0 1.00 170.98 641.73 0 35.0 60.0 120.0 14440 ▇▁▁▁▁
servings 2 1.00 10.48 13.42 1 4.0 8.0 12.0 240 ▇▁▁▁▁
names(cuisines_modified)
 [1] "name"           "country"        "url"            "author"        
 [5] "date_published" "ingredients"    "calories"       "fat"           
 [9] "carbs"          "protein"        "avg_rating"     "total_ratings" 
[13] "reviews"        "prep_time"      "cook_time"      "total_time"    
[17] "servings"      
cuisines %>%
  dplyr::summarise(across(
    .cols = c(calories, fat, carbs, protein, avg_rating, total_ratings, reviews, prep_time, cook_time, total_time, servings), # select columns

    .fns = list(
      mean = ~ mean(., na.rm = T),
      sd = sd,
      min = min, max = max
    )
  )) %>% 
  tt()
calories_mean calories_sd calories_min calories_max fat_mean fat_sd fat_min fat_max carbs_mean carbs_sd carbs_min carbs_max protein_mean protein_sd protein_min protein_max avg_rating_mean avg_rating_sd avg_rating_min avg_rating_max total_ratings_mean total_ratings_sd total_ratings_min total_ratings_max reviews_mean reviews_sd reviews_min reviews_max prep_time_mean prep_time_sd prep_time_min prep_time_max cook_time_mean cook_time_sd cook_time_min cook_time_max total_time_mean total_time_sd total_time_min total_time_max servings_mean servings_sd servings_min servings_max
358.4131 NA NA NA 18.75589 NA NA NA 31.95511 NA NA NA 16.60578 NA NA NA 4.509335 NA NA NA 85.24517 NA NA NA 76.93318 NA NA NA 21.49775 60.7219 0 1800 41.75023 63.18412 0 600 170.9847 641.7349 0 14440 10.47653 NA NA NA
visdat::vis_dat(cuisines, sort_type = TRUE, palette = "cb_safe")

cuisines_modified <- cuisines %>% tidyr::drop_na()
cuisines_modified
# A tibble: 2,058 × 17
   name     country url   author date_published ingredients calories   fat carbs
   <chr>    <chr>   <chr> <chr>  <date>         <chr>          <dbl> <dbl> <dbl>
 1 Saganak… Greek   http… John … 2024-02-07     1 (4 ounce…      391    25    15
 2 Coney I… Jewish  http… John … 2024-11-26     2 ¾ cups a…      301    17    31
 3 Diana's… Austra… http… CHIPP… 2022-07-14     1 ½ cups w…       64     3     9
 4 Chilean… Chilean http… Heidi  2025-01-31     ½ cup chop…      106     9     7
 5 Tex-Mex… Tex-Mex http… Ann    2025-02-18     2 cups all…      449    23    58
 6 Newfoun… Canadi… http… MomWh… 2022-08-12     1 (3 pound…      958    24   144
 7 Pasta e… Italian http… Buckw… 2023-12-12     1 cup dry …      378    10    59
 8 Lemon P… Amish … http… Laura… 2025-01-21     2 cups all…      157     6    25
 9 Pan con… Spanish http… Luis … 2025-06-02     1 large to…      322    16    39
10 Traci's… Filipi… http… Traci… 2022-08-28     2 tablespo…        4     0     1
# ℹ 2,048 more rows
# ℹ 8 more variables: protein <dbl>, avg_rating <dbl>, total_ratings <dbl>,
#   reviews <dbl>, prep_time <dbl>, cook_time <dbl>, total_time <dbl>,
#   servings <dbl>
visdat::vis_dat(cuisines_modified, sort_type = TRUE, palette = "cb_safe")

cuisines_modified <- cuisines %>%
  dplyr::mutate(across(where(is.character), as.factor)) %>% 
  relocate(where(is.factor))
glimpse(cuisines_modified)
Rows: 2,218
Columns: 17
$ name           <fct> "Saganaki (Flaming Greek Cheese)", "Coney Island Knishe…
$ country        <fct> Greek, Jewish, Australian and New Zealander, Chilean, T…
$ url            <fct> https://www.allrecipes.com/recipe/263750/flaming-greek-…
$ author         <fct> "John Mitzewich", "John Mitzewich", "CHIPPENDALE", "Hei…
$ ingredients    <fct> "1 (4 ounce) package kasseri cheese, 1 tablespoon water…
$ date_published <date> 2024-02-07, 2024-11-26, 2022-07-14, 2025-01-31, 2025-0…
$ calories       <dbl> 391, 301, 64, 106, 449, 958, 378, 90, 157, 322, 4, NA, …
$ fat            <dbl> 25, 17, 3, 9, 23, 24, 10, 5, 6, 16, 0, NA, 21, 2, 66, 8…
$ carbs          <dbl> 15, 31, 9, 7, 58, 144, 59, 10, 25, 39, 1, NA, 16, 63, 7…
$ protein        <dbl> 16, 7, 1, 1, 7, 46, 14, 1, 2, 7, 0, NA, 28, 6, 54, 17, …
$ avg_rating     <dbl> 4.8, 4.6, 4.3, 5.0, 3.8, 4.4, 4.3, NA, 4.6, 5.0, 4.7, 4…
$ total_ratings  <dbl> 25, 10, 126, 1, 13, 40, 3, NA, 65, 2, 182, 2, 19, 16, 9…
$ reviews        <dbl> 22, 9, 104, 1, 11, 32, 3, NA, 55, 2, 138, 2, 15, 16, 84…
$ prep_time      <dbl> 10, 30, 20, 10, 30, 30, 30, 40, 0, 5, 5, 5, 10, 10, 20,…
$ cook_time      <dbl> 5, 75, 15, 0, 15, 165, 75, 30, 0, 5, 0, 25, 10, 50, 16,…
$ total_time     <dbl> 15, 180, 180, 10, 45, 675, 585, 155, 0, 10, 5, 30, 50, …
$ servings       <dbl> 2, 16, 12, 6, 15, 6, 6, 84, 24, 1, 21, 8, 4, 10, 4, 8, …
cuisines_modified %>%
  stats::setNames(c("Name_of_Cuisine","Country_of_Origin","URL","Author","Ingredients","Date_Published","Calories","Fat","Carbs","Protein","Average_Rating","Total_Ratings","Reviews","Preperation_Time","Cooking_Time","Total_Time","Servings"))
# A tibble: 2,218 × 17
   Name_of_Cuisine     Country_of_Origin URL   Author Ingredients Date_Published
   <fct>               <fct>             <fct> <fct>  <fct>       <date>        
 1 Saganaki (Flaming … Greek             http… John … 1 (4 ounce… 2024-02-07    
 2 Coney Island Knish… Jewish            http… John … 2 ¾ cups a… 2024-11-26    
 3 Diana's Hawaiian B… Australian and N… http… CHIPP… 1 ½ cups w… 2022-07-14    
 4 Chilean Pebre       Chilean           http… Heidi  ½ cup chop… 2025-01-31    
 5 Tex-Mex Sheet Cake  Tex-Mex           http… Ann    2 cups all… 2025-02-18    
 6 Newfoundland Jigg'… Canadian          http… MomWh… 1 (3 pound… 2022-08-12    
 7 Pasta e Ceci (Ital… Italian           http… Buckw… 1 cup dry … 2023-12-12    
 8 Danish Butter Cook… Danish            http… TheOt… 4 cups all… 2020-06-19    
 9 Lemon Poppy Seed A… Amish and Mennon… http… Laura… 2 cups all… 2025-01-21    
10 Pan con Tomate (Sp… Spanish           http… Luis … 1 large to… 2025-06-02    
# ℹ 2,208 more rows
# ℹ 11 more variables: Calories <dbl>, Fat <dbl>, Carbs <dbl>, Protein <dbl>,
#   Average_Rating <dbl>, Total_Ratings <dbl>, Reviews <dbl>,
#   Preperation_Time <dbl>, Cooking_Time <dbl>, Total_Time <dbl>,
#   Servings <dbl>
cuisines_modified %>%
  DT::datatable(
    style = "default",
    caption = htmltools::tags$caption(
      style = "caption-side: top; text-align: left; color: black; font-size: 100%;", "Cuisines Dataset (Clean)"
    ),
    options = list(pageLength = 10, autoWidth = TRUE)
  ) %>%
  DT::formatStyle(
    columns = names(cuisines_modified),
    fontFamily = "Roboto Condensed",
    fontSize = "12px",
  )
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html
crosstable(
  calories + protein + carbs + fat
  ~ country,        
  data = cuisines_modified       
) %>%
  as_flextable()

label

variable

country

Amish and Mennonite

Argentinian

Australian and New Zealander

Austrian

Bangladeshi

Belgian

Brazilian

Cajun and Creole

Canadian

Chilean

Chinese

Colombian

Cuban

Danish

Dutch

Filipino

Finnish

French

German

Greek

Indian

Indonesian

Israeli

Italian

Jamaican

Japanese

Jewish

Korean

Lebanese

Malaysian

Norwegian

Pakistani

Persian

Peruvian

Polish

Portuguese

Puerto Rican

Russian

Scandinavian

Soul Food

South African

Southern Recipes

Spanish

Swedish

Swiss

Tex-Mex

Thai

Turkish

Vietnamese

calories

Min / Max

33.0 / 924.0

70.0 / 758.0

46.0 / 707.0

49.0 / 937.0

187.0 / 671.0

249.0 / 664.0

49.0 / 932.0

3.0 / 1109.0

80.0 / 1106.0

23.0 / 579.0

43.0 / 940.0

7.0 / 2108.0

56.0 / 2010.0

19.0 / 802.0

78.0 / 622.0

4.0 / 983.0

106.0 / 891.0

4.0 / 1326.0

40.0 / 917.0

9.0 / 1213.0

6.0 / 880.0

65.0 / 716.0

66.0 / 551.0

82.0 / 1345.0

4.0 / 983.0

8.0 / 1222.0

63.0 / 1067.0

12.0 / 1096.0

7.0 / 990.0

31.0 / 1238.0

43.0 / 756.0

107.0 / 835.0

33.0 / 785.0

26.0 / 995.0

13.0 / 837.0

16.0 / 2266.0

3.0 / 1220.0

27.0 / 795.0

29.0 / 662.0

10.0 / 1793.0

101.0 / 531.0

65.0 / 1815.0

39.0 / 1564.0

69.0 / 551.0

89.0 / 902.0

7.0 / 975.0

37.0 / 1360.0

6.0 / 735.0

27.0 / 1263.0

Med [IQR]

238.0 [168.0;382.0]

266.0 [171.0;336.0]

210.0 [133.0;402.0]

426.5 [110.2;498.5]

328.5 [255.2;349.0]

354.5 [278.0;452.0]

303.0 [152.0;454.0]

465.0 [248.0;621.5]

344.0 [231.8;489.8]

285.5 [123.5;417.8]

314.0 [195.0;507.0]

127.0 [96.5;572.5]

299.5 [218.0;458.0]

214.0 [97.0;331.0]

270.0 [170.5;450.2]

337.0 [227.0;463.0]

242.5 [178.8;437.8]

325.0 [193.0;490.0]

331.0 [207.0;460.0]

299.0 [178.0;425.0]

293.0 [171.0;416.0]

463.0 [328.2;532.0]

208.0 [175.0;283.5]

456.0 [339.2;569.0]

314.5 [223.8;431.0]

237.0 [100.8;442.0]

278.0 [184.0;378.0]

294.5 [195.0;459.5]

241.0 [108.5;364.5]

436.0 [224.8;609.5]

196.0 [127.0;339.2]

395.0 [305.0;458.0]

307.0 [209.0;506.0]

389.0 [214.2;540.5]

356.0 [234.0;526.0]

387.0 [227.0;540.0]

367.0 [254.5;472.5]

293.0 [167.0;447.0]

288.0 [125.5;406.0]

372.0 [176.5;495.5]

266.0 [234.0;369.5]

443.0 [208.0;618.0]

347.0 [222.5;542.5]

285.5 [187.5;370.5]

400.0 [222.2;468.8]

392.0 [247.0;520.0]

371.5 [206.0;580.0]

322.0 [217.5;458.5]

410.0 [210.0;604.0]

Mean (std)

286.3 (172.9)

292.2 (154.3)

270.9 (171.6)

362.0 (243.0)

331.7 (121.8)

393.2 (157.2)

313.3 (191.0)

449.6 (265.8)

383.1 (220.4)

291.8 (171.7)

369.4 (220.7)

418.5 (607.0)

393.6 (325.9)

247.8 (186.2)

315.4 (172.1)

354.3 (201.8)

364.2 (251.3)

382.6 (265.5)

354.8 (205.4)

328.5 (227.5)

303.0 (178.7)

430.0 (170.9)

247.3 (131.3)

455.5 (232.4)

328.4 (207.3)

308.3 (270.4)

300.1 (170.1)

377.6 (284.0)

260.6 (191.2)

441.5 (282.1)

247.2 (169.9)

406.3 (166.7)

354.5 (222.6)

400.9 (236.7)

375.6 (208.5)

414.1 (325.8)

387.3 (227.4)

312.8 (173.4)

286.6 (175.1)

412.6 (335.6)

290.8 (109.3)

487.9 (373.8)

415.2 (275.2)

278.7 (127.1)

386.5 (238.9)

414.3 (232.4)

409.5 (257.1)

340.1 (184.7)

418.5 (250.2)

N (NA)

61 (0)

29 (1)

65 (0)

22 (0)

12 (0)

6 (0)

65 (2)

63 (0)

66 (1)

18 (4)

65 (0)

11 (0)

64 (1)

33 (0)

22 (0)

65 (1)

18 (0)

65 (0)

61 (1)

61 (1)

65 (0)

24 (0)

23 (0)

62 (2)

42 (1)

62 (1)

61 (0)

56 (0)

51 (0)

24 (0)

26 (0)

25 (0)

43 (2)

38 (0)

61 (0)

53 (0)

59 (1)

65 (0)

35 (3)

63 (0)

19 (0)

49 (1)

55 (6)

30 (1)

10 (0)

53 (2)

62 (0)

36 (0)

62 (0)

protein

Min / Max

0 / 69.0

0 / 44.0

0 / 27.0

1.0 / 32.0

1.0 / 45.0

4.0 / 56.0

1.0 / 45.0

0 / 91.0

0 / 58.0

1.0 / 54.0

1.0 / 60.0

0 / 27.0

0 / 92.0

0 / 35.0

1.0 / 23.0

0 / 66.0

1.0 / 50.0

0 / 77.0

1.0 / 71.0

0 / 65.0

0 / 59.0

1.0 / 42.0

1.0 / 36.0

0 / 91.0

0 / 70.0

0 / 84.0

1.0 / 62.0

1.0 / 66.0

0 / 58.0

0 / 85.0

1.0 / 31.0

0 / 46.0

0 / 48.0

0 / 94.0

0 / 43.0

0 / 101.0

0 / 78.0

0 / 40.0

0 / 71.0

1.0 / 73.0

3.0 / 37.0

0 / 159.0

0 / 95.0

1.0 / 30.0

2.0 / 52.0

0 / 56.0

0 / 53.0

0 / 54.0

1.0 / 72.0

Med [IQR]

4.0 [2.0;9.0]

5.0 [1.0;13.0]

4.0 [2.0;5.0]

7.0 [1.2;11.0]

21.5 [12.0;27.2]

7.5 [5.0;19.8]

7.0 [3.0;16.0]

25.0 [15.0;35.0]

10.5 [4.0;25.0]

9.0 [2.2;15.8]

22.0 [10.0;30.0]

6.0 [1.5;11.5]

21.0 [7.8;31.5]

5.0 [2.0;9.0]

4.0 [2.0;6.0]

16.0 [5.0;28.0]

6.0 [4.2;12.5]

8.0 [4.0;30.0]

10.0 [4.0;25.0]

12.0 [4.0;24.0]

12.0 [5.0;25.0]

25.0 [18.0;36.5]

7.0 [2.5;13.0]

21.0 [9.0;38.8]

22.0 [6.0;29.0]

8.0 [3.0;22.0]

7.0 [4.0;14.0]

16.5 [5.0;26.5]

5.5 [2.0;15.8]

21.5 [13.0;31.8]

4.0 [2.0;10.2]

20.0 [10.0;26.0]

12.0 [4.0;26.5]

21.0 [9.8;25.8]

14.0 [7.0;20.0]

18.0 [7.0;28.2]

11.0 [4.5;25.0]

9.0 [5.0;19.0]

6.0 [1.0;13.5]

17.0 [6.0;33.0]

7.0 [4.5;13.5]

11.5 [4.0;29.0]

14.0 [5.5;28.0]

6.0 [3.0;10.0]

10.5 [5.2;18.2]

23.0 [12.0;33.0]

17.0 [7.0;27.0]

11.0 [6.0;24.2]

21.0 [12.0;31.5]

Mean (std)

8.9 (11.8)

8.8 (10.6)

4.6 (5.0)

9.0 (8.9)

21.7 (13.1)

17.2 (20.3)

11.6 (11.7)

25.5 (19.0)

15.3 (13.4)

12.3 (13.0)

21.6 (14.5)

8.2 (8.3)

24.1 (21.8)

7.6 (8.7)

7.2 (7.7)

19.3 (16.3)

11.9 (13.1)

17.0 (18.3)

16.7 (16.6)

15.9 (14.9)

15.8 (13.0)

24.7 (13.1)

9.5 (9.7)

25.4 (20.0)

19.7 (15.6)

14.0 (16.1)

11.7 (13.3)

18.3 (14.4)

11.6 (13.5)

26.5 (20.7)

7.4 (7.8)

19.5 (11.8)

16.4 (13.5)

20.3 (16.5)

14.7 (10.1)

22.1 (20.9)

15.7 (15.6)

12.8 (10.5)

10.7 (14.5)

21.3 (17.3)

10.7 (10.3)

23.2 (31.9)

19.9 (19.9)

7.9 (7.4)

15.3 (15.5)

23.2 (14.0)

19.0 (14.1)

15.8 (14.0)

23.7 (17.3)

N (NA)

61 (0)

29 (1)

64 (1)

22 (0)

12 (0)

6 (0)

65 (2)

63 (0)

66 (1)

18 (4)

65 (0)

11 (0)

64 (1)

33 (0)

22 (0)

65 (1)

18 (0)

65 (0)

61 (1)

61 (1)

65 (0)

23 (1)

23 (0)

62 (2)

42 (1)

61 (2)

61 (0)

56 (0)

50 (1)

24 (0)

26 (0)

25 (0)

43 (2)

38 (0)

61 (0)

52 (1)

59 (1)

65 (0)

35 (3)

63 (0)

19 (0)

48 (2)

55 (6)

30 (1)

10 (0)

53 (2)

61 (1)

36 (0)

62 (0)

carbs

Min / Max

1.0 / 161.0

1.0 / 76.0

4.0 / 93.0

7.0 / 156.0

6.0 / 56.0

7.0 / 52.0

2.0 / 178.0

1.0 / 79.0

3.0 / 181.0

3.0 / 54.0

1.0 / 101.0

2.0 / 109.0

2.0 / 264.0

2.0 / 114.0

7.0 / 100.0

1.0 / 90.0

15.0 / 83.0

1.0 / 153.0

1.0 / 93.0

1.0 / 73.0

1.0 / 80.0

4.0 / 94.0

5.0 / 60.0

2.0 / 108.0

1.0 / 115.0

1.0 / 159.0

2.0 / 98.0

2.0 / 121.0

1.0 / 77.0

2.0 / 77.0

3.0 / 133.0

6.0 / 105.0

1.0 / 103.0

2.0 / 165.0

1.0 / 98.0

3.0 / 109.0

1.0 / 175.0

2.0 / 85.0

3.0 / 83.0

1.0 / 132.0

13.0 / 60.0

2.0 / 203.0

3.0 / 161.0

1.0 / 93.0

13.0 / 69.0

2.0 / 111.0

4.0 / 144.0

1.0 / 100.0

3.0 / 128.0

Med [IQR]

34.0 [18.0;48.0]

25.0 [3.0;34.0]

32.0 [18.0;46.0]

42.5 [14.0;52.5]

27.5 [11.2;35.2]

34.0 [26.5;46.0]

28.0 [11.8;53.2]

16.0 [5.0;42.0]

33.5 [19.0;51.0]

23.5 [12.0;37.5]

25.0 [11.0;50.0]

19.0 [13.0;39.0]

23.0 [10.0;40.2]

31.0 [10.0;37.0]

27.0 [21.0;61.8]

23.0 [11.0;38.0]

32.5 [24.2;44.5]

21.0 [9.0;35.0]

28.5 [16.0;40.2]

16.0 [4.0;33.0]

19.0 [13.0;28.0]

20.5 [14.8;47.0]

20.0 [10.5;32.5]

29.0 [14.5;49.0]

18.5 [11.5;42.0]

25.0 [8.2;41.0]

30.0 [18.0;39.0]

26.0 [12.8;45.5]

26.0 [7.0;39.0]

19.0 [6.8;40.5]

25.5 [17.2;38.8]

27.0 [11.0;50.0]

28.0 [8.0;51.0]

36.5 [14.5;50.2]

32.0 [15.0;45.0]

28.0 [12.0;45.0]

36.0 [21.5;55.0]

25.0 [15.0;36.0]

20.0 [11.2;39.2]

22.0 [13.0;36.0]

27.0 [16.5;45.0]

33.0 [17.0;64.0]

36.0 [15.0;47.0]

23.0 [17.0;35.2]

48.0 [18.5;56.8]

31.0 [13.0;47.0]

20.5 [14.2;58.8]

28.5 [16.5;48.0]

23.5 [12.5;60.2]

Mean (std)

36.4 (25.6)

24.6 (20.4)

36.4 (22.5)

44.2 (34.7)

25.6 (15.9)

33.5 (16.6)

36.1 (30.4)

24.2 (22.3)

39.7 (31.5)

26.4 (16.5)

32.5 (25.5)

32.0 (33.3)

31.0 (36.8)

27.2 (22.1)

38.4 (25.4)

26.6 (20.1)

38.4 (21.1)

25.2 (24.0)

30.1 (19.9)

21.0 (18.0)

23.6 (17.6)

33.7 (24.9)

24.6 (16.8)

33.2 (23.0)

28.9 (28.3)

29.2 (28.7)

31.5 (19.2)

31.7 (25.7)

26.2 (20.7)

25.2 (21.8)

32.0 (27.7)

32.5 (26.3)

34.6 (30.3)

38.2 (32.3)

33.0 (22.2)

33.4 (24.9)

40.5 (30.3)

28.3 (17.7)

25.7 (18.9)

28.0 (25.5)

31.9 (15.4)

44.5 (35.9)

35.8 (27.0)

29.0 (19.8)

40.4 (21.3)

32.8 (22.6)

38.1 (33.8)

35.2 (26.7)

39.5 (35.0)

N (NA)

61 (0)

29 (1)

65 (0)

22 (0)

12 (0)

6 (0)

64 (3)

63 (0)

66 (1)

18 (4)

65 (0)

11 (0)

64 (1)

33 (0)

22 (0)

65 (1)

18 (0)

65 (0)

60 (2)

61 (1)

65 (0)

24 (0)

23 (0)

62 (2)

42 (1)

62 (1)

61 (0)

56 (0)

51 (0)

24 (0)

26 (0)

25 (0)

43 (2)

38 (0)

61 (0)

53 (0)

59 (1)

65 (0)

34 (4)

63 (0)

19 (0)

49 (1)

55 (6)

30 (1)

10 (0)

53 (2)

62 (0)

36 (0)

62 (0)

fat

Min / Max

0 / 37.0

2.0 / 56.0

0 / 43.0

2.0 / 35.0

4.0 / 39.0

8.0 / 35.0

0 / 57.0

0 / 82.0

0 / 51.0

1.0 / 46.0

1.0 / 92.0

0 / 225.0

0 / 88.0

0 / 47.0

1.0 / 37.0

0 / 75.0

0 / 62.0

0 / 86.0

0 / 62.0

0 / 115.0

0 / 82.0

6.0 / 50.0

0 / 33.0

0 / 101.0

0 / 56.0

0 / 112.0

1.0 / 49.0

0 / 86.0

0 / 65.0

1.0 / 96.0

1.0 / 36.0

0 / 68.0

0 / 52.0

0 / 70.0

0 / 69.0

0 / 190.0

0 / 100.0

0 / 55.0

0 / 59.0

0 / 110.0

3.0 / 31.0

0 / 110.0

0 / 151.0

4.0 / 37.0

3.0 / 49.0

0 / 64.0

0 / 79.0

0 / 39.0

0 / 76.0

Med [IQR]

10.0 [5.0;17.0]

15.0 [10.0;22.0]

10.0 [5.0;18.0]

14.5 [5.2;29.0]

16.5 [9.0;21.2]

21.0 [16.5;26.2]

12.0 [5.0;21.0]

24.0 [12.0;40.0]

15.5 [9.0;25.5]

14.0 [5.2;23.2]

12.0 [8.0;23.0]

5.0 [4.0;19.0]

15.0 [6.5;22.0]

8.0 [5.0;18.0]

15.0 [8.5;20.0]

15.0 [9.0;25.0]

10.0 [5.8;21.8]

19.0 [10.0;31.0]

15.0 [9.0;29.0]

17.0 [7.0;26.0]

15.0 [5.0;23.0]

22.0 [17.5;29.0]

12.0 [8.0;16.5]

20.0 [13.0;33.0]

15.0 [6.0;20.0]

10.5 [4.0;19.8]

11.0 [7.0;22.0]

14.5 [6.8;22.2]

10.0 [4.0;17.0]

26.0 [8.8;39.0]

6.5 [4.0;14.8]

18.0 [15.0;32.0]

15.0 [7.5;24.5]

15.0 [7.0;30.0]

17.0 [10.0;28.0]

18.0 [8.5;31.0]

16.5 [8.0;24.0]

12.0 [7.0;26.0]

12.5 [7.0;22.0]

18.0 [8.5;29.5]

13.0 [8.0;20.0]

20.5 [10.8;29.0]

16.0 [10.0;25.0]

11.0 [6.2;23.8]

17.5 [10.2;21.8]

17.0 [11.0;30.0]

21.0 [12.0;33.0]

16.0 [6.0;25.8]

16.5 [9.0;27.0]

Mean (std)

12.2 (8.7)

17.9 (11.9)

13.1 (10.8)

17.1 (11.7)

16.2 (9.8)

21.3 (9.4)

14.2 (11.4)

28.1 (20.0)

18.5 (12.9)

15.3 (11.8)

17.1 (14.6)

29.4 (65.7)

18.1 (18.0)

12.5 (11.3)

15.3 (8.9)

19.2 (16.2)

18.1 (18.6)

23.6 (18.4)

18.8 (14.6)

20.8 (19.1)

16.9 (14.0)

24.3 (11.9)

13.2 (8.1)

24.7 (17.6)

15.6 (13.0)

16.2 (21.4)

14.6 (10.2)

20.3 (21.2)

13.3 (11.8)

26.9 (22.5)

10.4 (8.9)

22.5 (14.8)

17.6 (12.8)

19.1 (16.1)

20.8 (15.9)

22.0 (27.2)

18.7 (15.8)

16.8 (12.7)

16.5 (14.1)

24.0 (23.4)

14.3 (8.5)

24.4 (21.5)

21.0 (22.7)

14.9 (9.8)

18.4 (13.1)

21.8 (16.3)

22.4 (15.8)

16.8 (11.2)

18.7 (14.8)

N (NA)

61 (0)

29 (1)

63 (2)

22 (0)

12 (0)

6 (0)

65 (2)

63 (0)

66 (1)

18 (4)

65 (0)

11 (0)

63 (2)

33 (0)

22 (0)

65 (1)

18 (0)

65 (0)

61 (1)

61 (1)

65 (0)

23 (1)

23 (0)

61 (3)

41 (2)

58 (5)

61 (0)

56 (0)

49 (2)

24 (0)

26 (0)

25 (0)

43 (2)

37 (1)

61 (0)

51 (2)

58 (2)

65 (0)

34 (4)

63 (0)

19 (0)

48 (2)

53 (8)

30 (1)

10 (0)

53 (2)

61 (1)

34 (2)

62 (0)

4. Data Dictionary

Quantitative Data

  1. name(fct): Name of cuisine
  2. country(fct): Country of origin
  3. url(fct): Web link to the full recipe source or page
  4. author(fct): Name or username of the recipe creator or contributor
  5. date_published(date): Date when the recipe was published or posted online
  6. ingredients(fct): List of ingredients required for the recipe

Qualitative Data

  1. calories(dbl): Approximate total calorie content per serving
  2. fat(dbl): Total fat content per serving (in grams)
  3. carbs(dbl): Total carbohydrates per serving (in grams)
  4. protein(dbl): Protein content per serving (in grams)
  5. avg_rating(dbl): Average user rating for the recipe (on a scale from 1-5)
  6. total_ratings(dbl): Total number of user ratings received
  7. reviews(dbl): Number of text reviews or written comments
  8. prep_time(dbl): Estimated preparation time before cooking (in minutes)
  9. cook_time(dbl): Estimated cooking time (in minutes)
  10. total_time(dbl): Total time required for the recipe (prep + cook) in minutes
  11. servings(dbl): Number of servings the recipe yields

5. Graphs

1. How many recipes come from each country?

cuisines_modified %>%
  gf_bar(~ reorder(country, country, function(x) length(x)),
    fill = "cyan4", position = "dodge") %>%
  gf_labs(title = "Number of Recipes From Each Country",
          x = "Country",
          y = "Number of Recipes") %>%
  gf_theme(theme(axis.text.x = element_text(angle = 45, hjust = 1)))

Inferences

Belgian has the least number of recipes, while Canada has the most. India is among the top 5 countries.

2. Does a longer cooking time generally result in higher-calorie recipes?

cuisines_modified %>%
gf_point(cook_time ~ calories, data = cuisines_modified, color = 'maroon', size = 2) %>%
  gf_smooth(color = 'black') %>% 
  gf_labs(title = "Does a longer cooking time generally result in higher-calorie recipes?",
          x = "Cooking Time",
          y = "Calories")
`geom_smooth()` using method = 'gam'
Warning: Removed 32 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 32 rows containing missing values or values outside the scale range
(`geom_point()`).

Inferences

Yes, there is a positive correlation. More the cooking time, higher the calories Most recipes that have less cooking time also have less calories, except a few outliers that have less calories even if the cooking time is large.

3. Which countries are known for having high-calorie cuisines?

cuisines_modified %>%
  group_by(country) %>%
  dplyr::summarise(avg_protein = mean(protein, na.rm = TRUE)) %>%
  arrange(desc(avg_protein)) %>%
  slice_head(n = 10) %>%
  mutate(country = factor(country, levels = country)) %>%
  gf_col(
    avg_protein ~ country,
    fill = ~ avg_protein
  ) %>%
  gf_labs(
    title = "Top 10 Countries with High Protein-Rich Cuisines",
    x = "Country",
    y = "Average Protein Content",
    fill = "Average Protein"
  ) +
  scale_fill_gradient(low = "lightpink", high = "palevioletred") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Inferences

Malaysian cuisine has the most protein-rich food. Followed by Cajun and creole, Italian, Indonesian and Cuban.

4. What is the correlation between protein and calories in the food of these 10 countries?

cuisines_modified %>%
  group_by(country) %>%
  dplyr::summarise(avg_protein = mean(protein, na.rm = TRUE)) %>%
  arrange(desc(avg_protein)) %>%
  slice_head(n = 10) %>%
  pull(country) -> top10_protein_countries

cuisines_modified %>%
  filter(country %in% top10_protein_countries) %>%
  gf_point(protein ~ calories, color = ~country) %>%
  gf_facet_wrap(~country, scales = "free") %>%
  gf_labs(
    title = "Protein vs Calories of Cuisines from Top 10 Protein-Rich Countries",
    x = "Calories",
    y = "Protein (g)",
    color = "Country"
  ) %>%
  gf_refine(
    theme_minimal(),
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
  )
Warning: Removed 9 rows containing missing values or values outside the scale range
(`geom_point()`).

Inferences

While most cuisines show that more calories generally mean more protein, there are clear differences in the maximum content and protein density between the cuisines, with Portuguese and Southern Recipes having the most protein-rich individual dishes.

5. Does total time taken to cook affect the number of people served?

cuisines_modified %>%
  gf_point(total_time ~ servings, colour = 'darkolivegreen') %>%
  gf_lm(color = 'black') %>%
  gf_labs(
    title = "Total time vs Serving",
    subtitle = "Does total time taken to cook affect the number of people served?",
    x = 'Servings',
    y = 'Total Time'
  )
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_lm()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Inferences

The graph shows the regression line going upwards, indicating a positive correlation. We can conclude that the total time taken for a recipe increases with the number of servings.

6. Does total time taken to cook affect the average rating?

cuisines_modified %>%
  gf_point(total_time ~ avg_rating, colour = 'cornflowerblue') %>%
  gf_lm(colour = 'black') %>%
  gf_labs(
    title = "Total Time vs Average Rating",
    subtitle = "Does total time taken to cook affect the average rating?",
    x = 'Average Rating',
    y = 'Total Time'
  )
Warning: Removed 97 rows containing non-finite outside the scale range
(`stat_lm()`).
Warning: Removed 97 rows containing missing values or values outside the scale range
(`geom_point()`).

Inferences

From this graph we can infer that there is not much of a correlation between total time and avg rating. In fact, a lot of recipes that take a lot of time are rated high.

7. Do the number of reviews affect the average rating?

cuisines_modified %>%
  gf_point(reviews ~ avg_rating, colour = 'brown') %>%
  gf_lm(colour = 'black') %>%
  gf_labs(
    title = "Average Rating vs Number of Reviews",
    subtitle = "Do the number of reviews affect the average rating?",
    x = 'Average Rating',
    y = 'Number of reviews'
  )
Warning: Removed 108 rows containing non-finite outside the scale range
(`stat_lm()`).
Warning: Removed 108 rows containing missing values or values outside the scale range
(`geom_point()`).

Inferences

From the graph, there seems to be a positive correlation between the reviews and the average rating. Popular recipes have a high rating.

8. Does protein rich also mean calorie rich?

cuisines_modified %>%
  gf_point(calories ~ protein, colour = 'darkorchid3') %>%
  gf_lm(colour = 'black') %>%
  gf_labs(
    title = "Calories vs Protein",
    subtitle = "Does protein rich also mean calorie rich?",
    x = 'Protein',
    y = 'Calories'
  )
Warning: Removed 39 rows containing non-finite outside the scale range
(`stat_lm()`).
Warning: Removed 39 rows containing missing values or values outside the scale range
(`geom_point()`).

Inferences

This graph tells us that there is a significant positive correlation between the calories and the protein content. Some points are much higher than the rest, suggesting certain recipes may have high calorie counts despite moderate amounts of protein (or vice versa), due to other ingredients containing significant amounts of fats/carbs or other factors.

6. Summary of Inferences

The analysis shows that the total time taken for a recipe generally increases with the number of servings, indicating a positive correlation. However, total cooking time doesn’t strongly affect the average rating - many time-consuming recipes are rated highly.

There is a clear positive correlation between the number of reviews and the average rating, suggesting that popular recipes tend to receive higher ratings. Among cuisines, Belgian has the fewest recipes, while Canada has the most, with India ranking in the top five.

A positive relationship is observed between cooking time and calories longer cooking times often correspond to higher-calorie dishes. Most quick recipes tend to have lower calorie counts, though there are outliers.

Malaysian cuisine stands out for having the most protein-rich dishes, followed by Cajun and Creole, Italian, Indonesian, and Cuban cuisines. A significant positive correlation also exists between calories and protein content, implying that protein-rich foods are often calorie-dense.