UFO Sightings

code
Author

Nikhita Purohit

Published

September 28, 2025

UFO Sightings

Studying UFO Encounters

EXPERIMENT OBJECTIVE:

This aim of this experiment could be to document UFOs and their characteristics spotted in the USA, Great Britain, Australia and Canada over the years.

1. Setting up R Packages

# SETUP CHUNK- LIBRARIES
#| label: setup
#| echo: false
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(mosaic) # Our all-in-one package
Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum
library(skimr) # Looking at data

Attaching package: 'skimr'

The following object is masked from 'package:mosaic':

    n_missing
library(janitor) # Clean the data

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(naniar) # Handle missing data

Attaching package: 'naniar'

The following object is masked from 'package:skimr':

    n_complete
library(visdat) # Visualise missing data
library(tinytable) # Printing Static Tables for our data

Attaching package: 'tinytable'

The following object is masked from 'package:ggplot2':

    theme_void
library(DT) # Interactive Tables for our data
library(crosstable) # Multiple variable summaries

Attaching package: 'crosstable'

The following object is masked from 'package:purrr':

    compact

2. Read Data

ufo_modified <- ufo <- readr::read_csv("../data/ufo_sightings.csv") %>% 
  # Clean variable names
  janitor::clean_names(case="snake")
Rows: 80332 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): date_time, city_area, state, country, ufo_shape, described_encounte...
dbl (3): encounter_length, latitude, longitude

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ufo_modified
# A tibble: 80,332 × 11
   date_time        city_area           state country ufo_shape encounter_length
   <chr>            <chr>               <chr> <chr>   <chr>                <dbl>
 1 10/10/1949 20:30 san marcos          tx    us      cylinder              2700
 2 10/10/1949 21:00 lackland afb        tx    <NA>    light                 7200
 3 10/10/1955 17:00 chester (uk/englan… <NA>  gb      circle                  20
 4 10/10/1956 21:00 edna                tx    us      circle                  20
 5 10/10/1960 20:00 kaneohe             hi    us      light                  900
 6 10/10/1961 19:00 bristol             tn    us      sphere                 300
 7 10/10/1965 21:00 penarth (uk/wales)  <NA>  gb      circle                 180
 8 10/10/1965 23:45 norwalk             ct    us      disk                  1200
 9 10/10/1966 20:00 pell city           al    us      disk                   180
10 10/10/1966 21:00 live oak            fl    us      disk                   120
# ℹ 80,322 more rows
# ℹ 5 more variables: described_encounter_length <chr>, description <chr>,
#   date_documented <chr>, latitude <dbl>, longitude <dbl>

3. Examine Data

dplyr::glimpse(ufo_modified)
Rows: 80,332
Columns: 11
$ date_time                  <chr> "10/10/1949 20:30", "10/10/1949 21:00", "10…
$ city_area                  <chr> "san marcos", "lackland afb", "chester (uk/…
$ state                      <chr> "tx", "tx", NA, "tx", "hi", "tn", NA, "ct",…
$ country                    <chr> "us", NA, "gb", "us", "us", "us", "gb", "us…
$ ufo_shape                  <chr> "cylinder", "light", "circle", "circle", "l…
$ encounter_length           <dbl> 2700, 7200, 20, 20, 900, 300, 180, 1200, 18…
$ described_encounter_length <chr> "45 minutes", "1-2 hrs", "20 seconds", "1/2…
$ description                <chr> "This event took place in early fall around…
$ date_documented            <chr> "4/27/2004", "12/16/2005", "1/21/2008", "1/…
$ latitude                   <dbl> 29.88306, 29.38421, 53.20000, 28.97833, 21.…
$ longitude                  <dbl> -97.941111, -98.581082, -2.916667, -96.6458…
skimr::skim(ufo_modified)
Data summary
Name ufo_modified
Number of rows 80332
Number of columns 11
_______________________
Column type frequency:
character 8
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
date_time 0 1.00 14 16 0 69586 0
city_area 0 1.00 1 69 0 19900 0
state 5797 0.93 2 2 0 67 0
country 9670 0.88 2 2 0 5 0
ufo_shape 1932 0.98 3 9 0 29 0
described_encounter_length 0 1.00 2 31 0 8349 0
description 15 1.00 1 246 0 79996 0
date_documented 0 1.00 8 10 0 317 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
encounter_length 3 1 9017.23 620228.37 0.00 30.00 180.00 600.00 9.7836e+07 ▇▁▁▁▁
latitude 1 1 38.12 10.47 -82.86 34.13 39.41 42.79 7.2700e+01 ▁▁▁▇▅
longitude 0 1 -86.77 39.70 -176.66 -112.07 -87.90 -78.75 1.7844e+02 ▃▇▁▁▁

4. Data Dictionary

Quantitative Data

  1. encounter_length(dbl): Duration of encounter with UFO (seconds)
  2. latitude(dbl): Latitude of UFO location
  3. longitude(dbl): Longitude of UFO location

Qualitative Data

  1. date_time(fct): Date and time when the UFO was spotted
  2. city_area(fct): Where the UFO was spotted
  3. state(fct): State code where the UFO was spotted
  4. country(fct): Country in which the UFO was spotted
  5. ufo_shape(fct): Shape of the UFO
  6. described_encounter_length(fct): Duration of encounter with UFO (seconds, minutes and hours)
  7. description(fct): Description and details regarding the encounter
  8. date_documented(fct): The date when the sighting was officially recorded
ufo_modified %>%
  dplyr::summarise(across(
    .cols = c(encounter_length,latitude,longitude), # select columns

    .fns = list(
      mean = ~ mean(., na.rm = T),
      sd = sd,
      min = min, max = max
    )
  ))
# A tibble: 1 × 12
  encounter_length_mean encounter_length_sd encounter_length_min
                  <dbl>               <dbl>                <dbl>
1                 9017.                  NA                   NA
# ℹ 9 more variables: encounter_length_max <dbl>, latitude_mean <dbl>,
#   latitude_sd <dbl>, latitude_min <dbl>, latitude_max <dbl>,
#   longitude_mean <dbl>, longitude_sd <dbl>, longitude_min <dbl>,
#   longitude_max <dbl>
ufo_modified <- ufo %>% tidyr::drop_na()
ufo_modified
# A tibble: 66,516 × 11
   date_time        city_area  state country ufo_shape encounter_length
   <chr>            <chr>      <chr> <chr>   <chr>                <dbl>
 1 10/10/1949 20:30 san marcos tx    us      cylinder              2700
 2 10/10/1956 21:00 edna       tx    us      circle                  20
 3 10/10/1960 20:00 kaneohe    hi    us      light                  900
 4 10/10/1961 19:00 bristol    tn    us      sphere                 300
 5 10/10/1965 23:45 norwalk    ct    us      disk                  1200
 6 10/10/1966 20:00 pell city  al    us      disk                   180
 7 10/10/1966 21:00 live oak   fl    us      disk                   120
 8 10/10/1968 13:00 hawthorne  ca    us      circle                 300
 9 10/10/1968 19:00 brevard    nc    us      fireball               180
10 10/10/1970 16:00 bellmore   ny    us      disk                  1800
# ℹ 66,506 more rows
# ℹ 5 more variables: described_encounter_length <chr>, description <chr>,
#   date_documented <chr>, latitude <dbl>, longitude <dbl>
ufo_modified <- ufo_modified %>%
  dplyr::mutate(across(where(is.character), as.factor)) %>% 
  relocate(where(is.factor))
glimpse(ufo_modified)
Rows: 66,516
Columns: 11
$ date_time                  <fct> 10/10/1949 20:30, 10/10/1956 21:00, 10/10/1…
$ city_area                  <fct> san marcos, edna, kaneohe, bristol, norwalk…
$ state                      <fct> tx, tx, hi, tn, ct, al, fl, ca, nc, ny, ky,…
$ country                    <fct> us, us, us, us, us, us, us, us, us, us, us,…
$ ufo_shape                  <fct> cylinder, circle, light, sphere, disk, disk…
$ described_encounter_length <fct> 45 minutes, 1/2 hour, 15 minutes, 5 minutes…
$ description                <fct> This event took place in early fall around …
$ date_documented            <fct> 4/27/2004, 1/17/2004, 1/22/2004, 4/27/2007,…
$ encounter_length           <dbl> 2700, 20, 900, 300, 1200, 180, 120, 300, 18…
$ latitude                   <dbl> 29.88306, 28.97833, 21.41806, 36.59500, 41.…
$ longitude                  <dbl> -97.94111, -96.64583, -157.80361, -82.18889…
ufo_modified2 <- ufo_modified %>%
  stats::setNames(c("Date_and_Time_of_Sighting","City_Area","State_Code","Country","UFO_Shape","Described_Encounter_Length","Description","Date_of_Documentation","Encounter_Length","Latitude","Longitude"))

ufo_modified2 %>%
  dplyr::slice_sample(n = 500) %>%
  DT::datatable(
    style = "default",
    caption = htmltools::tags$caption(
      style = "caption-side: top; text-align: left; color: black; font-size: 100%;", "UFO Sightings Dataset (Clean)"
    ),
    options = list(pageLength = 10, autoWidth = TRUE)
  ) %>%
  DT::formatStyle(
    columns = names(ufo_modified2),
    fontFamily = "Roboto Condensed",
    fontSize = "15px",
  )
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

5. Graphs

1. Which Countries Have Reported the Most UFO Sightings?

ufo_modified2 %>%
  gf_bar(~Country,
    fill = "#ffbd59", position = "dodge") %>%
  gf_labs(title = "Which Countries Have Reported the Most UFO Sightings?",
          x = "Country",
          y = "Number of Sightings") %>%
  gf_refine(scale_fill_brewer(palette = "Set2"))

2. How Has The Length of UFO Encounters Evolved Over Time?

gf_point(Encounter_Length ~ Date_of_Documentation, 
         data = ufo_modified2, 
         color = '#0097b2', 
         size = 1) %>%
  gf_labs(title = "How Has The Length of UFO Encounters Evolved Over Time?",
          x = "Date Documented",
          y = "Encounter Length (seconds,log scale)") %>% 
  gf_refine(scale_y_log10())

3. Does UFO Encounter Duration Differ Across Countries?

ufo_modified2 %>%
  gf_boxplot(Encounter_Length ~ Country, fill = ~Country, orientation = "x",alpha = 0.5) %>%
  gf_refine(scale_y_log10(), scale_fill_brewer(name = "Legend = Countries",palette = "Set1")) %>%
  gf_labs(
    title = "Does UFO Encounter Duration Differ Across Countries?",
    x = "Country", y = "Encounter Duration"
  )

4. What is The Most Frequently Sighted UFO Shape?

ufo_modified2 %>%
  gf_bar(~UFO_Shape, fill = ~Country, position = "dodge",) %>%
  gf_refine(scale_y_log10(), scale_fill_brewer(name = "Legend = Countries",palette = "Set1")) %>% 
  gf_labs(title = "What is The Most Frequently Sighted UFO Shape?",
          x = "UFO Shape",
          y = "Count") %>%
  gf_theme(theme(axis.text.x = element_text(angle = 45, hjust = 1)))

6. Summary of Findings

  1. The USA has reported the most UFO sightings over the years, next being Canada.

  2. Encounter duration with UFOs has remained almost the same over the years, with a few exceptions.

  3. UFO encounters in Great Britain tend to be the longest as compared to Australia, Canada and USA.

  4. Duration of UFO encounters varies widely in USA, and very scarcely in Australia.

  5. The most frequently sighted UFO Shape is ‘light’. This particularly shaped UFO was spotted in all four countries on varied days.

  6. The lesser frequently sighted UFO Shapes include ‘changed’, ‘crescent’, ‘flare’, ‘hexagon’ and ‘pyramid’.

7. Surprising Aspects

  1. Why are there more UFOs spotted in USA? Could they be mistaking objects as UFOs?

  2. According to the data set, there are UFO sightings that were documented only years after they were reported, which is very strange.

  3. Duration of UFO encounters seem to be higher in the Northern countries.

  4. I was unaware of the variety in shapes of UFOs that existed.