Data Wrangling and Visualization

Lab 1 ESM 244 Allison Horst

Attach packages, read in and explore the data

Attach (load) packages with library():

Read in NOAA Commercial Fisheries Landing data (1950 - 2017) Accessed from: https://www.st.nmfs.noaa.gov/commercial-fisheries/commercial-landings/ Source: Fisheries Statistics Division of the NOAA Fisheries

#using here packages
us_landings <- read_csv(here("content", "project", "DataWrangling", "data", "noaa_fisheries.csv"))
## Parsed with column specification:
## cols(
##   Year = col_double(),
##   State = col_character(),
##   `AFS Name` = col_character(),
##   `Landings (pounds)` = col_double(),
##   `Dollars (USD)` = col_character()
## )

Go exploring a bit:

summary(us_landings)
View(us_landings)
names(us_landings)
head(us_landings)
tail(us_landings)

Data cleaning to get salmon landings

First: tidying the entire data frame

landings_tidy <- us_landings %>% 
  janitor::clean_names() %>% #names to lower case and snake
  mutate(state = str_to_lower(state),#overwrite the state column changing the content to lower case
        afs_name = str_to_lower(afs_name)) %>% #overwrite the afs_name column changing the content to lower case
  mutate(dollars_num = parse_number(dollars_usd))#take only the numbers and create a new column

Now, getting just the salmon:

salmon_landings <- landings_tidy %>% 
  mutate(afs_clean = str_remove(afs_name, pattern = "aggregate")) %>% #remove string pattern 
  filter(str_detect(afs_clean, pattern = "salmon")) %>% 
  separate (afs_clean, into = c("group", "species"), sep = ",")

Find some grouped summary data:

Find annual total US landings and dollar value (summing across all states) for each TYPE of salmon using “group_by ()” + “summarize()”

salmon_summary <- salmon_landings %>% 
  group_by(year, species) %>% 
  summarize(
    tot_landings = sum (landings_pounds),
    tot_value = sum (dollars_num)
  )

Make a graph of US commercial fisheries value by species over time with ggplot2

salmon_landings_graph <- ggplot(salmon_summary, aes(x = year, y = tot_landings, group = species)) +
  geom_line(aes(color = species)) +
  theme_bw() +
  labs(x = "year", y = "US commercial salmon landings (pounds)")

salmon_landings_graph

2015 commercial fisheries value by state

Now, let’s create a finalized table of the top 5 states (by total commercial fisheries value) for 2015 .

state_value <- landings_tidy %>% 
  filter(year %in% c(2015)) %>% 
  group_by(state) %>% 
  summarize(
    state_value = sum(dollars_num, na.rm = TRUE),
    state_landings = sum(landings_pounds, na.rm = TRUE)
  ) %>% 
  arrange(-state_value) %>% 
  head(5)

Making a HTML table

First, we’ll create it as a finalized data frame:

state_table <- state_value %>% 
  mutate(`Fisheries value ($ millions)` = round(state_value / 1e6, 2),
         `Landings (million pounds)` = round(state_landings / 1e6, 1)) %>% 
  select(-state_value, -state_landings) %>% 
  rename(State = state) %>% 
  mutate(State = str_to_title(State))

Now, use kable() + kableExtra to nicely format it for HTML:

kable(state_table) %>% 
  kable_styling(bootstrap_options = "striped", 
                full_width = FALSE) %>% 
  add_header_above(c("", "2015 US commercial fisheries - top 5 states by value" = 2))
2015 US commercial fisheries - top 5 states by value
State Fisheries value ($ millions) Landings (million pounds)
Alaska 1750.20 6015.1
Maine 628.95 252.5
Massachusetts 523.67 259.8
Louisiana 369.62 1068.5
Washington 221.54 148.8