Data Wrangling and Visualization
Lab 1 ESM 244 Allison Horst
Attach packages, read in and explore the data
Attach (load) packages with library()
:
Read in NOAA Commercial Fisheries Landing data (1950 - 2017) Accessed from: https://www.st.nmfs.noaa.gov/commercial-fisheries/commercial-landings/ Source: Fisheries Statistics Division of the NOAA Fisheries
#using here packages
us_landings <- read_csv(here("content", "project", "DataWrangling", "data", "noaa_fisheries.csv"))
## Parsed with column specification:
## cols(
## Year = col_double(),
## State = col_character(),
## `AFS Name` = col_character(),
## `Landings (pounds)` = col_double(),
## `Dollars (USD)` = col_character()
## )
Go exploring a bit:
summary(us_landings)
View(us_landings)
names(us_landings)
head(us_landings)
tail(us_landings)
Data cleaning to get salmon landings
First: tidying the entire data frame
landings_tidy <- us_landings %>%
janitor::clean_names() %>% #names to lower case and snake
mutate(state = str_to_lower(state),#overwrite the state column changing the content to lower case
afs_name = str_to_lower(afs_name)) %>% #overwrite the afs_name column changing the content to lower case
mutate(dollars_num = parse_number(dollars_usd))#take only the numbers and create a new column
Now, getting just the salmon:
salmon_landings <- landings_tidy %>%
mutate(afs_clean = str_remove(afs_name, pattern = "aggregate")) %>% #remove string pattern
filter(str_detect(afs_clean, pattern = "salmon")) %>%
separate (afs_clean, into = c("group", "species"), sep = ",")
Find some grouped summary data:
Find annual total US landings and dollar value (summing across all states) for each TYPE of salmon using “group_by ()” + “summarize()”
salmon_summary <- salmon_landings %>%
group_by(year, species) %>%
summarize(
tot_landings = sum (landings_pounds),
tot_value = sum (dollars_num)
)
Make a graph of US commercial fisheries value by species over time with ggplot2
salmon_landings_graph <- ggplot(salmon_summary, aes(x = year, y = tot_landings, group = species)) +
geom_line(aes(color = species)) +
theme_bw() +
labs(x = "year", y = "US commercial salmon landings (pounds)")
salmon_landings_graph
2015 commercial fisheries value by state
Now, let’s create a finalized table of the top 5 states (by total commercial fisheries value) for 2015 .
state_value <- landings_tidy %>%
filter(year %in% c(2015)) %>%
group_by(state) %>%
summarize(
state_value = sum(dollars_num, na.rm = TRUE),
state_landings = sum(landings_pounds, na.rm = TRUE)
) %>%
arrange(-state_value) %>%
head(5)
Making a HTML table
First, we’ll create it as a finalized data frame:
state_table <- state_value %>%
mutate(`Fisheries value ($ millions)` = round(state_value / 1e6, 2),
`Landings (million pounds)` = round(state_landings / 1e6, 1)) %>%
select(-state_value, -state_landings) %>%
rename(State = state) %>%
mutate(State = str_to_title(State))
Now, use kable()
+ kableExtra
to nicely format it for HTML:
kable(state_table) %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE) %>%
add_header_above(c("", "2015 US commercial fisheries - top 5 states by value" = 2))
State | Fisheries value ($ millions) | Landings (million pounds) |
---|---|---|
Alaska | 1750.20 | 6015.1 |
Maine | 628.95 | 252.5 |
Massachusetts | 523.67 | 259.8 |
Louisiana | 369.62 | 1068.5 |
Washington | 221.54 | 148.8 |