Extracting Weather Data From Canadian Archive

To me, spring 2019 seems cooler than the last year spring. I wanted to plot daily temperature data for my city. I know that the Government of Canada have such data. The following code was used to extract daily mean temperature for springs 2018 and 2019.

library(extrafont)
library(tidyverse)
library(glue)

## Set default ggplot2 font size and font familly

loadfonts(quiet = TRUE)
theme_set(theme_bw(base_size = 12, base_family = "Poppins"))

The first thing I did was to create a tibble with all the dates for which I wanted to extract the weather information. This was also a good chance to use the glue package.

url <- "http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?format=csv&stationID=26892&Year={year}&Month={month}&Day={day}&timeframe=2&submit=T%C3%A9l%C3%A9charger+des+donn%C3%A9es"

df <- tibble(
  date = seq(as.Date("2018-01-01"), as.Date("2019-05-15"), by = "1 month"),
  url = glue(
    url,
    year = lubridate::year(date),
    month = lubridate::month(date),
    day = lubridate::day(date)
  )
)

df
## # A tibble: 17 x 2
##    date       url                                                          
##    <date>     <glue>                                                       
##  1 2018-01-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
##  2 2018-02-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
##  3 2018-03-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
##  4 2018-04-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
##  5 2018-05-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
##  6 2018-06-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
##  7 2018-07-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
##  8 2018-08-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
##  9 2018-09-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
## 10 2018-10-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
## 11 2018-11-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
## 12 2018-12-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
## 13 2019-01-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
## 14 2019-02-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
## 15 2019-03-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
## 16 2019-04-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~
## 17 2019-05-01 http://climat.meteo.gc.ca/climate_data/bulk_data_f.html?form~

Now, I will create a function that will download, read and clean each data file.

download_weather <- function(url) {
  t <- tempfile()
  curl::curl_download(url, t, quiet = TRUE)
  df <- read_csv(t, skip = 25, locale = locale(decimal_mark = ",")) %>%
    janitor::clean_names() %>%
    select(date_heure, temp_moy_c)

  return(df)
}

With the urls constructed, I can now download all the data. Here I am using the rowwise() function from dplyr. Then, I extract the day of the year and the year.

res <- df %>%
  rowwise() %>%
  mutate(data = list(download_weather(url))) %>%
  unnest(data) %>%
  mutate(doy = lubridate::yday(date_heure)) %>%
  mutate(year = lubridate::year(date_heure)) %>%
  filter(between(doy, 100, lubridate::yday(Sys.time())))

With the downloaded data, it was easy to make a plot. Indeed, this year is cooler!

res %>%
  drop_na() %>%
  ggplot(aes(x = doy, y = temp_moy_c, color = factor(year))) +
  geom_line() +
  geom_point(show.legend = FALSE) +
  scale_y_continuous(breaks = seq(-10, 20, by = 2)) +
  xlab("Day of year") +
  ylab(expression("Average daily temperature " ~ (degree ~ C))) +
  theme(legend.title = element_blank())

R 

See also