You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: "Using epipredict on panel data from other contexts"
2
+
title: "Using epipredict on non-epidemic panel data"
3
3
output: rmarkdown::html_vignette
4
4
vignette: >
5
-
%\VignetteIndexEntry{Using epipredict on panel data from other contexts}
5
+
%\VignetteIndexEntry{Using epipredict on non-epidemic panel data}
6
6
%\VignetteEngine{knitr::rmarkdown}
7
7
%\VignetteEncoding{UTF-8}
8
8
---
9
9
10
-
```{r setup, include = F}
10
+
```{r setup, include=F}
11
11
knitr::opts_chunk$set(
12
12
collapse = TRUE,
13
13
comment = "#>",
@@ -16,23 +16,22 @@ knitr::opts_chunk$set(
16
16
)
17
17
```
18
18
19
-
```{r load-data, include = F}
20
-
# TODO: temp - remove me when I figure out how to run this rmd without loading
21
-
# perhaps need to commit statcan_employ_subset.rda first?
22
-
devtools::load_all()
23
-
```
24
-
25
19
```{r libraries}
26
20
library(epiprocess)
27
21
library(epipredict)
28
22
library(dplyr)
29
23
library(stringr)
30
24
library(parsnip)
25
+
library(recipes)
31
26
```
32
27
33
-
[Panel data](https://en.wikipedia.org/wiki/Panel_data), or longitudinal data, contains cross-sectional measurements of subjects over time. The `epipredict` package is most suitable for running forecasters on epidemiological data. However, other datasets with similar structures are also valid candidates for `epipredict` functionality.
28
+
[Panel data](https://en.wikipedia.org/wiki/Panel_data), or longitudinal data,
29
+
contains cross-sectional measurements of subjects over time. The `epipredict`
30
+
package is most suitable for running forecasters on epidemiological data.
31
+
However, other datasets with similar structures are also valid candidates for
In this vignette, we will demonstrate using `epipredict` with employment data from Statistics Canada. We will be using [Table 14-10-0220-01: Employment and average weekly earnings (including overtime) for all employees by industry, monthly, seasonally adjusted, Canada](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1410022001#data). The full dataset contains monthly employment counts from `r date_start` to `r date_end`, and presents employment data stratified by geographic region, [NAICS industries](https://www23.statcan.gc.ca/imdb/p3VD.pl?Function=getVD&TVD=1181553), and employee type. The full dataset also contains metadata that describes the quality of data collected. For demonstration purposes, we make the following modifications to get a subset of the full dataset:
45
-
46
-
* Only keep level 1 industries (2-digit codes) in the [NAICS hierarchy](https://www23.statcan.gc.ca/imdb/pUtil.pl?Function=getNote&Id=1181553&NT=45) and remove aggregated industry codes.
43
+
In this vignette, we will demonstrate using `epipredict` with employment data from
44
+
Statistics Canada. We will be using
45
+
[Table 14-10-0220-01: Employment and average weekly earnings (including overtime) for all employees by industry, monthly, seasonally adjusted, Canada](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1410022001#data).
46
+
The full dataset contains monthly employment counts from `r date_start` to `r date_end`,
47
+
and presents employment data stratified by geographic region,
To use this data with `epipredict`, we need to convert it into `epi_df` format using `as_epi_df` with additional keys. In our case, the additional keys are `employee_type` and `naics_industry`. Note that in the above modifications, we encoded `time_value` as type `tsibble::yearmonth`. This allows us to set `time_type` to `"yearmonth"` below, and to ensure lag and ahead modifications later on are using the correct time units.
96
+
To use this data with `epipredict`, we need to convert it into `epi_df` format using
97
+
`as_epi_df` with additional keys. In our case, the additional keys are `employee_type`
98
+
and `naics_industry`. Note that in the above modifications, we encoded `time_value`
99
+
as type `tsibble::yearmonth`. This allows us to set `time_type` to `"yearmonth"` below,
100
+
and to ensure lag and ahead modifications later on are using the correct time units.
The data contains `r employ_rowcount` rows and `r employ_colcount` columns. Now, we are ready to use `statcan_employ_subset` with `epipredict`.
116
+
The data contains `r employ_rowcount` rows and `r employ_colcount` columns. Now, we are
117
+
ready to use `statcan_employ_subset` with `epipredict`.
103
118
104
119
```{r preview-data, include=T}
105
-
head(statcan_employ_subset)
120
+
# Rename for simplicity
121
+
employ <- statcan_employ_subset
122
+
head(employ)
106
123
```
107
124
108
-
In the following sections, we will go over preprocessing the data in the `epi_recipe` framework, fitting 3 types of models from the `parsnip` package, and making future predictions.
125
+
In the following sections, we will go over preprocessing the data in the `epi_recipe`
126
+
framework, fitting 3 types of models from the `parsnip` package, and making future
127
+
predictions.
109
128
110
129
## Preprocessing
111
130
112
131
We will create a recipe that adds one `ahead` column and 3 `lag` columns.
113
132
114
-
```{r make-recipe, include = T}
115
-
r <- epi_recipe(statcan_employ_subset) %>%
133
+
```{r make-recipe, include=T}
134
+
r <- epi_recipe(employ) %>%
116
135
step_epi_ahead(ppl_count, ahead = 6) %>% # lag & ahead units in months
117
136
step_epi_lag(ppl_count, lag = c(0, 6, 12)) %>%
118
137
step_epi_naomit()
119
138
r
120
139
```
121
140
122
-
There is one `raw` role which includes our value column `ppl_count`, and two `key` roles which include our additional keys `employee_type` and `naics_industry`. Let's take a look at what these additional columns look like.
141
+
There is one `raw` role which includes our value column `ppl_count`, and two `key`
142
+
roles which include our additional keys `employee_type` and `naics_industry`. Let's
143
+
take a look at what these additional columns look like.
First we will look at a simple model: `parsnip::linear_reg()` with default engine `lm`. We can use `epi_workflow` with the above `epi_recipe` to fit a linear model using lags at time $t$ (current), $t-6$ months, and $t-12$ months.
0 commit comments