1
+
1
2
<!-- README.md is generated from README.Rmd. Please edit that file -->
2
3
3
4
# epiprocess
4
5
6
+ ## TODO: Condense these paragraphs
7
+
5
8
The [ ` {epiprocess} ` ] ( https://cmu-delphi.github.io/epiprocess/ ) package
6
9
works with epidemiological time series data to provide situational
7
10
awareness, processing, and transformations in preparation for modeling,
8
11
and version-faithful model backtesting. It contains:
9
12
10
- - ` epi_df ` , a class for working with epidemiological time series data
11
- which behaves like a tibble (and can be manipulated with
12
- [ ` {dplyr} ` ] ( https://dplyr.tidyverse.org/ ) -esque “verbs”) but with
13
- some additional structure;
14
- - ` epi_archive ` , a class for working with the version history of such
15
- time series data;
16
- - sample epidemiological data in these formats;
13
+ - ` epi_df ` , a class for working with epidemiological time series data
14
+ which behaves like a tibble (and can be manipulated with
15
+ [ ` {dplyr} ` ] ( https://dplyr.tidyverse.org/ ) -esque “verbs”) but with
16
+ some additional structure;
17
+ - ` epi_archive ` , a class for working with the version history of such
18
+ time series data;
19
+ - sample epidemiological data in these formats;
17
20
18
21
This package is provided by the Delphi group at Carnegie Mellon
19
22
University. The Delphi group provides many tools also hosts the Delphi
@@ -48,7 +51,7 @@ many common tasks instead.
48
51
49
52
To install:
50
53
51
- ``` r
54
+ ``` r
52
55
# Stable version
53
56
pak :: pkg_install(" cmu-delphi/epiprocess@main" )
54
57
@@ -63,7 +66,7 @@ The package is not yet on CRAN.
63
66
Once ` epiprocess ` and ` epidatr ` are installed, you can use the following
64
67
code to get started:
65
68
66
- ``` r
69
+ ``` r
67
70
library(epiprocess )
68
71
library(epidatr )
69
72
library(dplyr )
@@ -74,7 +77,7 @@ Get COVID-19 confirmed cumulative case data from JHU CSSE for
74
77
California, Florida, New York, and Texas, from March 1, 2020 to January
75
78
31, 2022
76
79
77
- ``` r
80
+ ``` r
78
81
df <- pub_covidcast(
79
82
source = " jhu-csse" ,
80
83
signals = " confirmed_cumulative_num" ,
101
104
# > # ℹ 2,798 more rows
102
105
```
103
106
104
- Convert the data to an epi_df object and sort by geo_value and
105
- time_value . You can work with the epi_df object like a tibble using
107
+ Convert the data to an epi \_ df object and sort by geo \_ value and
108
+ time \_ value . You can work with the epi \_ df object like a tibble using
106
109
dplyr
107
110
108
- ``` r
111
+ ``` r
109
112
edf <- df %> %
110
113
as_epi_df() %> %
111
114
arrange_canonical() %> %
115
118
# > An `epi_df` object, 2,808 x 4 with metadata:
116
119
# > * geo_type = state
117
120
# > * time_type = day
118
- # > * as_of = 2024-10-04 13:32:23.730165
119
- # >
121
+ # > * as_of = 2024-10-04 22:31:35.502626
122
+ # >
120
123
# > # A tibble: 2,808 × 4
121
124
# > # Groups: geo_value [4]
122
125
# > geo_value time_value cases_cumulative cases_daily
@@ -134,56 +137,56 @@ edf
134
137
# > # ℹ 2,798 more rows
135
138
```
136
139
137
- Autoplot the confirmed daily cases for each geo_value
140
+ Autoplot the confirmed daily cases for each geo \_ value
138
141
139
- ``` r
142
+ ``` r
140
143
edf %> %
141
144
autoplot(cases_cumulative )
142
145
```
143
146
144
147
<img src =" man/figures/README-unnamed-chunk-6-1.png " width =" 100% " />
145
148
146
149
Compute the 7 day moving average of the confirmed daily cases for each
147
- geo_value
150
+ geo \_ value
148
151
149
- ``` r
152
+ ``` r
150
153
edf %> %
151
154
group_by(geo_value ) %> %
152
155
epi_slide_mean(cases_daily , .window_size = 7 , na.rm = TRUE )
153
156
# > An `epi_df` object, 2,808 x 5 with metadata:
154
157
# > * geo_type = state
155
158
# > * time_type = day
156
- # > * as_of = 2024-10-04 13:32:23.730165
157
- # >
159
+ # > * as_of = 2024-10-04 22:31:35.502626
160
+ # >
158
161
# > # A tibble: 2,808 × 5
159
162
# > # Groups: geo_value [4]
160
163
# > geo_value time_value cases_cumulative cases_daily slide_value_cases_daily
161
164
# > * <chr> <date> <dbl> <dbl> <dbl>
162
- # > 1 ca 2020-03-01 19 19 19
163
- # > 2 ca 2020-03-02 23 4 11.5
165
+ # > 1 ca 2020-03-01 19 19 19
166
+ # > 2 ca 2020-03-02 23 4 11.5
164
167
# > 3 ca 2020-03-03 29 6 9.67
165
- # > 4 ca 2020-03-04 40 11 10
166
- # > 5 ca 2020-03-05 50 10 10
167
- # > 6 ca 2020-03-06 68 18 11.3
168
- # > 7 ca 2020-03-07 94 26 13.4
169
- # > 8 ca 2020-03-08 113 19 13.4
170
- # > 9 ca 2020-03-09 136 23 16.1
171
- # > 10 ca 2020-03-10 158 22 18.4
168
+ # > 4 ca 2020-03-04 40 11 10
169
+ # > 5 ca 2020-03-05 50 10 10
170
+ # > 6 ca 2020-03-06 68 18 11.3
171
+ # > 7 ca 2020-03-07 94 26 13.4
172
+ # > 8 ca 2020-03-08 113 19 13.4
173
+ # > 9 ca 2020-03-09 136 23 16.1
174
+ # > 10 ca 2020-03-10 158 22 18.4
172
175
# > # ℹ 2,798 more rows
173
176
```
174
177
175
178
Compute the growth rate of the confirmed cumulative cases for each
176
- geo_value
179
+ geo \_ value
177
180
178
- ``` r
181
+ ``` r
179
182
edf %> %
180
183
group_by(geo_value ) %> %
181
184
mutate(cases_growth = growth_rate(x = time_value , y = cases_cumulative , method = " rel_change" , h = 7 ))
182
185
# > An `epi_df` object, 2,808 x 5 with metadata:
183
186
# > * geo_type = state
184
187
# > * time_type = day
185
- # > * as_of = 2024-10-04 13:32:23.730165
186
- # >
188
+ # > * as_of = 2024-10-04 22:31:35.502626
189
+ # >
187
190
# > # A tibble: 2,808 × 5
188
191
# > # Groups: geo_value [4]
189
192
# > geo_value time_value cases_cumulative cases_daily cases_growth
@@ -204,7 +207,7 @@ edf %>%
204
207
Detect outliers in the growth rate of the confirmed cumulative cases for
205
208
each
206
209
207
- ``` r
210
+ ``` r
208
211
edf %> %
209
212
group_by(geo_value ) %> %
210
213
mutate(outlier_info = detect_outlr(x = time_value , y = cases_daily )) %> %
@@ -228,8 +231,8 @@ edf %>%
228
231
# > An `epi_df` object, 2,808 x 5 with metadata:
229
232
# > * geo_type = state
230
233
# > * time_type = day
231
- # > * as_of = 2024-10-04 13:32:23.730165
232
- # >
234
+ # > * as_of = 2024-10-04 22:31:35.502626
235
+ # >
233
236
# > # A tibble: 2,808 × 5
234
237
# > geo_value time_value cases_cumulative cases_daily outlier_info$rm_geo_value
235
238
# > * <chr> <date> <dbl> <dbl> <dbl>
@@ -249,11 +252,11 @@ edf %>%
249
252
# > # $combined_replacement <dbl>
250
253
```
251
254
252
- Add a column to the epi_df object with the daily deaths for each
253
- geo_value and compute the correlations between cases and deaths for
254
- each geo_value
255
+ Add a column to the epi \_ df object with the daily deaths for each
256
+ geo \_ value and compute the correlations between cases and deaths for
257
+ each geo \_ value
255
258
256
- ``` r
259
+ ``` r
257
260
df <- pub_covidcast(
258
261
source = " jhu-csse" ,
259
262
signals = " deaths_incidence_num" ,
0 commit comments