Skip to content

compactify changes including adding a vignette #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 87 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
c6a88f3
Added \n to last sprintf statement
May 10, 2022
a5780ed
Testing out creation of error message (note not yet refined)
May 11, 2022
2c84c10
Probs should be or
May 11, 2022
c3e32b8
Updated error message
May 11, 2022
7ba7434
Update error message again
May 11, 2022
648ee62
Took out print statements that were used for testing
May 12, 2022
7c636a7
Took out print statement and clarified error message
May 12, 2022
dce69d4
Added error message to epi_slide fun to address issue #65.
May 12, 2022
a1a0ec2
Added testing and made stylistic changes as per pull request comments
May 13, 2022
a4bd060
Added testing for epi_slide and made stylistic changes to error message
May 13, 2022
d95a7b6
Added code to make edf and f
May 13, 2022
bcbd37a
Re-worded comment a bit
May 13, 2022
636f212
Added some details for compactify
kenmawer May 13, 2022
fa98b61
Created helper file for testing
May 14, 2022
84a4769
Merging to add helper script for epi_slide tests
May 14, 2022
29678d5
Made sure dplyr fun can be accessed in tests
May 14, 2022
d436ae9
Printed column names of DT as requested
May 14, 2022
74a10a9
Merge this branch with main as added final newline to archive print s…
May 14, 2022
8c1307a
Updated sprintf statement to better accomodate many cols
May 16, 2022
7e757ee
Deleted commented out old code
May 16, 2022
34b0b1d
Re-wrote explan. a bit.
May 16, 2022
75d6d68
Re-wrote explan. some more
May 16, 2022
05a4804
Added compactify variable with check.
kenmawer May 16, 2022
7760ce4
Updated function.
kenmawer May 16, 2022
fe1f2d8
Put !!! to indicate incomplete part.
kenmawer May 16, 2022
434f7eb
Some minor re-wording
May 16, 2022
efe3a2d
More minor re-wording
May 16, 2022
1268bb7
Shortened code
kenmawer May 16, 2022
cccfea9
Converted roxygen comments to Rd file
May 16, 2022
56859cb
More code updates, including updates for vignette.
kenmawer May 17, 2022
462b950
Moved helper file code to test file and deleted helper file
May 17, 2022
6f463b4
Fixed some arrangement of code
May 17, 2022
2999a9e
Simplified code a bit
May 17, 2022
b6d4dc1
R6 class
May 17, 2022
3d3b2a2
Update slide.R
rachlobay May 17, 2022
4df45a4
Unsure why that got deleted so re-add that import
rachlobay May 17, 2022
bd4daf3
Put in new changes to vignette.
kenmawer May 17, 2022
a6d6975
Merge branch 'main' of https://github.com/dajmcdon/epiprocess into km…
kenmawer May 17, 2022
084bf8c
Added function to remove LOCF. This was copy-pasted while working out…
kenmawer May 20, 2022
d6a5fed
Still need to figure out how to get this to not give me a null dataset.
kenmawer May 20, 2022
87c487d
Fixed a typo on the code. I still haven't finished it.
kenmawer May 20, 2022
22937ae
Changed some text and fixed a typo.
kenmawer May 21, 2022
7310d22
Add a comment about code
kenmawer May 21, 2022
f94b87c
Merge pull request #7 from dajmcdon/epi_slide-error_message
dajmcdon May 25, 2022
6f31970
Merge pull request #8 from dajmcdon/epi-archive-print
dajmcdon May 25, 2022
d16d889
Merge pull request #9 from dajmcdon/epi_slide-f-param-doc
dajmcdon May 25, 2022
5eb4bdd
Put in a code stub for compactify testing.
kenmawer May 31, 2022
82756d5
Made tests and fixed a bug.
kenmawer May 31, 2022
97a2161
There's still work to do in ensuring the LOCF rows are printed properly!
kenmawer May 31, 2022
21d3d55
Fixed tests, LOCF rows still need to be printed.
kenmawer May 31, 2022
ffb32fe
Added warnings for LOCF.
kenmawer Jun 1, 2022
fd1c26a
Updated testing and warnings.
kenmawer Jun 1, 2022
cbdf9dd
Updated testing.
kenmawer Jun 1, 2022
c417f56
Still needs updating so tests pass. No longer using delphi.epidata.
kenmawer Jun 9, 2022
fef7377
Changed how LOCF checking will work. (Still needs work.)
kenmawer Jun 9, 2022
b142442
Added updates and changed some names.
kenmawer Jun 9, 2022
f7b353b
Changed LOCF check
kenmawer Jun 9, 2022
46628ca
Updated archive code.
kenmawer Jun 10, 2022
5479dd8
Improved LOCF filtering by treating NA's as if they are LOCF's due to…
kenmawer Jun 11, 2022
4f0ee0d
Modified arrange function to also account for other_keys.
kenmawer Jun 13, 2022
5420261
Update comment on RStudio.
kenmawer Jun 13, 2022
c66cb4e
Updated LOCF checking to take out redundancies.
kenmawer Jun 13, 2022
0a924be
Greatly improved as to actually test LOCF with more control over valu…
kenmawer Jun 13, 2022
b0d68e8
Improved tests to account for multiple values.
kenmawer Jun 14, 2022
2d4cd05
Updated vignette with an example.
kenmawer Jun 14, 2022
e4d15c6
Updated a few changes based on pull request comments.
kenmawer Jun 16, 2022
ff64463
Cleared up ifelse, improved printing and updated descriptions.
kenmawer Jun 16, 2022
e9bd8ab
Added a few more changes on the vignette and with the warning message.
kenmawer Jun 16, 2022
ba4f933
Fixed as_epi_archive on covidcast to be more clear about what is does…
kenmawer Jun 16, 2022
146b99d
Updated.
kenmawer Jun 16, 2022
716f8f3
Added a test (not finished yet) to test as_of.
kenmawer Jun 16, 2022
3712481
Added test for testing as_of in conjunction with compactify.
kenmawer Jun 16, 2022
2443419
Made test more descriptive.
kenmawer Jun 16, 2022
6185b78
Fixed error with if_else
kenmawer Jun 16, 2022
ec2960d
Updated vignette.
kenmawer Jun 16, 2022
07808fe
Updated warning message to be more clear.
kenmawer Jun 16, 2022
f6fde05
Updated comments.
kenmawer Jun 16, 2022
7c4d9b4
Fixed spacing with tests.
kenmawer Jun 17, 2022
1be2b37
Added a feature for time of compactify.
kenmawer Jun 21, 2022
9a5d0a7
Basic template (not documented yet).
kenmawer Jun 21, 2022
cef2e4b
Pulled from main, fixed changes.
kenmawer Jun 21, 2022
f967522
Also need to commit this.
kenmawer Jun 21, 2022
8ec0b0b
Fixed accidentally deleted comment.
kenmawer Jun 21, 2022
07bce50
Broke up the archive vignette to create a vignette for compactify, wh…
kenmawer Jun 21, 2022
a309f0b
Updated incorrect code and vignette for more details.
kenmawer Jun 21, 2022
d7c56d3
Updated documentation.
kenmawer Jun 22, 2022
2f8735a
Updated this to show a plot of times with and without LOCF.
kenmawer Jun 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 132 additions & 52 deletions R/archive.R

Large diffs are not rendered by default.

28 changes: 26 additions & 2 deletions R/slide.R
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,27 @@
#' tidy evaluation (first example, above), then the name for the new column is
#' inferred from the given expression and overrides any name passed explicitly
#' through the `new_col_name` argument.
#'
#'
#' When `f` is a named function with arguments, if a tibble with an unnamed
#' grouping variable is passed in as the method argument to `f`, include a
#' parameter for the grouping-variable in `function()` just prior to
#' specifying the method to prevent that from being overridden. For example:
#' ```
#' # Construct an tibble with an unnamed grouping variable
#' edf = bind_rows(tibble(geo_value = "ak", time_value = as.Date("2020-01-01")
#' + 1:10, x1=1:10, y=1:10 + rnorm(10L))) %>%
#' as_epi_df()
#'
#' # Now, include a row parameter for the grouping variable in the tibble,
#' # which we denote as g, just prior to method = "qr"
#' # Note that if g was not included below, then the method = "qr" would be
#' # overridden, as described above
#' edf %>%
#' group_by(geo_value) %>%
#' epi_slide(function(x, g, method="qr", ...) tibble(model=list(
#' lm(y ~ x1, x, method=method))), n=7L)
#' ```
#'
#' @importFrom lubridate days weeks
#' @importFrom rlang .data .env !! enquo enquos sym
#' @export
Expand Down Expand Up @@ -121,7 +141,7 @@ epi_slide = function(x, f, ..., n = 7, ref_time_values,
# intersect with observed time values
if (missing(ref_time_values)) {
ref_time_values = unique(x$time_value)
}
}
else {
ref_time_values = ref_time_values[ref_time_values %in%
unique(x$time_value)]
Expand Down Expand Up @@ -164,6 +184,10 @@ epi_slide = function(x, f, ..., n = 7, ref_time_values,
time_range = range(unique(x$time_value))
starts = in_range(ref_time_values - before_num, time_range)
stops = in_range(ref_time_values + after_num, time_range)

if( length(starts) == 0 || length(stops) == 0 ) {
Abort("The starting and/or stopping times for sliding are out of bounds with respect to the range of times in your data. Check your settings for ref_time_values and align (and before, if specified).")
}

# Symbolize new column name
new_col = sym(new_col_name)
Expand Down
19 changes: 14 additions & 5 deletions man/as_epi_archive.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

55 changes: 36 additions & 19 deletions man/epi_archive.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 28 additions & 4 deletions man/epi_slide.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 8 additions & 4 deletions man/epix_as_of.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 8 additions & 4 deletions man/epix_merge.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 8 additions & 4 deletions man/epix_slide.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

84 changes: 84 additions & 0 deletions tests/testthat/test-compactify.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
library(epiprocess)
library(data.table)
library(dplyr)

dt <- archive_cases_dv$DT
test_that("Input for compactify must be NULL or a boolean", {
expect_error(as_epi_archive(dv_duplicated,compactify="no"))
})

dt <- filter(dt,geo_value == "ca")
dt$percent_cli <- c(1:80)
dt$case_rate <- c(1:80)

row_replace <- function(row,x,y) {
dt[row,4] <- x
dt[row,5] <- y
dt
}

# Rows 1 should not be eliminated even if NA
dt <- row_replace(1,NA,NA) # Not LOCF

# NOTE! We are assuming that there are no NA's in geo_value, time_value,
# and version. Even though compactify may erroneously remove the first row
# if it has all NA's, we are not testing this behaviour for now as this dataset
# has problems beyond the scope of this test

# Rows 11 and 12 correspond to different time_values
dt <- row_replace(12,11,11) # Not LOCF

# Rows 20 and 21 only differ in version
dt <- row_replace(21,20,20) # LOCF

# Rows 21 and 22 only differ in version
dt <- row_replace(22,20,20) # LOCF

# Row 39 comprises the first NA's
dt <-row_replace(39,NA,NA) # Not LOCF

# Row 40 has two NA's, just like its lag, row 39
dt <- row_replace(40,NA,NA) # LOCF

# Row 62's values already exist in row 15, but row 15 is not a preceding row
dt <- row_replace(62,15,15) # Not LOCF

# Row 73 only has one value carried over
dt <- row_replace(74,73,74) # Not LOCF

dt_true <- as_tibble(as_epi_archive(dt,compactify=TRUE)$DT)
dt_false <- as_tibble(as_epi_archive(dt,compactify=FALSE)$DT)
dt_null <- as_tibble(as_epi_archive(dt,compactify=NULL)$DT)

test_that("Warning for LOCF with compactify as NULL", {
expect_warning(as_epi_archive(dt,compactify=NULL))
})

test_that("No warning when there is no LOCF", {
expect_warning(as_epi_archive(dt[1:10,],compactify=NULL),NA)
})

test_that("LOCF values are ignored with compactify=FALSE", {
expect_identical(nrow(dt),nrow(dt_false))
})

test_that("LOCF values are taken out with compactify=TRUE", {
dt_test <- as_tibble(as_epi_archive(dt[-c(21,22,40),],compactify=FALSE)$DT)

expect_identical(dt_true,dt_null)
expect_identical(dt_null,dt_test)
})

test_that("as_of utilizes LOCF even after removal of LOCF values",{
ea_true <- as_epi_archive(dt,compactify=TRUE)
ea_false <- as_epi_archive(dt,compactify=FALSE)

epix_as_of(ea_true,max(ea_true$DT$version))

# Row 22, an LOCF row corresponding to the latest version, but for the
# date 2020-06-02, is omitted in ea_true
as_of_true <- ea_true$as_of(max(ea_true$DT$version))
as_of_false <- ea_false$as_of(max(ea_false$DT$version))

expect_identical(as_of_true,as_of_false)
})
Loading