Skip to content

Strange bug with sync="truncate" in epix_merge() #200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dajmcdon opened this issue Aug 11, 2022 · 3 comments · Fixed by #215
Closed

Strange bug with sync="truncate" in epix_merge() #200

dajmcdon opened this issue Aug 11, 2022 · 3 comments · Fixed by #215
Assignees
Labels
bug Something isn't working

Comments

@dajmcdon
Copy link
Contributor

Below is the output using the package on main.

library(epiprocess)
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

y <- readRDS(
  system.file("extdata", "epi_archive.rds", package = "epipredict", mustWork = TRUE)
)

x <- y[[1]] %>% 
  select(geo_value, time_value, version = issue, percent_cli = value) %>%
  as_epi_archive()
#> Warning: Found rows that appear redundant based on last (version of each) observation
#> carried forward; these rows have been removed to "compactify" and save space:
#>     geo_value time_value    version percent_cli
#>  1:        ca 2020-06-01 2020-06-07    2.140116
#>  2:        ca 2020-06-02 2020-06-07    1.964248
#>  3:        ca 2020-06-03 2020-06-07    1.767582
#> ---                                            
#>  8:        fl 2020-12-09 2021-02-07   10.392244
#>  9:        fl 2021-04-01 2021-06-14    4.875669
#> 10:        fl 2021-06-30 2021-08-11    2.108405
#> Built-in `epi_archive` functionality should be unaffected, but results may
#> change if you work directly with its fields (such as `DT`). See
#> `?as_epi_archive` for details. To silence this warning but keep
#> compactification, you can pass `compactify=TRUE` when constructing the
#> archive.

xx <- epix_merge(
  x, y[[2]] %>% 
    select(geo_value, time_value, version = issue, case_rate = value) %>%
    as_epi_archive(), sync = "truncate")
#> Warning: Found rows that appear redundant based on last (version of each) observation
#> carried forward; these rows have been removed to "compactify" and save space:
#>       geo_value time_value    version case_rate
#>    1:        ca 2020-06-01 2020-06-23  6.628329
#>    2:        ca 2020-06-01 2020-06-27  6.628329
#>    3:        ca 2020-06-01 2020-06-28  6.628329
#>   ---                                          
#> 6011:        fl 2021-10-18 2021-10-22 12.362194
#> 6012:        fl 2021-10-19 2021-10-22 12.362194
#> 6013:        fl 2021-10-20 2021-10-22 12.362194
#> Built-in `epi_archive` functionality should be unaffected, but results may
#> change if you work directly with its fields (such as `DT`). See
#> `?as_epi_archive` for details. To silence this warning but keep
#> compactification, you can pass `compactify=TRUE` when constructing the
#> archive.
#> Error in `[.data.table`(x$DT, x[["DT"]][["version"]] <= new_versions_end, : j must be provided when with=FALSE

Created on 2022-08-11 by the reprex package (v2.0.1)

@dajmcdon dajmcdon added the bug Something isn't working label Aug 11, 2022
@brookslogan
Copy link
Contributor

brookslogan commented Aug 15, 2022

Thanks for catching! I assume this slipped through because I forgot to implement truncate tests, or they were deleted, or maybe some older version allowed this sort of arg passing. I don't know why they don't allow this; i on its own clearly can have with & non-with behavior. I think a fix here (that still avoids CHECK messages) is to feed in names of each DT as the j args.

NOTE: if system.file("extdata", "epi_archive.rds", package = "epipredict", mustWork = TRUE) in epipredict is an epi_archive, we're in a sticky situation. Once this bug is fixed in epiprocess, you'll have to re-build the archive in epipredict. The alternative is to use the pattern in epiprocess's data-raw/archive_cases_dv_subset.R and R/data.R to construct on demand (but then, the user might have an unexpected version of epiprocess... this might be addressed by adding epiprocess version requirements in epipredict DESCRIPTION or checks in whatever script you're running... although we don't really have different epiprocess version numbers right now; that's in a separate issue). [Nvm, I see that you are manually re-wrapping in an epi_archive in the example above. So following the epiprocess pattern would just allow this to happen automatically, if you are able to implement it however system.file is stored. A less complex approach would be just to have the system.file hold the DT, and always do manual epi_archive construction.]

@brookslogan
Copy link
Contributor

brookslogan commented Aug 15, 2022

We'll need to check that my suggestion above doesn't produce something with a different key than the input DTs.

We also need to implement the missing truncate test.

@brookslogan
Copy link
Contributor

To prevent this type of thing from slipping through again, probably want to look to see if there are code coverage tools to help check for testing completeness. (And this might also make me rethink some ways of rewriting things without branching logic, as the branching logic might make it easier to detect tests not covering things.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants