Skip to content

Djm/resids hotfix #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 6, 2024
Merged

Djm/resids hotfix #294

merged 9 commits into from
Mar 6, 2024

Conversation

dajmcdon
Copy link
Contributor

@dajmcdon dajmcdon commented Feb 20, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main".
  • Request a review from one of the current epipredict main reviewers:
    dajmcdon.
  • Makes sure to bump the version number in DESCRIPTION and NEWS.md.
    Always increment the patch version number (the third number), unless you are
    making a release PR from dev to main, in which case increment the minor
    version number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    0.7.2, then write your changes under the 0.8 heading).

Change explanations for reviewer

If grab_residuals() returns a data frame with some keys, then the by_key argument would cause an error. This is be cause, for example, if geo_value was a key (it always is), and the r object had the columns c("geo_value", ".resid"), then bind_cols(key_cols, r) renames the two geo_value columns to be distinct, and subsequent group_by operations failed.

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

  • Resolves #{issue number}

@dajmcdon dajmcdon requested a review from dsweber2 February 20, 2024 20:48
@dsweber2
Copy link
Contributor

Interesting, a little surprised I haven't actually run into this problem. If we're not doing grouping by geo_value by default, when would we want to do this? Is this a problem that comes up naturally for multi-aheads, e.g. smooth_quantile_regression?

So I tried the test without the change and it seems to have passed; doing a browser(), it looks like grab_residuals only returns a .resid column. My guess is that you had a more complicated situation where grab_residuals actually had too many columns that made this unit test from?

Copy link
Contributor

@dsweber2 dsweber2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to extract the test case slightly differently. Looking at grab_residuals, seems like it could only happen if ".resid" is already in the residuals of the_fit, which is returning a dataframe.

Were you maybe using a different engine than linear_reg?

@dajmcdon
Copy link
Contributor Author

Here's the failure:

library(epipredict)
#> Loading required package: epiprocess
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
flat1 <- flatline_forecaster(case_death_rate_subset, "death_rate")
flat2 <- flatline_forecaster(
  case_death_rate_subset, "death_rate",
  args_list = flatline_args_list(quantile_by_key = "geo_value")
)
#> New names:
#> • `geo_value` -> `geo_value...1`
#> • `geo_value` -> `geo_value...2`
#> Error in `dplyr::group_by()`:
#> ! Must group by variables found in `.data`.
#> ✖ Column `geo_value` is not found.

Created on 2024-02-20 with reprex v2.1.0

And on rebuilding with this fix:

library(epipredict)
#> Loading required package: epiprocess
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
flat1 <- flatline_forecaster(case_death_rate_subset, "death_rate")
flat2 <- flatline_forecaster(
  case_death_rate_subset, "death_rate",
  args_list = flatline_args_list(quantile_by_key = "geo_value")
)

Created on 2024-02-20 with reprex v2.1.0

@dajmcdon
Copy link
Contributor Author

It's because of the grab_residuals() function returning a tibble with 2 columns (geo_value and .resid) for the flatline engine. This doesn't (necessarily) happen with other engines.

@dsweber2
Copy link
Contributor

That seems simple enough to include as the unit test maybe?

I tried the straightforward wf <- epi_workflow(r, parsnip::linear_reg() %>% parsnip::set_engine("flatline")) %>% fit(jhu) to get the test to fail on the old version, but that gives the confusing error of

Error in `dplyr::arrange()`:
ℹ In argument: `..1 = time_value`.
Caused by error:
! object 'time_value' not found

* avoids bug when requesting quantile_values close to existing ones but off by a small tolerance
* never creates unsorted quantiles
* extrapolates outside the existing range by linearly interpolating on the logistic scale
Copy link
Contributor

@dsweber2 dsweber2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Half of this is addressing a new issue right? Seems like you changed the extrapolation to use the same method regardless of region and refactored some functions?

I have lingering questions about things but I wouldn't let those block

l <- 1:9 / 10
v <- 1:9
distn <- dist_quantiles(list(v), list(l))
expect_equal(quantile(distn, c(.25, .75)), list(c(2.5, 7.5)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check, this is 2.5 because that's halfway between 2 and 3, which are the .2 and .3 quantile values?

Comment on lines +62 to +65
expect_equal(
unlist(quantile(distn, c(.01, .05))),
tail_extrapolate(c(.01, .05), head(qv, 2))
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I quite get how this is testing the tail behavior; is it just checking that quantile is using tail_extrapolate to calculate the values at those quantiles?

@dsweber2
Copy link
Contributor

oh, I pushed the change for the styler, which seems to have been the only thing that was breaking the CI

@dajmcdon
Copy link
Contributor Author

dajmcdon commented Mar 6, 2024

@dsweber2 Am I good to merge this?

@dsweber2
Copy link
Contributor

dsweber2 commented Mar 6, 2024

oh, I suppose I should've said that in addition to approving. Would you prefer I just merge if I approve when I'm tagged?

But yes, lgtm!

@dajmcdon
Copy link
Contributor Author

dajmcdon commented Mar 6, 2024

Ah, no problem. I think better for the PR creator to merge once satisfied, but I lost track of whether we were both satisfied!

@dajmcdon dajmcdon merged commit 7a4ea55 into dev Mar 6, 2024
@dajmcdon dajmcdon deleted the djm/resids-hotfix branch September 20, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants