Skip to content

Change treatment of NAs and odd n in pct_change #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brookslogan opened this issue Oct 26, 2021 · 2 comments
Closed

Change treatment of NAs and odd n in pct_change #17

brookslogan opened this issue Oct 26, 2021 · 2 comments

Comments

@brookslogan
Copy link
Contributor

Proposed semantic changes:

  • use sum instead of Sum; if there is a missing observation in the middle of the computation window, the result should be NA
  • produce an error if n is odd rather than silently changing it to an even number; alternatively make it the half-width of the computation window rather than the full width
  • maybe change the naming semantics; e.g., something like "pct_change_{{x}}" := ...... when they don't provide a name, or requiring the user to explicitly provide a name
  • add some more input validation

Maybe also consider vectorizing it to not rely on slide_by_geo? I don't know if performance is actually a consideration here.

@ryantibs
Copy link
Member

Thanks for these good ideas @brookslogan. I just snuck in, on my last commit on #14, your second bullet point request: fail if n is odd.

The first bullet: this is a reasonable idea as a default, however looking at what I did for the derivative estimation function, all of the methods there actually try to gracefully handle NAs as well (in line with pct_change()). So the overall philosophy I implemented seems to be to give you something that's non-NA if possible. However, I think we should at least make this an argument (as in na_rm = TRUE or FALSE), in all of these functions. I just didn't want to make this change at the moment, since I wanted to wrap up #14 and merge it.

@ryantibs
Copy link
Member

@brookslogan Ok, I snuck in one more commit on #14, which addresses the NA problem. Now we have an argument (and default value): na_rm = TRUE in both pct_change(), estimate_deriv().

I thought about putting an na_rm argument into epi_slide() as well, but then realized that this makes less sense: this would be forced to drop all rows such that any column has an NA, but that would be overkill, since pct_change() and estimate_deriv() would only be accessing certain columns anyway.

Since the main points in this issue are addressed by #14, I'm going to close it with that PR. If you want to make separate issues about speed (vectorization) and/or about better error checking, then I'd say go for it.

My 2c: I don't know that as it is right now, speed is much of an issue with pct_change(). And better error checking would be useful in general, not just for this function, so you could make an issue about that in general. And you could include in the issue the need for unit tests as well. Thanks!

ryantibs added a commit that referenced this issue Oct 28, 2021
- Changing what we highlight about what this function does; also,
  moving "cor" to the front of the name
- While I'm here, I snuck in a fix to the `NA` problem raised by
  Logan on #17. Now `pct_change()` and `estimate_deriv()` take as
  an argument `na_rm` with default value `TRUE`
@ryantibs ryantibs mentioned this issue Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants