Change treatment of `NA`s and odd `n` in `pct_change` #17

brookslogan · 2021-10-26T16:15:46Z

Proposed semantic changes:

use sum instead of Sum; if there is a missing observation in the middle of the computation window, the result should be NA
produce an error if n is odd rather than silently changing it to an even number; alternatively make it the half-width of the computation window rather than the full width
maybe change the naming semantics; e.g., something like "pct_change_{{x}}" := ...... when they don't provide a name, or requiring the user to explicitly provide a name
add some more input validation

Maybe also consider vectorizing it to not rely on slide_by_geo? I don't know if performance is actually a consideration here.

The text was updated successfully, but these errors were encountered:

ryantibs · 2021-10-27T03:20:43Z

Thanks for these good ideas @brookslogan. I just snuck in, on my last commit on #14, your second bullet point request: fail if n is odd.

The first bullet: this is a reasonable idea as a default, however looking at what I did for the derivative estimation function, all of the methods there actually try to gracefully handle NAs as well (in line with pct_change()). So the overall philosophy I implemented seems to be to give you something that's non-NA if possible. However, I think we should at least make this an argument (as in na_rm = TRUE or FALSE), in all of these functions. I just didn't want to make this change at the moment, since I wanted to wrap up #14 and merge it.

ryantibs · 2021-10-28T14:56:42Z

@brookslogan Ok, I snuck in one more commit on #14, which addresses the NA problem. Now we have an argument (and default value): na_rm = TRUE in both pct_change(), estimate_deriv().

I thought about putting an na_rm argument into epi_slide() as well, but then realized that this makes less sense: this would be forced to drop all rows such that any column has an NA, but that would be overkill, since pct_change() and estimate_deriv() would only be accessing certain columns anyway.

Since the main points in this issue are addressed by #14, I'm going to close it with that PR. If you want to make separate issues about speed (vectorization) and/or about better error checking, then I'd say go for it.

My 2c: I don't know that as it is right now, speed is much of an issue with pct_change(). And better error checking would be useful in general, not just for this function, so you could make an issue about that in general. And you could include in the issue the need for unit tests as well. Thanks!

- Changing what we highlight about what this function does; also, moving "cor" to the front of the name - While I'm here, I snuck in a fix to the `NA` problem raised by Logan on #17. Now `pct_change()` and `estimate_deriv()` take as an argument `na_rm` with default value `TRUE`

ryantibs mentioned this issue Oct 28, 2021

Big redesign #14

Merged

ryantibs closed this as completed Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change treatment of `NA`s and odd `n` in `pct_change` #17

Change treatment of `NA`s and odd `n` in `pct_change` #17

brookslogan commented Oct 26, 2021

ryantibs commented Oct 27, 2021

ryantibs commented Oct 28, 2021

Change treatment of NAs and odd n in pct_change #17

Change treatment of NAs and odd n in pct_change #17

Comments

brookslogan commented Oct 26, 2021

ryantibs commented Oct 27, 2021

ryantibs commented Oct 28, 2021

Change treatment of `NA`s and odd `n` in `pct_change` #17

Change treatment of `NA`s and odd `n` in `pct_change` #17