Skip to content

GCD calculation #426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dajmcdon opened this issue Mar 6, 2024 · 6 comments
Open

GCD calculation #426

dajmcdon opened this issue Mar 6, 2024 · 6 comments
Labels
question Further information is requested

Comments

@dajmcdon
Copy link
Contributor

dajmcdon commented Mar 6, 2024

I'm not entirely sure what the use case for gcd_num() is, but is it worth implementing differently? We don't currently import Rcpp (which the second version uses):

d <- sample(1:1000, 1000, TRUE)

microbenchmark::microbenchmark(
  epiprocess = epiprocess:::gcd_num(d),
  rtestim = rtestim:::gcd(d)
)
#> Warning in microbenchmark::microbenchmark(epiprocess = epiprocess:::gcd_num(d),
#> : less accurate nanosecond times to avoid potential integer overflows
#> Unit: microseconds
#>        expr       min         lq      mean    median       uq      max neval
#>  epiprocess 12249.816 12346.0635 17960.673 12423.984 14510.54 484596.5   100
#>     rtestim    19.024    24.2515  5672.601    34.973    51.66 563304.5   100

Created on 2024-03-06 with reprex v2.1.0

@dajmcdon dajmcdon added the question Further information is requested label Mar 6, 2024
@dsweber2
Copy link
Contributor

dsweber2 commented Mar 6, 2024

That mean median comparison though.

When I search for gcd_num, the only time it shows up is in guess_period, where it is handed a list of diffs between dates, converted to numeric. So I'd expect it to run once per loop of epix_process, which doesn't seem likely to eat too much time.

The lack of tests may mean there's a bug lurking, so that would be a potential gain from switching over to a non-DIY implementation.

@brookslogan
Copy link
Contributor

Tempting speedups and testedness, but Rcpp also means potential extra installation hassle for some users and potential license headaches (assuming we don't already transitively depend on Rcpp). Without evidence that this impacts epix_slide() performance significantly or of a bug, I'm not sure whether we should proceed.

@brookslogan
Copy link
Contributor

We may not even want to have a GCD algorithm at all. It seems like if the guessed period doesn't appear in the diffs of the unique times, then we have some really weird/wrong data. And if we make that an error, then we can move from calculating the GCD to taking the smallest diff and simply checking that it is a CD.

@brookslogan
Copy link
Contributor

brookslogan commented Jul 18, 2024

Actually, further, we might not need a GCD at all. I'm not sure why I used this for epix_slide default ref_time_values (i.e. versions).

  • I guess maybe I was thinking if there were a no-op update that this would still try to generate an output for that update.
  • [Perhaps I was thinking specifically of something beyond the max version with an update but <= the versions_end. So you could get exploration + production forecasts from one slide, and the default would even generate a forecast if the data source didn't deliver an update in time.]
  • [Or perhaps I was thinking in the case of skipped versions, we'd still want to by default include them in version analysis and not give an overly-rosy picture of what things would look like... and skipping them by default seems like something that could mess up latency and revision analysis routines. Though perhaps the forecast times are more relevant than the typical release schedule of an input....]
  • But we'd expect it to completely fail (no GCD) for datetime versions unless they were rounded to some coarse resolution.
  • And even with Dates, if we download roughly weekly but are sometimes off by a day, then we may end up running the computation daily... don't think that's what we want for pseudoprospective forecasting.

And we're not using GCD in guess_time_type() either.

[But we might actually be missing a guess_period() in next_after() impls... though that mainly only messes up.... guess_period() later, and trying to guess the period to create new times and being wrong may go south when using time types that don't enforce the period.]

@dshemetov
Copy link
Contributor

assuming we don't already transitively depend on Rcpp

We do. I wonder if tsibble is pushed into GPL because of Rcpp.

r$> pak::pkg_deps_explain("cmu-delphi/epiprocess", "Rcpp")
epiprocess -> tsibble -> anytime -> Rcpp                                   

@brookslogan
Copy link
Contributor

brookslogan commented Oct 5, 2024

(Side note: we could reduce runtime by 60% by using base R for checks instead of checkmate, and 70%+ by omitting some checks if possible. Still not sure we even want gcd at all anymore though. But this brings to attention that we should take care about validation in inner loops.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants