Skip to content

Clarify naming and fix defaults: epix_slide is over versions #146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
7 tasks done
brookslogan opened this issue Jul 17, 2022 · 8 comments
Closed
7 tasks done

Clarify naming and fix defaults: epix_slide is over versions #146

brookslogan opened this issue Jul 17, 2022 · 8 comments
Assignees
Labels
op-semantics Operational semantics; many potentially breaking changes here P0 high priority

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Jul 17, 2022

  • Make ref_time_values default to something involving versions, not all time values; full_seq up to observed_versions_end with period determined using tsibble::guess_frequency or an alternative to avoid potential licensing complications the default value setting is now epix_slide won't grab the latest version #153; the renaming will be part of a different discussion/issue
  • Rename ref_time_values to something involving "version" part of separate discussion
  • Rename n to something that doesn't imply it's counting something replace with before (+add issue about dot prefixes)
  • Remove any n default value (e.g., 7 doesn't make sense; we won't get 7 time values in typical data sources)
  • Rename time_value column in output to version part of separate discussion
  • Consider making output an epi_archive... but we might need to invent geo and time columns... part of larger discussion
  • Fix up documentation, examples, vignettes, etc. to make sense and work with the above
@brookslogan brookslogan added P0 high priority op-semantics Operational semantics; many potentially breaking changes here labels Jul 17, 2022
@kenmawer kenmawer self-assigned this Jul 18, 2022
@brookslogan
Copy link
Contributor Author

Another thought: do we want to rename epix_slide to epix_version_slide?

@brookslogan
Copy link
Contributor Author

brookslogan commented Jul 18, 2022

We should try to avoid using tsibble::guess_frequency if possible for now. Another issue with the full_seq suggestion is that it could be altered by compactify; we'd at least want to document this, by making sure compactify descriptions don't promise full compatibility and noting specifically for the ref_time_values/new-name documentation. This might also lead us to think of a different post-compactify format that records other versions that had updates reported. (This also hints at making epi_archive provide a different interpretation: the current setup is as if we continuously and perfectly monitored some data source for updates within some time range; an alternative setup is to say that we only observed a certain set of versions. It even more hassle for a user to provide this version set explicitly, but we could default to something like the default I was describing. This alternative interpretation would mean that as_of should not allow us to ask for versions that weren't measured or should return NAs. It's a bit more complex, but also more precise/realistic.)

@ryantibs
Copy link
Member

Thanks for raising these! I'd like to talk some of these through before you guys get too far along to understand what the problems are.

I tried to design epix_slide() to be as close as possible to epi_slide(), thinking this was a feature, rather than a bug. Thus the way I see it is that the name of the function itself, and all its argument names, being close is a good thing.

Hence I like items 1 and 4 in your todo list, but I'm not sure I like the others which are suggesting we rename things (items 2, 3, 5).

Thinking back again about the output type for epix_slide() (item 6) is a good idea, though I'm not sure we'll arrive at any obvious solutions. I'd be happy to be wrong though.

Should we discuss on a call or here on GitHub? Call might be more fluid. We could combine it with discussing #64 #67 whenever you're ready.

@brookslogan
Copy link
Contributor Author

We can discuss on call.

Recording a couple vague thoughts to hash out then:

  • An alternative to n is to try to make it "true", and change the code to given them n time values... but that will be at some varying lag from the queried version, so the user will then need to be aware of and watch out for that...
  • fill_through_version(archive, fill_versions_end, how="na") will also mess up the states version default if the spacing is not the smallest possible for a given class. This could be changed to involve frequency guessing, but then its behavior would also be affected by compactify. (If we had a /set/ (not a clopen range) of observed versions + said that things were unobserved between those, then this could be a no op.)
  • time_value -> version thing: see existing bug, dev misinterpretation, forecasting&nowcasting&backcasting-output ahead/lag consistency. Archive transformers will be part of any metamodeling framework. Having geo and time value columns enforced is a little nuisance. Whether we want to compute the entire result at once, vs. previews, is a question. (Reminds of lazy merge idea as well. And early sketches of combined lazy archive merge-slide.) Think dbplyr might have a relevant interface.

@kenmawer
Copy link
Contributor

kenmawer commented Jul 20, 2022

How would item 6 even work though? I don't see this happening, unless we also the time_value feature along with the version feature.

Also, for item 1, ref_time_values should actually be ref_versions.

Finally, epix_slide won't grab the latest version; however, this is a bug beyond the scope of this update.

@ryantibs
Copy link
Member

I'll leave it up to you Logan to schedule a call whenever you feel is the right time. We can discuss on the call since I think that's the most fluid, given that see some of these ideas differently (and maybe we'll clarify some confusion along the way, from either side).

@brookslogan
Copy link
Contributor Author

brookslogan commented Jul 22, 2022

Summary from call:

  • Agreed-upon changes: clean up the n problem:
    • Remove the default n value, in both epix_slide and epi_slide
    • Replace n with [before in epix_slide and] before and after in epix_slide and epi_slide. n=7 would act like before=6, after=0. [before should have no default in either function. after should have a default of 0 in epi_slide (and doesn't exist in epix_slide).] [Edits: corrections to match Ryan's description below.]
  • Separate discussion: should we rename ref_time_values and time_value output column to something involving version [--- or as_of, to match epi_df metadata?], or keep the former and have the latter turn into two duplicate columns with both names? Should we output an epi_archive?
  • Separate discussion: should we rename max_version parameter of epix_slide to version [or as_of, to match epi_df metadata?]?
  • Separate issue: dot-prefixing args in functions with user-"controlled" ... args (their own functions' parameter names or their own column names)

I don't think many of the tasks in this issue still apply. Closing this and opening other issues.

@ryantibs
Copy link
Member

Thanks for the summary. That all sounds right except for one detail:

Default after should be 0 in epi_slide() and I think after shouldn't even be present as an arg in epix_slide().

(Subtleties for forecast data can be discussed and possible modifications made later, as part of the time vs version discussion)

@brookslogan brookslogan added wontfix This will not be worked on P0 high priority and removed P0 high priority wontfix This will not be worked on labels Aug 17, 2022
@dshemetov dshemetov mentioned this issue Aug 23, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
op-semantics Operational semantics; many potentially breaking changes here P0 high priority
Projects
None yet
Development

No branches or pull requests

3 participants