Skip to content

Code improvements to time processing routines #1010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 9, 2022

Conversation

rzats
Copy link
Contributor

@rzats rzats commented Oct 26, 2022

Closes #999.

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

Summary

Refactors several parts of the datetime code as specified in #999. Specifically:

  • Use TimePair objects as much as possible instead of lists of dates or individual dates
  • Only use guess_time_value_is_* when absolutely necessary
  • Remove unused methods like where_dates
  • Rename inconsistently named methods: week_value_to_week, etc.
  • Make an alias for the long typing sequence "Sequence[Union[Tuple[int, int], int]]", use it as much as possible

@rzats rzats force-pushed the rzatserkovnyi/datetime-refactor branch 2 times, most recently from 29254b4 to 78f2bce Compare October 27, 2022 13:35
@rzats rzats changed the title [WIP] Code improvements to time processing routines Code improvements to time processing routines Oct 27, 2022
@rzats rzats requested a review from melange396 October 27, 2022 13:49
@melange396
Copy link
Collaborator

This is teriffic! Everything looks great, especially the changes in src/server/endpoints/covidcast.py which make me happy :)

There might be a couple more changes w/ the TimeValues alias that could be made:

  • i think you can do a replacement to TimeValues at src/server/_query.py:191, or maybe even entirely remove the call to cast()
  • a couple List[Union[Tuple[int, int], int]]s might be able to come out too:
    • can the issues argument to src/server/endpoints/covidcast.py:_handle_lag_issues_as_of() be a TimeValues?
    • can values in src/server/_validate.py:extract_dates() be hinted to be type TimeValues? i think you can also remove DateRange from src/server/_validate.py:144 and then change src/server/_validate.py:extract_dates() to return Optional[TimeValues]

Then there are a few remaining bullets in #999...

  • Can you dig around and see if we really need the instances of Sequence[Union[Tuple[str, str], str] or can we just use Sequence[str] in its place instead?
  • There is also the discussion of the lists/sequences of TimePairs -- if changing that stuff out looks like a big job, we could move it to a different PR.

Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, i had these comments pending and forgot to send them!

requirements.txt Outdated
@@ -11,3 +11,4 @@ scipy==1.6.2
tenacity==7.0.0
newrelic
epiweeks==2.1.2
typing-extensions==4.4.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe drop the version specification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, isn't best practice to pin the version in case newer updates unexpectedly break?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the most part, yeah, but this is a fairly fundamental and python-authored package that i expect to "do the right thing" as it matures. pinning can also have negative consequences where you can get stuck with vulns or bugs or poor performance implementations. so... im not actually certain on what the best thing to do here, hence the "maybe" ¯_(ツ)_/¯

@@ -267,13 +267,13 @@ def parse_day_or_week_arg(key: str, default_value: Optional[int] = None) -> Tupl
return parse_week_arg(key), False
return parse_day_arg(key), True

def parse_day_or_week_range_arg(key: str) -> Tuple[Tuple[int, int], bool]:
def parse_day_or_week_range_arg(key: str) -> TimePair: # Tuple[Tuple[int, int], bool]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! great find & replacement :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you can also get rid of that comment w/ the old type hint)

@rzats
Copy link
Contributor Author

rzats commented Nov 3, 2022

@melange396 made a few more changes to the PR:

  • All of your changes to TimeValues can be implemented, and parse_day_or_week_arg can be converted into TimePair from a tuple too :)
  • Tuple[str, str] can be safely removed; I've taken a look at all of the calls to where_strings/filter_strings and those only deal with Sequence[str]. (Also, SQL's BETWEEN behaves a bit oddly with strings - BETWEEN 'A' AND 'B' will include all the strings that start with A but exclude the ones the start with B, which feels counterintuitive.)
  • As for the sequences of TimePair, see API server code improvements surrounding some "time" processing routines #999 (comment). tl;dr: It's not too hard to implement, but some tests suggest that List[TimePair] should be supported.

@rzats rzats requested review from melange396 and dshemetov November 3, 2022 17:09
Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, modulo a few small details

"""
if not pair:
return "FALSE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is this a good code path to have? AFAICT there's an exception raised in covidcast.py:parse_time_pairs if you don't have either: a) time=... or b) time_type=... and time_values=.... So this will technically never happen in the API. So:

  • Pro: function safety, can be reused in the future in other places that didn't have the same validation
  • Con: confusing now, because it makes you think this is a possible code path

not super blocking, i can go either way, but curious what everyone thinks here, cc @melange396 @krivard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you've mentioned, I'd rather have the extra failsafe for function safety so I kept this for now (but it should be trivial to remove if needed)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not actually sure what would be most correct here... it could arguably even be reasonable to return "TRUE" -- if we are not filtering on any time, then all time values are valid.

since it is ambiguous, i think its good to leave that condition there but but also add a comment to the effect of "time values are required by the API, so this case should never be reached, but this check further ensures the desired behavior"

@rzats rzats requested a review from dshemetov November 3, 2022 22:29
@rzats
Copy link
Contributor Author

rzats commented Nov 3, 2022

@dshemetov I've addressed your comments! (except for the if not pair, return false code path for now)

dshemetov
dshemetov previously approved these changes Nov 4, 2022
Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

melange396
melange396 previously approved these changes Nov 8, 2022
Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great work, thank you for doing it. Do you feel like youve reduced the total complexity of the api server code? I feel like you did, and it makes me cringe at the api code less than before. :)

This all looks quite good to me, but i have a couple optional things you could look at; if you are getting burned out on this PR and wanna move on to something else, i can just turn them into new issues:

  • We've made a bunch of changes to the type hinting stuff -- is all that still happy? Is your IDE set up to show mismatches and whatnot? I tried to run a type checker from the command line (with some limited success) and among other things i saw, it suggests it might help to change the type alias for TimePair to use List[...] instead of Sequence[...]
  • Move the *_to_ranges() methods (and similar) to TimePair instance methods, so you can have callers be agnostic to week/day/whatever, then (for example) in _query.py:filter_time_pair(), instead of ranges = weeks_to_ranges(pair.time_values) if pair.is_week else days_to_ranges(pair.time_values) you can just do ranges = pair.to_ranges() and to_ranges() does the appropriate thing for the time type of the TimePair.
  • Rename TimePair (and also various instances of time_pair variable names or similar) to something more appropriate. Youve done the hard work to make it better, you should get to choose a new class name.

Im gonna mark this as approved, but you should also add the comment in _params.py, on the point Dmitry brought up.

"""
if not pair:
return "FALSE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not actually sure what would be most correct here... it could arguably even be reasonable to return "TRUE" -- if we are not filtering on any time, then all time values are valid.

since it is ambiguous, i think its good to leave that condition there but but also add a comment to the effect of "time values are required by the API, so this case should never be reached, but this check further ensures the desired behavior"

@rzats rzats dismissed stale reviews from melange396 and dshemetov via 2a0d51a November 8, 2022 21:05
@rzats rzats merged commit ccad94a into dev Nov 9, 2022
@rzats rzats deleted the rzatserkovnyi/datetime-refactor branch November 9, 2022 14:48
@rzats rzats mentioned this pull request Nov 10, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API server code improvements surrounding some "time" processing routines
3 participants