Skip to content

API server code improvements surrounding some "time" processing routines #999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
melange396 opened this issue Oct 10, 2022 · 8 comments · Fixed by #1010
Closed

API server code improvements surrounding some "time" processing routines #999

melange396 opened this issue Oct 10, 2022 · 8 comments · Fixed by #1010
Assignees
Labels
api change affect the API and its responses code health readability, maintainability, best practices, etc refactor Substantial projects to make the code do the same thing, better.

Comments

@melange396
Copy link
Collaborator

Persuant to #998, theres a lot of needlessly complicated stuff in the api server code...

This is a section of this, focused mostly around time-related objects and routines

  • a TimePair is a bit of a misnomer in that the second object of the "pair" is a date type (either 'day' or 'week'), and the other component is a possibly mixed list of pairs of dates (ranges) or individual dates. "dates" and "TimePair" objects should probably be refactored/combined into one to reduce confusion.
  • guess_time_value_is_* is a potentially dangerous cruch and should only be used if time type was not specified in the request
  • find unused methods like where_dates in server code -- linting can probably help us locate these
  • where_time_pairs() is the only thing that calls filter_time_pairs() (and theyre nowhere near each other in the code), and filter_time_pairs() is just like filter_dates() which is only called by where_dates() [which itself is never used] and by the twitter and wiki endpoints.... similarities probbly also exist for geos and for srcsigs. remove/refactor!
  • rename inconsistently named methods: time_value_to_date <==> date_to_time_value, but week_to_time_value <==> *** week_value_to_week ***
  • make an alias for the long typing sequence "Sequence[Union[Tuple[int, int], int]]" -- something like "TimeValues", add "TimeValues" to "TimePair"... then use TimePair wherever possible instead of S[U[T[i,i,i]]
  • once TimePair has replaced similar object instances, pull the _to_ranges code into it as member methods
  • is this typing alias useful or even necessary? do we ever do BETWEEN type operations on strings? "Sequence[Union[Tuple[str, str], str]"
  • there should probably never be a List[] or Sequence[] of GeoPair or TimePair or SourceSignalPair, because they have sequences inside of them, and in our schema/structure, that would require like a cross product of multiple time_types or geo_types or sources
@rzats rzats self-assigned this Oct 10, 2022
@melange396 melange396 added api change affect the API and its responses code health readability, maintainability, best practices, etc refactor Substantial projects to make the code do the same thing, better. labels Oct 10, 2022
@dshemetov
Copy link
Contributor

fwiw, if we bump to Python 3.10, we could turn Sequence[Union[Tuple[int, int], int]] into Sequence[Tuple[int, int] | int]

@dshemetov
Copy link
Contributor

wrt the last point: if someone requests multiple data sources in a single request, that will translate to a Sequence[SourceSignalPair] query (since each one has a sequence of signals, but only a single data source)

@dshemetov
Copy link
Contributor

agree with the rest (especially the confusing "pair" name)! very happy to see movement in this direction.

@dshemetov
Copy link
Contributor

dshemetov commented Oct 28, 2022

@rzats @melange396 A few ideas:

  • we carry around List[TimePair] instead of just merging their time_value fields, which complicates our interfaces
  • currently, the nice optimizations in time_value_to_range only occur in filter_dates which only gets called in twitter.py and wiki.py; these should be brought to covidcast.py at least
  • JIT would benefit a lot from having a simple TimePair object, so ideally time_value_to_range would get called immediately in _params.py:parse_time_arg and return a single TimePair object
    • a complication to that is the possibility that the user supplies both day and week time types; I think that should return an error, since we don't have indicators that support both

@melange396
Copy link
Collaborator Author

  • we carry around List[TimePair] instead of just merging their time_value fields, which complicates our interfaces

Yes! This is in the last bullet of the initial issue comment. Lets get rid of these -- searching with grep -rn \\[TimePair src/ helps locate them... but read the next section of my comments first:

  • currently, the nice optimizations in time_value_to_range only occur in filter_dates which only gets called in twitter.py and wiki.py; these should be brought to covidcast.py at least
  • JIT would benefit a lot from having a simple TimePair object, so ideally time_value_to_range would get called immediately in _params.py:parse_time_arg and return a single TimePair object

The real work in time_values_to_ranges() is done by days_to_ranges() and weeks_to_ranges(), which are also called by filter_time_pairs(). That method is then in turn called by where_time_pairs(), which itself is used a lot in covidcast.py, but it is done shortly before making a SQL query. I concur that the input processing and coalescing should happen sooner rather than later during the request/response cycle, and parse_time_pair()/parse_time_arg() in _params.py seems like the appropriate place for now... Ultimately the range coalescing code belongs in TimePair methods, as is also mentioned in one of the bullets above.

  • a complication to that is the possibility that the user supplies both day and week time types; I think that should return an error, since we don't have indicators that support both

I completely agree.

@rzats
Copy link
Contributor

rzats commented Nov 3, 2022

@dshemetov @melange396 I've done some more work on datetime refactoring, so the only remaining bullet point is around merging List[TimePair] into a single TimePair. This is pretty straightforward to do, but there are a few tests that suggest mixed day/week time types should be supported right now. I see three options here:

  • Leave the List[TimePair] as-is
  • Merge the List[TimePair] into a tuple of (Optional[TimePair], Optional[TimePair]): one for days, one for weeks.
  • Remove the tests and throw a ValidationFailedException if mixed days/weeks are encountered; but at a risk of breaking existing user workflows. In that case List[TimePair] will be merged into a single TimePair.

What do you think?

@krivard
Copy link
Contributor

krivard commented Nov 3, 2022

Mixed date types will cause problems at the client side, so we should reject those queries.

Mixed star and value types should coalesce to star if we can.

@melange396
Copy link
Collaborator Author

@rzats those tests you linked to make me think of this image that was shared with me recently XD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change affect the API and its responses code health readability, maintainability, best practices, etc refactor Substantial projects to make the code do the same thing, better.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants