Skip to content

Add covid_hosp_facility lookup integration tests #1030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 17, 2022

Conversation

krivard
Copy link
Contributor

@krivard krivard commented Nov 16, 2022

Extends #1016

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

Summary

Integration tests

  • mixed-facility-metadata inputs: most-recent collection week should win
    • two lines with same pk but different addresses and first cw > second cw
    • two lines with same pk but different addresses and first cw < second cw
  • facility-metadata updates: most-recent collection week should win

@krivard krivard marked this pull request as ready for review November 16, 2022 20:34
@krivard krivard requested a review from melange396 November 16, 2022 20:36
Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great stuff, but check my comment @ database.py:181 to make sure im not imagining things

# ...and string of VALUES placeholders
values_str = ','.join( ['%s'] * len(ak_cols) )
# use aggregate key table alias
ak_table = self.table_name + '_key'
# assemble full SQL statement
ak_insert_sql = f'INSERT INTO `{ak_table}` ({ak_cols_str}) VALUES ({values_str}) ON DUPLICATE KEY UPDATE {ak_updates_str}'
ak_insert_sql = f'INSERT INTO `{ak_table}` ({ak_cols_str}) VALUES ({values_str}) as v ON DUPLICATE KEY UPDATE {ak_updates_str}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ak_insert_sql = f'INSERT INTO `{ak_table}` ({ak_cols_str}) VALUES ({values_str}) as v ON DUPLICATE KEY UPDATE {ak_updates_str}'
ak_insert_sql = f'INSERT INTO `{ak_table}` ({ak_cols_str}) VALUES ({values_str}) AS v ON DUPLICATE KEY UPDATE {ak_updates_str}'

Comment on lines 181 to 182
.sort_values(self.KEY_COLS)[ak_cols]
.drop_duplicates())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice, so any later-dated rows for the same PK will go into the INSERT...ON DUPLICATE after the earlier ones, and thus the newest data will be preserved in the ak_table! it might be worth leaving a note to that effect? also (im beginning to hate this data set) should we apply the dataframe_typemap before this sort, because '2020-10-30'<'2020/09/07' but not '20201030'<'20200907'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we apply the dataframe_typemap before this sort

Good call. We've never seen dates without delimiters in the source data, but we have seen both slashes and dashes, and

>>> "2/3/4" > "2-3-5"
True
>>> "2/3/4" > "2/3/5"
False

😩

@krivard krivard merged commit 9e11ada into covid_hosp_lookup_speedup Nov 17, 2022
@krivard krivard deleted the krivard/covid_hosp_facility-lookup branch November 17, 2022 18:04
krivard added a commit that referenced this pull request Feb 27, 2023
* Switch to executemany
* Add new limited_geocode datatype
* Fix test prep missing from #1030
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants