Skip to content

Add CovidcastRow testing util and a few other changes #1044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 6, 2023
Merged

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Dec 5, 2022

I made a number of QoL changes to the tests and the server in the process of working on JIT (see also #949). I decided to separate them out, so we can release them separately from JIT and help debug if anything goes wrong.

Built on #1072.

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

Summary

  • Adds CovidcastRow testing dataclass, which serves as an abstraction for a row in the database
    • Supports many constructors (from arguments, from dictionaries)
    • Brings many defaults, so tests only need to specify the fields they care about
    • Has a list class companion CovidcastRows that helps in making a bundle of rows
    • Supports three views as a DataFrame, with consistently enforced data types for each column
  • Integrates this class into tests across the codebase
  • Integrates this class into database.py in acquisition
    • Need to update CovidcastRow to remove default values for safety
  • Adds more_itertools dependency, as it's needed in a few tests and needed later in JIT
  • Adds PANDAS_DTYPES to model.py to synchronize the datatypes used by Pandas in JIT and in CovidcastRow (will remove, if JIT does not end up using Pandas)
  • Removes some unused imports in a few server files
  • Fixes a Pandas warning due to assigning on a view in test_csv_uploading

@dshemetov dshemetov mentioned this pull request Dec 5, 2022
5 tasks
@dshemetov dshemetov force-pushed the ds/merge-docker-web-python branch 2 times, most recently from 08b4589 to 8da9b0f Compare December 5, 2022 23:47
@dshemetov dshemetov force-pushed the ds/covidcast-row branch 2 times, most recently from ec5c6c6 to 163185f Compare December 6, 2022 00:33
@krivard krivard requested a review from rzats December 6, 2022 15:13
Base automatically changed from ds/merge-docker-web-python to dev December 6, 2022 19:52
@krivard
Copy link
Contributor

krivard commented Dec 8, 2022

Oh no, #1043 was squash-merged, so now git won't be able to automatically identify any commits from this PR that were also in that one. You'll need to rebase -i and drop the following:

Squash merging is great if you want to quickly tidy away details of the development process that are no longer relevant, but if you're still working on the branch (like this, with several PRs open that chain off one another), they make a mess. Squash with caution!

@dshemetov
Copy link
Contributor Author

True, fortunately I'm ok with rebasing and force pushing into my branches, since no one else develops on these.

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review pass up through ~half of covidcast_row.py

We should probably talk in a synchronous format about how to architect CovidcastRow for use in production vs tests; this complete overlap approach makes me hella nervous

@dshemetov dshemetov changed the base branch from dev to ds/refactor-csv-importer January 21, 2023 13:43
@dshemetov dshemetov force-pushed the ds/covidcast-row branch 3 times, most recently from 3bcc0cb to 703d57c Compare January 21, 2023 14:32
@melange396
Copy link
Collaborator

looking at this now... do you wanna base it against dev? that way you might be able to keep the diff on #1072 smaller

Base automatically changed from ds/refactor-csv-importer to dev January 27, 2023 20:49
Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fantastic! it needed to happen and thank you for taking care of it. CovidcastRow came out nice and OO, i really like it.

but im sorry :( there are a whopping 35 comments on this.... a lot are real easy: 3 are just notes you can dismiss, and 17 are nits (almost all w/ pre-populated suggestions). i jumped around a lot while reviewing (not in file-order), but hopefully it should all make sense. the meatier things are: the schema of columns in CovidcastRow, things that may or may not belong in/with the testing base class, semantics around handling of some values, and some readability stuff. i also went a little crazy with the underscores in the date ints.

i am also very happy to get on a zoom to pair up to discuss any or all of my comments.

i must say for the record that i dont love the type annotations everywhere, especially because we are not doing any real rigorous checking or enforcement of them, but i will assume your IDE is telling you that they all check out... i feel a bit like its a plague of locusts coming to descend upon me and maybe i cant get away from it

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these are tagging places where we could drop in as_api_compatibility_dict for as_dict, but I also have a concern about signing future-us up for more test revision hell the next time we do a major schema change

Comment on lines 29 to 53
"source": "src",
"signal": "sig",
"time_type": "day",
"geo_type": "county",
"time_value": 20200202,
"geo_value": "01234",
"value": 5.0,
"stderr": 10.0,
"sample_size": 10.0,
"missing_value": Nans.NOT_MISSING,
"missing_stderr": Nans.NOT_MISSING,
"missing_sample_size": Nans.NOT_MISSING,
"issue": 20200202,
"lag": 0,
"direction": None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concern: tedious consequences

see "tedious consequences" above -- if any of our defaults ever change, we'll need to update them in 2 places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've addressed this by writing a single expected_df variable at the top of the class and having all the test cases draw from this example.

@dshemetov
Copy link
Contributor Author

Thanks for all the review comments! I believe I have addressed all of them and synced with @melange396 on it yesterday. I rebased and force pushed to clean up the review commit log. Should be good to merge.

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glorious triumph 💯

@krivard krivard merged commit 766959b into dev Feb 6, 2023
@krivard krivard deleted the ds/covidcast-row branch February 6, 2023 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants