Skip to content

Partitioning Direction Update #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 5, 2020

Conversation

AlSaeed
Copy link
Contributor

@AlSaeed AlSaeed commented Jul 24, 2020

No description provided.

@AlSaeed
Copy link
Contributor Author

AlSaeed commented Jul 24, 2020

The new direction updater takes an argument partitions: a list of integers in the range [0, 23] representing the indexes of the partitions to be processed. It will then run the direction updater for each one of these partitions in sequence. The default value is the list of all partition indexes.
Example:
python3 -m delphi.epidata.acquisition.covidcast.direction_updater --partitions 0 1 will process the first two partition in sequence.

AlSaeed added 2 commits July 27, 2020 13:34
Changed column names back to timestamp1, timestamp2 for testing purposes.
* Implemented the Quick Fix: Direction should be updated to NA wherever there are no historical data to compute the stdev from.
* Modified the integration test to test for this behavior
@AlSaeed AlSaeed changed the base branch from main to staging July 30, 2020 15:07
AlSaeed added 3 commits August 3, 2020 15:11
* Updated column names back to value_updated_timestamp and  direction_updated_timestamp.
@AlSaeed
Copy link
Contributor Author

AlSaeed commented Aug 3, 2020

To add the is_latest_issue column, run the two following commands:

  1. ALTER TABLE `covidcast` ADD `is_latest_issue` BINARY(1) NOT NULL DEFAULT (0);
  2. python3 -m delphi.epidata.acquisition.covidcast.fill_is_latest_issue

We can drop the fill_is_latest_issue and its corresponding integration test once they are run successfully.

@korlaxxalrok
Copy link
Contributor

@AlSaeed The new column is added. Once merged we can do the second step.

change signal type from VARCHAR(32) to VARCHAR(64) in temporary tables.

# partition configuration
PARTITION_VARIABLE = 'geo_value'
PARTITION_SPLITS = ["'05101'", "'101'", "'13071'", "'15007'", "'17161'", "'19039'", "'20123'", "'21213'", "'24035'",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you come up with these split values?

Copy link
Contributor Author

@AlSaeed AlSaeed Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were chosen so that they split a recent full issue into roughly equally-sized partitions.

finally:
# no catch block so that an exception above will cause the program to
# fail after the following cleanup
database.disconnect(commit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly consider moving this out of the loop so the db is consistent? i guess having partially updated directions isnt such a bad thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is an argument for both sides. The other side being this allows to investigate the failure and only update the remaining partitions instead of running the entire job again.

Copy link
Collaborator

@melange396 melange396 Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯\(ツ)/¯ its probably fine to leave it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way someone has to go in and fix it manually, so as long as we document what the behavior is and what should be done in a failure, it's fine.

Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice sql magic!

@krivard krivard merged commit 611d200 into cmu-delphi:staging Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants