Skip to content

v4 changes #805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
10 tasks
krivard opened this issue Jan 10, 2022 · 2 comments · Fixed by #954
Closed
10 tasks

v4 changes #805

krivard opened this issue Jan 10, 2022 · 2 comments · Fixed by #954
Assignees

Comments

@krivard
Copy link
Contributor

krivard commented Jan 10, 2022

This will be necessary to handle the additional data load from DSEW-CPR and several other planned indicator additions. These changes complete the scalability requirements of the data scaling PRD, but not the expressiveness requirements -- we’ll layer on expressiveness later. This release will include only the following changes:

  • split off signal and geo dimension tables
  • split off a latest-only table
  • create views which query like the existing table (minimizing necessary API server changes)

Four parts:

  1. ddl
  2. acquisition
  3. API server
  4. Stress test

1: ddl - @jgreene1959

  • Changes to covidcast.sql
  • 0.3 -> 0.4 migration -- unlike previous migrations this will actually involve substantial data mutation, which Joe has a draft of in Python

2: acquisition - @jgreene1959 + @melange396 to pair

  • convert csv_import and database.py to do initial import into a load table instead of the main fact tables
  • add a new acquisition phase "loader" which does the dimension table updates, puts new rows from the load table into the latest and history fact tables, and records some stats in a job log and meta data etc
  • new loader unit tests

3: api server - @melange396

  • remove USE INDEX
  • point latest queries to latest view
  • point as-of, issue, and lag queries to history view
  • possibly reformulate metadata query? address this after acquisition is settled
  • update api server unit tests

4: stress test in qa environment with replication

  • a day's worth of CSV imports - Katie to collect
  • playback a day's worth of traffic - George to collect
  • a batch issue upload - Katie to collect
  • metadata - we get this for free in Loader, but we should time it anyway
@krivard
Copy link
Contributor Author

krivard commented Jan 12, 2022

@chinandrew

@krivard
Copy link
Contributor Author

krivard commented Jan 13, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants