Skip to content

Spin up a wee server to run COVIDcast queries and generate CSV files #265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krivard opened this issue Sep 4, 2020 · 13 comments
Closed
Assignees

Comments

@krivard
Copy link
Contributor

krivard commented Sep 4, 2020

Requires three components:

  1. Some kind of UI to configure the query
  2. A python CGI to run it and generate the CSV data from pandas
  3. Appropriate MIME type support in the web server
@capnrefsmmat
Copy link
Contributor

Here's a sketch of a design. I already cooked up a tiny Flask app with a very minimal HTML form that then downloads the data as CSV -- this is what it could be with some JS and extra work.

The idea is to have a "wizard" that shows all sources, all signals (or at least the good ones... we can curate) with brief descriptions, then option to download as CSV or get code for the R and Python clients. We'd want to include links to the API docs (for signal details) and quick links to see each signal on the map, too.

Paper Delphi_designs 3

Somewhere on the map, maybe in the same area we have all the line graphs, we can have a "Download this data" button that leads directly to this.

I wonder if this could be built directly into the viz tool as a wizard that appears when you press the Download button.

@capnrefsmmat
Copy link
Contributor

@ryantibs @adamperer Wondering what you think about having the above as part of the interactive map, through a button somewhere, and its relative priority compared to other current tasks. I assume this would take a viz team member some time to build, plus some engineering time to deploy a CSV downloader on a server and ensure it can handle the load, so it's not completely free.

@adamperer
Copy link

@sgratzl is this something you would have time to help implement? We've gotten requests from certain organizations that do not have technical ability to use JSON APIs but want CSVs. If you have cycles to spare on this, it'd be great to get your help.

@sgratzl
Copy link
Member

sgratzl commented Sep 7, 2020

can you summarize how it should look like and what it should do? are those all just queries against all regions?. Since the API doesn't support pagination there is a maximal row limit. where the server and how is it deployed?

@ryantibs
Copy link
Member

ryantibs commented Sep 7, 2020

This looks and sounds great to me; to avoid too-many-cooks, I'll stay out of it until/unless you ask me for feedback.

Just my quick 2c for now: if we give access to all signals through their raw signal names through this interface, it could be confusing to the lay user. (We have a lot to sort through; yes you could point them to the documentation, but that would just be more to slow the lay user down.) So I recommend listing just a few of the "primetime" signals for each data sources (and then maybe having a button to reveal all signals, if they want to see more).

@capnrefsmmat
Copy link
Contributor

I think, as a first draft, we can use all the signals exposed on the map, since we already have nice descriptions written.

@sgratzl I spun up a demo server here: http://rosmarus.refsmmat.com:5000/

It's just a tiny Flask app that uses the COVIDcast package to fetch the data and return a CSV. It's slow when you fetch many days of data because it takes many API calls. But when we deploy it, maybe we could special-case it or figure out pagination.

I can make a more detailed sketch than the one I posted above, but basically it just provides a selection of signals, lets the user configure a date range, and then offers either a CSV download or the right R or Python code to fetch that data. It can offer the CSV download by constructing the right URL for the Flask app.

@sgratzl
Copy link
Member

sgratzl commented Sep 8, 2020

take a look at cmu-delphi/www-covidcast#422

@korlaxxalrok
Copy link
Contributor

This application should be up, at least basically, at https://delphi.cmu.edu/csvomatic.

@capnrefsmmat Do we need to restart this nightly?

New bits and bobs in the repo:

  • One small addition to the app to make it easy to run with gunicorn.
  • Jenkins and Ansible necessities to aid with CI/CD.
  • Some small structural changes to the repo directory structure.

CI/CD process:

  • Make a branch from main.
  • Make your changes.
  • Submit a PR against main and Jenkins will run the build/test/package phases of its pipeline.
  • Merge to main and Jenkins will run the deploy phase.

Nitty gritty:

  • It is currently being deployed to the Delphi primary server.
  • Lives in a new user account of deploy.
  • Has a systemd service of covidcast-csv-server.service that handles start/stop/etc.
    • This is restarted during deployments.
  • Nginx proxy handles routing traffic to it.

This is just one way to run it and a decent enough place to start. Can shift it to something else in the future.

@capnrefsmmat
Copy link
Contributor

Excellent, thanks.

I don't think nightly restarting is necessary. The user interface relies on the signal metadata to know what signals to present, and that will go out of date quickly if not restarted -- but we won't be using that UI in production, instead using the one Sam built in the COVIDcast app: cmu-delphi/www-covidcast#422

Maybe just use gunicorn's max-requests options so it restarts workers occasionally? That will guard against any other breakage-causing problems, and will ensure the built-in UI is vaguely up-to-date.

@sgratzl
Copy link
Member

sgratzl commented Sep 14, 2020

note the CSV download itself it available at https://delphi.cmu.edu/csv

@krivard
Copy link
Contributor Author

krivard commented Sep 14, 2020

@sgratzl I get a 502 Bad Gateway at that address

@sgratzl
Copy link
Member

sgratzl commented Sep 14, 2020

if you go to https://delphi.cmu.edu/csvomatic and download the CSV file its URL is: https://delphi.cmu.edu/csv?signal=doctor-visits:smoothed_cli&start_day=2020-08-15&end_day=2020-09-14&geo_type=state.

@krivard
Copy link
Contributor Author

krivard commented Sep 30, 2020

this went out in 1.9

@krivard krivard closed this as completed Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants