Skip to content

Backfill SafeGraph 7-day averages, HRRs, and MSAs to cover entire time period #664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
capnrefsmmat opened this issue Dec 26, 2020 · 5 comments
Assignees
Labels
data quality Missing data, weird data, broken data

Comments

@capnrefsmmat
Copy link
Contributor

Actual Behavior:

#332 introduced SafeGraph 7dav signals; #416 also added HRR and MSA aggregation. However, these were not backfilled for the entire history of SafeGraph data, resulting in this:

                        signal geo_type   min_time
9         completely_home_prop   county 2019-01-01
10        completely_home_prop      hrr 2020-11-01
11        completely_home_prop      msa 2020-11-01
12        completely_home_prop    state 2019-01-01
13   completely_home_prop_7dav   county 2020-10-01
14   completely_home_prop_7dav      hrr 2020-11-01
15   completely_home_prop_7dav      msa 2020-11-01
16   completely_home_prop_7dav    state 2020-10-01

Expected behavior

All SafeGraph signals have consistent min_time across geographic levels.

@capnrefsmmat capnrefsmmat added the data quality Missing data, weird data, broken data label Dec 26, 2020
@nmdefries nmdefries removed their assignment Dec 27, 2020
@chinandrew
Copy link
Contributor

hrr_completely_home_prop* (including 7dav) files from before 20201101, msa_completely_home_prop* (including 7dav) files from before 20201101, state_completely_home_prop_7dav* files from before 20201001, and county_completely_home_prop_7dav* files from before 20201001 are dropped into receiving/ and should be picked up tomorrow

@chinandrew
Copy link
Contributor

This should be resolved. Not sure metadata has updated yet though.

@krivard
Copy link
Contributor

krivard commented Jan 6, 2021

I don't expect meta to finish until 3. In the meantime your update comment is suspiciously light -- are the completely_home signals the only ones updated? I think Alex's quote was supposed to just be an example; we really do want all safegraph signals to go back to 1/2019.

The database shows the following, if that helps:

MariaDB [epidata]> select `signal`,geo_type,min(time_value) start from covidcast where source="safegraph" group by `signal`,geo_type order by start;
+-------------------------------+----------+----------+
| signal                        | geo_type | start    |
+-------------------------------+----------+----------+
| completely_home_prop_7dav     | msa      | 20190101 |
| completely_home_prop_7dav     | hrr      | 20190101 |
| median_home_dwell_time        | county   | 20190101 |
| completely_home_prop_7dav     | county   | 20190101 |
| completely_home_prop          | state    | 20190101 |
| completely_home_prop          | msa      | 20190101 |
| part_time_work_prop           | state    | 20190101 |
| completely_home_prop          | hrr      | 20190101 |
| completely_home_prop          | county   | 20190101 |
| restaurants_visit_prop        | state    | 20190101 |
| bars_visit_prop               | state    | 20190101 |
| part_time_work_prop           | county   | 20190101 |
| restaurants_visit_prop        | msa      | 20190101 |
| bars_visit_prop               | msa      | 20190101 |
| full_time_work_prop           | state    | 20190101 |
| restaurants_visit_prop        | hrr      | 20190101 |
| bars_visit_prop               | hrr      | 20190101 |
| restaurants_visit_prop        | county   | 20190101 |
| bars_visit_prop               | county   | 20190101 |
| restaurants_visit_num         | state    | 20190101 |
| bars_visit_num                | state    | 20190101 |
| full_time_work_prop           | county   | 20190101 |
| restaurants_visit_num         | msa      | 20190101 |
| bars_visit_num                | msa      | 20190101 |
| restaurants_visit_num         | hrr      | 20190101 |
| bars_visit_num                | hrr      | 20190101 |
| restaurants_visit_num         | county   | 20190101 |
| bars_visit_num                | county   | 20190101 |
| median_home_dwell_time        | state    | 20190101 |
| completely_home_prop_7dav     | state    | 20190101 |
| part_time_work_prop_7d_avg    | county   | 20200901 |
| full_time_work_prop_7d_avg    | state    | 20200901 |
| full_time_work_prop_7d_avg    | county   | 20200901 |
| median_home_dwell_time_7d_avg | state    | 20200901 |
| median_home_dwell_time_7d_avg | county   | 20200901 |
| completely_home_prop_7d_avg   | state    | 20200901 |
| completely_home_prop_7d_avg   | county   | 20200901 |
| part_time_work_prop_7d_avg    | state    | 20200901 |
| part_time_work_prop_7dav      | state    | 20201001 |
| part_time_work_prop_7dav      | county   | 20201001 |
| full_time_work_prop_7dav      | state    | 20201001 |
| full_time_work_prop_7dav      | county   | 20201001 |
| median_home_dwell_time_7dav   | state    | 20201001 |
| median_home_dwell_time_7dav   | county   | 20201001 |
| median_home_dwell_time        | msa      | 20201101 |
| median_home_dwell_time        | hrr      | 20201101 |
| part_time_work_prop_7dav      | msa      | 20201101 |
| part_time_work_prop_7dav      | hrr      | 20201101 |
| full_time_work_prop_7dav      | msa      | 20201101 |
| part_time_work_prop           | msa      | 20201101 |
| full_time_work_prop_7dav      | hrr      | 20201101 |
| part_time_work_prop           | hrr      | 20201101 |
| full_time_work_prop           | msa      | 20201101 |
| full_time_work_prop           | hrr      | 20201101 |
| median_home_dwell_time_7dav   | msa      | 20201101 |
| median_home_dwell_time_7dav   | hrr      | 20201101 |
+-------------------------------+----------+----------+

@chinandrew
Copy link
Contributor

Ah ok I misunderstood, sorry about that. I thought it was just the ones in the original comment. I'll drop the rest in now.

@chinandrew
Copy link
Contributor

Files look to be ingested back to 2019-01-01 for al safegraph signals:

    data_source                       signal   min_time
539   safegraph               bars_visit_num 2019-01-01
565   safegraph       median_home_dwell_time 2019-01-01
566   safegraph       median_home_dwell_time 2019-01-01
567   safegraph  median_home_dwell_time_7dav 2019-01-01
568   safegraph  median_home_dwell_time_7dav 2019-01-01
569   safegraph  median_home_dwell_time_7dav 2019-01-01
570   safegraph  median_home_dwell_time_7dav 2019-01-01
571   safegraph          part_time_work_prop 2019-01-01
572   safegraph          part_time_work_prop 2019-01-01
573   safegraph          part_time_work_prop 2019-01-01
564   safegraph       median_home_dwell_time 2019-01-01
574   safegraph          part_time_work_prop 2019-01-01
576   safegraph     part_time_work_prop_7dav 2019-01-01
577   safegraph     part_time_work_prop_7dav 2019-01-01
578   safegraph     part_time_work_prop_7dav 2019-01-01
579   safegraph        restaurants_visit_num 2019-01-01
580   safegraph        restaurants_visit_num 2019-01-01
581   safegraph        restaurants_visit_num 2019-01-01
582   safegraph        restaurants_visit_num 2019-01-01
583   safegraph       restaurants_visit_prop 2019-01-01
584   safegraph       restaurants_visit_prop 2019-01-01
575   safegraph     part_time_work_prop_7dav 2019-01-01
563   safegraph       median_home_dwell_time 2019-01-01
562   safegraph     full_time_work_prop_7dav 2019-01-01
561   safegraph     full_time_work_prop_7dav 2019-01-01
540   safegraph               bars_visit_num 2019-01-01
541   safegraph               bars_visit_num 2019-01-01
542   safegraph               bars_visit_num 2019-01-01
543   safegraph              bars_visit_prop 2019-01-01
544   safegraph              bars_visit_prop 2019-01-01
545   safegraph              bars_visit_prop 2019-01-01
546   safegraph              bars_visit_prop 2019-01-01
547   safegraph         completely_home_prop 2019-01-01
548   safegraph         completely_home_prop 2019-01-01
549   safegraph         completely_home_prop 2019-01-01
550   safegraph         completely_home_prop 2019-01-01
551   safegraph    completely_home_prop_7dav 2019-01-01
552   safegraph    completely_home_prop_7dav 2019-01-01
553   safegraph    completely_home_prop_7dav 2019-01-01
554   safegraph    completely_home_prop_7dav 2019-01-01
555   safegraph          full_time_work_prop 2019-01-01
556   safegraph          full_time_work_prop 2019-01-01
557   safegraph          full_time_work_prop 2019-01-01
558   safegraph          full_time_work_prop 2019-01-01
559   safegraph     full_time_work_prop_7dav 2019-01-01
560   safegraph     full_time_work_prop_7dav 2019-01-01
585   safegraph       restaurants_visit_prop 2019-01-01
586   safegraph       restaurants_visit_prop 2019-01-01

@krivard krivard closed this as completed Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality Missing data, weird data, broken data
Projects
None yet
Development

No branches or pull requests

4 participants