Pending updates to columns in db_signals.csv #1442

melange396 · 2024-05-18T04:37:36Z

This PR is derived from #1434; i removed all of the new columns but this should include all of the changes to the existing columns (except Available Geography, more on that in a bit).

Please let me know if we need to fix any of these -- the summary of differences appears to me to be:

"day"' removed from Time Type column, replaced with empty string: (this seems like it was accidental)
- dsew-cpr:confirmed_admissions_covid_1d_7dav
~~"n/a" removed from Pathogen/Disease Area column, replaced with empty string: (these seem intentional)~~
- ~~nchs-mortality:deaths_allcause_incidence_num~~
- ~~nchs-mortality:deaths_allcause_incidence_prop~~
- ~~nchs-mortality:deaths_percent_of_expected~~
- ~~safegraph-daily:completely_home_prop~~
- ~~safegraph-daily:completely_home_prop_7dav~~
- ~~safegraph-daily:full_time_work_prop~~
- ~~safegraph-daily:full_time_work_prop_7dav~~
- ~~safegraph-daily:median_home_dwell_time~~
- ~~safegraph-daily:median_home_dwell_time_7dav~~
- ~~safegraph-daily:part_time_work_prop~~
- ~~safegraph-daily:part_time_work_prop_7dav~~
- ~~safegraph-weekly:bars_visit_num~~
- ~~safegraph-weekly:bars_visit_prop~~
- ~~safegraph-weekly:restaurants_visit_num~~
- ~~safegraph-weekly:restaurants_visit_prop~~
Newline added at the end of the file (this is effectively inconsequential)

The Available Geography column has some sweeping changes applied to it... In one example from chng, the text was modified from county,hhs,hrr,msa,nation,state to county, hrr (by Delphi), msa (by Delphi), state (by Delphi), hhs (by Delphi), nation (by Delphi). I believe this signifies that only county data came from the source, and we computed the various other higher levels of geo aggregation. This is valuable information, but i would suggest we keep the column the way it was and create a new column called something like Geographies aggregated by Delphi or Post-aggregated geographies that lists the geography types that were extrapolated by us. There are a few reasons for doing it this way, including that (i believe) the Signal Documentation app expects the structured comma-separated text without the extra annotations as it was before, and that representing the same information in its own column should save some space. If you agree with this, let me know as i think i should be able to apply those changes pretty easily. Also, some entries (like quidel for instance) have " (by Delphi)" attached to every geography in the list; that suggests to me that we did aggregations to produce county-level data from finer-grained locations, but i didn't think that was the case.

…al Type', 'Time Type', 'Is Weighted', 'Is Cumulative', 'Has StdErr', 'Has Sample Size'

melange396 · 2024-05-18T05:06:57Z

Just kidding! Those n/a values were not actually removed in the source spreadsheet nor in #1434 -- i inadvertently stripped them due to the way i imported the csv files... I edited the above message to strikethrough the irrelevant text.

melange396 · 2024-05-20T19:38:05Z

Here is some code that you can paste into a python interpreter to see the (correct) list of differences:

import pandas as pd

base_url = 'https://github.com/cmu-delphi/delphi-epidata/raw/{}/src/server/endpoints/covidcast_utils/db_signals.csv'

current = pd.read_csv(base_url.format('dev'), na_filter=False)
proposed = pd.read_csv(base_url.format('bot/update-docs'), na_filter=False)

new_cols = set(proposed.columns) - set(current.columns)
print(new_cols)

non_matching = (proposed[current.columns] != current)
diffs_per_col = non_matching.apply(sum)
print(diffs_per_col)

mismatched_time = pd.concat([current[['Source Subdivision', 'Signal']], non_matching[['Time Type']]], axis=1)
print(mismatched_time[mismatched_time['Time Type']])

sonarqubecloud · 2024-05-20T19:48:49Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

melange396 · 2024-05-20T19:50:08Z

and then the csv in this PR was produced by following the above code snippet with this:

intermediate = proposed[current.columns]
intermediate['Available Geography'] = current['Available Geography']
intermediate.to_csv('intermediate.csv', index=False)
import os
for _ in range(2):
    os.system("sed -i 's/,False,/,FALSE,/g' intermediate.csv")
    os.system("sed -i 's/,True,/,TRUE,/g' intermediate.csv")

melange396 · 2024-05-28T19:42:05Z

the source data in the google sheet has changed since this was done; closing this PR to create a new one...

updates to columns: 'base_is_other', 'Pathogen/\nDisease Area', 'Sign…

17f3e20

…al Type', 'Time Type', 'Is Weighted', 'Is Cumulative', 'Has StdErr', 'Has Sample Size'

melange396 added chore api change affect the API and its responses code health readability, maintainability, best practices, etc data quality labels May 18, 2024

melange396 requested review from tinatownes, nmdefries and carlynvandyke May 18, 2024 04:37

melange396 mentioned this pull request May 18, 2024

Update Google Docs Meta Data #1434

Closed

fix dropped 'n/a'

4643876

melange396 closed this May 28, 2024

melange396 mentioned this pull request May 28, 2024

incorporating other changes prior to new column importation #1450

Merged

nmdefries deleted the piecemeal_db_signals_updates branch June 5, 2024 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pending updates to columns in db_signals.csv #1442

Pending updates to columns in db_signals.csv #1442

Uh oh!

melange396 commented May 18, 2024 •

edited

Loading

Uh oh!

melange396 commented May 18, 2024

Uh oh!

melange396 commented May 20, 2024

Uh oh!

sonarqubecloud bot commented May 20, 2024

Uh oh!

melange396 commented May 20, 2024

Uh oh!

melange396 commented May 28, 2024

Uh oh!

Uh oh!

Pending updates to columns in db_signals.csv #1442

Pending updates to columns in db_signals.csv #1442

Uh oh!

Conversation

melange396 commented May 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melange396 commented May 18, 2024

Uh oh!

melange396 commented May 20, 2024

Uh oh!

sonarqubecloud bot commented May 20, 2024

Quality Gate passed

Uh oh!

melange396 commented May 20, 2024

Uh oh!

melange396 commented May 28, 2024

Uh oh!

Uh oh!

melange396 commented May 18, 2024 •

edited

Loading