Skip to content

Update Google Docs Meta Data #1452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2024
Merged

Update Google Docs Meta Data #1452

merged 1 commit into from
May 28, 2024

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented May 28, 2024

Updating Google Docs Meta Data

  • new columns added for signal discovery reasons
  • many Available Geography lists got reordered

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@melange396
Copy link
Collaborator

New columns added (in alphabetic order) :

  • Data Censoring
  • Delphi-Aggregated Geography
  • Demographic Breakdowns
  • Demographic Scope
  • Geographic Scope
  • License
  • Link to DUA
  • Missingness
  • Reporting Cadence
  • Severity Pyramid Rungs
  • Temporal Scope End
  • Temporal Scope End Note
  • Temporal Scope Start
  • Temporal Scope Start Note
  • Typical Reporting Lag
  • Typical Revision Cadence
  • Use Restrictions
  • Who may access this signal?
  • Who may be told about this signal?

Additionally, some orderings of the comma-separated values under column Available Geography have changed, but this should be inconsequential.

@melange396
Copy link
Collaborator

above items verified with this code:

import pandas as pd

base_url = 'https://github.com/cmu-delphi/delphi-epidata/raw/{}/src/server/endpoints/covidcast_utils/db_signals.csv'

current = pd.read_csv(base_url.format('dev'), na_filter=False)
proposed = pd.read_csv(base_url.format('bot/update-docs'), na_filter=False)

# this code assumes columns have not been removed or renamed,
# and that no new rows have been added or had their ordering changed

new_cols = set(proposed.columns) - set(current.columns)
print("new cols:", sorted(new_cols))

non_matching = (proposed[current.columns] != current)
diffs_per_col = non_matching.apply(sum)
print(diffs_per_col)
# => 400, only in 'Available Geography'

# produce alpha-sorted "Available Geography" from each csv for accurate comparison purposes:
current_ag_norm = current['Available Geography'].apply(lambda g: ','.join(sorted(g.split(','))))
proposed_ag_norm = proposed['Available Geography'].apply(lambda g: ','.join(sorted(g.split(','))))
diff_geos = (current_ag_norm != proposed_ag_norm)
print("Number of rows with different geographies:", sum(diff_geos))
# => 0

@melange396 melange396 merged commit 0547255 into dev May 28, 2024
6 checks passed
@melange396 melange396 deleted the bot/update-docs branch May 28, 2024 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant