Skip to content

"Apparent" missingness in GHT #292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
huisaddison opened this issue Sep 30, 2020 · 6 comments
Closed

"Apparent" missingness in GHT #292

huisaddison opened this issue Sep 30, 2020 · 6 comments
Assignees
Labels
data quality Missing data, weird data, broken data Engineering Used to filter issues when synching with Asana

Comments

@huisaddison
Copy link
Contributor

Actual Behavior:

"Apparent" missingness: for most days, what appears to be missing data is actually a 0 in the GHT indicator. (There is the meta-issue regarding whether a 0 should be interpreted as a signal that indicates a low, but nonzero volume of searchers, or whether it should be interpreted as actual missing data). The map renders an exact zero as a solid gray (rather than a diagonally hatched gray, which indicates true missingness as a matter of fact about what's in our API). To me, the pipeline is working properly, and the discussion should be whether anosmia and i' can't smell or taste now has low recall and therefore we should adjust the terms in our bag.

Expected behavior

Should we change our bag of terms to something with higher recall now that it's no longer March, and people's search habits have changed?

Context

https://delphi-org.slack.com/archives/C0130CSQRN3/p1601495049023500?thread_ts=1601480482.022300&cid=C0130CSQRN3

cc @ryantibs who pointed this out originally
cc @nmdefries who is working on Google Symptoms, which may replace GHT
cc @jingjtang who is the current pipeline "maintainer" to my knowledge

@huisaddison huisaddison added the data quality Missing data, weird data, broken data label Sep 30, 2020
@jingjtang
Copy link
Contributor

The GHT pipeline has been automated starting from Aug 11.

@huisaddison
Copy link
Contributor Author

huisaddison commented Sep 30, 2020

Said something relevant to a different issue - sorry!

@huisaddison
Copy link
Contributor Author

@jingjtang thanks! I only tagged you here to keep you in the loop (not sure if there are any official maintainers after a pipeline is automated). Your insight would be useful for #293, though

@nmdefries
Copy link
Contributor

nmdefries commented Sep 30, 2020

the discussion should be whether anosmia and i can't smell or taste now has low recall and therefore we should adjust the terms in our bag.

The strength of the GHT signal (compared to baseline) does appear to be decaying relative to Google Symptoms, more info here.

So if we intend to continue using the GHT indicator, we should adjust the search terms used, keeping in mind that this is something that would probably need to be repeated periodically. How were the initial set of search terms chosen?

Via Katie: we've already decided not to alter bag of search terms, and incoming GS indicator should be more robust to changes in user search term selection.

@capnrefsmmat
Copy link
Contributor

Some useful context is in #138.

@SumitDELPHI SumitDELPHI added the Engineering Used to filter issues when synching with Asana label Dec 6, 2020
@nmdefries
Copy link
Contributor

GHT has been deprecated (on the Google side) and replaced by Google Symptoms, which we expect to be more robust to shifts in specific search terms used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality Missing data, weird data, broken data Engineering Used to filter issues when synching with Asana
Projects
None yet
Development

No branches or pull requests

5 participants