Skip to content

Google symptoms dap #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open

Google symptoms dap #28

wants to merge 43 commits into from

Conversation

jingjtang
Copy link
Collaborator

@jingjtang jingjtang commented Feb 18, 2021

Add GS DAP

  • Scripts for sensorization/correlation analysis etc. are included in folder ./scripts.
  • Google_Symptoms_DAP_Final_Report.html is the final report.
  • More details for correlation analysis in Appendix1_Correlation_Results.html
  • More details for coefficients and intercepts returned by sensorization in Appendix2_Coefficients_and_Intercepts.html

nmdefries and others added 30 commits September 30, 2020 10:53
@jingjtang jingjtang requested a review from nmdefries February 18, 2021 14:02
Copy link
Collaborator

@nmdefries nmdefries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, a lot of these files are too big for me to leave comments directly, so I'll put them here.

Final Report:

In the introduction, it would be good to link to the previous work establishing anosmia/ageusia as useful for a limited geographic range.

electrical heath records

electronic health records

The violin plot shown above and the barplot shown in Appendix II indicate the difference in the spatial distribution of symptoms in the same symptom set. This leads to the fact that when training linear regression models for different counties, the number of features (symptoms) that are "actually" taken into account is different. If we draw the median of coefficients for symptoms across all the counties available for the symptom set, we simply get 0s for symptoms with low geographical coverage. It should be noted that getting median to zero is not an error message, because the default missingness of this data set is caused by the extremely low search volume.

This paragraph is hard to parse, especially if the reader hasn't gotten to the appendix yet. Essentially you're saying this, right:

"Within a symptom set, there are large differences in geographical availability by symptom. Because of this and the zero-fill procedure when creating a symptom set, some symptoms are implicitly excluded from modeling for a given county. This leads to the coefficient for that symptom to be zero in the model for a given county. Symptoms with high missingness will, as a result, have a median coefficient of zero; this isn't an error."

From an organizational perspective, results in the appendices should be auxiliary -- not required to understand the main report, but providing additional tangential results that someone might be interested in. If you're using results from the appendix in the main report, those results should also be in the main report.

Appendix I is particularly interesting -- it makes an even stronger case for using regression over rawsum based on comparisons for non-sensory symptom sets.

Over all, looks good! Thanks for your hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants