Skip to content

Improved docs for as_epi_df() #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dajmcdon opened this issue Jun 5, 2022 · 4 comments · Fixed by #103
Closed

Improved docs for as_epi_df() #93

dajmcdon opened this issue Jun 5, 2022 · 4 comments · Fixed by #103
Assignees
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers P1 medium priority question Further information is requested

Comments

@dajmcdon
Copy link
Contributor

dajmcdon commented Jun 5, 2022

It would be helpful to add some documentation illustrating how to create an epi_df with additional keys.

  1. Convert a tsibble that has an extra key. Show the meta data
  2. Use a data frame that has an additional key and maybe misnamed geo_value or time_value (or both).
  3. The same with something added later. Say, a covidcast -> epi_df then add a column that adds a key.

Related, are there other types of additional_metadata then keys?

This could be done in the examples for as_epi_df() or in the main epiprocess vignette (or both).

And just as a flag, note that in the roxygen one can use:

#' @includeRmd man/rmd/<some-name>.Rmd details

The trailing details means to include the Rmd file in the @details section (this is the default). You could also put it in any other section (e.g. @format).

@dajmcdon dajmcdon added documentation Improvements or additions to documentation good first issue Good for newcomers question Further information is requested labels Jun 5, 2022
@ChloeYou ChloeYou self-assigned this Jun 6, 2022
@ChloeYou
Copy link
Contributor

ChloeYou commented Jun 7, 2022

Hi Daniel! @dajmcdon

I'd like to get some clarifications:

Additional keys are referring to additonal_metadata right? Based on the description of additional_metadata in the as_epi_df function:
List of additional metadata to attach to the epi_df object. The metadata will have geo_type, time_type, and as_of fields; named entries from the passed list or will be included as well.
So the metadata can contain (at least) geo_type, time_type, and as_of fields. Are there other types that are commonly used?

  1. Show the meta data meaning, printing the first couple rows of data in the documentation for the readers to see?
  2. What behaviours are we illustrating for the misnamed geo_value and time_value? Are the misnamed columns additional to an already existing and correctly named geo_value and time_value?
  3. We're pulling some data from covidcast and then use as_epi_df to turn into an epi_df object, then add new columns?

Thank you!

@dajmcdon
Copy link
Contributor Author

dajmcdon commented Jun 7, 2022

So, the metadata is simpler than you're imagining. Here's an example:

attr(jhu_csse_county_level_subset, "metadata")
$geo_type
[1] "county"

$time_type
[1] "day"

$as_of
[1] "2022-05-23 14:35:45 PDT"

Here's some example data from BC that sort of illustrates 1 and 2 (extra keys, misnamed columns)

remotes::install_github("mountainmath/CanCovidData")
bc <- CanCovidData::get_british_columbia_case_data()
bc_dated <- bc %>% 
  count(`Reported Date`, `Health Authority`, `Age group`, `Sex`, name = "cases")
bc_dated
# A tibble: 49,955 × 5
   `Reported Date` `Health Authority` `Age group` Sex   cases
   <date>          <chr>              <chr>       <chr> <int>
 1 2020-01-29      Out of Canada      40-49       M         1
 2 2020-02-06      Vancouver Coastal  50-59       F         1
 3 2020-02-10      Out of Canada      20-29       F         1
 4 2020-02-10      Out of Canada      30-39       M         1
 5 2020-02-18      Interior           30-39       F         1
 6 2020-02-24      Fraser             30-39       F         1
 7 2020-02-24      Fraser             40-49       M         1
 8 2020-03-03      Fraser             50-59       M         1
 9 2020-03-03      Vancouver Coastal  30-39       F         1
10 2020-03-03      Vancouver Coastal  60-69       F         1
# … with 49,945 more rows

On 3, yes. Maybe, do county cases (so that geo_type = "county") then add a column of states. That should also be a key. How do you put that into the metadata?

@brookslogan brookslogan added the P1 medium priority label Jun 7, 2022
@ChloeYou
Copy link
Contributor

ChloeYou commented Jun 14, 2022

@brookslogan Hey Logan!

When calling attr(jhu_csse_county_level_subset, "metadata") for example, can geo_type have more than one element in it? For example can it return "state" "county"?

@brookslogan
Copy link
Contributor

The built-in geo guesser does not expect any mixed geo_types, so while we don't validate that geo_type is length 1, I think we do expect it to be length 1. (We should probably add such a check.) So no, I don't think it would return "state" "county".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers P1 medium priority question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants