Skip to content

Consider adding fetch functions returning epiprocess objects #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brookslogan opened this issue Apr 27, 2023 · 6 comments
Open

Consider adding fetch functions returning epiprocess objects #91

brookslogan opened this issue Apr 27, 2023 · 6 comments
Labels
P2 low priority

Comments

@brookslogan
Copy link
Contributor

To streamline analyses using our new package ecosystem, it might help to be able to get epi_dfs directly out of epidatr. This could be opt-in with Suggests: epiprocess and requiring it inside fetch_edf. However, at minimum, this would involve renaming and setting key columns on a per-endpoint basis, and we might also consider pivoting to a wider format. Thus, it's a nontrivial amount of work. (But also hopefully a nontrivial amount of savings when actually being used.)

Might conflict with the goal of #72.

@brookslogan brookslogan added the P2 low priority label Apr 27, 2023
@brookslogan brookslogan changed the title Consider a fetch_edf() returning an epi_df? Consider adding a fetch_edf()/fetch_epi_df() returning an epi_df? Apr 27, 2023
@brookslogan brookslogan changed the title Consider adding a fetch_edf()/fetch_epi_df() returning an epi_df? Consider adding a fetch_edf()/fetch_epi_df()? Apr 27, 2023
@dshemetov
Copy link
Contributor

dshemetov commented Apr 27, 2023

I'm thinking we should circle back to this after both #72 is addressed and the v1.0.0 release. A few thoughts on how this might work:

  • this would just consume the output of the new fetch interface from Consolidate fetch_* interfaces #72, which will return either an unstructured list (for classic only endpoints) or a tibble
  • the tibbles that have the requisite columns to make an epi_df, we can support this interface and otherwise give an error
    • having to write endpoint-specific branches in fetch_edf doesn't sound ideal; checking for requisite columns might get complicated if the endpoints have different names for requisite cols

@brookslogan brookslogan changed the title Consider adding a fetch_edf()/fetch_epi_df()? Consider adding fetch functions returning epiprocess objects May 10, 2023
@brookslogan
Copy link
Contributor Author

Just realized we might also want to get epi_archives as well. Adding also a fetch_ea() / fetch_archive() / etc. might be too much. Not sure if what we actually want is a fetch_wide(), which would perform pivoting and maybe renaming. One trouble is "issue" vs. "version" naming, and the presence of the "lag" column interfering with epi_archive compactification. (These details actually seem like they start to justify that extra fetch_ea() / function...)

@dshemetov
Copy link
Contributor

dshemetov commented May 10, 2023

Adding fetch_ea seems like a medium difficulty task. Each endpoint would need careful attention to how their particulars (namely, the differences in key columns) are translated to an epi_archive.

@brookslogan
Copy link
Contributor Author

epi_df also has other_keys metadata, though not essential for construction or some operations. (Though we plan to make it a little more prominent than it is now, e.g., by printing it when we print an epi_df, and making it an explicit parameter of the new_epi_df and as_epi_df functions.) I think epi_archive's key may need to rename issue to version, eliminate lag, and add on version to the epi_df keys.

@brookslogan
Copy link
Contributor Author

It might actually require more thinking about how to handle issue and lag in epi_df, since (a) we need/want to change them in epi_archive and might want some type of consistency, and (b) these columns are more like values if you do a latest or as_of query, but one or the other is like a key when you do issues or lags queries.

@dshemetov
Copy link
Contributor

Two separate but related points:

  1. how about we do provide tibble -> epi_df, epi_archive in epiprocess instead of here?
  2. how about we provide recipes instead of full on conversion functions?

Advantages:

  • (1) and (2) keep the fetch interface in this client minimal and focused around getting a standard R tibble of data
  • (1) separates epiprocess concerns out of this package; allows epiprocess to focus on delivering a tibble -> epi_df pipeline
  • (2) concerned that there are too many degrees of freedom in this conversion; not wrapping tibble -> epi_df functions allows us to not worry about constructing an interface and the user can see the inner workings and modify as they see fit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 low priority
Projects
None yet
Development

No branches or pull requests

2 participants