Skip to content

Ignore NAs when printing an archive's min/max time_value #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 27, 2022

Conversation

nmdefries
Copy link
Contributor

@nmdefries nmdefries commented Jul 21, 2022

Description

Toggle on the na.rm option (via the custom Min and Max functions) so even if an epi_archive's time_value field contains some missing values (NA) printed min/max values are still useful. Prints NA if all time_values are missing and "no non-missing arguments" warning is raised, or if an un-caught warning occurs.

Changes

  • R/archive.R - added na.rm setting and try-catch to handle all-missing case.
  • man/as_epi_df.Rd - not my changes, showed up when I re-document()ed

Fixes

Closes #106

@brookslogan
Copy link
Contributor

brookslogan commented Jul 22, 2022

Thanks, I like the described behavior. Implementation-wise, I'm not sure if the warning message could vary between different versions of R [so the current check might be brittle]. (+ pattern-wise we might want to keep a habit of using Warn with a parent arg rather than warning(w) to rethrow... not sure it's better here but it would be in more complicated cases where we don't want to lose stack traces that we could enable.) Maybe the tryCatch could be replaced with a check on whether the vector is length-0 or all NA (or NaN)?

@brookslogan
Copy link
Contributor

Not sure if I was too afraid of "brittleness" above. Slight tweak to the current logic to get it closer to how the warning is triggered:

|| all(is.na(self$DT$time_value)) || all(is.nan(self$DT$time_value)) looks like it wouldn't catch a mix of NAs and NaNs, so it'd seem like we'd want || all(is.na(self$DT$time_value) | is.nan(self$DT$time_value)) to detect a mix of NA and NaNs. However, it turns out that using is.nan isn't even necessary: ?is.na states that:

The default method for 'is.na' applied to an atomic vector returns a logical vector of the same length as its argument 'x', containing 'TRUE' for those elements marked 'NA' or, for numeric or complex vectors, 'NaN', and 'FALSE' otherwise. (A complex value is regarded as 'NA' if either its real or imaginary part is 'NA' or 'NaN'.) 'dim', 'dimnames' and 'names' attributes are copied to the result.
So we can just change the above to || all(is.na(self$DT$time_value)). I did that in the commit above.

Side note from testing: we can't directly construct an epi_archive with all-NaN time_values, although I see no use cases for this.

as_epi_archive(tibble::tibble(geo_value=1,time_value=NaN,version=1,value=1))
#> Error in if (is.numeric(time_value) && all(time_value == as.integer(time_value)) &&  (from utils.R#93) : 
#>  missing value where TRUE/FALSE needed

I do see a use case for all-NA time values in an epi_archive, to provide version tracking for non-time-series data (or maybe time series in a nested format for some reason?) using epi_archive, which insists on a time_value column.

@brookslogan
Copy link
Contributor

To clarify in case we revisit this thread: we are currently able to directly construct an epi_archive with all NA time_values, just not all NaN ones. This PR properly prints the time_value range of an epi_archive given any mixture of these that we can directly construct or sneak in via mutating the DT contents or reseating the DT field.

@brookslogan brookslogan merged commit ba78d43 into main Jul 27, 2022
@nmdefries
Copy link
Contributor Author

Thanks for the fix.

Side note from testing: we can't directly construct an epi_archive with all-NaN time_values

In my testing, it seems that we can. The problem with construction is that the time_type auto-detection doesn't work -- I'm getting a

Error in if (is.numeric(time_value) && all(time_value == as.integer(time_value)) &&  : 
  missing value where TRUE/FALSE needed

but if we set time_type manually, the archive is successfully created.

@nmdefries nmdefries deleted the ndefries/na-time-printing branch August 1, 2022 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improving printing of epi_archives with NA's
3 participants