Skip to content

Commit 165a15f

Browse files
committed
Use col spec from input df to read output file
readr's column guessing procedure only uses the first 1000 lines, by default, of a file to guess variable type for each column. If a column is completely missing for the first 1000 lines, it is read in as a logical which causes parsing failures if the column contains non-boolean values later, outside the type guessing range. This happens when reading in output files if an indicator was newly added. To correctly specify these, use the column specification from the input file/s. All columns included in input files are at least partially non-missing and sorted alphabetically (indepdendent of missingness), so we should always see non-missing values in the first 1000 lines.
1 parent f4a8159 commit 165a15f

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

facebook/contingency-combine.R

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,12 +114,12 @@ combine_tables <- function(seen_file, input_dir, input_files, output_file) {
114114
county_fips = col_character()
115115
)
116116

117-
# Get input data. Make sure `issue_date` is the last column after combining.
117+
# Get input data.
118118
input_df <- map_dfr(
119119
file.path(input_dir, input_files),
120120
function(f) {
121121
read_csv(f, col_types = cols)
122-
}) %>% relocate(issue_date, .after=last_col())
122+
})
123123

124124
seen_files <- load_seen_file(seen_file)
125125
if (any(input_files %in% seen_files)) {
@@ -128,6 +128,7 @@ combine_tables <- function(seen_file, input_dir, input_files, output_file) {
128128
" files using the same grouping variables have been seen before."))
129129
}
130130

131+
cols <- cols_condense(spec(input_df))
131132
if ( file.exists(output_file) ) {
132133
output_df <- read_csv(output_file, col_types = cols)
133134
} else {

0 commit comments

Comments
 (0)