Skip to content

Commit ffcad02

Browse files
committed
Strip Trailing Whitespace From Dumped Pandas DataFrames
We recently upgraded from Pandas 1.1.5 to Pandas 1.4.3. In Pandas 1.2, a bug was fixed that prevented trailing whitespaces from being added to the end of dumped ndjson output. See [here](pandas-dev/pandas#36898) for additional information. This change uses `rstrip` on the dumped ndjson output so that all trailing whitespace is removed before this data is dumped to stdout (or a destination file). Note that this trailing newline affected our assumptions about the structure of NDJSON files and thus prevented files created after the upgrade from being parsed without error.
1 parent 4085db6 commit ffcad02

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

lib/id3c/cli/io/pandas.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def dump_ndjson(df: pd.DataFrame, file = None, columns_to_mask: List[str] = None
4242
if columns_to_mask:
4343
mask_values(df, columns_to_mask)
4444

45-
print(df.to_json(orient = "records", lines = True, date_format = "iso"), file = file)
45+
print(df.to_json(orient = "records", lines = True, date_format = "iso").rstrip(), file = file)
4646

4747

4848
def load_file_as_dataframe(filename: str) -> pd.DataFrame:

0 commit comments

Comments
 (0)