Skip to content

Commit d9d35a2

Browse files
authored
add capability to pass options to to_csv method
On Python 3.10 and 3.11, an upstream bug in pandas causes a failure when serializing a dataframe to csv when there's a null byte in the dataframe. This pull request leaves the default behaviour alone, but gives users the options to modify to_csv behaviour, including fixing that issue with the `escapechar` parameter. See pandas-dev/pandas#47871
1 parent acfae68 commit d9d35a2

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

socrata/sources.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -445,6 +445,8 @@ def df(self, dataframe, **kwargs):
445445
446446
max_retries (integer): Optional retry limit per chunk in the upload. Defaults to 5.
447447
backoff_seconds (integer): Optional amount of time to backoff upon a chunk upload failure. Defaults to 2.
448+
449+
pd_to_csv_params (dict): Optional keyword arguments passed to pd.DataFrame.to_csv method. Defaults to {}.
448450
```
449451
450452
Returns:
@@ -458,9 +460,17 @@ def df(self, dataframe, **kwargs):
458460
df = pandas.read_csv('test/fixtures/simple.csv')
459461
upload = upload.df(df)
460462
```
463+
```python
464+
import pandas
465+
# assume test/fixtures/simple.csv contains the null byte \x00
466+
# see https://github.com/pandas-dev/pandas/issues/47871
467+
df = pandas.read_csv('test/fixtures/simple.csv')
468+
upload = upload.df(df, pd_to_csv_params={"escapechar": "\\"})
469+
```
461470
"""
462471
s = io.StringIO()
463-
dataframe.to_csv(s, index=False)
472+
pd_to_csv_params = kwargs.pop("pd_to_csv_params", {})
473+
dataframe.to_csv(s, index=False, **pd_to_csv_params)
464474
return self._chunked_bytes(bytes(s.getvalue().encode()),"text/csv", **kwargs)
465475

466476
def add_to_revision(self, uri, revision):

0 commit comments

Comments
 (0)