-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Add errors argument to to_csv() call to enable error handling for encoders #32702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Add errors argument to to_csv() call to enable error handling for encoders #32702
Conversation
Cool makes sense. Can you add tests to make sure this works? |
c576828
to
5b51c51
Compare
@WillAyd Test code is now added, pls. review |
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -69,7 +69,7 @@ Other enhancements | |||
- `OptionError` is now exposed in `pandas.errors` (:issue:`27553`) | |||
- :func:`timedelta_range` will now infer a frequency when passed ``start``, ``stop``, and ``periods`` (:issue:`32377`) | |||
- Positional slicing on a :class:`IntervalIndex` now supports slices with ``step > 1`` (:issue:`31658`) | |||
- | |||
- :meth:`to_csv` now accepts an ``error`` argument (:issue:`22610`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use :meth:`DataFrame.to_csv`
amd :meth:`Series.to_csv`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pandas/core/generic.py
Outdated
encoding: str = "UTF-8", | ||
errors: str = "strict", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you changed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This order is in line with the order of arguments in open()
of the Python standard library. I thought it would be nice to have a consistent order of arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s revert, to not break users passing them by position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are kidding right, this is the 13'th argument of this method. Nobody in their right minds would ever use this one as a positional argument. (masochists excluded)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm not kidding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever, I'll revert
pandas/io/common.py
Outdated
@@ -345,6 +346,10 @@ def get_handle( | |||
Mode to open path_or_buf with. | |||
encoding : str or None | |||
Encoding to use. | |||
errors : str, default 'strict' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we called this encoding_errors elsewhere (as errors is too generic here). cc @WillAyd @TomAugspurger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion is at #27899. I don't recall the resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed for passing that to a handle prior to the call in that conversation and just documenting that. But I think this is OK as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the keyword argument names of the standard Python open()
function. In my opinion this allows us to point users to the standard python library documentation for more information on what these arguments mean.
3ea8ebb
to
5424787
Compare
f26a492
to
efd6b9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment. errors
is appealing because it mirrors open
, but we use errors
in different context more generally, so kind of +0 on errors (and +1 on encoding_errors), but could be persuaded.
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -70,7 +70,8 @@ Other enhancements | |||
- :func:`timedelta_range` will now infer a frequency when passed ``start``, ``stop``, and ``periods`` (:issue:`32377`) | |||
- Positional slicing on a :class:`IntervalIndex` now supports slices with ``step > 1`` (:issue:`31658`) | |||
- :meth:`DataFrame.sample` will now also allow array-like and BitGenerator objects to be passed to ``random_state`` as seeds (:issue:`32503`) | |||
- | |||
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` now accept an ``error`` argument (:issue:`22610`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error -> errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pandas/core/generic.py
Outdated
@@ -3086,6 +3087,12 @@ def to_csv( | |||
encoding : str, optional | |||
A string representing the encoding to use in the output file, | |||
defaults to 'utf-8'. | |||
errors : str, default 'strict' | |||
Specifies how encoding and decoding errors are to be handled. | |||
See the errors argument for :func:`open` for a full list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this render (open)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It renders as: open()
See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_hdf.html
That is where I stole it from
There is prior art for the |
8e95f27
to
23668cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on arg position otherwise lgtm
pandas/core/generic.py
Outdated
@@ -3030,6 +3030,7 @@ def to_csv( | |||
index_label: Optional[Union[bool_t, str, Sequence[Label]]] = None, | |||
mode: str = "w", | |||
encoding: Optional[str] = None, | |||
errors: str = "strict", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @TomAugspurger mentioned this before but should add this to the end of the function; while seemingly pedantic we have a lot of users doing different things, so no real upside to change the order of arguments here
If an interest point for you there are other issues we have to make larger functions keyword only via a deprecation that could use some help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the end, however the logic behind the argument order is now lost (just as it was for the other method/function
If you can point me to the other issues, I can see if I can help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
23668cd
to
d4ed3b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
ed48735
to
ecc604b
Compare
ecc604b
to
1218163
Compare
@roberthdevries this looked fine, can you merge master and ping on green. @gfyoung if you have comments. |
@jreback: Agreed that this looks good. Just need to merge with master. |
1218163
to
6fa2290
Compare
Thanks @roberthdevries |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff