Skip to content

BUG: Add errors argument to to_csv() call to enable error handling for encoders #32702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

roberthdevries
Copy link
Contributor

@roberthdevries roberthdevries commented Mar 14, 2020

@WillAyd
Copy link
Member

WillAyd commented Mar 14, 2020

Cool makes sense. Can you add tests to make sure this works?

@WillAyd WillAyd added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv labels Mar 14, 2020
@roberthdevries roberthdevries force-pushed the fix-22610-add-errors-argument-to-to_csv branch from c576828 to 5b51c51 Compare March 15, 2020 07:23
@roberthdevries roberthdevries changed the title WIP: Add errors argument to to_csv() call to enable error handling for encoders Add errors argument to to_csv() call to enable error handling for encoders Mar 15, 2020
@roberthdevries
Copy link
Contributor Author

@WillAyd Test code is now added, pls. review

@@ -69,7 +69,7 @@ Other enhancements
- `OptionError` is now exposed in `pandas.errors` (:issue:`27553`)
- :func:`timedelta_range` will now infer a frequency when passed ``start``, ``stop``, and ``periods`` (:issue:`32377`)
- Positional slicing on a :class:`IntervalIndex` now supports slices with ``step > 1`` (:issue:`31658`)
-
- :meth:`to_csv` now accepts an ``error`` argument (:issue:`22610`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :meth:`DataFrame.to_csv` amd :meth:`Series.to_csv`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

encoding: str = "UTF-8",
errors: str = "strict",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you changed this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This order is in line with the order of arguments in open() of the Python standard library. I thought it would be nice to have a consistent order of arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s revert, to not break users passing them by position.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are kidding right, this is the 13'th argument of this method. Nobody in their right minds would ever use this one as a positional argument. (masochists excluded)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm not kidding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever, I'll revert

@@ -345,6 +346,10 @@ def get_handle(
Mode to open path_or_buf with.
encoding : str or None
Encoding to use.
errors : str, default 'strict'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we called this encoding_errors elsewhere (as errors is too generic here). cc @WillAyd @TomAugspurger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion is at #27899. I don't recall the resolution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed for passing that to a handle prior to the call in that conversation and just documenting that. But I think this is OK as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the keyword argument names of the standard Python open() function. In my opinion this allows us to point users to the standard python library documentation for more information on what these arguments mean.

@roberthdevries roberthdevries force-pushed the fix-22610-add-errors-argument-to-to_csv branch from 3ea8ebb to 5424787 Compare March 16, 2020 20:14
@roberthdevries roberthdevries requested a review from jreback March 17, 2020 20:55
@roberthdevries roberthdevries force-pushed the fix-22610-add-errors-argument-to-to_csv branch 2 times, most recently from f26a492 to efd6b9b Compare March 18, 2020 19:48
@roberthdevries roberthdevries changed the title Add errors argument to to_csv() call to enable error handling for encoders BUG: Add errors argument to to_csv() call to enable error handling for encoders Mar 18, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment. errors is appealing because it mirrors open, but we use errors in different context more generally, so kind of +0 on errors (and +1 on encoding_errors), but could be persuaded.

@@ -70,7 +70,8 @@ Other enhancements
- :func:`timedelta_range` will now infer a frequency when passed ``start``, ``stop``, and ``periods`` (:issue:`32377`)
- Positional slicing on a :class:`IntervalIndex` now supports slices with ``step > 1`` (:issue:`31658`)
- :meth:`DataFrame.sample` will now also allow array-like and BitGenerator objects to be passed to ``random_state`` as seeds (:issue:`32503`)
-
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` now accept an ``error`` argument (:issue:`22610`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error -> errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -3086,6 +3087,12 @@ def to_csv(
encoding : str, optional
A string representing the encoding to use in the output file,
defaults to 'utf-8'.
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this render (open)?

Copy link
Contributor Author

@roberthdevries roberthdevries Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It renders as: open()
See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_hdf.html
That is where I stole it from

@jreback jreback added this to the 1.1 milestone Mar 19, 2020
@roberthdevries
Copy link
Contributor Author

There is prior art for the errors argument for this purpose in DataFrame.to_hdf and DataFrame:read_hdf

@roberthdevries roberthdevries requested a review from jreback March 19, 2020 19:25
@roberthdevries roberthdevries force-pushed the fix-22610-add-errors-argument-to-to_csv branch 2 times, most recently from 8e95f27 to 23668cd Compare March 20, 2020 18:56
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on arg position otherwise lgtm

@@ -3030,6 +3030,7 @@ def to_csv(
index_label: Optional[Union[bool_t, str, Sequence[Label]]] = None,
mode: str = "w",
encoding: Optional[str] = None,
errors: str = "strict",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @TomAugspurger mentioned this before but should add this to the end of the function; while seemingly pedantic we have a lot of users doing different things, so no real upside to change the order of arguments here

If an interest point for you there are other issues we have to make larger functions keyword only via a deprecation that could use some help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the end, however the logic behind the argument order is now lost (just as it was for the other method/function

If you can point me to the other issues, I can see if I can help

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roberthdevries roberthdevries force-pushed the fix-22610-add-errors-argument-to-to_csv branch from 23668cd to d4ed3b0 Compare March 25, 2020 20:53
@roberthdevries roberthdevries requested a review from WillAyd March 25, 2020 20:56
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@roberthdevries roberthdevries force-pushed the fix-22610-add-errors-argument-to-to_csv branch 2 times, most recently from ed48735 to ecc604b Compare April 2, 2020 12:22
@roberthdevries roberthdevries force-pushed the fix-22610-add-errors-argument-to-to_csv branch from ecc604b to 1218163 Compare April 19, 2020 12:29
@jreback
Copy link
Contributor

jreback commented May 25, 2020

@roberthdevries this looked fine, can you merge master and ping on green.

@gfyoung if you have comments.

@gfyoung
Copy link
Member

gfyoung commented May 25, 2020

@jreback: Agreed that this looks good. Just need to merge with master.

@roberthdevries roberthdevries force-pushed the fix-22610-add-errors-argument-to-to_csv branch from 1218163 to 6fa2290 Compare May 28, 2020 21:39
@WillAyd WillAyd merged commit 43a463d into pandas-dev:master Jun 8, 2020
@WillAyd
Copy link
Member

WillAyd commented Jun 8, 2020

Thanks @roberthdevries

@roberthdevries roberthdevries deleted the fix-22610-add-errors-argument-to-to_csv branch June 9, 2020 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_csv() surrogates not allowed
5 participants