-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
EHN: Add encoding_errors option in pandas.DataFrame.to_csv (#27750) #27899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Can you add a release note to doc/source/whatsnew/v1.0.0? Under enhancements.
ff42832
to
ea27b7f
Compare
@TomAugspurger @jreback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thorough on the tests :)
Do you have time to add the new parameter to user_guide/io.rst
? We repeat most of the parameters there, since there are so many. You can place it near encoding
(~line 331). You likely just copy-paste the docstring description.
Can you also make it clear that this is the errors
argument passed to :func:\open\
?
What do you mean? |
Include that explicit reference in the docstring.
… On Aug 19, 2019, at 07:00, Michihito Shigemura ***@***.***> wrote:
@TomAugspurger
Can you also make it clear that this is the errors argument passed to :func:\open?
What do you mean?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Do you mean to add to_csv's example in https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#writing-out-data ? |
I add documents for error options both in https://dev.pandas.io/user_guide/io.html#quoting-compression-and-file-format and https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#writing-out-data . |
@TomAugspurger @jreback |
Late to review but would we rather just document how to do this with the underlying file handle? Personally against adding new keywords to the read_* functions if they don't add functionality that can't already be done with the underlying fp object |
@WillAyd that was my initial reaction too. The thing that changed my mind was that we already accept the Though I didn't realize |
Yea certainly a grey area. If we are only at 4/8 I think we would ideally either commit to all of them or start pushing people to passing through directly to the file constructor. The latter of the two seems more reasonable though hence why I think documentation would be better (but not a hang up if you and other core devs feel strongly otherwise) |
According to https://docs.python.org/3/library/functions.html#open, current(0.25.x) pandas already supports 5/8 open's aguments in _get_handle function.
https://github.com/pandas-dev/pandas/blob/0.25.x/pandas/io/common.py#L396-L405 Lines 418 to 429 in 3bbb6f3
So, I think we should support errors parameter because _get_handle already uses errors argument. |
@WillAyd @TomAugspurger |
…v#27750) encoding_errors : str, default 'strict' Behavior when the input string can’t be converted according to the encoding’s rules (strict, ignore, replace, etc.) See: https://docs.python.org/3/library/codecs.html#codec-base-classes
3bbb6f3
to
b4f6929
Compare
I'm leaning towards accepting the PR. @WillAyd does that sound OK? |
While I think this PR in a nutshell is well done I think I'm still against the principal of adding this. For consistency wouldn't we then want to add to read_csv and other text based methods? Also do we see any concern with the fact that I just think to comprehensively support this would take a lot more effort than its worth so would rather just document how to do it instead of semi-supporting it via this method |
Thanks @WillAyd. All fair points. In the past, we've come up against issues where we want to pass additional arguments through to the underlying filesystem calls. I wonder
Thoughts on that kind of API? It's a bit fuzzy on where exactly |
Maybe we just add kwargs and dictate that kwargs get passed through to the filepath_or_buffer argument? Not sure if that has been discussed in the past but seems natural |
Not sure if it's been discussed before. It's certainly possible. Which do you think is easier to document, Do any of our readers / writers already accept |
@WillAyd @TomAugspurger |
We want to provide a consistent API for all our readers, though. We need to
design something that would work for
more than just to_csv.
…On Thu, Sep 5, 2019 at 12:34 AM Michihito Shigemura < ***@***.***> wrote:
@WillAyd <https://github.com/WillAyd> @TomAugspurger
<https://github.com/TomAugspurger>
IMO, we should talk about only to_csv in order not to go off the rails.
Of course, we have to talk about raed_csv (and read_excel) error handling.
But my PR does not have any impact on read_csv.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27899?email_source=notifications&email_token=AAKAOIS2PXSECA33JNG2PLDQICK7LA5CNFSM4ILJ4OTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD554FOA#issuecomment-528204472>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIWXJUCANZKAW3VK4W3QICK7LANCNFSM4ILJ4OTA>
.
|
I like @TomAugspurger suggestion of a dict of We could accept a dict (simpler), or make a named tuple |
So as far as this PR goes I would suggest going to the documentation route, showing how you can pass this parameter to a file handle prior to a Does that sound good @shigemk2 ? |
@WillAyd |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff