-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: avoid "b" prefix for bytes in to_csv() on Python 3 (#9712) #13890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1378,6 +1378,12 @@ def __init__(self, obj, path_or_buf=None, sep=",", na_rep='', | |||
self.has_mi_columns = (isinstance(obj.columns, MultiIndex) and | |||
not self.tupleize_cols) | |||
|
|||
# in Python 3, decode bytes to str so strings print without b'' | |||
if sys.version_info[0] >= 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use compat.PY3
if self.bytes_encoding: | ||
if ix.dtype == object: | ||
lib.object_array_decode_bytes(ix, self.bytes_encoding) | ||
for arr in self.data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be pushed down to to_native_types
as an addtional parameter.
Current coverage is 85.31% (diff: 90.00%)@@ master #13890 diff @@
==========================================
Files 139 139
Lines 50143 50161 +18
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 42777 42793 +16
- Misses 7366 7368 +2
Partials 0 0
|
@@ -2020,6 +2020,14 @@ def re_replacer(s): | |||
|
|||
return block | |||
|
|||
def to_native_types(self, slicer=None, na_rep='nan', quoting=None, | |||
bytes_encoding=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to Block
, where to_native_types
already exists, you just need to incorporate this argument after stringifying (we have another conditional if quoting or not, but and am not sure if you need to do this or not for that one as I think the csv writer handles that).
around line 590.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it in ObjectBlock
because that's the only class that could contain objects--correct? Given that a precondition of the code I'm adding is dtype==object
, are you sure you'd want the logic to go in the base class for all types (even if it's overridden by some)?
OK so let me summarize where this PR is at:
It might be interesting to try to get support in Python 3.7 (or whatever) to write bytes as plain strings in the I hope we can get #9712 fixed somehow. PRs welcome! ;-) |
""" slice and dice then format """ | ||
values = self | ||
if slicer is not None: | ||
values = values[slicer] | ||
return values._format_native_types(**kwargs) | ||
result = values._format_native_types(**kwargs) | ||
if bytes_encoding is not None and result.dtype == object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_object_dtype
function
@jzwinck I have made some more comments. Please use the existing functions. It is not clear that adding code is needed here, it is just more repeated code and adds to the maintenance burden. |
can you rebase / update? |
@jreback I'm not sure what you're asking for. I know what rebase means but the last time we spoke about this PR you had fundamental problems with the way I was approaching it. I don't have much time these days to do anything more with it. |
@jzwinck rebase and update means exactly what it says. rebase on master and respond to comments. pls reopen if you want to continue work. |
@jzwinck Looking back at the comments, I think one of the main suggestions of @jreback is to use the existing
|
@jorisvandenbossche As you say, In any case, I have limited time to work on this further now. |
the mixed-dtype case in not relevant (nor tested). expanding the code base at the expense of readability is not necessary nor warranted. as I said, if you want to fix according to comments, pls do so. or maybe someone else will pick this up. |
@jreback If I am the user, and I am telling you I want support for mixed dtypes, how do you then conclude it is "not relevant?" Also you are mistaken when you say it is "not tested" because it plainly is in the unit test I added: |
@jzwinck mixed-types (in a single array) are a complete disaster. I would rather not make them worse by adding additional code which makes them worse. |
@jzwinck Although the wording of @jreback was maybe a bit harsh on the 'relevant' aspect, there is a difference between "relevant for you as a user" and "relevant enough in general to warrant additional code-complexity in pandas" (and to be clear: I am not not saying it is one of either, just noting that it is debatable) |
git diff upstream/master | flake8 --diff