-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
to_csv and bytes on Python 3. #9712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd say this is not intended, but I haven't worked on this part of the code. It's being written to file anyway, so (python 3) bytes written to |
FWIW I think that's actually the output I'd expect in 3. |
I guess I would expect behavior similar to with open('tmp.txt', 'wb') as f:
f.write('abc'.encode('utf-8')) which doesn't have the The caveat here is that you have to explicitly open the file in |
I think you just need to pass the
|
It'd better that padas have a configurable parameter in df['Column'] =df['Column'].astype(str)
df.to_csv('output.csv') |
I have this problem also. Here's a trivial example that I think most regular users would expect to work differently:
That is, the CSV is created with Python-specific
|
@zhuoqiang What I think you meant is you have to do this:
Simply doing |
I totally agree with @jzwinck. When you use import pandas as pd
fname = './blah.csv'
pd.Series([b'x',b'y']).to_csv(fname) >>> pd.read_csv(fname, dtype='S5')
0 b'x'
0 b'1' b"b'y'" Using >>> pd.read_csv(fname, dtype='S')
0 b'x'
0 1 b'y' I actually even find ^ unexpected since it seems to be interpreting as python string literals automatically? If a user chooses to load CSV data as >>> pd.Series(['x', 'y']).to_csv(fname)
>>> pd.read_csv(fname)
0 x
0 1 y
>>> >>> pd.read_csv(fname, dtype='S10')
0 b'x'
0 1 b'y' |
I think everyone agrees that writing out the
|
@TomAugspurger My vote's for 1. Since the I'm getting worried though (especially being new to py3) because apparently even print does this? >>> print(b'doggy')
b'doggy' So maybe @dsm054 was right? |
@TomAugspurger: I prefer your number 1: just decode, because that's what most users would want. @tgoodlet: It doesn't matter what |
Proposal to fix this issue: We introduce a new parameter passed to We use the Do note that after the decoding of the bytes happens using the |
Add a new optional parameter named bytes_encoding to allow a specific encoding scheme to be used to decode the bytes.
Add a new optional parameter named bytes_encoding to allow a specific encoding scheme to be used to decode the bytes.
Add a new optional parameter named bytes_encoding to allow a specific encoding scheme to be used to decode the bytes.
Add a new optional parameter named bytes_encoding to allow a specific encoding scheme to be used to decode the bytes.
Is this desired behavior and something I need to work around or a bug? Notice the byte type marker is written to disk so you can't round-trip the data. This works fine in Python 2 with unicode AFAICT.
The text was updated successfully, but these errors were encountered: