-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
to_csv does not always handle line_terminator correctly #17365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@kevinsa5 : Thanks for reporting this! The fact that you only have this issue on Windows I think is telling about the issue and could indicate that it is not a If you can find a version of |
@gfyoung : Thanks for the quick reply! I am familiar with how Windows uses If you take a close look at the script output I posted above, for some values of I should mention: I am immensely grateful for all the work the pandas devs have put into this library. Thank you very much for your continued effort. |
@gfyoung After further reading, I think you may be correct. Is it true that on Windows, if you write the character '\n' to a file, the OS may actually insert '\r\n' into the file? Does this depend on how the file is opened? Maybe I've been spoilt by Linux, but it seems unacceptable to me for an OS to silently change bytes that you send to disk. |
Not entirely sure to be honest. However, as someone who has worked on a Windows computer from a Linux-based repository, I can say for certain that I have seen these carriage returns sneak into diffs just merely from cloning the repository.
As somebody who has worked in both the Linux and Windows world, you are perfectly entitled to bytes not being "corrupted" like this. There's a reason why Linux is generally preferred for developers 😄 |
https://stackoverflow.com/questions/3191528/csv-in-python-adding-an-extra-carriage-return On python2 looks like we should be opening the file in mode |
Indeed changing the above code to https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L1448 Thank you both. Pandas is a fantastic piece of software. |
actually if u look the other highly starred answer in the SO post might work better e.g. passing line_terminator to the csv.writer itself |
Note that the argument to https://github.com/pandas-dev/pandas/blob/master/pandas/io/formats/format.py#L1586 |
to_csv
does not always handle line_terminator
correctly
@kevinsa5 if you want to submit a PR with the change (for opening in binary mode on windows in binary when line_terminator is specified), and see if it passes tests. |
I everyone, I'm experiencing the same problem on Pandas 0.20.3 on Windows 7. However, mode='wb' might be a dangerous fix, as then it crashes with an encoding setting such as encoding='utf-8' saying: "ValueError: binary mode doesn't take an encoding argument". It would be nice if there was a workaround making both line_terminator and encoding work at the same time. |
I am experiencing the same issue. With When I am using
And the same as @ingmars when I try to set an encoding with I settled at the end with not specifying any line_terminator. Not happy, but I will fix this with other tools after the file has been written. Everything with Win10, Python 3.5, Pandas 0.19.2 |
Per #20353 (comment) from @jreback , let's move the discussion to #20353. |
Code Sample, a copy-pastable example if possible
Problem description
It seems that the
to_csv
does not always handle theline_terminator
argument correctly. The above code prints out the hexified CSV data produced from several different calls toto_csv
. In particular, passing\n
in fact produces\r\n
, and\r\n
becomes\r\r\n
. Note also that this only happens when writing to a file, not directly returning the CSV data as a string.However, this seems to be OS-dependent as well -- I have reproduced it on several machines running Windows 7, Python 2.7, and various versions of pandas (including 0.20.1), but on a linux VM, it works as expected.
Output of above code:
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 35.0.1
Cython: None
numpy: 1.13.1
scipy: None
xarray: None
IPython: 5.3.0
sphinx: 1.5.5
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.1.7
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: