Skip to content

Commit 31f86d6

Browse files
deflatSOCOjreback
authored andcommitted
BUG: Use correct line terminator on Windows (#21406)
* Use OS line terminator if none is provided * Enforce line terminator selection if one is Originally authored by @deflatSOCO, but reapplied by @gfyoung due to enormous merge conflicts. Closes gh-20353.
1 parent 29e586c commit 31f86d6

File tree

9 files changed

+468
-104
lines changed

9 files changed

+468
-104
lines changed

doc/source/whatsnew/v0.24.0.txt

+91
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,97 @@ If installed, we now require:
235235
| scipy | 0.18.1 | |
236236
+-----------------+-----------------+----------+
237237

238+
.. _whatsnew_0240.api_breaking.csv_line_terminator:
239+
240+
`os.linesep` is used for ``line_terminator`` of ``DataFrame.to_csv``
241+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
242+
243+
:func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'``
244+
for the default line terminator (:issue:`20353`).
245+
This change only affects when running on Windows, where ``'\r\n'`` was used for line terminator
246+
even when ``'\n'`` was passed in ``line_terminator``.
247+
248+
Previous Behavior on Windows:
249+
250+
.. code-block:: ipython
251+
252+
In [1]: data = pd.DataFrame({
253+
...: "string_with_lf": ["a\nbc"],
254+
...: "string_with_crlf": ["a\r\nbc"]
255+
...: })
256+
257+
In [2]: # When passing file PATH to to_csv, line_terminator does not work, and csv is saved with '\r\n'.
258+
...: # Also, this converts all '\n's in the data to '\r\n'.
259+
...: data.to_csv("test.csv", index=False, line_terminator='\n')
260+
261+
In [3]: with open("test.csv", mode='rb') as f:
262+
...: print(f.read())
263+
b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n'
264+
265+
In [4]: # When passing file OBJECT with newline option to to_csv, line_terminator works.
266+
...: with open("test2.csv", mode='w', newline='\n') as f:
267+
...: data.to_csv(f, index=False, line_terminator='\n')
268+
269+
In [5]: with open("test2.csv", mode='rb') as f:
270+
...: print(f.read())
271+
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'
272+
273+
274+
New Behavior on Windows:
275+
276+
- By passing ``line_terminator`` explicitly, line terminator is set to that character.
277+
- The value of ``line_terminator`` only affects the line terminator of CSV,
278+
so it does not change the value inside the data.
279+
280+
.. code-block:: ipython
281+
282+
In [1]: data = pd.DataFrame({
283+
...: "string_with_lf": ["a\nbc"],
284+
...: "string_with_crlf": ["a\r\nbc"]
285+
...: })
286+
287+
In [2]: data.to_csv("test.csv", index=False, line_terminator='\n')
288+
289+
In [3]: with open("test.csv", mode='rb') as f:
290+
...: print(f.read())
291+
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'
292+
293+
294+
- On Windows, the value of ``os.linesep`` is ``'\r\n'``,
295+
so if ``line_terminator`` is not set, ``'\r\n'`` is used for line terminator.
296+
- Again, it does not affect the value inside the data.
297+
298+
.. code-block:: ipython
299+
300+
In [1]: data = pd.DataFrame({
301+
...: "string_with_lf": ["a\nbc"],
302+
...: "string_with_crlf": ["a\r\nbc"]
303+
...: })
304+
305+
In [2]: data.to_csv("test.csv", index=False)
306+
307+
In [3]: with open("test.csv", mode='rb') as f:
308+
...: print(f.read())
309+
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'
310+
311+
312+
- For files objects, specifying ``newline`` is not sufficient to set the line terminator.
313+
You must pass in the ``line_terminator`` explicitly, even in this case.
314+
315+
.. code-block:: ipython
316+
317+
In [1]: data = pd.DataFrame({
318+
...: "string_with_lf": ["a\nbc"],
319+
...: "string_with_crlf": ["a\r\nbc"]
320+
...: })
321+
322+
In [2]: with open("test2.csv", mode='w', newline='\n') as f:
323+
...: data.to_csv(f, index=False)
324+
325+
In [3]: with open("test2.csv", mode='rb') as f:
326+
...: print(f.read())
327+
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'
328+
238329
.. _whatsnew_0240.api_breaking.interval_values:
239330

240331
``IntervalIndex.values`` is now an ``IntervalArray``

pandas/core/generic.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -9518,7 +9518,7 @@ def last_valid_index(self):
95189518
def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
95199519
columns=None, header=True, index=True, index_label=None,
95209520
mode='w', encoding=None, compression='infer', quoting=None,
9521-
quotechar='"', line_terminator='\n', chunksize=None,
9521+
quotechar='"', line_terminator=None, chunksize=None,
95229522
tupleize_cols=None, date_format=None, doublequote=True,
95239523
escapechar=None, decimal='.'):
95249524
r"""
@@ -9583,9 +9583,12 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
95839583
will treat them as non-numeric.
95849584
quotechar : str, default '\"'
95859585
String of length 1. Character used to quote fields.
9586-
line_terminator : string, default ``'\n'``
9586+
line_terminator : string, optional
95879587
The newline character or character sequence to use in the output
9588-
file.
9588+
file. Defaults to `os.linesep`, which depends on the OS in which
9589+
this method is called ('\n' for linux, '\r\n' for Windows, i.e.).
9590+
9591+
.. versionchanged:: 0.24.0
95899592
chunksize : int or None
95909593
Rows to write at a time.
95919594
tupleize_cols : bool, default False

pandas/io/common.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -417,13 +417,14 @@ def _get_handle(path_or_buf, mode, encoding=None, compression=None,
417417
elif is_path:
418418
if compat.PY2:
419419
# Python 2
420+
mode = "wb" if mode == "w" else mode
420421
f = open(path_or_buf, mode)
421422
elif encoding:
422423
# Python 3 and encoding
423-
f = open(path_or_buf, mode, encoding=encoding)
424+
f = open(path_or_buf, mode, encoding=encoding, newline="")
424425
elif is_text:
425426
# Python 3 and no explicit encoding
426-
f = open(path_or_buf, mode, errors='replace')
427+
f = open(path_or_buf, mode, errors='replace', newline="")
427428
else:
428429
# Python 3 and binary mode
429430
f = open(path_or_buf, mode)

pandas/io/formats/csvs.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from zipfile import ZipFile
1212

1313
import numpy as np
14+
import os
1415

1516
from pandas._libs import writers as libwriters
1617

@@ -73,7 +74,7 @@ def __init__(self, obj, path_or_buf=None, sep=",", na_rep='',
7374
self.doublequote = doublequote
7475
self.escapechar = escapechar
7576

76-
self.line_terminator = line_terminator
77+
self.line_terminator = line_terminator or os.linesep
7778

7879
self.date_format = date_format
7980

0 commit comments

Comments
 (0)