@@ -195,7 +195,158 @@ Other Enhancements
195
195
196
196
Backwards incompatible API changes
197
197
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
198
- - :func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'`` for the default line terminator.(:issue:`20353`)
198
+
199
+ .. _whatsnew_0240.api_breaking.csv_line_terminator:
200
+
201
+ `os.linesep` is used for ``line_terminator`` of ``DataFrame.to_csv``
202
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
203
+
204
+ :func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'``
205
+ for the default line terminator(:issue:`20353`).
206
+ - This change only affects when running on Windows, where ``'\r\n'`` was used for line terminator
207
+ even when ``'\n'`` was passed in ``line_terminator``.
208
+ - Strictly speeaking, all ``'\n'``s appear in data and line terminator of CSV were converted into ``'\r\n'``s.
209
+ - This problem was resolved by passing file object with ``newline='\n'`` option as output, rather than file name.
210
+
211
+ Previous Behavior on Windows:
212
+
213
+ .. code-block:: ipython
214
+
215
+ In [1]: import pandas as pd
216
+
217
+ In [2]: data = pd.DataFrame({
218
+ ...: "string_with_lf":["abc","d\nef","g\nh\n\ni"],
219
+ ...: "string_with_crlf":["abc","d\r\nef","g\r\nh\r\n\r\ni"]
220
+ ...: })
221
+
222
+ In [3]: data.to_csv("test.csv",index=False,line_terminator='\n')
223
+
224
+ In [4]: print(pd.read_csv("test.csv"))
225
+ string_with_lf string_with_crlf
226
+ 0 abc abc
227
+ 1 d\r\nef d\r\r\nef
228
+ 2 g\r\nh\r\n\r\ni g\r\r\nh\r\r\n\r\r\ni
229
+
230
+ In [5]: with open("test.csv", mode='rb') as f:
231
+ ...: print(f.read())
232
+ b'string_with_lf,string_with_crlf\r\nabc,abc\r\n"d\r\nef","d\r\r\nef"\r\n"g\r\nh
233
+ \r\n\r\ni","g\r\r\nh\r\r\n\r\r\ni"\r\n'
234
+
235
+ In [6]: with open("test2.csv", mode='w', newline='\n') as f:
236
+ ...: data.to_csv(f,index=False,line_terminator='\n')
237
+
238
+ In [7]: print(pd.read_csv("test2.csv"))
239
+ string_with_lf string_with_crlf
240
+ 0 abc abc
241
+ 1 d\nef d\r\nef
242
+ 2 g\nh\n\ni g\r\nh\r\n\r\ni
243
+
244
+ In [8]: with open("test2.csv", mode='rb') as f:
245
+ ...: print(f.read())
246
+ b'string_with_lf,string_with_crlf\nabc,abc\n"d\nef","d\r\nef"\n"g\nh\n\ni","g\r\
247
+ nh\r\n\r\ni"\n'
248
+
249
+ New Behavior on Windows:
250
+
251
+ - By passing ``line_terminator`` explicitly, line terminator is set to that character.
252
+ - The value of ``line_terminator`` only affects the line terminator of CSV,
253
+ so it does not change the value inside the data.
254
+
255
+ .. code-block:: ipython
256
+
257
+ In [1]: import pandas as pd
258
+
259
+ In [2]: data = pd.DataFrame({
260
+ ...: "string_with_lf":["abc","d\nef","g\nh\n\ni"],
261
+ ...: "string_with_crlf":["abc","d\r\nef","g\r\nh\r\n\r\ni"]
262
+ ...: })
263
+
264
+ In [3]: data.to_csv("test.csv",index=False,line_terminator='\n')
265
+
266
+ In [4]: pd.read_csv("test.csv")
267
+ Out[4]:
268
+ string_with_lf string_with_crlf
269
+ 0 abc abc
270
+ 1 d\nef d\r\nef
271
+ 2 g\nh\n\ni g\r\nh\r\n\r\ni
272
+
273
+ In [5]: with open("test.csv", mode='rb') as f:
274
+ ...: binary_str=f.read()
275
+ ...: binary_str
276
+ Out[5]: b'string_with_lf,string_with_crlf\nabc,abc\n"d\nef","d\r\nef"\n"g\nh\n\n
277
+ i","g\r\nh\r\n\r\ni"\n'
278
+
279
+ - On windows, the value of ``os.linesep`` is ``'\r\n'``,
280
+ so if ``line_terminator`` is not set, ``'\r\n'`` is used for line terminator.
281
+ - Again, it does not affects the value inside the data.
282
+
283
+ .. code-block:: ipython
284
+
285
+ In [1]: import pandas as pd
286
+
287
+ In [2]: data = pd.DataFrame({
288
+ ...: "string_with_lf":["abc","d\nef","g\nh\n\ni"],
289
+ ...: "string_with_crlf":["abc","d\r\nef","g\r\nh\r\n\r\ni"]
290
+ ...: })
291
+
292
+ In [3]: data.to_csv("test.csv",index=False)
293
+
294
+ In [4]: pd.read_csv("test.csv")
295
+ Out[4]:
296
+ string_with_lf string_with_crlf
297
+ 0 abc abc
298
+ 1 d\nef d\r\nef
299
+ 2 g\nh\n\ni g\r\nh\r\n\r\ni
300
+
301
+ In [5]: with open("test.csv", mode='rb') as f:
302
+ ...: binary_str=f.read()
303
+ ...: binary_str
304
+ Out[5]: b'string_with_lf,string_with_crlf\r\nabc,abc\r\n"d\nef","d\r\nef"\r\n"g\
305
+ nh\n\ni","g\r\nh\r\n\r\ni"\r\n'
306
+
307
+ - As default value of ``line_terminator`` changes, just passing file object with ``newline='\n'`` does not set ``'\n'`` to line terminator.
308
+ Pass ``line_terminator='\n'`` explicitly.
309
+
310
+ .. code-block:: ipython
311
+
312
+ In [1]: import pandas as pd
313
+
314
+ In [2]: data = pd.DataFrame({
315
+ ...: "string_with_lf":["abc","d\nef","g\nh\n\ni"],
316
+ ...: "string_with_crlf":["abc","d\r\nef","g\r\nh\r\n\r\ni"]
317
+ ...: })
318
+
319
+ In [3]: with open("test2.csv", mode='w', newline='\n') as f:
320
+ ...: data.to_csv(f,index=False)
321
+
322
+ In [4]: pd.read_csv("test2.csv")
323
+ Out[4]:
324
+ string_with_lf string_with_crlf
325
+ 0 abc abc
326
+ 1 d\nef d\r\nef
327
+ 2 g\nh\n\ni g\r\nh\r\n\r\ni
328
+
329
+ In [5]: with open("test2.csv", mode='rb') as f:
330
+ ...: binary_str=f.read()
331
+ ...: binary_str
332
+ Out[5]: b'string_with_lf,string_with_crlf\r\nabc,abc\r\n"d\nef","d\r\nef"\r\n"g\
333
+ nh\n\ni","g\r\nh\r\n\r\ni"\r\n'
334
+
335
+ In [6]: with open("test2.csv", mode='w', newline='\n') as f:
336
+ ...: data.to_csv(f,index=False,line_terminator='\n')
337
+
338
+ In [7]: pd.read_csv("test2.csv")
339
+ Out[7]:
340
+ string_with_lf string_with_crlf
341
+ 0 abc abc
342
+ 1 d\nef d\r\nef
343
+ 2 g\nh\n\ni g\r\nh\r\n\r\ni
344
+
345
+ In [8]: with open("test2.csv", mode='rb') as f:
346
+ ...: binary_str=f.read()
347
+ ...: binary_str
348
+ Out[8]: b'string_with_lf,string_with_crlf\nabc,abc\n"d\nef","d\r\nef"\n"g\nh\n\n
349
+ i","g\r\nh\r\n\r\ni"\n'
199
350
200
351
201
352
.. _whatsnew_0240.api_breaking.interval_values:
@@ -755,7 +906,6 @@ MultiIndex
755
906
I/O
756
907
^^^
757
908
758
- - Bug in :meth:`DataFrame.to_csv`, in which all `\n`s are converted to `\r\n` on Windows (:issue:`20353`)
759
909
- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
760
910
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
761
911
- :func:`read_csv()` and func:`read_table()` will throw ``UnicodeError`` and not coredump on badly encoded strings (:issue:`22748`)
0 commit comments