Skip to content

Commit 8bde6c6

Browse files
committed
DOC: expanded whatsnew docs
PR #21406 * Expanded whatsnew documents about the change of to_csv * Resolved duplication
1 parent 023e4ab commit 8bde6c6

File tree

1 file changed

+152
-2
lines changed

1 file changed

+152
-2
lines changed

doc/source/whatsnew/v0.24.0.txt

+152-2
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,158 @@ Other Enhancements
195195

196196
Backwards incompatible API changes
197197
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
198-
- :func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'`` for the default line terminator.(:issue:`20353`)
198+
199+
.. _whatsnew_0240.api_breaking.csv_line_terminator:
200+
201+
`os.linesep` is used for ``line_terminator`` of ``DataFrame.to_csv``
202+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
203+
204+
:func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'``
205+
for the default line terminator(:issue:`20353`).
206+
- This change only affects when running on Windows, where ``'\r\n'`` was used for line terminator
207+
even when ``'\n'`` was passed in ``line_terminator``.
208+
- Strictly speeaking, all ``'\n'``s appear in data and line terminator of CSV were converted into ``'\r\n'``s.
209+
- This problem was resolved by passing file object with ``newline='\n'`` option as output, rather than file name.
210+
211+
Previous Behavior on Windows:
212+
213+
.. code-block:: ipython
214+
215+
In [1]: import pandas as pd
216+
217+
In [2]: data = pd.DataFrame({
218+
...: "string_with_lf":["abc","d\nef","g\nh\n\ni"],
219+
...: "string_with_crlf":["abc","d\r\nef","g\r\nh\r\n\r\ni"]
220+
...: })
221+
222+
In [3]: data.to_csv("test.csv",index=False,line_terminator='\n')
223+
224+
In [4]: print(pd.read_csv("test.csv"))
225+
string_with_lf string_with_crlf
226+
0 abc abc
227+
1 d\r\nef d\r\r\nef
228+
2 g\r\nh\r\n\r\ni g\r\r\nh\r\r\n\r\r\ni
229+
230+
In [5]: with open("test.csv", mode='rb') as f:
231+
...: print(f.read())
232+
b'string_with_lf,string_with_crlf\r\nabc,abc\r\n"d\r\nef","d\r\r\nef"\r\n"g\r\nh
233+
\r\n\r\ni","g\r\r\nh\r\r\n\r\r\ni"\r\n'
234+
235+
In [6]: with open("test2.csv", mode='w', newline='\n') as f:
236+
...: data.to_csv(f,index=False,line_terminator='\n')
237+
238+
In [7]: print(pd.read_csv("test2.csv"))
239+
string_with_lf string_with_crlf
240+
0 abc abc
241+
1 d\nef d\r\nef
242+
2 g\nh\n\ni g\r\nh\r\n\r\ni
243+
244+
In [8]: with open("test2.csv", mode='rb') as f:
245+
...: print(f.read())
246+
b'string_with_lf,string_with_crlf\nabc,abc\n"d\nef","d\r\nef"\n"g\nh\n\ni","g\r\
247+
nh\r\n\r\ni"\n'
248+
249+
New Behavior on Windows:
250+
251+
- By passing ``line_terminator`` explicitly, line terminator is set to that character.
252+
- The value of ``line_terminator`` only affects the line terminator of CSV,
253+
so it does not change the value inside the data.
254+
255+
.. code-block:: ipython
256+
257+
In [1]: import pandas as pd
258+
259+
In [2]: data = pd.DataFrame({
260+
...: "string_with_lf":["abc","d\nef","g\nh\n\ni"],
261+
...: "string_with_crlf":["abc","d\r\nef","g\r\nh\r\n\r\ni"]
262+
...: })
263+
264+
In [3]: data.to_csv("test.csv",index=False,line_terminator='\n')
265+
266+
In [4]: pd.read_csv("test.csv")
267+
Out[4]:
268+
string_with_lf string_with_crlf
269+
0 abc abc
270+
1 d\nef d\r\nef
271+
2 g\nh\n\ni g\r\nh\r\n\r\ni
272+
273+
In [5]: with open("test.csv", mode='rb') as f:
274+
...: binary_str=f.read()
275+
...: binary_str
276+
Out[5]: b'string_with_lf,string_with_crlf\nabc,abc\n"d\nef","d\r\nef"\n"g\nh\n\n
277+
i","g\r\nh\r\n\r\ni"\n'
278+
279+
- On windows, the value of ``os.linesep`` is ``'\r\n'``,
280+
so if ``line_terminator`` is not set, ``'\r\n'`` is used for line terminator.
281+
- Again, it does not affects the value inside the data.
282+
283+
.. code-block:: ipython
284+
285+
In [1]: import pandas as pd
286+
287+
In [2]: data = pd.DataFrame({
288+
...: "string_with_lf":["abc","d\nef","g\nh\n\ni"],
289+
...: "string_with_crlf":["abc","d\r\nef","g\r\nh\r\n\r\ni"]
290+
...: })
291+
292+
In [3]: data.to_csv("test.csv",index=False)
293+
294+
In [4]: pd.read_csv("test.csv")
295+
Out[4]:
296+
string_with_lf string_with_crlf
297+
0 abc abc
298+
1 d\nef d\r\nef
299+
2 g\nh\n\ni g\r\nh\r\n\r\ni
300+
301+
In [5]: with open("test.csv", mode='rb') as f:
302+
...: binary_str=f.read()
303+
...: binary_str
304+
Out[5]: b'string_with_lf,string_with_crlf\r\nabc,abc\r\n"d\nef","d\r\nef"\r\n"g\
305+
nh\n\ni","g\r\nh\r\n\r\ni"\r\n'
306+
307+
- As default value of ``line_terminator`` changes, just passing file object with ``newline='\n'`` does not set ``'\n'`` to line terminator.
308+
Pass ``line_terminator='\n'`` explicitly.
309+
310+
.. code-block:: ipython
311+
312+
In [1]: import pandas as pd
313+
314+
In [2]: data = pd.DataFrame({
315+
...: "string_with_lf":["abc","d\nef","g\nh\n\ni"],
316+
...: "string_with_crlf":["abc","d\r\nef","g\r\nh\r\n\r\ni"]
317+
...: })
318+
319+
In [3]: with open("test2.csv", mode='w', newline='\n') as f:
320+
...: data.to_csv(f,index=False)
321+
322+
In [4]: pd.read_csv("test2.csv")
323+
Out[4]:
324+
string_with_lf string_with_crlf
325+
0 abc abc
326+
1 d\nef d\r\nef
327+
2 g\nh\n\ni g\r\nh\r\n\r\ni
328+
329+
In [5]: with open("test2.csv", mode='rb') as f:
330+
...: binary_str=f.read()
331+
...: binary_str
332+
Out[5]: b'string_with_lf,string_with_crlf\r\nabc,abc\r\n"d\nef","d\r\nef"\r\n"g\
333+
nh\n\ni","g\r\nh\r\n\r\ni"\r\n'
334+
335+
In [6]: with open("test2.csv", mode='w', newline='\n') as f:
336+
...: data.to_csv(f,index=False,line_terminator='\n')
337+
338+
In [7]: pd.read_csv("test2.csv")
339+
Out[7]:
340+
string_with_lf string_with_crlf
341+
0 abc abc
342+
1 d\nef d\r\nef
343+
2 g\nh\n\ni g\r\nh\r\n\r\ni
344+
345+
In [8]: with open("test2.csv", mode='rb') as f:
346+
...: binary_str=f.read()
347+
...: binary_str
348+
Out[8]: b'string_with_lf,string_with_crlf\nabc,abc\n"d\nef","d\r\nef"\n"g\nh\n\n
349+
i","g\r\nh\r\n\r\ni"\n'
199350

200351

201352
.. _whatsnew_0240.api_breaking.interval_values:
@@ -755,7 +906,6 @@ MultiIndex
755906
I/O
756907
^^^
757908

758-
- Bug in :meth:`DataFrame.to_csv`, in which all `\n`s are converted to `\r\n` on Windows (:issue:`20353`)
759909
- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
760910
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
761911
- :func:`read_csv()` and func:`read_table()` will throw ``UnicodeError`` and not coredump on badly encoded strings (:issue:`22748`)

0 commit comments

Comments
 (0)