Skip to content

Commit 97fd744

Browse files
committed
DOC: reord dialect in io.rst
closes pandas-dev#15179
1 parent a8e216b commit 97fd744

File tree

1 file changed

+56
-89
lines changed

1 file changed

+56
-89
lines changed

doc/source/io.rst

+56-89
Original file line numberDiff line numberDiff line change
@@ -357,95 +357,6 @@ warn_bad_lines : boolean, default ``True``
357357
If error_bad_lines is ``False``, and warn_bad_lines is ``True``, a warning for
358358
each "bad line" will be output (only valid with C parser).
359359

360-
.. ipython:: python
361-
:suppress:
362-
363-
f = open('foo.csv','w')
364-
f.write('date,A,B,C\n20090101,a,1,2\n20090102,b,3,4\n20090103,c,4,5')
365-
f.close()
366-
367-
Consider a typical CSV file containing, in this case, some time series data:
368-
369-
.. ipython:: python
370-
371-
print(open('foo.csv').read())
372-
373-
The default for `read_csv` is to create a DataFrame with simple numbered rows:
374-
375-
.. ipython:: python
376-
377-
pd.read_csv('foo.csv')
378-
379-
In the case of indexed data, you can pass the column number or column name you
380-
wish to use as the index:
381-
382-
.. ipython:: python
383-
384-
pd.read_csv('foo.csv', index_col=0)
385-
386-
.. ipython:: python
387-
388-
pd.read_csv('foo.csv', index_col='date')
389-
390-
You can also use a list of columns to create a hierarchical index:
391-
392-
.. ipython:: python
393-
394-
pd.read_csv('foo.csv', index_col=[0, 'A'])
395-
396-
.. _io.dialect:
397-
398-
The ``dialect`` keyword gives greater flexibility in specifying the file format.
399-
By default it uses the Excel dialect but you can specify either the dialect name
400-
or a :class:`python:csv.Dialect` instance.
401-
402-
.. ipython:: python
403-
:suppress:
404-
405-
data = ('label1,label2,label3\n'
406-
'index1,"a,c,e\n'
407-
'index2,b,d,f')
408-
409-
Suppose you had data with unenclosed quotes:
410-
411-
.. ipython:: python
412-
413-
print(data)
414-
415-
By default, ``read_csv`` uses the Excel dialect and treats the double quote as
416-
the quote character, which causes it to fail when it finds a newline before it
417-
finds the closing double quote.
418-
419-
We can get around this using ``dialect``
420-
421-
.. ipython:: python
422-
:okwarning:
423-
424-
dia = csv.excel()
425-
dia.quoting = csv.QUOTE_NONE
426-
pd.read_csv(StringIO(data), dialect=dia)
427-
428-
All of the dialect options can be specified separately by keyword arguments:
429-
430-
.. ipython:: python
431-
432-
data = 'a,b,c~1,2,3~4,5,6'
433-
pd.read_csv(StringIO(data), lineterminator='~')
434-
435-
Another common dialect option is ``skipinitialspace``, to skip any whitespace
436-
after a delimiter:
437-
438-
.. ipython:: python
439-
440-
data = 'a, b, c\n1, 2, 3\n4, 5, 6'
441-
print(data)
442-
pd.read_csv(StringIO(data), skipinitialspace=True)
443-
444-
The parsers make every attempt to "do the right thing" and not be very
445-
fragile. Type inference is a pretty big deal. So if a column can be coerced to
446-
integer dtype without altering the contents, it will do so. Any non-numeric
447-
columns will come through as object dtype as with the rest of pandas objects.
448-
449360
.. _io.dtypes:
450361

451362
Specifying column data types
@@ -1239,6 +1150,62 @@ data that appear in some lines but not others:
12391150
1 4 5 6
12401151
2 8 9 10
12411152
1153+
.. _io.dialect:
1154+
1155+
Dialect
1156+
'''''''
1157+
1158+
The ``dialect`` keyword gives greater flexibility in specifying the file format.
1159+
By default it uses the Excel dialect but you can specify either the dialect name
1160+
or a :class:`python:csv.Dialect` instance.
1161+
1162+
.. ipython:: python
1163+
:suppress:
1164+
1165+
data = ('label1,label2,label3\n'
1166+
'index1,"a,c,e\n'
1167+
'index2,b,d,f')
1168+
1169+
Suppose you had data with unenclosed quotes:
1170+
1171+
.. ipython:: python
1172+
1173+
print(data)
1174+
1175+
By default, ``read_csv`` uses the Excel dialect and treats the double quote as
1176+
the quote character, which causes it to fail when it finds a newline before it
1177+
finds the closing double quote.
1178+
1179+
We can get around this using ``dialect``
1180+
1181+
.. ipython:: python
1182+
:okwarning:
1183+
1184+
dia = csv.excel()
1185+
dia.quoting = csv.QUOTE_NONE
1186+
pd.read_csv(StringIO(data), dialect=dia)
1187+
1188+
All of the dialect options can be specified separately by keyword arguments:
1189+
1190+
.. ipython:: python
1191+
1192+
data = 'a,b,c~1,2,3~4,5,6'
1193+
pd.read_csv(StringIO(data), lineterminator='~')
1194+
1195+
Another common dialect option is ``skipinitialspace``, to skip any whitespace
1196+
after a delimiter:
1197+
1198+
.. ipython:: python
1199+
1200+
data = 'a, b, c\n1, 2, 3\n4, 5, 6'
1201+
print(data)
1202+
pd.read_csv(StringIO(data), skipinitialspace=True)
1203+
1204+
The parsers make every attempt to "do the right thing" and not be very
1205+
fragile. Type inference is a pretty big deal. So if a column can be coerced to
1206+
integer dtype without altering the contents, it will do so. Any non-numeric
1207+
columns will come through as object dtype as with the rest of pandas objects.
1208+
12421209
.. _io.quoting:
12431210

12441211
Quoting and Escape Characters

0 commit comments

Comments
 (0)