28
28
IO Tools (Text, CSV, HDF5, ...)
29
29
===============================
30
30
31
- The pandas I/O API is a set of top level ``reader `` functions accessed like
32
- :func: `pandas.read_csv ` that generally return a pandas object. The corresponding
33
- ``writer `` functions are object methods that are accessed like
34
- :meth: `DataFrame.to_csv `. Below is a table containing available ``readers `` and
31
+ The pandas I/O API is a set of top level ``reader `` functions accessed like
32
+ :func: `pandas.read_csv ` that generally return a pandas object. The corresponding
33
+ ``writer `` functions are object methods that are accessed like
34
+ :meth: `DataFrame.to_csv `. Below is a table containing available ``readers `` and
35
35
``writers ``.
36
36
37
37
.. csv-table ::
@@ -74,7 +74,7 @@ intelligently convert tabular data into a ``DataFrame`` object. See the
74
74
Parsing options
75
75
'''''''''''''''
76
76
77
- The functions :func: `read_csv ` and :func: `read_table ` accept the following
77
+ The functions :func: `read_csv ` and :func: `read_table ` accept the following
78
78
common arguments:
79
79
80
80
Basic
@@ -351,8 +351,8 @@ Error Handling
351
351
352
352
error_bad_lines : boolean, default ``True ``
353
353
Lines with too many fields (e.g. a csv line with too many commas) will by
354
- default cause an exception to be raised, and no ``DataFrame `` will be
355
- returned. If ``False ``, then these "bad lines" will dropped from the
354
+ default cause an exception to be raised, and no ``DataFrame `` will be
355
+ returned. If ``False ``, then these "bad lines" will dropped from the
356
356
``DataFrame `` that is returned. See :ref: `bad lines <io.bad_lines >`
357
357
below.
358
358
warn_bad_lines : boolean, default ``True ``
@@ -364,7 +364,7 @@ warn_bad_lines : boolean, default ``True``
364
364
Specifying column data types
365
365
''''''''''''''''''''''''''''
366
366
367
- You can indicate the data type for the whole ``DataFrame `` or individual
367
+ You can indicate the data type for the whole ``DataFrame `` or individual
368
368
columns:
369
369
370
370
.. ipython :: python
@@ -463,7 +463,7 @@ Specifying Categorical dtype
463
463
pd.read_csv(StringIO(data)).dtypes
464
464
pd.read_csv(StringIO(data), dtype = ' category' ).dtypes
465
465
466
- Individual columns can be parsed as a ``Categorical `` using a dict
466
+ Individual columns can be parsed as a ``Categorical `` using a dict
467
467
specification:
468
468
469
469
.. ipython :: python
@@ -562,17 +562,17 @@ If the header is in a row other than the first, pass the row number to
562
562
Duplicate names parsing
563
563
'''''''''''''''''''''''
564
564
565
- If the file or header contains duplicate names, pandas will by default
565
+ If the file or header contains duplicate names, pandas will by default
566
566
distinguish between them so as to prevent overwriting data:
567
567
568
568
.. ipython :: python
569
569
570
570
data = 'a,b,a\n0,1,2\n3,4,5'
571
571
pd.read_csv(StringIO(data))
572
572
573
- There is no more duplicate data because ``mangle_dupe_cols=True `` by default,
574
- which modifies a series of duplicate columns 'X', ..., 'X' to become
575
- 'X', 'X.1', ..., 'X.N'. If ``mangle_dupe_cols=False ``, duplicate data can
573
+ There is no more duplicate data because ``mangle_dupe_cols=True `` by default,
574
+ which modifies a series of duplicate columns 'X', ..., 'X' to become
575
+ 'X', 'X.1', ..., 'X.N'. If ``mangle_dupe_cols=False ``, duplicate data can
576
576
arise:
577
577
578
578
.. code-block :: python
@@ -927,9 +927,9 @@ Note that performance-wise, you should try these methods of parsing dates in ord
927
927
For optimal performance, this should be vectorized, i.e., it should accept arrays
928
928
as arguments.
929
929
930
- You can explore the date parsing functionality in
931
- `date_converters.py <https://github.com/pandas-dev/pandas/blob/master/pandas/io/date_converters.py >`__
932
- and add your own. We would love to turn this module into a community supported
930
+ You can explore the date parsing functionality in
931
+ `date_converters.py <https://github.com/pandas-dev/pandas/blob/master/pandas/io/date_converters.py >`__
932
+ and add your own. We would love to turn this module into a community supported
933
933
set of date/time parsers. To get you started, ``date_converters.py `` contains
934
934
functions to parse dual date and time columns, year/month/day columns,
935
935
and year/month/day/hour/minute/second columns. It also contains a
@@ -1073,11 +1073,11 @@ The ``thousands`` keyword allows integers to be parsed correctly:
1073
1073
NA Values
1074
1074
'''''''''
1075
1075
1076
- To control which values are parsed as missing values (which are signified by
1077
- ``NaN ``), specify a string in ``na_values ``. If you specify a list of strings,
1078
- then all values in it are considered to be missing values. If you specify a
1079
- number (a ``float ``, like ``5.0 `` or an ``integer `` like ``5 ``), the
1080
- corresponding equivalent values will also imply a missing value (in this case
1076
+ To control which values are parsed as missing values (which are signified by
1077
+ ``NaN ``), specify a string in ``na_values ``. If you specify a list of strings,
1078
+ then all values in it are considered to be missing values. If you specify a
1079
+ number (a ``float ``, like ``5.0 `` or an ``integer `` like ``5 ``), the
1080
+ corresponding equivalent values will also imply a missing value (in this case
1081
1081
effectively ``[5.0, 5] `` are recognized as ``NaN ``).
1082
1082
1083
1083
To completely override the default values that are recognized as missing, specify ``keep_default_na=False ``.
@@ -1094,7 +1094,7 @@ Let us consider some examples:
1094
1094
read_csv(path, na_values = [5 ])
1095
1095
1096
1096
In the example above ``5 `` and ``5.0 `` will be recognized as ``NaN ``, in
1097
- addition to the defaults. A string will first be interpreted as a numerical
1097
+ addition to the defaults. A string will first be interpreted as a numerical
1098
1098
``5 ``, then as a ``NaN ``.
1099
1099
1100
1100
.. code-block :: python
@@ -1113,7 +1113,7 @@ Above, both ``NA`` and ``0`` as strings are ``NaN``.
1113
1113
1114
1114
read_csv(path, na_values = [" Nope" ])
1115
1115
1116
- The default values, in addition to the string ``"Nope" `` are recognized as
1116
+ The default values, in addition to the string ``"Nope" `` are recognized as
1117
1117
``NaN ``.
1118
1118
1119
1119
.. _io.infinity :
@@ -1272,8 +1272,8 @@ after a delimiter:
1272
1272
print (data)
1273
1273
pd.read_csv(StringIO(data), skipinitialspace = True )
1274
1274
1275
- The parsers make every attempt to "do the right thing" and not be fragile. Type
1276
- inference is a pretty big deal. If a column can be coerced to integer dtype
1275
+ The parsers make every attempt to "do the right thing" and not be fragile. Type
1276
+ inference is a pretty big deal. If a column can be coerced to integer dtype
1277
1277
without altering the contents, the parser will do so. Any non-numeric
1278
1278
columns will come through as object dtype as with the rest of pandas objects.
1279
1279
@@ -1814,7 +1814,7 @@ Writing to a file, with a date index and a date column:
1814
1814
Fallback Behavior
1815
1815
+++++++++++++++++
1816
1816
1817
- If the JSON serializer cannot handle the container contents directly it will
1817
+ If the JSON serializer cannot handle the container contents directly it will
1818
1818
fall back in the following manner:
1819
1819
1820
1820
- if the dtype is unsupported (e.g. ``np.complex ``) then the ``default_handler ``, if provided, will be called
@@ -1908,10 +1908,10 @@ overview.
1908
1908
Data Conversion
1909
1909
+++++++++++++++
1910
1910
1911
- The default of ``convert_axes=True ``, ``dtype=True ``, and ``convert_dates=True ``
1912
- will try to parse the axes, and all of the data into appropriate types,
1913
- including dates. If you need to override specific dtypes, pass a dict to
1914
- ``dtype ``. ``convert_axes `` should only be set to ``False `` if you need to
1911
+ The default of ``convert_axes=True ``, ``dtype=True ``, and ``convert_dates=True ``
1912
+ will try to parse the axes, and all of the data into appropriate types,
1913
+ including dates. If you need to override specific dtypes, pass a dict to
1914
+ ``dtype ``. ``convert_axes `` should only be set to ``False `` if you need to
1915
1915
preserve string-like numbers (e.g. '1', '2') in an axes.
1916
1916
1917
1917
.. note ::
@@ -2675,7 +2675,7 @@ The :func:`~pandas.read_excel` method can read Excel 2003 (``.xls``) and
2675
2675
Excel 2007+ (``.xlsx ``) files using the ``xlrd `` Python
2676
2676
module. The :meth: `~DataFrame.to_excel ` instance method is used for
2677
2677
saving a ``DataFrame `` to Excel. Generally the semantics are
2678
- similar to working with :ref: `csv<io.read_csv_table> ` data.
2678
+ similar to working with :ref: `csv<io.read_csv_table> ` data.
2679
2679
See the :ref: `cookbook<cookbook.excel> ` for some advanced strategies.
2680
2680
2681
2681
.. _io.excel_reader :
@@ -3065,9 +3065,9 @@ The look and feel of Excel worksheets created from pandas can be modified using
3065
3065
Clipboard
3066
3066
---------
3067
3067
3068
- A handy way to grab data is to use the :meth: `~DataFrame.read_clipboard ` method,
3069
- which takes the contents of the clipboard buffer and passes them to the
3070
- ``read_table `` method. For instance, you can copy the following text to the
3068
+ A handy way to grab data is to use the :meth: `~DataFrame.read_clipboard ` method,
3069
+ which takes the contents of the clipboard buffer and passes them to the
3070
+ ``read_table `` method. For instance, you can copy the following text to the
3071
3071
clipboard (CTRL-C on many operating systems):
3072
3072
3073
3073
.. code-block :: python
@@ -4550,7 +4550,7 @@ Several caveats.
4550
4550
on an attempt at serialization.
4551
4551
4552
4552
You can specify an ``engine `` to direct the serialization. This can be one of ``pyarrow ``, or ``fastparquet ``, or ``auto ``.
4553
- If the engine is NOT specified, then the ``pd.options.io.parquet.engine `` option is checked; if this is also ``auto ``,
4553
+ If the engine is NOT specified, then the ``pd.options.io.parquet.engine `` option is checked; if this is also ``auto ``,
4554
4554
then ``pyarrow `` is tried, and falling back to ``fastparquet ``.
4555
4555
4556
4556
See the documentation for `pyarrow <http://arrow.apache.org/docs/python/ >`__ and `fastparquet <https://fastparquet.readthedocs.io/en/latest/ >`__.
@@ -5200,7 +5200,7 @@ ignored.
5200
5200
dtypes: float64(1), int64(1)
5201
5201
memory usage: 15.3 MB
5202
5202
5203
- When writing, the top-three functions in terms of speed are are
5203
+ When writing, the top-three functions in terms of speed are are
5204
5204
``test_pickle_write ``, ``test_feather_write `` and ``test_hdf_fixed_write_compress ``.
5205
5205
5206
5206
.. code-block :: ipython
0 commit comments