Skip to content

Commit 4392824

Browse files
authored
DOC: fix inconsistencies in read_csv docstring type descriptions (#53834)
* update type descriptions for read_csv parameters * make read_csv parameter descriptions consistent with valid type options * correct minor docstring formatting
1 parent af804a9 commit 4392824

File tree

1 file changed

+63
-57
lines changed

1 file changed

+63
-57
lines changed

pandas/io/parsers/readers.py

+63-57
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,8 @@
101101
By file-like object, we refer to objects with a ``read()`` method, such as
102102
a file handle (e.g. via builtin ``open`` function) or ``StringIO``.
103103
sep : str, default {_default_sep}
104-
Delimiter to use. If ``sep=None``, the C engine cannot automatically detect
104+
Character or regex pattern to treat as the delimiter. If ``sep=None``, the
105+
C engine cannot automatically detect
105106
the separator, but the Python parsing engine can, meaning the latter will
106107
be used and automatically detect the separator from only the first valid
107108
row of the file by Python's builtin sniffer tool, ``csv.Sniffer``.
@@ -111,9 +112,9 @@
111112
to ignoring quoted data. Regex example: ``'\r\t'``.
112113
delimiter : str, optional
113114
Alias for ``sep``.
114-
header : int, list of int, None, default 'infer'
115-
Row number(s) to use as the column names, and the start of the
116-
data. Default behavior is to infer the column names: if no ``names``
115+
header : int, Sequence of int, 'infer' or None, default 'infer'
116+
Row number(s) containing column labels and marking the start of the
117+
data (zero-indexed). Default behavior is to infer the column names: if no ``names``
117118
are passed the behavior is identical to ``header=0`` and column
118119
names are inferred from the first line of the file, if column
119120
names are passed explicitly to ``names`` then the behavior is identical to
@@ -125,20 +126,21 @@
125126
parameter ignores commented lines and empty lines if
126127
``skip_blank_lines=True``, so ``header=0`` denotes the first line of
127128
data rather than the first line of the file.
128-
names : array-like, optional
129-
List of column names to use. If the file contains a header row,
129+
names : Sequence of Hashable, optional
130+
Sequence of column labels to apply. If the file contains a header row,
130131
then you should explicitly pass ``header=0`` to override the column names.
131132
Duplicates in this list are not allowed.
132-
index_col : int, str, sequence of int / str, or False, optional
133-
Column(s) to use as the row labels of the :class:`~pandas.DataFrame`, either given as
134-
string name or column index. If a sequence of ``int`` / ``str`` is given, a
135-
:class:`~pandas.MultiIndex` is used.
133+
index_col : Hashable, Sequence of Hashable or False, optional
134+
Column(s) to use as row label(s), denoted either by column labels or column
135+
indices. If a sequence of labels or indices is given, :class:`~pandas.MultiIndex`
136+
will be formed for the row labels.
136137
137138
Note: ``index_col=False`` can be used to force ``pandas`` to *not* use the first
138-
column as the index, e.g. when you have a malformed file with delimiters at
139+
column as the index, e.g., when you have a malformed file with delimiters at
139140
the end of each line.
140-
usecols : list-like or callable, optional
141-
Return a subset of the columns. If list-like, all elements must either
141+
usecols : list of Hashable or Callable, optional
142+
Subset of columns to select, denoted either by column labels or column indices.
143+
If list-like, all elements must either
142144
be positional (i.e. integer indices into the document columns) or strings
143145
that correspond to column names provided either by the user in ``names`` or
144146
inferred from the document header row(s). If ``names`` are given, the document
@@ -156,9 +158,9 @@
156158
example of a valid callable argument would be ``lambda x: x.upper() in
157159
['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster
158160
parsing time and lower memory usage.
159-
dtype : Type name or dict of column -> type, optional
160-
Data type for data or columns. E.g., ``{{'a': np.float64, 'b': np.int32,
161-
'c': 'Int64'}}``
161+
dtype : dtype or dict of {{Hashable : dtype}}, optional
162+
Data type(s) to apply to either the whole dataset or individual columns.
163+
E.g., ``{{'a': np.float64, 'b': np.int32, 'c': 'Int64'}}``
162164
Use ``str`` or ``object`` together with suitable ``na_values`` settings
163165
to preserve and not interpret ``dtype``.
164166
If ``converters`` are specified, they will be applied INSTEAD
@@ -176,18 +178,18 @@
176178
177179
.. versionadded:: 1.4.0
178180
179-
The "pyarrow" engine was added as an *experimental* engine, and some features
181+
The 'pyarrow' engine was added as an *experimental* engine, and some features
180182
are unsupported, or may not work correctly, with this engine.
181-
converters : dict, optional
182-
``dict`` of functions for converting values in certain columns. Keys can either
183-
be integers or column labels.
183+
converters : dict of {{Hashable : Callable}}, optional
184+
Functions for converting values in specified columns. Keys can either
185+
be column labels or column indices.
184186
true_values : list, optional
185-
Values to consider as ``True`` in addition to case-insensitive variants of "True".
187+
Values to consider as ``True`` in addition to case-insensitive variants of 'True'.
186188
false_values : list, optional
187-
Values to consider as ``False`` in addition to case-insensitive variants of "False".
189+
Values to consider as ``False`` in addition to case-insensitive variants of 'False'.
188190
skipinitialspace : bool, default False
189191
Skip spaces after delimiter.
190-
skiprows : list-like, int or callable, optional
192+
skiprows : int, list of int or Callable, optional
191193
Line numbers to skip (0-indexed) or number of lines to skip (``int``)
192194
at the start of the file.
193195
@@ -198,7 +200,7 @@
198200
Number of lines at bottom of file to skip (Unsupported with ``engine='c'``).
199201
nrows : int, optional
200202
Number of rows of file to read. Useful for reading pieces of large files.
201-
na_values : scalar, str, list-like, or dict, optional
203+
na_values : Hashable, Iterable of Hashable or dict of {{Hashable : Iterable}}, optional
202204
Additional strings to recognize as ``NA``/``NaN``. If ``dict`` passed, specific
203205
per-column ``NA`` values. By default the following values are interpreted as
204206
``NaN``: '"""
@@ -227,7 +229,7 @@
227229
Indicate number of ``NA`` values placed in non-numeric columns.
228230
skip_blank_lines : bool, default True
229231
If ``True``, skip over blank lines rather than interpreting as ``NaN`` values.
230-
parse_dates : bool or list of int or names or list of lists or dict, \
232+
parse_dates : bool, list of Hashable, list of lists or dict of {{Hashable : list}}, \
231233
default False
232234
The behavior is as follows:
233235
@@ -258,7 +260,7 @@
258260
keep_date_col : bool, default False
259261
If ``True`` and ``parse_dates`` specifies combining multiple columns then
260262
keep the original columns.
261-
date_parser : function, optional
263+
date_parser : Callable, optional
262264
Function to use for converting a sequence of string columns to an array of
263265
``datetime`` instances. The default uses ``dateutil.parser.parser`` to do the
264266
conversion. ``pandas`` will try to call ``date_parser`` in three different ways,
@@ -273,9 +275,9 @@
273275
Use ``date_format`` instead, or read in as ``object`` and then apply
274276
:func:`~pandas.to_datetime` as-needed.
275277
date_format : str or dict of column -> format, optional
276-
If used in conjunction with ``parse_dates``, will parse dates according to this
277-
format. For anything more complex,
278-
please read in as ``object`` and then apply :func:`~pandas.to_datetime` as-needed.
278+
Format to use for parsing dates when used in conjunction with ``parse_dates``.
279+
For anything more complex, please read in as ``object`` and then apply
280+
:func:`~pandas.to_datetime` as-needed.
279281
280282
.. versionadded:: 2.0.0
281283
dayfirst : bool, default False
@@ -306,50 +308,53 @@
306308
307309
.. versionchanged:: 1.4.0 Zstandard support.
308310
309-
thousands : str, optional
310-
Thousands separator.
311-
decimal : str, default '.'
312-
Character to recognize as decimal point (e.g. use ',' for European data).
311+
thousands : str (length 1), optional
312+
Character acting as the thousands separator in numerical values.
313+
decimal : str (length 1), default '.'
314+
Character to recognize as decimal point (e.g., use ',' for European data).
313315
lineterminator : str (length 1), optional
314-
Character to break file into lines. Only valid with C parser.
316+
Character used to denote a line break. Only valid with C parser.
315317
quotechar : str (length 1), optional
316-
The character used to denote the start and end of a quoted item. Quoted
318+
Character used to denote the start and end of a quoted item. Quoted
317319
items can include the ``delimiter`` and it will be ignored.
318-
quoting : int or csv.QUOTE_* instance, default 0
319-
Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of
320-
``QUOTE_MINIMAL`` (0), ``QUOTE_ALL`` (1), ``QUOTE_NONNUMERIC`` (2) or
321-
``QUOTE_NONE`` (3).
320+
quoting : {{0 or csv.QUOTE_MINIMAL, 1 or csv.QUOTE_ALL, 2 or csv.QUOTE_NONNUMERIC, \
321+
3 or csv.QUOTE_NONE}}, default csv.QUOTE_MINIMAL
322+
Control field quoting behavior per ``csv.QUOTE_*`` constants. Default is
323+
``csv.QUOTE_MINIMAL`` (i.e., 0) which implies that only fields containing special
324+
characters are quoted (e.g., characters defined in ``quotechar``, ``delimiter``,
325+
or ``lineterminator``.
322326
doublequote : bool, default True
323327
When ``quotechar`` is specified and ``quoting`` is not ``QUOTE_NONE``, indicate
324328
whether or not to interpret two consecutive ``quotechar`` elements INSIDE a
325329
field as a single ``quotechar`` element.
326330
escapechar : str (length 1), optional
327-
One-character string used to escape other characters.
328-
comment : str, optional
329-
Indicates remainder of line should not be parsed. If found at the beginning
331+
Character used to escape other characters.
332+
comment : str (length 1), optional
333+
Character indicating that the remainder of line should not be parsed.
334+
If found at the beginning
330335
of a line, the line will be ignored altogether. This parameter must be a
331336
single character. Like empty lines (as long as ``skip_blank_lines=True``),
332337
fully commented lines are ignored by the parameter ``header`` but not by
333338
``skiprows``. For example, if ``comment='#'``, parsing
334339
``#empty\\na,b,c\\n1,2,3`` with ``header=0`` will result in ``'a,b,c'`` being
335340
treated as the header.
336-
encoding : str, optional, default "utf-8"
341+
encoding : str, optional, default 'utf-8'
337342
Encoding to use for UTF when reading/writing (ex. ``'utf-8'``). `List of Python
338343
standard encodings
339344
<https://docs.python.org/3/library/codecs.html#standard-encodings>`_ .
340345
341346
.. versionchanged:: 1.2
342347
343-
When ``encoding`` is ``None``, ``errors="replace"`` is passed to
344-
``open()``. Otherwise, ``errors="strict"`` is passed to ``open()``.
345-
This behavior was previously only the case for ``engine="python"``.
348+
When ``encoding`` is ``None``, ``errors='replace'`` is passed to
349+
``open()``. Otherwise, ``errors='strict'`` is passed to ``open()``.
350+
This behavior was previously only the case for ``engine='python'``.
346351
347352
.. versionchanged:: 1.3.0
348353
349354
``encoding_errors`` is a new argument. ``encoding`` has no longer an
350355
influence on how encoding errors are handled.
351356
352-
encoding_errors : str, optional, default "strict"
357+
encoding_errors : str, optional, default 'strict'
353358
How encoding errors are treated. `List of possible values
354359
<https://docs.python.org/3/library/codecs.html#error-handlers>`_ .
355360
@@ -361,7 +366,7 @@
361366
``skipinitialspace``, ``quotechar``, and ``quoting``. If it is necessary to
362367
override values, a ``ParserWarning`` will be issued. See ``csv.Dialect``
363368
documentation for more details.
364-
on_bad_lines : {{'error', 'warn', 'skip'}} or callable, default 'error'
369+
on_bad_lines : {{'error', 'warn', 'skip'}} or Callable, default 'error'
365370
Specifies what to do upon encountering a bad line (a line with too many fields).
366371
Allowed values are :
367372
@@ -379,11 +384,11 @@
379384
If the function returns ``None``, the bad line will be ignored.
380385
If the function returns a new ``list`` of strings with more elements than
381386
expected, a ``ParserWarning`` will be emitted while dropping extra elements.
382-
Only supported when ``engine="python"``
387+
Only supported when ``engine='python'``
383388
384389
delim_whitespace : bool, default False
385390
Specifies whether or not whitespace (e.g. ``' '`` or ``'\\t'``) will be
386-
used as the sep. Equivalent to setting ``sep='\\s+'``. If this option
391+
used as the ``sep`` delimiter. Equivalent to setting ``sep='\\s+'``. If this option
387392
is set to ``True``, nothing should be passed in for the ``delimiter``
388393
parameter.
389394
low_memory : bool, default True
@@ -397,7 +402,7 @@
397402
If a filepath is provided for ``filepath_or_buffer``, map the file object
398403
directly onto memory and access the data directly from there. Using this
399404
option can improve performance because there is no longer any I/O overhead.
400-
float_precision : str, optional
405+
float_precision : {{'high', 'legacy', 'round_trip'}}, optional
401406
Specifies which converter the C engine should use for floating-point
402407
values. The options are ``None`` or ``'high'`` for the ordinary converter,
403408
``'legacy'`` for the original lower precision ``pandas`` converter, and
@@ -409,13 +414,14 @@
409414
410415
.. versionadded:: 1.2
411416
412-
dtype_backend : {{"numpy_nullable", "pyarrow"}}, defaults to NumPy backed DataFrame
413-
Which ``dtype_backend`` to use, e.g. whether a :class:`~pandas.DataFrame` should
414-
have NumPy arrays, nullable ``dtypes`` are used for all ``dtypes`` that have a
415-
nullable implementation when ``"numpy_nullable"`` is set, pyarrow is used for all
416-
dtypes if ``"pyarrow"`` is set.
417+
dtype_backend : {{'numpy_nullable', 'pyarrow'}}, defaults to NumPy backed DataFrame
418+
Back-end data type to use for the :class:`~pandas.DataFrame`. For
419+
``'numpy_nullable'``, have NumPy arrays, nullable ``dtypes`` are used for all
420+
``dtypes`` that have a
421+
nullable implementation when ``'numpy_nullable'`` is set, pyarrow is used for all
422+
dtypes if ``'pyarrow'`` is set.
417423
418-
The ``dtype_backends`` are still experimential.
424+
The ``dtype_backends`` are still experimental.
419425
420426
.. versionadded:: 2.0
421427

0 commit comments

Comments
 (0)