|
101 | 101 | By file-like object, we refer to objects with a ``read()`` method, such as
|
102 | 102 | a file handle (e.g. via builtin ``open`` function) or ``StringIO``.
|
103 | 103 | sep : str, default {_default_sep}
|
104 |
| - Delimiter to use. If ``sep=None``, the C engine cannot automatically detect |
| 104 | + Character or regex pattern to treat as the delimiter. If ``sep=None``, the |
| 105 | + C engine cannot automatically detect |
105 | 106 | the separator, but the Python parsing engine can, meaning the latter will
|
106 | 107 | be used and automatically detect the separator from only the first valid
|
107 | 108 | row of the file by Python's builtin sniffer tool, ``csv.Sniffer``.
|
|
111 | 112 | to ignoring quoted data. Regex example: ``'\r\t'``.
|
112 | 113 | delimiter : str, optional
|
113 | 114 | Alias for ``sep``.
|
114 |
| -header : int, list of int, None, default 'infer' |
115 |
| - Row number(s) to use as the column names, and the start of the |
116 |
| - data. Default behavior is to infer the column names: if no ``names`` |
| 115 | +header : int, Sequence of int, 'infer' or None, default 'infer' |
| 116 | + Row number(s) containing column labels and marking the start of the |
| 117 | + data (zero-indexed). Default behavior is to infer the column names: if no ``names`` |
117 | 118 | are passed the behavior is identical to ``header=0`` and column
|
118 | 119 | names are inferred from the first line of the file, if column
|
119 | 120 | names are passed explicitly to ``names`` then the behavior is identical to
|
|
125 | 126 | parameter ignores commented lines and empty lines if
|
126 | 127 | ``skip_blank_lines=True``, so ``header=0`` denotes the first line of
|
127 | 128 | data rather than the first line of the file.
|
128 |
| -names : array-like, optional |
129 |
| - List of column names to use. If the file contains a header row, |
| 129 | +names : Sequence of Hashable, optional |
| 130 | + Sequence of column labels to apply. If the file contains a header row, |
130 | 131 | then you should explicitly pass ``header=0`` to override the column names.
|
131 | 132 | Duplicates in this list are not allowed.
|
132 |
| -index_col : int, str, sequence of int / str, or False, optional |
133 |
| - Column(s) to use as the row labels of the :class:`~pandas.DataFrame`, either given as |
134 |
| - string name or column index. If a sequence of ``int`` / ``str`` is given, a |
135 |
| - :class:`~pandas.MultiIndex` is used. |
| 133 | +index_col : Hashable, Sequence of Hashable or False, optional |
| 134 | + Column(s) to use as row label(s), denoted either by column labels or column |
| 135 | + indices. If a sequence of labels or indices is given, :class:`~pandas.MultiIndex` |
| 136 | + will be formed for the row labels. |
136 | 137 |
|
137 | 138 | Note: ``index_col=False`` can be used to force ``pandas`` to *not* use the first
|
138 |
| - column as the index, e.g. when you have a malformed file with delimiters at |
| 139 | + column as the index, e.g., when you have a malformed file with delimiters at |
139 | 140 | the end of each line.
|
140 |
| -usecols : list-like or callable, optional |
141 |
| - Return a subset of the columns. If list-like, all elements must either |
| 141 | +usecols : list of Hashable or Callable, optional |
| 142 | + Subset of columns to select, denoted either by column labels or column indices. |
| 143 | + If list-like, all elements must either |
142 | 144 | be positional (i.e. integer indices into the document columns) or strings
|
143 | 145 | that correspond to column names provided either by the user in ``names`` or
|
144 | 146 | inferred from the document header row(s). If ``names`` are given, the document
|
|
156 | 158 | example of a valid callable argument would be ``lambda x: x.upper() in
|
157 | 159 | ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster
|
158 | 160 | parsing time and lower memory usage.
|
159 |
| -dtype : Type name or dict of column -> type, optional |
160 |
| - Data type for data or columns. E.g., ``{{'a': np.float64, 'b': np.int32, |
161 |
| - 'c': 'Int64'}}`` |
| 161 | +dtype : dtype or dict of {{Hashable : dtype}}, optional |
| 162 | + Data type(s) to apply to either the whole dataset or individual columns. |
| 163 | + E.g., ``{{'a': np.float64, 'b': np.int32, 'c': 'Int64'}}`` |
162 | 164 | Use ``str`` or ``object`` together with suitable ``na_values`` settings
|
163 | 165 | to preserve and not interpret ``dtype``.
|
164 | 166 | If ``converters`` are specified, they will be applied INSTEAD
|
|
176 | 178 |
|
177 | 179 | .. versionadded:: 1.4.0
|
178 | 180 |
|
179 |
| - The "pyarrow" engine was added as an *experimental* engine, and some features |
| 181 | + The 'pyarrow' engine was added as an *experimental* engine, and some features |
180 | 182 | are unsupported, or may not work correctly, with this engine.
|
181 |
| -converters : dict, optional |
182 |
| - ``dict`` of functions for converting values in certain columns. Keys can either |
183 |
| - be integers or column labels. |
| 183 | +converters : dict of {{Hashable : Callable}}, optional |
| 184 | + Functions for converting values in specified columns. Keys can either |
| 185 | + be column labels or column indices. |
184 | 186 | true_values : list, optional
|
185 |
| - Values to consider as ``True`` in addition to case-insensitive variants of "True". |
| 187 | + Values to consider as ``True`` in addition to case-insensitive variants of 'True'. |
186 | 188 | false_values : list, optional
|
187 |
| - Values to consider as ``False`` in addition to case-insensitive variants of "False". |
| 189 | + Values to consider as ``False`` in addition to case-insensitive variants of 'False'. |
188 | 190 | skipinitialspace : bool, default False
|
189 | 191 | Skip spaces after delimiter.
|
190 |
| -skiprows : list-like, int or callable, optional |
| 192 | +skiprows : int, list of int or Callable, optional |
191 | 193 | Line numbers to skip (0-indexed) or number of lines to skip (``int``)
|
192 | 194 | at the start of the file.
|
193 | 195 |
|
|
198 | 200 | Number of lines at bottom of file to skip (Unsupported with ``engine='c'``).
|
199 | 201 | nrows : int, optional
|
200 | 202 | Number of rows of file to read. Useful for reading pieces of large files.
|
201 |
| -na_values : scalar, str, list-like, or dict, optional |
| 203 | +na_values : Hashable, Iterable of Hashable or dict of {{Hashable : Iterable}}, optional |
202 | 204 | Additional strings to recognize as ``NA``/``NaN``. If ``dict`` passed, specific
|
203 | 205 | per-column ``NA`` values. By default the following values are interpreted as
|
204 | 206 | ``NaN``: '"""
|
|
227 | 229 | Indicate number of ``NA`` values placed in non-numeric columns.
|
228 | 230 | skip_blank_lines : bool, default True
|
229 | 231 | If ``True``, skip over blank lines rather than interpreting as ``NaN`` values.
|
230 |
| -parse_dates : bool or list of int or names or list of lists or dict, \ |
| 232 | +parse_dates : bool, list of Hashable, list of lists or dict of {{Hashable : list}}, \ |
231 | 233 | default False
|
232 | 234 | The behavior is as follows:
|
233 | 235 |
|
|
258 | 260 | keep_date_col : bool, default False
|
259 | 261 | If ``True`` and ``parse_dates`` specifies combining multiple columns then
|
260 | 262 | keep the original columns.
|
261 |
| -date_parser : function, optional |
| 263 | +date_parser : Callable, optional |
262 | 264 | Function to use for converting a sequence of string columns to an array of
|
263 | 265 | ``datetime`` instances. The default uses ``dateutil.parser.parser`` to do the
|
264 | 266 | conversion. ``pandas`` will try to call ``date_parser`` in three different ways,
|
|
273 | 275 | Use ``date_format`` instead, or read in as ``object`` and then apply
|
274 | 276 | :func:`~pandas.to_datetime` as-needed.
|
275 | 277 | date_format : str or dict of column -> format, optional
|
276 |
| - If used in conjunction with ``parse_dates``, will parse dates according to this |
277 |
| - format. For anything more complex, |
278 |
| - please read in as ``object`` and then apply :func:`~pandas.to_datetime` as-needed. |
| 278 | + Format to use for parsing dates when used in conjunction with ``parse_dates``. |
| 279 | + For anything more complex, please read in as ``object`` and then apply |
| 280 | + :func:`~pandas.to_datetime` as-needed. |
279 | 281 |
|
280 | 282 | .. versionadded:: 2.0.0
|
281 | 283 | dayfirst : bool, default False
|
|
306 | 308 |
|
307 | 309 | .. versionchanged:: 1.4.0 Zstandard support.
|
308 | 310 |
|
309 |
| -thousands : str, optional |
310 |
| - Thousands separator. |
311 |
| -decimal : str, default '.' |
312 |
| - Character to recognize as decimal point (e.g. use ',' for European data). |
| 311 | +thousands : str (length 1), optional |
| 312 | + Character acting as the thousands separator in numerical values. |
| 313 | +decimal : str (length 1), default '.' |
| 314 | + Character to recognize as decimal point (e.g., use ',' for European data). |
313 | 315 | lineterminator : str (length 1), optional
|
314 |
| - Character to break file into lines. Only valid with C parser. |
| 316 | + Character used to denote a line break. Only valid with C parser. |
315 | 317 | quotechar : str (length 1), optional
|
316 |
| - The character used to denote the start and end of a quoted item. Quoted |
| 318 | + Character used to denote the start and end of a quoted item. Quoted |
317 | 319 | items can include the ``delimiter`` and it will be ignored.
|
318 |
| -quoting : int or csv.QUOTE_* instance, default 0 |
319 |
| - Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of |
320 |
| - ``QUOTE_MINIMAL`` (0), ``QUOTE_ALL`` (1), ``QUOTE_NONNUMERIC`` (2) or |
321 |
| - ``QUOTE_NONE`` (3). |
| 320 | +quoting : {{0 or csv.QUOTE_MINIMAL, 1 or csv.QUOTE_ALL, 2 or csv.QUOTE_NONNUMERIC, \ |
| 321 | +3 or csv.QUOTE_NONE}}, default csv.QUOTE_MINIMAL |
| 322 | + Control field quoting behavior per ``csv.QUOTE_*`` constants. Default is |
| 323 | + ``csv.QUOTE_MINIMAL`` (i.e., 0) which implies that only fields containing special |
| 324 | + characters are quoted (e.g., characters defined in ``quotechar``, ``delimiter``, |
| 325 | + or ``lineterminator``. |
322 | 326 | doublequote : bool, default True
|
323 | 327 | When ``quotechar`` is specified and ``quoting`` is not ``QUOTE_NONE``, indicate
|
324 | 328 | whether or not to interpret two consecutive ``quotechar`` elements INSIDE a
|
325 | 329 | field as a single ``quotechar`` element.
|
326 | 330 | escapechar : str (length 1), optional
|
327 |
| - One-character string used to escape other characters. |
328 |
| -comment : str, optional |
329 |
| - Indicates remainder of line should not be parsed. If found at the beginning |
| 331 | + Character used to escape other characters. |
| 332 | +comment : str (length 1), optional |
| 333 | + Character indicating that the remainder of line should not be parsed. |
| 334 | + If found at the beginning |
330 | 335 | of a line, the line will be ignored altogether. This parameter must be a
|
331 | 336 | single character. Like empty lines (as long as ``skip_blank_lines=True``),
|
332 | 337 | fully commented lines are ignored by the parameter ``header`` but not by
|
333 | 338 | ``skiprows``. For example, if ``comment='#'``, parsing
|
334 | 339 | ``#empty\\na,b,c\\n1,2,3`` with ``header=0`` will result in ``'a,b,c'`` being
|
335 | 340 | treated as the header.
|
336 |
| -encoding : str, optional, default "utf-8" |
| 341 | +encoding : str, optional, default 'utf-8' |
337 | 342 | Encoding to use for UTF when reading/writing (ex. ``'utf-8'``). `List of Python
|
338 | 343 | standard encodings
|
339 | 344 | <https://docs.python.org/3/library/codecs.html#standard-encodings>`_ .
|
340 | 345 |
|
341 | 346 | .. versionchanged:: 1.2
|
342 | 347 |
|
343 |
| - When ``encoding`` is ``None``, ``errors="replace"`` is passed to |
344 |
| - ``open()``. Otherwise, ``errors="strict"`` is passed to ``open()``. |
345 |
| - This behavior was previously only the case for ``engine="python"``. |
| 348 | + When ``encoding`` is ``None``, ``errors='replace'`` is passed to |
| 349 | + ``open()``. Otherwise, ``errors='strict'`` is passed to ``open()``. |
| 350 | + This behavior was previously only the case for ``engine='python'``. |
346 | 351 |
|
347 | 352 | .. versionchanged:: 1.3.0
|
348 | 353 |
|
349 | 354 | ``encoding_errors`` is a new argument. ``encoding`` has no longer an
|
350 | 355 | influence on how encoding errors are handled.
|
351 | 356 |
|
352 |
| -encoding_errors : str, optional, default "strict" |
| 357 | +encoding_errors : str, optional, default 'strict' |
353 | 358 | How encoding errors are treated. `List of possible values
|
354 | 359 | <https://docs.python.org/3/library/codecs.html#error-handlers>`_ .
|
355 | 360 |
|
|
361 | 366 | ``skipinitialspace``, ``quotechar``, and ``quoting``. If it is necessary to
|
362 | 367 | override values, a ``ParserWarning`` will be issued. See ``csv.Dialect``
|
363 | 368 | documentation for more details.
|
364 |
| -on_bad_lines : {{'error', 'warn', 'skip'}} or callable, default 'error' |
| 369 | +on_bad_lines : {{'error', 'warn', 'skip'}} or Callable, default 'error' |
365 | 370 | Specifies what to do upon encountering a bad line (a line with too many fields).
|
366 | 371 | Allowed values are :
|
367 | 372 |
|
|
379 | 384 | If the function returns ``None``, the bad line will be ignored.
|
380 | 385 | If the function returns a new ``list`` of strings with more elements than
|
381 | 386 | expected, a ``ParserWarning`` will be emitted while dropping extra elements.
|
382 |
| - Only supported when ``engine="python"`` |
| 387 | + Only supported when ``engine='python'`` |
383 | 388 |
|
384 | 389 | delim_whitespace : bool, default False
|
385 | 390 | Specifies whether or not whitespace (e.g. ``' '`` or ``'\\t'``) will be
|
386 |
| - used as the sep. Equivalent to setting ``sep='\\s+'``. If this option |
| 391 | + used as the ``sep`` delimiter. Equivalent to setting ``sep='\\s+'``. If this option |
387 | 392 | is set to ``True``, nothing should be passed in for the ``delimiter``
|
388 | 393 | parameter.
|
389 | 394 | low_memory : bool, default True
|
|
397 | 402 | If a filepath is provided for ``filepath_or_buffer``, map the file object
|
398 | 403 | directly onto memory and access the data directly from there. Using this
|
399 | 404 | option can improve performance because there is no longer any I/O overhead.
|
400 |
| -float_precision : str, optional |
| 405 | +float_precision : {{'high', 'legacy', 'round_trip'}}, optional |
401 | 406 | Specifies which converter the C engine should use for floating-point
|
402 | 407 | values. The options are ``None`` or ``'high'`` for the ordinary converter,
|
403 | 408 | ``'legacy'`` for the original lower precision ``pandas`` converter, and
|
|
409 | 414 |
|
410 | 415 | .. versionadded:: 1.2
|
411 | 416 |
|
412 |
| -dtype_backend : {{"numpy_nullable", "pyarrow"}}, defaults to NumPy backed DataFrame |
413 |
| - Which ``dtype_backend`` to use, e.g. whether a :class:`~pandas.DataFrame` should |
414 |
| - have NumPy arrays, nullable ``dtypes`` are used for all ``dtypes`` that have a |
415 |
| - nullable implementation when ``"numpy_nullable"`` is set, pyarrow is used for all |
416 |
| - dtypes if ``"pyarrow"`` is set. |
| 417 | +dtype_backend : {{'numpy_nullable', 'pyarrow'}}, defaults to NumPy backed DataFrame |
| 418 | + Back-end data type to use for the :class:`~pandas.DataFrame`. For |
| 419 | + ``'numpy_nullable'``, have NumPy arrays, nullable ``dtypes`` are used for all |
| 420 | + ``dtypes`` that have a |
| 421 | + nullable implementation when ``'numpy_nullable'`` is set, pyarrow is used for all |
| 422 | + dtypes if ``'pyarrow'`` is set. |
417 | 423 |
|
418 |
| - The ``dtype_backends`` are still experimential. |
| 424 | + The ``dtype_backends`` are still experimental. |
419 | 425 |
|
420 | 426 | .. versionadded:: 2.0
|
421 | 427 |
|
|
0 commit comments