Skip to content

Commit 8a9e643

Browse files
gfyoungjreback
authored andcommitted
ENH: Python parser now accepts delim_whitespace=True
Title is self-explanatory. Author: gfyoung <[email protected]> Closes #12958 from gfyoung/delim-whitespace-python and squashes the following commits: 08da127 [gfyoung] ENH: Python parser now accepts delim_whitespace=True
1 parent ed324e8 commit 8a9e643

File tree

4 files changed

+172
-165
lines changed

4 files changed

+172
-165
lines changed

doc/source/io.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -101,8 +101,9 @@ delim_whitespace : boolean, default False
101101
Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``)
102102
will be used as the delimiter. Equivalent to setting ``sep='\+s'``.
103103
If this option is set to True, nothing should be passed in for the
104-
``delimiter`` parameter. This parameter is currently supported for
105-
the C parser only.
104+
``delimiter`` parameter.
105+
106+
.. versionadded:: 0.18.1 support for the Python parser.
106107

107108
Column and Index Locations and Names
108109
++++++++++++++++++++++++++++++++++++

doc/source/whatsnew/v0.18.1.txt

+1
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ Partial string indexing now matches on ``DateTimeIndex`` when part of a ``MultiI
7474
Other Enhancements
7575
^^^^^^^^^^^^^^^^^^
7676

77+
- ``pd.read_csv()`` now supports ``delim_whitespace=True`` for the Python engine (:issue:`12958`)
7778
- ``pd.read_csv()`` now supports opening ZIP files that contains a single CSV, via extension inference or explict ``compression='zip'`` (:issue:`12175`)
7879
- ``pd.read_csv()`` now supports opening files using xz compression, via extension inference or explicit ``compression='xz'`` is specified; ``xz`` compressions is also supported by ``DataFrame.to_csv`` in the same way (:issue:`11852`)
7980
- ``pd.read_msgpack()`` now always gives writeable ndarrays even when compression is used (:issue:`12359`).

pandas/io/parsers.py

+28-4
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,10 @@
5757
Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``) will be
5858
used as the sep. Equivalent to setting ``sep='\+s'``. If this option
5959
is set to True, nothing should be passed in for the ``delimiter``
60-
parameter. This parameter is currently supported for the C parser only.
60+
parameter.
61+
62+
.. versionadded:: 0.18.1 support for the Python parser.
63+
6164
header : int or list of ints, default 'infer'
6265
Row number(s) to use as the column names, and the start of the data.
6366
Default behavior is as if set to 0 if no ``names`` passed, otherwise
@@ -390,7 +393,20 @@ def _read(filepath_or_buffer, kwds):
390393
}
391394

392395
_c_unsupported = set(['skip_footer'])
393-
_python_unsupported = set(_c_parser_defaults.keys())
396+
_python_unsupported = set([
397+
'as_recarray',
398+
'na_filter',
399+
'compact_ints',
400+
'use_unsigned',
401+
'low_memory',
402+
'memory_map',
403+
'buffer_lines',
404+
'error_bad_lines',
405+
'warn_bad_lines',
406+
'dtype',
407+
'decimal',
408+
'float_precision',
409+
])
394410

395411

396412
def _make_parser_function(name, sep=','):
@@ -647,8 +663,13 @@ def _get_options_with_defaults(self, engine):
647663
value = kwds[argname]
648664

649665
if engine != 'c' and value != default:
650-
raise ValueError('The %r option is not supported with the'
651-
' %r engine' % (argname, engine))
666+
if ('python' in engine and
667+
argname not in _python_unsupported):
668+
pass
669+
else:
670+
raise ValueError(
671+
'The %r option is not supported with the'
672+
' %r engine' % (argname, engine))
652673
else:
653674
value = default
654675
options[argname] = value
@@ -691,6 +712,9 @@ def _clean_options(self, options, engine):
691712
" different from '\s+' are"\
692713
" interpreted as regex)"
693714
engine = 'python'
715+
elif delim_whitespace:
716+
if 'python' in engine:
717+
result['delimiter'] = '\s+'
694718

695719
if fallback_reason and engine_specified:
696720
raise ValueError(fallback_reason)

0 commit comments

Comments
 (0)