Skip to content

Commit b3b166a

Browse files
gfyoungjreback
authored andcommitted
BUG, DOC: Allow custom line terminator with delim_whitespace=True
Title is self-explanatory. Closes #12912. Author: gfyoung <[email protected]> Closes #12939 from gfyoung/delim-whitespace-fix and squashes the following commits: 78cf922 [gfyoung] MAINT: Refactor C engine tokenizing 62d6260 [gfyoung] BUG: Parse custom terminator with whitespace delimiter fdbc768 [gfyoung] DOC: Add documentation for delim_whitespace
1 parent 1617244 commit b3b166a

File tree

5 files changed

+218
-769
lines changed

5 files changed

+218
-769
lines changed

doc/source/io.rst

+6
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,12 @@ sep : str, defaults to ``','`` for :func:`read_csv`, ``\t`` for :func:`read_tabl
9797
Regex example: ``'\\r\\t'``.
9898
delimiter : str, default ``None``
9999
Alternative argument name for sep.
100+
delim_whitespace : boolean, default False
101+
Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``)
102+
will be used as the delimiter. Equivalent to setting ``sep='\+s'``.
103+
If this option is set to True, nothing should be passed in for the
104+
``delimiter`` parameter. This parameter is currently supported for
105+
the C parser only.
100106

101107
Column and Index Locations and Names
102108
++++++++++++++++++++++++++++++++++++

doc/source/whatsnew/v0.18.1.txt

+1
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,7 @@ Bug Fixes
302302
- Bug in ``value_counts`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)
303303
- Bug in ``Panel.fillna()`` ignoring ``inplace=True`` (:issue:`12633`)
304304
- Bug in ``read_csv`` when specifying ``names``, ```usecols``, and ``parse_dates`` simultaneously with the C engine (:issue:`9755`)
305+
- Bug in ``read_csv`` when specifying ``delim_whitespace=True`` and ``lineterminator`` simultaneously with the C engine (:issue:`12912`)
305306
- Bug in ``Series.rename``, ``DataFrame.rename`` and ``DataFrame.rename_axis`` not treating ``Series`` as mappings to relabel (:issue:`12623`).
306307
- Clean in ``.rolling.min`` and ``.rolling.max`` to enhance dtype handling (:issue:`12373`)
307308
- Bug in ``groupby`` where complex types are coerced to float (:issue:`12902`)

pandas/io/parsers.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,13 @@
5151
file. For file URLs, a host is expected. For instance, a local file could
5252
be file ://localhost/path/to/table.csv
5353
%s
54-
delimiter : str, default None
54+
delimiter : str, default ``None``
5555
Alternative argument name for sep.
56+
delim_whitespace : boolean, default False
57+
Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``) will be
58+
used as the sep. Equivalent to setting ``sep='\+s'``. If this option
59+
is set to True, nothing should be passed in for the ``delimiter``
60+
parameter. This parameter is currently supported for the C parser only.
5661
header : int or list of ints, default 'infer'
5762
Row number(s) to use as the column names, and the start of the data.
5863
Default behavior is as if set to 0 if no ``names`` passed, otherwise

pandas/io/tests/test_parsers.py

+9
Original file line numberDiff line numberDiff line change
@@ -3878,6 +3878,15 @@ def test_buffer_rd_bytes(self):
38783878
except Exception as e:
38793879
pass
38803880

3881+
def test_delim_whitespace_custom_terminator(self):
3882+
# See gh-12912
3883+
data = """a b c~1 2 3~4 5 6~7 8 9"""
3884+
df = self.read_csv(StringIO(data), lineterminator='~',
3885+
delim_whitespace=True)
3886+
expected = DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
3887+
columns=['a', 'b', 'c'])
3888+
tm.assert_frame_equal(df, expected)
3889+
38813890

38823891
class TestCParserHighMemory(CParserTests, CompressionTests, tm.TestCase):
38833892
engine = 'c'

0 commit comments

Comments
 (0)