Skip to content

Commit 28b4b01

Browse files
gfyoungjreback
authored andcommitted
BUG, COMPAT: Fix read_csv for multi-char sep and non-utf8 data in Python 2.x (#13812)
Closes gh-3404. [ci skip]
1 parent 12c8ce6 commit 28b4b01

File tree

3 files changed

+21
-0
lines changed

3 files changed

+21
-0
lines changed

doc/source/whatsnew/v0.19.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -788,3 +788,4 @@ Bug Fixes
788788
- Bugs in ``Index.difference`` and ``DataFrame.join`` raise in Python3 when using mixed-integer indexes (:issue:`13432`, :issue:`12814`)
789789

790790
- Bug in ``.to_excel()`` when DataFrame contains a MultiIndex which contains a label with a NaN value (:issue:`13511`)
791+
- Bug in ``pd.read_csv`` in Python 2.x with non-UTF8 encoded, multi-character separated data (:issue:`3404`)

pandas/io/parsers.py

+4
Original file line numberDiff line numberDiff line change
@@ -1871,6 +1871,10 @@ class MyDialect(csv.Dialect):
18711871
else:
18721872
def _read():
18731873
line = f.readline()
1874+
1875+
if compat.PY2 and self.encoding:
1876+
line = line.decode(self.encoding)
1877+
18741878
pat = re.compile(sep)
18751879
yield pat.split(line.strip())
18761880
for line in f:

pandas/io/tests/parser/python_parser_only.py

+16
Original file line numberDiff line numberDiff line change
@@ -201,3 +201,19 @@ def test_skipfooter_with_decimal(self):
201201
result = self.read_csv(StringIO(data), names=['a'],
202202
decimal='#', skipfooter=1)
203203
tm.assert_frame_equal(result, expected)
204+
205+
def test_encoding_non_utf8_multichar_sep(self):
206+
# see gh-3404
207+
expected = DataFrame({'a': [1], 'b': [2]})
208+
209+
for sep in ['::', '#####', '!!!', '123', '#1!c5',
210+
'%!c!d', '@@#4:2', '_!pd#_']:
211+
data = '1' + sep + '2'
212+
213+
for encoding in ['utf-16', 'utf-16-be', 'utf-16-le',
214+
'utf-32', 'cp037']:
215+
encoded_data = data.encode(encoding)
216+
result = self.read_csv(BytesIO(encoded_data),
217+
sep=sep, names=['a', 'b'],
218+
encoding=encoding)
219+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)