Skip to content

Commit 210fea9

Browse files
bdrosen96jreback
authored andcommitted
ERR: csv parser exceptions will now bubble up
closes #13652 Author: Brett Rosen <[email protected]> Closes #13693 from bdrosen96/brett/dont_swallow_exc and squashes the following commits: 0efe18b [Brett Rosen] Address review comments 6ed3a2e [Brett Rosen] Flake e966c26 [Brett Rosen] Test case for patch, plus fix to not swallow exceptions
1 parent 634e95d commit 210fea9

File tree

5 files changed

+59
-7
lines changed

5 files changed

+59
-7
lines changed

doc/source/whatsnew/v0.19.0.txt

+5-6
Original file line numberDiff line numberDiff line change
@@ -309,15 +309,15 @@ Other enhancements
309309
- A function :func:`union_categorical` has been added for combining categoricals, see :ref:`Unioning Categoricals<categorical.union>` (:issue:`13361`)
310310
- ``Series`` has gained the properties ``.is_monotonic``, ``.is_monotonic_increasing``, ``.is_monotonic_decreasing``, similar to ``Index`` (:issue:`13336`)
311311
- ``Series.append`` now supports the ``ignore_index`` option (:issue:`13677`)
312-
- ``.to_stata()`` and ```StataWriter`` can now write variable labels to Stata dta files using a dictionary to make column names to labels (:issue:`13535`, :issue:`13536`)
312+
- ``.to_stata()`` and ``StataWriter`` can now write variable labels to Stata dta files using a dictionary to make column names to labels (:issue:`13535`, :issue:`13536`)
313313

314314
.. _whatsnew_0190.api:
315315

316316
API changes
317317
~~~~~~~~~~~
318318

319319

320-
- ``Index.reshape`` will raise a ``NotImplementedError`` exception when called (:issue: `12882`)
320+
- ``Index.reshape`` will raise a ``NotImplementedError`` exception when called (:issue:`12882`)
321321
- Non-convertible dates in an excel date column will be returned without conversion and the column will be ``object`` dtype, rather than raising an exception (:issue:`10001`)
322322
- ``eval``'s upcasting rules for ``float32`` types have been updated to be more consistent with NumPy's rules. New behavior will not upcast to ``float64`` if you multiply a pandas ``float32`` object by a scalar float64. (:issue:`12388`)
323323
- An ``UnsupportedFunctionCall`` error is now raised if NumPy ufuncs like ``np.mean`` are called on groupby or resample objects (:issue:`12811`)
@@ -330,7 +330,7 @@ API changes
330330
- ``__setitem__`` will no longer apply a callable rhs as a function instead of storing it. Call ``where`` directly to get the previous behavior. (:issue:`13299`)
331331
- Passing ``Period`` with multiple frequencies to normal ``Index`` now returns ``Index`` with ``object`` dtype (:issue:`13664`)
332332
- ``PeriodIndex.fillna`` with ``Period`` has different freq now coerces to ``object`` dtype (:issue:`13664`)
333-
333+
- More informative exceptions are passed through the csv parser. The exception type would now be the original exception type instead of ``CParserError``. (:issue:`13652`)
334334

335335
.. _whatsnew_0190.api.tolist:
336336

@@ -595,7 +595,6 @@ Deprecations
595595

596596
Removal of prior version deprecations/changes
597597
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
598-
599598
- The ``pd.sandbox`` module has been removed in favor of the external library ``pandas-qt`` (:issue:`13670`)
600599
- ``DataFrame.to_csv()`` has dropped the ``engine`` parameter, as was deprecated in 0.17.1 (:issue:`11274`, :issue:`13419`)
601600
- ``DataFrame.to_dict()`` has dropped the ``outtype`` parameter in favor of ``orient`` (:issue:`13627`, :issue:`8486`)
@@ -689,8 +688,8 @@ Bug Fixes
689688
- Bug in ``pd.read_csv()`` with ``engine='python'`` when reading from a ``tempfile.TemporaryFile`` on Windows with Python 3 (:issue:`13398`)
690689
- Bug in ``pd.read_csv()`` that prevents ``usecols`` kwarg from accepting single-byte unicode strings (:issue:`13219`)
691690
- Bug in ``pd.read_csv()`` that prevents ``usecols`` from being an empty set (:issue:`13402`)
692-
- Bug in ``pd.read_csv()`` with ``engine=='c'`` in which null ``quotechar`` was not accepted even though ``quoting`` was specified as ``None`` (:issue:`13411`)
693-
- Bug in ``pd.read_csv()`` with ``engine=='c'`` in which fields were not properly cast to float when quoting was specified as non-numeric (:issue:`13411`)
691+
- Bug in ``pd.read_csv()`` with ``engine='c'`` in which null ``quotechar`` was not accepted even though ``quoting`` was specified as ``None`` (:issue:`13411`)
692+
- Bug in ``pd.read_csv()`` with ``engine='c'`` in which fields were not properly cast to float when quoting was specified as non-numeric (:issue:`13411`)
694693
- Bug in ``pd.pivot_table()`` where ``margins_name`` is ignored when ``aggfunc`` is a list (:issue:`13354`)
695694
- Bug in ``pd.Series.str.zfill``, ``center``, ``ljust``, ``rjust``, and ``pad`` when passing non-integers, did not raise ``TypeError`` (:issue:`13598`)
696695
- Bug in checking for any null objects in a ``TimedeltaIndex``, which always returned ``True`` (:issue:`13603`)

pandas/io/tests/parser/common.py

+22
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import csv
44
import os
55
import platform
6+
import codecs
67

78
import re
89
import sys
@@ -45,6 +46,27 @@ def test_empty_decimal_marker(self):
4546
with tm.assertRaisesRegexp(ValueError, msg):
4647
self.read_csv(StringIO(data), decimal='')
4748

49+
def test_bad_stream_exception(self):
50+
# Issue 13652:
51+
# This test validates that both python engine
52+
# and C engine will raise UnicodeDecodeError instead of
53+
# c engine raising CParserError and swallowing exception
54+
# that caused read to fail.
55+
handle = open(self.csv_shiftjs, "rb")
56+
codec = codecs.lookup("utf-8")
57+
utf8 = codecs.lookup('utf-8')
58+
# stream must be binary UTF8
59+
stream = codecs.StreamRecoder(
60+
handle, utf8.encode, utf8.decode, codec.streamreader,
61+
codec.streamwriter)
62+
if compat.PY3:
63+
msg = "'utf-8' codec can't decode byte"
64+
else:
65+
msg = "'utf8' codec can't decode byte"
66+
with tm.assertRaisesRegexp(UnicodeDecodeError, msg):
67+
self.read_csv(stream)
68+
stream.close()
69+
4870
def test_read_csv(self):
4971
if not compat.PY3:
5072
if compat.is_platform_windows():
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
num, text
2+
1,�T�E�����iSauron�A�A�C�k�A�̑n���̎� - ��O�I3019�N3��25���j�́AJ�ER�ER�E�g�[���L���̒����𕑑�Ƃ��������w�z�r�b�g�̖`���x�w�w�֕���x�w�V���}�����̕���x�̓o��l���B
3+
2,�w�z�r�b�g�̖`���x�Ɍ��y�̂���u���l����Ȃ��t�v�i�f��w�z�r�b�g�V���[�Y�x�̎����ł́u���l�����i�l�N���}���T�[�j�v�j�Ƃ͔ނ̂��Ƃł���B
4+
3,���̑��҂ł���w�w�֕���x�ɂ����Ắu��‚̎w�ցithe One Ring�j�v�̍���A�u�����iDark Lord�j�v�A�u���̎ҁithe One�j[1]�v�Ƃ��ēo�ꂷ��B�O�j�ɂ�����w�V���}�����̕���x�ł́A����̖��������S�X�̍ł��͂��鑤�߂ł������B
5+
4,�T�E�����͌����A�A���_�i�n���j�̑n����S�����V�g�I�푰�A�C�k�A�̈���ł��������A�僁���R�[���̔��t�ɉ��S���đ—����A�A���_�ɊQ���Ȃ����݂ƂȂ����B
6+
5,�u�T�E�����v�Ƃ̓N�E�F�����Łu�g�̖т̂悾�‚��́v�Ƃ����Ӗ��ł���A�V���_�����œ��l�̈Ӗ��ł��閼�O�u�S���T�E�A�v�ƌĂ΂�邱�Ƃ�����B
7+
6,�����́A�T�E����������A���݌������G���t�ɂ�閼�ł���A�w�w�֕���x�쒆�ɂ����ăA���S�����́u����i�T�E�����j�͎����̖{���̖��͎g��Ȃ����A��������ɏ���������ɏo�����肷�邱�Ƃ������Ȃ��v�Ɣ������Ă���B
8+
7,���̂ق��A���I�ɃG���t�ɑ΂��Ď��̂����Ƃ���閼�ɁA�u�A���i�^�[���i������N�j�v�A�u�A���^�m�i���M�ȍ׍H�t�j�v�A�u�A�E�����f�B���i�A�E���̉��l�j�v������B
9+
8,���I�̍��̃T�E�����́A���݂ɕϐg����\�͂������Ă����B
10+
9,���̔\�͂��g���Ό��ڗ킵�����h�ȊO���𑕂����Ƃ�A�܂�����ȘT��z����������Ƃ����������ɕς��邱�Ƃ��ł��A�G���t���狰���ꂽ�B
11+
10,���I�Ɉ�‚̎w�ւ����グ���T�E�����́A���̗͂̎w�ւŐ�����鎖���₻�̏��L�҂��x�z�ł���悤�ɂȂ����B
12+
11,�܂��A���̂��łтĂ��w�ւ�������艽�x�ł��h�邱�Ƃ��ł����B
13+
12,�������k�[���m�[���v���̍ۂɔ��������̂�j�󂳂ꂽ��́A��x�Ɣ������ϐg���邱�Ƃ͂ł��Ȃ��Ȃ�A���̈��ӂ̋�̂悤�Ȍ�������낵���p�����Ƃ�Ȃ��Ȃ����Ƃ����B
14+
13,�܂����΂��΁u�܂Ԃ��̂Ȃ��΂ɉ����ꂽ�ځv�Ƃ������S�ە\���ő�����ꂽ�B

pandas/io/tests/parser/test_parsers.py

+1
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ def setUp(self):
4444
self.csv1 = os.path.join(self.dirpath, 'test1.csv')
4545
self.csv2 = os.path.join(self.dirpath, 'test2.csv')
4646
self.xls1 = os.path.join(self.dirpath, 'test.xls')
47+
self.csv_shiftjs = os.path.join(self.dirpath, 'sauron.SHIFT_JIS.csv')
4748

4849

4950
class TestCParserHighMemory(BaseParser, CParserTests, tm.TestCase):

pandas/parser.pyx

+17-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ import warnings
1010
from csv import QUOTE_MINIMAL, QUOTE_NONNUMERIC, QUOTE_NONE
1111
from cpython cimport (PyObject, PyBytes_FromString,
1212
PyBytes_AsString, PyBytes_Check,
13-
PyUnicode_Check, PyUnicode_AsUTF8String)
13+
PyUnicode_Check, PyUnicode_AsUTF8String,
14+
PyErr_Occurred, PyErr_Fetch)
15+
from cpython.ref cimport PyObject, Py_XDECREF
1416
from io.common import CParserError, DtypeWarning, EmptyDataError
1517

1618

@@ -1878,6 +1880,20 @@ cdef kh_float64_t* kset_float64_from_list(values) except NULL:
18781880

18791881

18801882
cdef raise_parser_error(object base, parser_t *parser):
1883+
cdef:
1884+
object old_exc
1885+
PyObject *type
1886+
PyObject *value
1887+
PyObject *traceback
1888+
1889+
if PyErr_Occurred():
1890+
PyErr_Fetch(&type, &value, &traceback);
1891+
Py_XDECREF(type)
1892+
Py_XDECREF(traceback)
1893+
if value != NULL:
1894+
old_exc = <object> value
1895+
Py_XDECREF(value)
1896+
raise old_exc
18811897
message = '%s. C error: ' % base
18821898
if parser.error_msg != NULL:
18831899
if PY3:

0 commit comments

Comments
 (0)