-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: raise for invalid dtypes per issue #15520 #16047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/generic.py
Outdated
@@ -165,12 +166,15 @@ def _validate_dtype(self, dtype): | |||
|
|||
if dtype is not None: | |||
dtype = _coerce_to_dtype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think can just directly use pandas._dtype
here. In fact I think you can replace all usages of _coerce_to_dtype
with pandas_dtype
. (you might have to back off some if the tests don't pass), but can investigate those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean. This is a fairly generic routine.
In [1]: pandas.core.dtypes.common.pandas_dtype(object)
Out[1]: dtype('O')
In [2]: pandas.core.dtypes.common.pandas_dtype('object')
Out[2]: dtype('O')
@@ -13,6 +14,15 @@ | |||
|
|||
class TestPandasDtype(tm.TestCase): | |||
|
|||
# Passing invalid dtype, both as a string or object, must raise TypeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a list for checking these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to convert to using more pytest style would be graet as well (could be a followup as well), e.g.: http://pandas-docs.github.io/pandas-docs-travis/contributing.html#transitioning-to-pytest
pandas/core/dtypes/common.py
Outdated
@@ -737,5 +737,7 @@ def pandas_dtype(dtype): | |||
pass | |||
elif isinstance(dtype, ExtensionDtype): | |||
return dtype | |||
elif np.dtype(dtype).kind == 'O': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be more defensive
try:
dtype = np.dtype(dtype)
except (TypeError, ValueError):
raise informative msg here
if dtype.kind == 'O':
....
return dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.dtype
doesn't differentiate between [1] and [2]. Any thoughts on how to raise errors for [1] without breaking [2]? Since the change I made to pandas_dtype
would raise an error in both cases (pandas_dtype(pd.Timestamp)
and pandas_dtype(dtype='object')
).
[1]
>>> np.dtype(pd.Timestamp)
dtype('O')
[2]
>>> np.dtype('object')
dtype('O')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a few more tests cases, FYI (e.g. np.float64
, float64
, np.dtype('float64')
, IOW some objects and strings
maybe something like
if isinstance(dtype, (np.dtype, compat.string_types)):
return np.dtype(dtype)
raise TypeError
@jreback Updated tests as well as pandas_dtype() function as per your suggestions. |
cfcc4c2
to
a77ab66
Compare
1d5f4b7
to
1f553e2
Compare
Codecov Report
@@ Coverage Diff @@
## master #16047 +/- ##
==========================================
- Coverage 90.84% 90.84% -0.01%
==========================================
Files 159 159
Lines 50790 50794 +4
==========================================
+ Hits 46142 46143 +1
- Misses 4648 4651 +3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16047 +/- ##
==========================================
- Coverage 90.86% 90.83% -0.03%
==========================================
Files 159 159
Lines 50771 50779 +8
==========================================
- Hits 46133 46127 -6
- Misses 4638 4652 +14
Continue to review full report at Codecov.
|
@jreback Can this PR be reviewed please? |
pandas/tests/dtypes/test_common.py
Outdated
with tm.assertRaisesRegexp(TypeError, msg): | ||
pandas_dtype(dtype) | ||
|
||
valid_list = [object, 'float64', np.object_, list, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add np.dtype('object')
; list
is not a valid dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's wrong it should raise
list is not a valid dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I modify this particular test and remove the checks involving list
dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
add list to the invalid types checking
pandas/tests/dtypes/test_common.py
Outdated
@@ -13,6 +14,19 @@ | |||
|
|||
class TestPandasDtype(tm.TestCase): | |||
|
|||
# Passing invalid dtype, both as a string or object, must raise TypeError | |||
def test_invalid_dtype_error(self): | |||
msg = 'not understood' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number as a comment here
pandas/core/generic.py
Outdated
@@ -164,13 +164,15 @@ def _validate_dtype(self, dtype): | |||
""" validate the passed dtype """ | |||
|
|||
if dtype is not None: | |||
dtype = _coerce_to_dtype(dtype) | |||
# This would raise an error if an invalid dtype was passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment not necessary
pandas/core/dtypes/common.py
Outdated
np.dtype(dtype) | ||
except (TypeError, ValueError): | ||
raise | ||
if dtype in [object, np.object_, list]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list is not valid
pandas/core/dtypes/common.py
Outdated
@@ -731,11 +731,23 @@ def pandas_dtype(dtype): | |||
except TypeError: | |||
pass | |||
|
|||
elif dtype == 'object': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catch this lower (where you catch object, np.dtype('object')
pandas/core/dtypes/common.py
Outdated
raise | ||
if dtype in [object, np.object_, list]: | ||
return np.dtype(dtype) | ||
elif np.dtype(dtype).kind == 'O': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't np.dtype(dtype).kind == 'O'
catch object, np.object_, 'object'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback From what I have understood, any invalid dtype (such as pd.Timestamp
) should raise an error. np.dtype(invalid_type).kind = 0
for such objects. However, this will also catch some valid dtypes
such as object, np.object_ and 'object'
which we safeguard against by catching them earlier and returning np.dtype(valid_dtype)
before this condition is evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok add a comment then as this is a tricky case
1f553e2
to
72a53ff
Compare
pandas/core/dtypes/common.py
Outdated
@@ -737,5 +737,20 @@ def pandas_dtype(dtype): | |||
pass | |||
elif isinstance(dtype, ExtensionDtype): | |||
return dtype | |||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this in the else, it IS the else.
then do
try:
npdtype = np.dtype(dtype)
except ...
....
return npdtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you can replace usages of np.dtype(dtype)
with npdtype
pandas/tests/test_strings.py
Outdated
@@ -1210,7 +1210,6 @@ def test_empty_str_methods(self): | |||
empty_str = empty = Series(dtype=str) | |||
empty_int = Series(dtype=int) | |||
empty_bool = Series(dtype=bool) | |||
empty_list = Series(dtype=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, this is the same as Series(dtype=str)
, so just change that one to Series(dtype=object)
and done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, ping when pushed and green.
this is ok for 0.20.0, pls add a whatsnew note (bug fix section is fine). |
3aeee36
to
9d46787
Compare
Changes made to pandas_dtype() in pandas/core/dtypes/common.py
Changes made to series.py, generic.py in pandas/core and cast.py in pandas/core/dtypes
This change ensures that no error is raised when type 'object' is passed.
94a1f87
to
84ee765
Compare
@jreback Added the whatsnew note. |
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -1510,6 +1510,12 @@ Conversion | |||
- Bug in the return type of ``pd.unique`` on a ``Categorical``, which was returning an ndarray and not a ``Categorical`` (:issue:`15903`) | |||
- Bug in ``Index.to_series()`` where the index was not copied (and so mutating later would change the original), (:issue:`15949`) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just put it in conversion, no need to make a different section.
tm.assert_series_equal(empty_int, empty.str.find('a')) | ||
tm.assert_series_equal(empty_int, empty.str.rfind('a')) | ||
tm.assert_series_equal(empty_str, empty.str.pad(42)) | ||
tm.assert_series_equal(empty_str, empty.str.center(42)) | ||
tm.assert_series_equal(empty_list, empty.str.split('a')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add back tests with empty_str
so that all of the original functions are convered. don't duplicate tests though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back the tests after looking out for duplicates; all green now.
Changes made to: - test_empty_str_methods in test_strings.py - test_invalid_dtype_error in dtypes/test_common.py
84ee765
to
61e614e
Compare
1a35178
to
b3c2fbb
Compare
add a test for the original issue as well
|
@jreback Should I add the test in |
put in |
8148868
to
ae7ecb0
Compare
ae7ecb0
to
3646eb6
Compare
@jreback Anything else which is missing? |
thanks @analyticalmonk nice fixes! keem em coming! |
closes pandas-dev#15520 Author: Akash Tandon <[email protected]> Author: root <[email protected]> Author: analyticalmonk <[email protected]> Author: Akash Tandon <[email protected]> Closes pandas-dev#16047 from analyticalmonk/patch_for_15520 and squashes the following commits: 3646eb6 [analyticalmonk] TST: check for invalid dtype for Series constructor per GH15520 73d980a [Akash Tandon] Merge branch 'master' into patch_for_15520 b3c2fbb [root] BUG: Added 'O' to pandas_dtype's valid list c3699fb [root] DOC: added whatsnew entry for PR#16047 addressing GH15520 fbed5a6 [Akash Tandon] TST: Added list to invalid dtype ad9f345 [Akash Tandon] CLN: refactored code related to issue GH15520 a358181 [Akash Tandon] BUG: Added numpy.dtype_ to valid pandas_dtype() type list 3eaa432 [Akash Tandon] TST: Added numpy.object_ dtype to valid pandas_dtype list f858726 [Akash Tandon] style fix d4971cd [Akash Tandon] BUG: pandas_dtype() to raise error for invalid dtype per GH15520 ee0030f [Akash Tandon] TST: added more test-cases for pandas_dtype() test 3700259 [Akash Tandon] CLN: Replace _coerce_to_dtype() with pandas_dtype() c10e1d4 [Akash Tandon] TST: maintain list containing dtypes in TestPandasDtype fecba12 [Akash Tandon] BUG: Raise when invalid dtype passed to pandas_dtype 99fb660 [Akash Tandon] TST: wrote test representing bug fix result for pandas-dev#15520
closes #15520
git diff upstream/master --name-only -- '*.py' | flake8 --diff