Skip to content

BUG: raise for invalid dtypes per issue #15520 #16047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

analyticalmonk
Copy link
Contributor

@analyticalmonk analyticalmonk commented Apr 18, 2017

closes #15520

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
  • whatsnew entry

@analyticalmonk analyticalmonk changed the title bug fix to raise for invalid dtypes #15520 bug fix to raise for invalid dtypes per issue #15520 Apr 18, 2017
@@ -165,12 +166,15 @@ def _validate_dtype(self, dtype):

if dtype is not None:
dtype = _coerce_to_dtype(dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think can just directly use pandas._dtype here. In fact I think you can replace all usages of _coerce_to_dtype with pandas_dtype. (you might have to back off some if the tests don't pass), but can investigate those.

Copy link
Contributor Author

@analyticalmonk analyticalmonk Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pandas_dtype() here will mean that [1] will throw an error. This has been used across various tests, and all of them will break. For example, a number of tests here.

[1] pd.Series(values, dtype=object)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean. This is a fairly generic routine.

In [1]: pandas.core.dtypes.common.pandas_dtype(object)
Out[1]: dtype('O')

In [2]: pandas.core.dtypes.common.pandas_dtype('object')
Out[2]: dtype('O')

@@ -13,6 +14,15 @@

class TestPandasDtype(tm.TestCase):

# Passing invalid dtype, both as a string or object, must raise TypeError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a list for checking these cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to convert to using more pytest style would be graet as well (could be a followup as well), e.g.: http://pandas-docs.github.io/pandas-docs-travis/contributing.html#transitioning-to-pytest

@@ -737,5 +737,7 @@ def pandas_dtype(dtype):
pass
elif isinstance(dtype, ExtensionDtype):
return dtype
elif np.dtype(dtype).kind == 'O':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be more defensive

try:
   dtype = np.dtype(dtype)
except (TypeError, ValueError):
   raise informative msg here
if dtype.kind == 'O':
    ....
return dtype

Copy link
Contributor Author

@analyticalmonk analyticalmonk Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.dtype doesn't differentiate between [1] and [2]. Any thoughts on how to raise errors for [1] without breaking [2]? Since the change I made to pandas_dtype would raise an error in both cases (pandas_dtype(pd.Timestamp) and pandas_dtype(dtype='object')).

[1]

>>> np.dtype(pd.Timestamp)
dtype('O')

[2]

>>> np.dtype('object')
dtype('O')

Copy link
Contributor

@jreback jreback Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a few more tests cases, FYI (e.g. np.float64, float64, np.dtype('float64'), IOW some objects and strings

maybe something like

if isinstance(dtype, (np.dtype, compat.string_types)):
    return np.dtype(dtype)
raise TypeError

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas labels Apr 18, 2017
@analyticalmonk
Copy link
Contributor Author

@jreback Updated tests as well as pandas_dtype() function as per your suggestions.

@analyticalmonk analyticalmonk force-pushed the patch_for_15520 branch 3 times, most recently from cfcc4c2 to a77ab66 Compare April 21, 2017 08:02
@analyticalmonk analyticalmonk changed the title bug fix to raise for invalid dtypes per issue #15520 BUG: raise for invalid dtypes per issue #15520 Apr 21, 2017
@codecov
Copy link

codecov bot commented Apr 21, 2017

Codecov Report

Merging #16047 into master will decrease coverage by <.01%.
The diff coverage is 80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16047      +/-   ##
==========================================
- Coverage   90.84%   90.84%   -0.01%     
==========================================
  Files         159      159              
  Lines       50790    50794       +4     
==========================================
+ Hits        46142    46143       +1     
- Misses       4648     4651       +3
Flag Coverage Δ
#multiple 88.62% <80%> (-0.01%) ⬇️
#single 40.35% <60%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/cast.py 86.89% <ø> (ø) ⬆️
pandas/core/series.py 94.97% <ø> (ø) ⬆️
pandas/core/generic.py 91.29% <100%> (ø) ⬆️
pandas/core/dtypes/common.py 92.76% <75%> (-0.87%) ⬇️
pandas/io/formats/format.py 95.02% <0%> (-0.03%) ⬇️
pandas/core/common.py 91.03% <0%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f16ca7...1f553e2. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 21, 2017

Codecov Report

Merging #16047 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16047      +/-   ##
==========================================
- Coverage   90.86%   90.83%   -0.03%     
==========================================
  Files         159      159              
  Lines       50771    50779       +8     
==========================================
- Hits        46133    46127       -6     
- Misses       4638     4652      +14
Flag Coverage Δ
#multiple 88.61% <100%> (-0.03%) ⬇️
#single 40.32% <66.66%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/common.py 93.24% <100%> (-0.39%) ⬇️
pandas/core/generic.py 91.29% <100%> (ø) ⬆️
pandas/core/series.py 94.98% <100%> (ø) ⬆️
pandas/core/dtypes/cast.py 86.89% <100%> (ø) ⬆️
pandas/plotting/_converter.py 63.54% <0%> (-1.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d122e6...3646eb6. Read the comment docs.

@analyticalmonk
Copy link
Contributor Author

@jreback Can this PR be reviewed please?

with tm.assertRaisesRegexp(TypeError, msg):
pandas_dtype(dtype)

valid_list = [object, 'float64', np.object_, list,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add np.dtype('object'); list is not a valid dtype.

Copy link
Contributor Author

@analyticalmonk analyticalmonk Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'list is not a valid dtype'

@jreback That would result in code such as Series(dtype=list) raising an error.
Among other things, this test would break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's wrong it should raise
list is not a valid dtype

Copy link
Contributor Author

@analyticalmonk analyticalmonk Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I modify this particular test and remove the checks involving list dtype?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

add list to the invalid types checking

@@ -13,6 +14,19 @@

class TestPandasDtype(tm.TestCase):

# Passing invalid dtype, both as a string or object, must raise TypeError
def test_invalid_dtype_error(self):
msg = 'not understood'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment here

@@ -164,13 +164,15 @@ def _validate_dtype(self, dtype):
""" validate the passed dtype """

if dtype is not None:
dtype = _coerce_to_dtype(dtype)
# This would raise an error if an invalid dtype was passed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment not necessary

np.dtype(dtype)
except (TypeError, ValueError):
raise
if dtype in [object, np.object_, list]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list is not valid

@@ -731,11 +731,23 @@ def pandas_dtype(dtype):
except TypeError:
pass

elif dtype == 'object':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch this lower (where you catch object, np.dtype('object')

raise
if dtype in [object, np.object_, list]:
return np.dtype(dtype)
elif np.dtype(dtype).kind == 'O':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't np.dtype(dtype).kind == 'O' catch object, np.object_, 'object'?

Copy link
Contributor Author

@analyticalmonk analyticalmonk Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback From what I have understood, any invalid dtype (such as pd.Timestamp) should raise an error. np.dtype(invalid_type).kind = 0 for such objects. However, this will also catch some valid dtypes such as object, np.object_ and 'object' which we safeguard against by catching them earlier and returning np.dtype(valid_dtype) before this condition is evaluated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok add a comment then as this is a tricky case

@@ -737,5 +737,20 @@ def pandas_dtype(dtype):
pass
elif isinstance(dtype, ExtensionDtype):
return dtype
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this in the else, it IS the else.
then do

try:
   npdtype = np.dtype(dtype)
except ...

....

return npdtype

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you can replace usages of np.dtype(dtype) with npdtype

@@ -1210,7 +1210,6 @@ def test_empty_str_methods(self):
empty_str = empty = Series(dtype=str)
empty_int = Series(dtype=int)
empty_bool = Series(dtype=bool)
empty_list = Series(dtype=list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this is the same as Series(dtype=str), so just change that one to Series(dtype=object) and done

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, ping when pushed and green.

@jreback
Copy link
Contributor

jreback commented Apr 24, 2017

this is ok for 0.20.0, pls add a whatsnew note (bug fix section is fine).

@jreback jreback added this to the 0.20.0 milestone Apr 24, 2017
@analyticalmonk
Copy link
Contributor Author

@jreback Added the whatsnew note.

@@ -1510,6 +1510,12 @@ Conversion
- Bug in the return type of ``pd.unique`` on a ``Categorical``, which was returning an ndarray and not a ``Categorical`` (:issue:`15903`)
- Bug in ``Index.to_series()`` where the index was not copied (and so mutating later would change the original), (:issue:`15949`)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just put it in conversion, no need to make a different section.

tm.assert_series_equal(empty_int, empty.str.find('a'))
tm.assert_series_equal(empty_int, empty.str.rfind('a'))
tm.assert_series_equal(empty_str, empty.str.pad(42))
tm.assert_series_equal(empty_str, empty.str.center(42))
tm.assert_series_equal(empty_list, empty.str.split('a'))
Copy link
Contributor

@jreback jreback Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add back tests with empty_str so that all of the original functions are convered. don't duplicate tests though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back the tests after looking out for duplicates; all green now.

Changes made to:
 - test_empty_str_methods in test_strings.py
 - test_invalid_dtype_error in dtypes/test_common.py
@analyticalmonk analyticalmonk force-pushed the patch_for_15520 branch 2 times, most recently from 1a35178 to b3c2fbb Compare April 25, 2017 07:27
@jreback
Copy link
Contributor

jreback commented Apr 25, 2017

add a test for the original issue as well

pd.Series([], name='time',dtype=pd.Timestamp)
which works (should raise)

@analyticalmonk
Copy link
Contributor Author

@jreback Should I add the test in core/tests/series/test_dtypes.py?

@jreback
Copy link
Contributor

jreback commented Apr 25, 2017

put in tests/series/test_constructors I think (this is an additional test), leave what you have

@analyticalmonk
Copy link
Contributor Author

@jreback Anything else which is missing?

@jreback jreback closed this in 61ca022 Apr 26, 2017
@jreback
Copy link
Contributor

jreback commented Apr 26, 2017

thanks @analyticalmonk

nice fixes!

keem em coming!

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017
closes pandas-dev#15520

Author: Akash Tandon <[email protected]>
Author: root <[email protected]>
Author: analyticalmonk <[email protected]>
Author: Akash Tandon <[email protected]>

Closes pandas-dev#16047 from analyticalmonk/patch_for_15520 and squashes the following commits:

3646eb6 [analyticalmonk] TST: check for invalid dtype for Series constructor per GH15520
73d980a [Akash Tandon] Merge branch 'master' into patch_for_15520
b3c2fbb [root] BUG: Added 'O' to pandas_dtype's valid list
c3699fb [root] DOC: added whatsnew entry for PR#16047 addressing GH15520
fbed5a6 [Akash Tandon] TST: Added list to invalid dtype
ad9f345 [Akash Tandon] CLN: refactored code related to issue GH15520
a358181 [Akash Tandon] BUG: Added numpy.dtype_ to valid pandas_dtype() type list
3eaa432 [Akash Tandon] TST: Added numpy.object_ dtype to valid pandas_dtype list
f858726 [Akash Tandon] style fix
d4971cd [Akash Tandon] BUG: pandas_dtype() to raise error for invalid dtype per GH15520
ee0030f [Akash Tandon] TST: added more test-cases for pandas_dtype() test
3700259 [Akash Tandon] CLN: Replace _coerce_to_dtype() with pandas_dtype()
c10e1d4 [Akash Tandon] TST: maintain list containing dtypes in TestPandasDtype
fecba12 [Akash Tandon] BUG: Raise when invalid dtype passed to pandas_dtype
99fb660 [Akash Tandon] TST: wrote test representing bug fix result for pandas-dev#15520
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: invalid dtypes should raise
2 participants