-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Remove old warnings (plus some useless code) #18022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I guess we never put these deprecations in #6581 |
pandas/core/categorical.py
Outdated
if ordered is None: | ||
ordered = values.ordered | ||
if categories is None: | ||
categories = values.categories | ||
values = values.get_values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this get hit? its prob pretty inefficient (as converting to an array then back), we already catch the dtype conversion above (which is why you can remove this code). maybe try to remove this part as well?
# after 0.18/ in 2016 | ||
if (is_integer_dtype(values) and | ||
not is_integer_dtype(dtype.categories)): | ||
warn("Values and categories have different dtypes. Did you " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these get hit anywhere in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine with removing them. pls add a note in the deprecations removal section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
Codecov Report
@@ Coverage Diff @@
## master #18022 +/- ##
=========================================
Coverage ? 91.22%
=========================================
Files ? 163
Lines ? 50089
Branches ? 0
=========================================
Hits ? 45693
Misses ? 4396
Partials ? 0
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18022 +/- ##
=========================================
Coverage ? 91.22%
=========================================
Files ? 163
Lines ? 50089
Branches ? 0
=========================================
Hits ? 45693
Misses ? 4396
Partials ? 0
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18022 +/- ##
=========================================
Coverage ? 91.22%
=========================================
Files ? 163
Lines ? 50089
Branches ? 0
=========================================
Hits ? 45693
Misses ? 4396
Partials ? 0
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18022 +/- ##
==========================================
- Coverage 91.43% 91.39% -0.05%
==========================================
Files 163 163
Lines 50091 50083 -8
==========================================
- Hits 45800 45771 -29
- Misses 4291 4312 +21
Continue to review full report at Codecov.
|
754ced2
to
b20fbe5
Compare
pandas/core/categorical.py
Outdated
@@ -2243,6 +2219,12 @@ def _get_codes_for_values(values, categories): | |||
""" | |||
|
|||
from pandas.core.algorithms import _get_data_algo, _hashtables | |||
if is_categorical_dtype(values.dtype): | |||
codes = (values.cat.codes if isinstance(values, ABCSeries) | |||
else values.codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be a cleaner way... but I couldn't find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the question could be: why does this get passed values that are already categorical and don't need a factorize but a recode ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning that the check could also be done before (which I think is how it was before?)
something like
if dtype.categories is None:
codes, categories = ....
elif is_categorical_dtype(values):
.... handle this case
else:
codes = _get_codes_for_values(values, dtype.categories)
I personally would find that easier to follow the logic (and this does not necessarily mean you don't need the values.cat.codes
vs values.codes
..., so my comment went a bit sideways :-))
doc/source/whatsnew/v0.21.1.txt
Outdated
@@ -30,7 +30,7 @@ Other Enhancements | |||
Deprecations | |||
~~~~~~~~~~~~ | |||
|
|||
- | |||
- Warnings for deprecated initialization style of ``Categorical`` (``Categorical(codes, categories)``) have been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move to the "Removal of prior version deprecations/changes" section (in 0.22.0)
pandas/core/categorical.py
Outdated
categories = values.categories | ||
values = values.get_values() | ||
if dtype.categories is None: | ||
dtype = CategoricalDtype(values.categories, dtype.ordered) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some lines above, there is already a
elif is_categorical(values):
dtype = values.dtype._from_categorical_dtype(values.dtype,
categories, ordered)
which should catch the same ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... but instead testing dtype.categories
must be done here (that is, after all the branches of the above if...elif...else
). Viceversa, the dtype
must be defined before the fastpath
test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but help me understand in what case this line is needed and is doing something different as the already defined dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance the case in which values
is already a Categorical
, categories
is None
and dtype
is "category"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and in that case
elif is_categorical(values):
dtype = values.dtype._from_categorical_dtype(values.dtype,
categories, ordered)
will already have created an appropriate dtype
?
So I still don't understand why in such a case it would need to be re-determined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will already have created an appropriate dtype ?
No, precisely because categories
is None
. Not just in theory: you can actually try and verify that tests break.
It could be possible to first have a different set of checks which finds the right value for categories
, but I doubt it would result in simpler code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume this small example:
In [1]: values = pd.Categorical(['a', 'b', 'c', 'a'])
In [2]: values
Out[2]:
[a, b, c, a]
Categories (3, object): [a, b, c]
In [4]: categories = None
In [5]: dtype = 'category'
In [9]: ordered = None
In [10]: dtype = values.dtype._from_categorical_dtype(values.dtype, categories, ordered)
In [11]: dtype
Out[11]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)
So if you pass categorical values, it will by definition have categories (can a Categorical have categories of None?), and those will be passed to the resulting dtype
object returned from values.dtype._from_categorical_dtype
.
So still wondering, in what case can you have categorical values, but where the resulting dtype
from above still has dtype.categories is None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when removing that check, it is the case where a CategoricalDType
instance is passed that differs from the dtype of the values:
dtype = CategoricalDtype(None, ordered=True)
values = Categorical(['a', 'b', 'd'])
Categorical(values, dtype=dtype)
I would find that more logical to handle in the if dtype is not None:
path (to clearly see which takes precedence in such a case). But I assume that when fastpath=True
we don't want to check that. Although I think that if you use that, dtype.categories will never be None.
pandas/core/categorical.py
Outdated
@@ -2243,6 +2219,12 @@ def _get_codes_for_values(values, categories): | |||
""" | |||
|
|||
from pandas.core.algorithms import _get_data_algo, _hashtables | |||
if is_categorical_dtype(values.dtype): | |||
codes = (values.cat.codes if isinstance(values, ABCSeries) | |||
else values.codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the question could be: why does this get passed values that are already categorical and don't need a factorize but a recode ?
pandas/core/categorical.py
Outdated
@@ -2243,6 +2219,12 @@ def _get_codes_for_values(values, categories): | |||
""" | |||
|
|||
from pandas.core.algorithms import _get_data_algo, _hashtables | |||
if is_categorical_dtype(values.dtype): | |||
codes = (values.cat.codes if isinstance(values, ABCSeries) | |||
else values.codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning that the check could also be done before (which I think is how it was before?)
something like
if dtype.categories is None:
codes, categories = ....
elif is_categorical_dtype(values):
.... handle this case
else:
codes = _get_codes_for_values(values, dtype.categories)
I personally would find that easier to follow the logic (and this does not necessarily mean you don't need the values.cat.codes
vs values.codes
..., so my comment went a bit sideways :-))
b20fbe5
to
6df2ea0
Compare
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -60,7 +60,7 @@ Deprecations | |||
Removal of prior version deprecations/changes | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
|
|||
- | |||
- Warnings for deprecated initialization style of ``Categorical`` (``Categorical(codes, categories)``) have been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference the issue (this PR is good, we also like to reference the original issue if you can find it).
The warnings for construction of a
Categorical of the form (......) have been removed
@@ -294,22 +293,10 @@ def __init__(self, values, categories=None, ordered=None, dtype=None, | |||
|
|||
# sanitize input | |||
if is_categorical_dtype(values): | |||
if dtype.categories is None: | |||
dtype = CategoricalDtype(values.categories, dtype.ordered) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of construction actually hits this?
it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
what if dtype differs from the dtype of values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
Yes, it does... above (if dtype is not None
). That is: dtype
here is not necessarily the dtype
argument to the function.
what if dtype differs from the dtype of values?
Then it is given priority - rightly so, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can you add a comment to that effect here.
it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
is there a case which hits this where the passed dtype has a different ordered than values.ordered
? (one is False, one is True). do/should we check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not simply using dtype = values
at this point (values
is already a dtype)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not simply using dtype = values at this point (values is already a dtype)?
values
can have a dtype...
is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?
Sure, whenever you pass values with given categories, unordered, and dtype with the same categories (and possibly more), ordered. Anyway, I added a couple of comments.
pandas/core/categorical.py
Outdated
# we're inferring from values | ||
dtype = CategoricalDtype(categories, dtype.ordered) | ||
|
||
elif is_categorical_dtype(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a repeat of the first if?? (this one actually looks good), wondering why it is hitting twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a repeat, as here the actual 'codes' construction happens (before it was to check the categories)
@@ -361,20 +353,6 @@ def __init__(self, values, categories=None, ordered=None, dtype=None, | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some comments on the lines above that can be cleaned up (mentioning the checks for the warnings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
6df2ea0
to
3d8f173
Compare
@@ -294,22 +293,10 @@ def __init__(self, values, categories=None, ordered=None, dtype=None, | |||
|
|||
# sanitize input | |||
if is_categorical_dtype(values): | |||
if dtype.categories is None: | |||
dtype = CategoricalDtype(values.categories, dtype.ordered) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can you add a comment to that effect here.
it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
is there a case which hits this where the passed dtype has a different ordered than values.ordered
? (one is False, one is True). do/should we check this?
@@ -294,22 +293,10 @@ def __init__(self, values, categories=None, ordered=None, dtype=None, | |||
|
|||
# sanitize input | |||
if is_categorical_dtype(values): | |||
if dtype.categories is None: | |||
dtype = CategoricalDtype(values.categories, dtype.ordered) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not simply using dtype = values
at this point (values
is already a dtype)?
pandas/core/categorical.py
Outdated
pass | ||
|
||
else: | ||
elif not isinstance(values, (ABCIndexClass, ABCSeries)): | ||
|
||
# on numpy < 1.6 datetimelike get inferred to all i8 by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this commnet (about numpy 1.6); also I think we can eliminate this logic (here or antother PR) as we don't care about old numpy any longer
pandas/tests/test_categorical.py
Outdated
@@ -306,20 +306,19 @@ def f(): | |||
assert len(cat.codes) == 1 | |||
assert cat.codes[0] == 0 | |||
|
|||
# Catch old style constructor useage: two arrays, codes + categories | |||
# We can only catch two cases: | |||
# Catches - now disabled - for old style constructor useage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the 'Catches - now disabled', this is not longer relevant to a reader of the code
3b335f7
to
79ea551
Compare
I tried to remove the call to |
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -60,7 +60,7 @@ Deprecations | |||
Removal of prior version deprecations/changes | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
|
|||
- | |||
- The warnings for construction of a ``Categorical`` in the form ``Categorical(codes, categories)`` have been removed (:issue:`8074`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just tweak this a bit, saying that when codes is an integer dtype; say this is in favor of Categorical.from_codes
, otherwise lgtm. ping on green.
as far as #18022 (comment) |
79ea551
to
27978f2
Compare
27978f2
to
56656e7
Compare
@jreback ping |
thanks @toobaz nice PR! |
git diff upstream/master -u -- "*.py" | flake8 --diff
This removes some warnings which @jreback suggested to split from #17934 , plus some other useless, probably obsolete, lines of code.