Skip to content

Remove old warnings (plus some useless code) #18022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 12, 2017

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Oct 29, 2017

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

This removes some warnings which @jreback suggested to split from #17934 , plus some other useless, probably obsolete, lines of code.

@jreback
Copy link
Contributor

jreback commented Oct 29, 2017

I guess we never put these deprecations in #6581

if ordered is None:
ordered = values.ordered
if categories is None:
categories = values.categories
values = values.get_values()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this get hit? its prob pretty inefficient (as converting to an array then back), we already catch the dtype conversion above (which is why you can remove this code). maybe try to remove this part as well?

# after 0.18/ in 2016
if (is_integer_dtype(values) and
not is_integer_dtype(dtype.categories)):
warn("Values and categories have different dtypes. Did you "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these get hit anywhere in the tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with removing them. pls add a note in the deprecations removal section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)

@jreback jreback added Categorical Categorical Data Type Deprecate Functionality to remove in pandas labels Oct 29, 2017
@codecov
Copy link

codecov bot commented Oct 29, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@a355ed2). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18022   +/-   ##
=========================================
  Coverage          ?   91.22%           
=========================================
  Files             ?      163           
  Lines             ?    50089           
  Branches          ?        0           
=========================================
  Hits              ?    45693           
  Misses            ?     4396           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.03% <100%> (?)
#single 40.25% <50%> (?)
Impacted Files Coverage Δ
pandas/core/categorical.py 95.74% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a355ed2...754ced2. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 29, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@a355ed2). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18022   +/-   ##
=========================================
  Coverage          ?   91.22%           
=========================================
  Files             ?      163           
  Lines             ?    50089           
  Branches          ?        0           
=========================================
  Hits              ?    45693           
  Misses            ?     4396           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.03% <100%> (?)
#single 40.25% <50%> (?)
Impacted Files Coverage Δ
pandas/core/categorical.py 95.74% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a355ed2...754ced2. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 29, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@a355ed2). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18022   +/-   ##
=========================================
  Coverage          ?   91.22%           
=========================================
  Files             ?      163           
  Lines             ?    50089           
  Branches          ?        0           
=========================================
  Hits              ?    45693           
  Misses            ?     4396           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.03% <100%> (?)
#single 40.25% <50%> (?)
Impacted Files Coverage Δ
pandas/core/categorical.py 95.74% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a355ed2...754ced2. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 29, 2017

Codecov Report

Merging #18022 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18022      +/-   ##
==========================================
- Coverage   91.43%   91.39%   -0.05%     
==========================================
  Files         163      163              
  Lines       50091    50083       -8     
==========================================
- Hits        45800    45771      -29     
- Misses       4291     4312      +21
Flag Coverage Δ
#multiple 89.19% <100%> (-0.03%) ⬇️
#single 40.36% <37.5%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/categorical.py 95.75% <100%> (-0.05%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96a5274...56656e7. Read the comment docs.

@toobaz toobaz force-pushed the remove_old_warnings branch from 754ced2 to b20fbe5 Compare October 30, 2017 08:56
@@ -2243,6 +2219,12 @@ def _get_codes_for_values(values, categories):
"""

from pandas.core.algorithms import _get_data_algo, _hashtables
if is_categorical_dtype(values.dtype):
codes = (values.cat.codes if isinstance(values, ABCSeries)
else values.codes)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be a cleaner way... but I couldn't find it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the question could be: why does this get passed values that are already categorical and don't need a factorize but a recode ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning that the check could also be done before (which I think is how it was before?)

something like

if dtype.categories is None:
   codes, categories = ....
elif is_categorical_dtype(values):
    .... handle this case
else:
    codes = _get_codes_for_values(values, dtype.categories)

I personally would find that easier to follow the logic (and this does not necessarily mean you don't need the values.cat.codes vs values.codes ..., so my comment went a bit sideways :-))

@@ -30,7 +30,7 @@ Other Enhancements
Deprecations
~~~~~~~~~~~~

-
- Warnings for deprecated initialization style of ``Categorical`` (``Categorical(codes, categories)``) have been removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move to the "Removal of prior version deprecations/changes" section (in 0.22.0)

categories = values.categories
values = values.get_values()
if dtype.categories is None:
dtype = CategoricalDtype(values.categories, dtype.ordered)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some lines above, there is already a

         elif is_categorical(values):
             dtype = values.dtype._from_categorical_dtype(values.dtype,
                                                          categories, ordered)

which should catch the same ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... but instead testing dtype.categories must be done here (that is, after all the branches of the above if...elif...else). Viceversa, the dtype must be defined before the fastpath test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but help me understand in what case this line is needed and is doing something different as the already defined dtype

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance the case in which values is already a Categorical, categories is None and dtype is "category".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in that case

         elif is_categorical(values):
             dtype = values.dtype._from_categorical_dtype(values.dtype,
                                                          categories, ordered)

will already have created an appropriate dtype ?
So I still don't understand why in such a case it would need to be re-determined.

Copy link
Member Author

@toobaz toobaz Oct 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will already have created an appropriate dtype ?

No, precisely because categories is None. Not just in theory: you can actually try and verify that tests break.
It could be possible to first have a different set of checks which finds the right value for categories, but I doubt it would result in simpler code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume this small example:

In [1]: values = pd.Categorical(['a', 'b', 'c', 'a'])

In [2]: values
Out[2]: 
[a, b, c, a]
Categories (3, object): [a, b, c]

In [4]: categories = None

In [5]: dtype = 'category'

In [9]: ordered = None

In [10]: dtype = values.dtype._from_categorical_dtype(values.dtype, categories, ordered)

In [11]: dtype
Out[11]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)

So if you pass categorical values, it will by definition have categories (can a Categorical have categories of None?), and those will be passed to the resulting dtype object returned from values.dtype._from_categorical_dtype.
So still wondering, in what case can you have categorical values, but where the resulting dtype from above still has dtype.categories is None?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when removing that check, it is the case where a CategoricalDType instance is passed that differs from the dtype of the values:

dtype = CategoricalDtype(None, ordered=True)
values = Categorical(['a', 'b', 'd'])
Categorical(values, dtype=dtype)

I would find that more logical to handle in the if dtype is not None: path (to clearly see which takes precedence in such a case). But I assume that when fastpath=True we don't want to check that. Although I think that if you use that, dtype.categories will never be None.

@@ -2243,6 +2219,12 @@ def _get_codes_for_values(values, categories):
"""

from pandas.core.algorithms import _get_data_algo, _hashtables
if is_categorical_dtype(values.dtype):
codes = (values.cat.codes if isinstance(values, ABCSeries)
else values.codes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the question could be: why does this get passed values that are already categorical and don't need a factorize but a recode ?

@@ -2243,6 +2219,12 @@ def _get_codes_for_values(values, categories):
"""

from pandas.core.algorithms import _get_data_algo, _hashtables
if is_categorical_dtype(values.dtype):
codes = (values.cat.codes if isinstance(values, ABCSeries)
else values.codes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning that the check could also be done before (which I think is how it was before?)

something like

if dtype.categories is None:
   codes, categories = ....
elif is_categorical_dtype(values):
    .... handle this case
else:
    codes = _get_codes_for_values(values, dtype.categories)

I personally would find that easier to follow the logic (and this does not necessarily mean you don't need the values.cat.codes vs values.codes ..., so my comment went a bit sideways :-))

@toobaz toobaz force-pushed the remove_old_warnings branch from b20fbe5 to 6df2ea0 Compare October 30, 2017 14:37
@@ -60,7 +60,7 @@ Deprecations
Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-
- Warnings for deprecated initialization style of ``Categorical`` (``Categorical(codes, categories)``) have been removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference the issue (this PR is good, we also like to reference the original issue if you can find it).

The warnings for construction of a Categorical of the form (......) have been removed

@@ -294,22 +293,10 @@ def __init__(self, values, categories=None, ordered=None, dtype=None,

# sanitize input
if is_categorical_dtype(values):
if dtype.categories is None:
dtype = CategoricalDtype(values.categories, dtype.ordered)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of construction actually hits this?

it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
what if dtype differs from the dtype of values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?

Yes, it does... above (if dtype is not None). That is: dtype here is not necessarily the dtype argument to the function.

what if dtype differs from the dtype of values?

Then it is given priority - rightly so, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can you add a comment to that effect here.

it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?

is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not simply using dtype = values at this point (values is already a dtype)?

Copy link
Member Author

@toobaz toobaz Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not simply using dtype = values at this point (values is already a dtype)?

values can have a dtype...

is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?

Sure, whenever you pass values with given categories, unordered, and dtype with the same categories (and possibly more), ordered. Anyway, I added a couple of comments.

# we're inferring from values
dtype = CategoricalDtype(categories, dtype.ordered)

elif is_categorical_dtype(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a repeat of the first if?? (this one actually looks good), wondering why it is hitting twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a repeat, as here the actual 'codes' construction happens (before it was to check the categories)

@@ -361,20 +353,6 @@ def __init__(self, values, categories=None, ordered=None, dtype=None,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some comments on the lines above that can be cleaned up (mentioning the checks for the warnings)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)

@toobaz toobaz force-pushed the remove_old_warnings branch from 6df2ea0 to 3d8f173 Compare November 10, 2017 18:46
@@ -294,22 +293,10 @@ def __init__(self, values, categories=None, ordered=None, dtype=None,

# sanitize input
if is_categorical_dtype(values):
if dtype.categories is None:
dtype = CategoricalDtype(values.categories, dtype.ordered)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can you add a comment to that effect here.

it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?

is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?

@@ -294,22 +293,10 @@ def __init__(self, values, categories=None, ordered=None, dtype=None,

# sanitize input
if is_categorical_dtype(values):
if dtype.categories is None:
dtype = CategoricalDtype(values.categories, dtype.ordered)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not simply using dtype = values at this point (values is already a dtype)?

pass

else:
elif not isinstance(values, (ABCIndexClass, ABCSeries)):

# on numpy < 1.6 datetimelike get inferred to all i8 by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this commnet (about numpy 1.6); also I think we can eliminate this logic (here or antother PR) as we don't care about old numpy any longer

@@ -306,20 +306,19 @@ def f():
assert len(cat.codes) == 1
assert cat.codes[0] == 0

# Catch old style constructor useage: two arrays, codes + categories
# We can only catch two cases:
# Catches - now disabled - for old style constructor useage:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the 'Catches - now disabled', this is not longer relevant to a reader of the code

@toobaz toobaz force-pushed the remove_old_warnings branch 2 times, most recently from 3b335f7 to 79ea551 Compare November 11, 2017 00:42
@toobaz
Copy link
Member Author

toobaz commented Nov 11, 2017

I tried to remove the call to maybe_infer_to_datetimelike, but some tests break. In another PR I can see if we can fix this inside _sanitize_array.

@@ -60,7 +60,7 @@ Deprecations
Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-
- The warnings for construction of a ``Categorical`` in the form ``Categorical(codes, categories)`` have been removed (:issue:`8074`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tweak this a bit, saying that when codes is an integer dtype; say this is in favor of Categorical.from_codes, otherwise lgtm. ping on green.

@jreback jreback added this to the 0.22.0 milestone Nov 11, 2017
@jreback
Copy link
Contributor

jreback commented Nov 11, 2017

as far as #18022 (comment)
e.g. trying to remove some sanitizing code, would take that as a followup

@toobaz toobaz force-pushed the remove_old_warnings branch from 79ea551 to 27978f2 Compare November 11, 2017 18:13
@toobaz toobaz force-pushed the remove_old_warnings branch from 27978f2 to 56656e7 Compare November 12, 2017 12:23
@toobaz
Copy link
Member Author

toobaz commented Nov 12, 2017

@jreback ping

@jreback jreback merged commit aebe2a9 into pandas-dev:master Nov 12, 2017
@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

thanks @toobaz nice PR!

@toobaz toobaz deleted the remove_old_warnings branch November 12, 2017 16:02
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants