Skip to content

API: Allow equality comparisons of Series with a categorical dtype and object type are allowed (GH8938) #8946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 4, 2014

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Nov 30, 2014

closes #8938

@jreback jreback added API Design Categorical Categorical Data Type labels Nov 30, 2014
@jreback jreback added this to the 0.15.2 milestone Nov 30, 2014
@jreback jreback added the Bug label Nov 30, 2014
@jreback
Copy link
Contributor Author

jreback commented Nov 30, 2014

@jorisvandenbossche
cc @shoyer
cc @JanSchulz
cc @immerr

?

@shoyer
Copy link
Member

shoyer commented Nov 30, 2014

Probably want to make sure != works, too.

@jreback
Copy link
Contributor Author

jreback commented Nov 30, 2014

@shoyer good point, fixed up.

self.assertTrue(((~(f==a)==(f!=a)).all()))

# non-equality is not comparable
self.assertRaises(TypeError, lambda: a < b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether the codepath is different, but just to make sure that this doesn't slip in:

self.assertRaises(TypeError, lambda: a > b)
self.assertRaises(TypeError, lambda: b > a)

@jankatins
Copy link
Contributor

This should also get some updates in categorical.rst. Want to get a patch?

@jreback
Copy link
Contributor Author

jreback commented Nov 30, 2014

@JanSchulz if you want to post a patch all ears! (for docs)....i'll fix the other in a minute

@jankatins
Copy link
Contributor

Patch is at #8952

@jankatins
Copy link
Contributor

What I started to wonder when I did the docs: if cat > scalar is allowed and cat == list also because it basically is doing a comparison of each line as if it was the scalar case, then by that logic, cat > list should also be allowed: each row in that comparison would treat the element from the list as a scalar.

On the other hand a scalar comparison with the categorical makes only sense if the scalar can be treated as a category (for any other value, it's basically a "not of the same type" comparison, which would raise on python3), so the scalar must be in categories and this should not work:

In[4]: df = pd.DataFrame({"a":[1,3,3,3,np.nan]})
In[6]: df["b"] = df.a.astype("category")
In[7]: df.b
Out[7]: 
0     1
1     3
2     3
3     3
4   NaN
Name: b, dtype: category
Categories (2, float64): [1 < 3]
In[8]: df.b > 2
Out[8]: 
0    False
1     True
2     True
3     True
4    False
Name: b, dtype: bool

@jreback
Copy link
Contributor Author

jreback commented Dec 2, 2014

@JanSchulz I can kind of buy that last example we would 'like' to work (as it seems natural), but it does violate the Categorical principles.

Ok, how about we merge this to fix the equalitiy inconsistency with merging then open a new issue for discussion about non-equal comparisons.

@shoyer
@jorisvandenbossche

?

@jreback
Copy link
Contributor Author

jreback commented Dec 3, 2014

@JanSchulz ok on this?

@jankatins
Copy link
Contributor

Jep, I'll open a new issue with that..

@jankatins
Copy link
Contributor

Oh, one more thing: according to that thought, df.b == 2 (-> The "equality" case) should also NOT work, because 2 is not in categories and therefore a "different type".

@jreback
Copy link
Contributor Author

jreback commented Dec 4, 2014

@JanSchulz gr8 thanks

@jreback
Copy link
Contributor Author

jreback commented Dec 4, 2014

In [5]: df.b==2
Out[5]: 
0    False
1    False
2    False
3    False
4    False
Name: b, dtype: bool

this is actually consistent (e.g. it returns False). On a comparison it shouldn't raise so this is a reasonable result. I think this is de-facto like the following and is useful.

In [7]: Series(['a','b','c'])==2
Out[7]: 
0    False
1    False
2    False
dtype: bool

jreback added a commit that referenced this pull request Dec 4, 2014
API: Allow equality comparisons of Series with a categorical dtype and object type are allowed (GH8938)
@jreback jreback merged commit 73c44a8 into pandas-dev:master Dec 4, 2014

.. ipython:: python

cat == cat_base2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JanSchulz @jreback This generates an error in the docs: TypeError: Categoricals can only be compared if 'categories' are the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to this (or can make rhs [2,2,4] (or whatever), just cannot be a categorical with different categories

In [14]: cat == np.array(cat_base2)
Out[14]: 
0    False
1     True
2    False
dtype: bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or change to cat_base instead of cat_base2 I suppose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: relax categorical equality when comparing against object
4 participants