Skip to content

API: comparisons of categorical data and (scalar or list-like) #8995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jankatins opened this issue Dec 4, 2014 · 3 comments
Closed

API: comparisons of categorical data and (scalar or list-like) #8995

jankatins opened this issue Dec 4, 2014 · 3 comments
Labels
API Design Bug Categorical Categorical Data Type Closing Candidate May be closeable, needs more eyeballs Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jankatins
Copy link
Contributor

From #8946:

If cat > scalar is allowed and cat == list also because it basically is doing a comparison of each line as if it was the scalar case, then by that logic, cat > list should also be allowed: each row in that comparison would treat the element from the list as a scalar.

On the other hand a scalar comparison with the categorical makes only sense if the scalar can be treated as a category (for any other value, it's basically a "not of the same type" comparison, which would raise on python3), so the scalar must be in categories and this should not work:

In[4]: df = pd.DataFrame({"a":[1,3,3,3,np.nan]})
In[6]: df["b"] = df.a.astype("category")
In[7]: df.b
Out[7]: 
0     1
1     3
2     3
3     3
4   NaN
Name: b, dtype: category
Categories (2, float64): [1 < 3]
In[8]: df.b > 2
Out[8]: 
0    False
1     True
2     True
3     True
4    False
Name: b, dtype: bool

Oh, one more thing: according to that thought, df.b == 2 (-> The "equality" case) should also NOT work, because 2 is not in categories and therefore a "different type".

Current code results in this:

In [5]: df.b==2
Out[5]: 
0    False
1    False
2    False
3    False
4    False
Name: b, dtype: bool

this is actually consistent (e.g. it returns False). On a comparison it shouldn't raise so this is a reasonable result. I think this is de-facto like the following and is useful.

In [7]: Series(['a','b','c'])==2
Out[7]: 
0    False
1    False
2    False
dtype: bool
@jankatins
Copy link
Contributor Author

@jreback: is this Series(['a','b','c'])==2 work on Py2 and Py3? If it is working on both I'm ok with that, but I think I remember that this raises on Py3 and I think we should model the API similar (= it should raise on Py3).

@jreback jreback added API Design Categorical Categorical Data Type labels Dec 4, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 4, 2014
@jreback
Copy link
Contributor

jreback commented Dec 4, 2014

Seems to work in py3 as well.

In [1]: Series(['a','b','c'])==2
Out[1]: 
0    False
1    False
2    False
dtype: bool

In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Darwin
OS-release: 13.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.1-84-gf1b270f

@jbrockmendel
Copy link
Member

AFAICT this works as expected. @jankatins can you confirm if this is still an issue?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Sep 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Categorical Categorical Data Type Closing Candidate May be closeable, needs more eyeballs Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

4 participants