Skip to content

Pandas 0.25.0 breaks np.isin #31080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lamourj opened this issue Jan 16, 2020 · 4 comments
Open

Pandas 0.25.0 breaks np.isin #31080

lamourj opened this issue Jan 16, 2020 · 4 comments
Labels
Bug Compat pandas objects compatability with Numpy or Python functions

Comments

@lamourj
Copy link

lamourj commented Jan 16, 2020

Observed with numpy 1.17.4 as well as (latest) 1.18.1:

# pandas 0.24.2:
l = [pd.Timestamp]
pd.Timestamp == pd.Timestamp
>>> True
np.isin(l, l)
>>> array([ True])
# pandas 0.25.0
l = [pd.Timestamp]
pd.Timestamp == pd.Timestamp
>>> True
np.isin(l, l)
>>> array([ False])

Problem description

Since 0.25.0, the == of pd.Timestamp is preserved but it doesn't go through np.isin.
This is observed as well under pandas 0.25.3.

@jreback
Copy link
Contributor

jreback commented Jan 17, 2020

i suppose; we have literally 0 support for this now

welcome to have a PR which patches with tests

@gfyoung gfyoung added the Compat pandas objects compatability with Numpy or Python functions label Jan 19, 2020
@lamourj
Copy link
Author

lamourj commented Jan 28, 2020

Thanks for the answer. Any ideas how you would approach this ?

@jreback
Copy link
Contributor

jreback commented Jan 28, 2020

i have no idea what np.isin actually does with non numpy types; not even sure why you would want to do this; Series.isin is well supported, tested and type aware

we do have numpy ufunc compatibility but don’t know how np.isin behaves

@mroeschke mroeschke added the Bug label Apr 10, 2020
@simonjayhawkins
Copy link
Member

simonjayhawkins commented Jul 18, 2020

On a quick investigation it does look like a pandas issue.

>>> np.array([pd.Timestamp]) == pd.Timestamp
False
>>>
>>> np.array([object]) == object
array([ True])
>>>
>>> np.array([5]) == 5
array([ True])
>>>
>>> import datetime
>>>
>>> np.array([datetime.datetime]) == datetime.datetime
array([ True])
>>>

i have no idea what np.isin actually does with non numpy types

it basically does..

>>> l = [pd.Timestamp]
>>>
>>> ar1 = np.asarray(l)
>>> ar1
array([<class 'pandas._libs.tslibs.timestamps.Timestamp'>], dtype=object)
>>>
>>> ar1 = np.asarray(ar1).ravel()
>>> ar1
array([<class 'pandas._libs.tslibs.timestamps.Timestamp'>], dtype=object)
>>>
>>> ar2 = np.asarray(l).ravel()
>>> ar2
array([<class 'pandas._libs.tslibs.timestamps.Timestamp'>], dtype=object)
>>>
>>> contains_object = ar1.dtype.hasobject or ar2.dtype.hasobject
>>> contains_object
True
>>>
>>> mask = np.zeros(len(ar1), dtype=bool)
>>> mask
array([False])
>>>
>>> for a in ar2:
...     mask |= ar1 == a
>>> mask
array([False])
>>>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

5 participants