Skip to content

DEPR: Remove pandas.np #30296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
datapythonista opened this issue Dec 17, 2019 · 20 comments · Fixed by #30386
Closed

DEPR: Remove pandas.np #30296

datapythonista opened this issue Dec 17, 2019 · 20 comments · Fixed by #30386
Assignees
Labels
API Design Deprecate Functionality to remove in pandas good first issue
Milestone

Comments

@datapythonista
Copy link
Member

Not sure if it was added intentionally, but it's possible to call numpy with the np attribute of the pandas module:

import pandas
x = pandas.np.array([1, 2, 3])

While this is not documented, I've seen couple of places suggesting this as a "trick" to avoid importing numpy directly.

I personally find this hacky, and I think should be removed.

@datapythonista datapythonista added API Design Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action labels Dec 17, 2019
@jorisvandenbossche jorisvandenbossche added this to the 1.0 milestone Dec 17, 2019
@AlexKirko
Copy link
Member

AlexKirko commented Dec 17, 2019

There is a chance removing this will break something, in case adding it wasn't random, but I believe it should still be removed. It's ugly and the issues that might arise are easily fixable.
As far as I can see, all this would entail is a one-line edit to __init__.py I found no explanation why np was added in the code or in the API reference.
Edit: I did find a colleague who relied on this, so even if our code doesn't break anything in the library (or if we fix it), this change will still break backward compatibility for some users.

@datapythonista
Copy link
Member Author

We remove everything gradually, by first raising warnings.

I think there are other things we may also want to check if we should remove, I saw a pandas.array that I guess is an alias for numpy.array.

@mroeschke
Copy link
Member

Similar but more minor, looks like users will also import datetime.datetime with import pandas which I find odd.

In [1]: import pandas

In [2]: pandas.datetime
Out[2]: datetime.datetime

@AlexKirko
Copy link
Member

AlexKirko commented Dec 17, 2019

Fair point about the deprecation warning.

pd.array, however, isn't just an alias. It allows to make arrays of pandas-specific datatypes.

This works:

pd.array([1,2,3], dtype=pd.Int64Dtype())

<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

This doesn't:

np.array([1,2,3], dtype=pd.Int64Dtype())

TypeError                                 Traceback (most recent call last)
<ipython-input-7-6ae38424b9f7> in <module>
----> 1 np.array([1,2,3], dtype=pd.Int64Dtype())

TypeError: data type not understood

@xhochy
Copy link
Contributor

xhochy commented Dec 17, 2019

The numpy import is actually explicit:

but is simply a redirect to normal numpy:
import numpy as np
Still, I have often met users that swear that pandas.np is different from np as it provides a compatability layer between NumPy and pandas. Removing this alias would also wipe out this myth. As it is just an alias, the breaking change is really easy to resolve.

The datetime import is also

from datetime import datetime
only there for importing it via pandas.datetime. There is no usage of it in the __init__.py.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 17, 2019

For python 3.7+, we can actually deprecate this with the module getattr trick (the same we use for Panel dummy class right now). So I think we can go through a deprecation cycle instead of directly removing (for python 3.6, this is more difficult though).

@datapythonista
Copy link
Member Author

pd.array, however, isn't just an alias.

Yep, I got confused, it's obviously our own array.

@datapythonista datapythonista added good first issue and removed Needs Discussion Requires discussion from core team before further action labels Dec 17, 2019
@lithomas1
Copy link
Member

take

@TomAugspurger
Copy link
Contributor

Are we wanting to do this for 1.0, or should it wait, or does it not matter?

@datapythonista
Copy link
Member Author

Would be nice, but I don't think it's important, since it won't be removed until 2.0 I guess.

@jbrockmendel
Copy link
Member

datetime was also suggested for this treatment. what else doesn't belong in the top-level namespace? Some candidates:

  • __docformat__ (is this a legacy thing or is it actually used by sphinx or something?)
  • _hashtable, _lib, _tslib
  • datetime
  • isnull, notnull (weren't these deprecated a while back?)
  • _np_version_under1p*
  • _version

@jreback
Copy link
Contributor

jreback commented Dec 21, 2019

the private modules don’t show up anyhow so reallly nbd in those

we didn’t actually depreciate isnull/notnull

@ryankarlos
Copy link
Contributor

Can i have a go at datetime or isnull/notnull ?

@jorisvandenbossche
Copy link
Member

I think it is better to first finalize the open PR: #30386.

Also, if we want to deprecate isnull/notnull, let's first discuss that in a separate issue, as this is a quite different thing. Here we are discussing shortcuts for external packages.

@ryankarlos
Copy link
Contributor

ok will wait for this PR to be merged - if datetime still requires treatment in this issue then happy to work on that.

@datapythonista
Copy link
Member Author

I think we want to get rid of pandas.datetime, and I don't think there shouldn't be important conflicts with #30386 if you open the PR in parallel.

@ryankarlos
Copy link
Contributor

ok will do, thanks @datapythonista

@CharlyWargnier
Copy link

Hi guys,

I've got this code:

df['SEBotClass'] = pd.np.where(df.userAgent.str.contains("YandexBot"), "YandexBot", pd.np.where(df.userAgent.str.contains("bingbot"), "BingBot", pd.np.where(df.userAgent.str.contains("DuckDuckBot"), "DuckDuckGo", pd.np.where(df.userAgent.str.contains("Baiduspider"), "Baidu", pd.np.where(df.userAgent.str.contains("Googlebot/2.1"), "GoogleBot", "Else"))))) nan=pd.np.nan

Which now gives this warning message:

FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead

What do I need to do to avoid that warning?

Thanks! :)

@jorisvandenbossche
Copy link
Member

Replace pd.np with np in your code above (after doing import numpy as np)

@CharlyWargnier
Copy link

Thanks Joris!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Deprecate Functionality to remove in pandas good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.