Skip to content

fromnumeric.py compatibility with GroupBy, Window, and tslib functions #12811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gfyoung opened this issue Apr 6, 2016 · 8 comments
Closed
Labels
Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@gfyoung
Copy link
Member

gfyoung commented Apr 6, 2016

In #12810, it was decided that compatibility for groupby (including resample.py) and window functions would be left for a separate PR / discussion, which seems reasonable given how massive #12810 already is. This issue serves a reminder to tackle this after landing #12810, as it seems like this can be easily addressed afterwards.

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Apr 6, 2016
@jreback jreback added this to the 0.18.1 milestone Apr 6, 2016
@gfyoung
Copy link
Member Author

gfyoung commented Apr 6, 2016

@jreback : Adding timestamps and timedeltas as well to this issue given my question about tslib.pyx (i.e. what sort of compatibility should we give to methods with numpy counterparts?)

@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 27, 2016
@gfyoung
Copy link
Member Author

gfyoung commented May 7, 2016

With my initial fromnumeric.py PR merged, it seems like a good idea to revisit this. The major files that I think merit examination for numpy compatibility are:

pandas/core/window/window.py
pandas/tseries/resample.py
pandas/core/groupby.py
pandas/tslib.pyx

@gfyoung gfyoung changed the title fromnumeric.py compatibility with GroupBy and Window functions fromnumeric.py compatibility with GroupBy, Window, and tslib functions May 7, 2016
@jreback
Copy link
Contributor

jreback commented May 7, 2016

how so, I don't really care to be compat with numpy for anything beyond very basic stuff. pls show an example.

@gfyoung
Copy link
Member Author

gfyoung commented May 7, 2016

Examples:

tslib.round(self, freq) vs. np.round(a, decimals=0, out=None)

window.max(self, how=None, **kwargs) vs. np.max(a, axis=None, out=None, keepdims=False)

resample.var(self, ddof=1) vs. np.var(a, axis=None, out=None, ddof=0, keepdims=False)

groupby.mean(self) vs. np.mean(a, axis=None, out=None, keepdims=False)

All I was thinking of doing was putting validation calls in the implementation, similar to what was done in my previous PR and nothing more than that. I'm also perfectly fine leaving them as is since numpy decoupling is also one of our objectives with pandas.

@jreback
Copy link
Contributor

jreback commented May 7, 2016

In [1]: df = DataFrame({'A' : [1,2,1], 'B' : [1,2,3]})

In [2]: g = df.groupby('A')

In [3]: g.mean()
Out[3]: 
   B
A   
1  2
2  2

In [4]: np.mean(g)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-fdcd38489fcb> in <module>()
----> 1 np.mean(g)

/Users/jreback/miniconda/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in mean(a, axis, dtype, out, keepdims)
   2878         try:
   2879             mean = a.mean
-> 2880             return mean(axis=axis, dtype=dtype, out=out)
   2881         except AttributeError:
   2882             pass

TypeError: mean() got an unexpected keyword argument 'axis'

@jreback
Copy link
Contributor

jreback commented May 7, 2016

ok that doesn't seem unreasonable

@jorisvandenbossche
Copy link
Member

Do we actually want something like np.mean(g) to work?
A groupby object is not an array-like such as a Series. IMO we shouldn't put effort in enabling such usage

@gfyoung
Copy link
Member Author

gfyoung commented May 12, 2016

@jorisvandenbossche : I'll leave that for you to debate with @jreback . To reiterate, I am perfectly fine either way. This is not as serious a compatibility issue as the previous one I raised in #12644.

gfyoung added a commit to forking-repos/pandas that referenced this issue May 19, 2016
Expands compatibility with fromnumeric.py in tslib.pyx and
puts checks in window.py, groupby.py, and resample.py to
ensure that pandas functions such as 'mean' are not called
via the numpy library.

Closes pandas-devgh-12811.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

3 participants