-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add the moment function as DataFrame and Series method WITH namespacing #10702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ok, this looks reasonable.
|
While in principle I agree that it would be consistent to have these on the objects, I am just not sure if I find it a good idea to add yet another 37 methods at once to DataFrame/Series. |
I share @jorisvandenbossche's concern. We could do it in a |
ok, these are good points. Not real strong on this, so tabling for now, @TomAugspurger of namespacing makes more sense. |
I like the idea of adding a namespace to avoid adding 37 names into DataFrame. IMHO, there are two clean ways of doing this:
I prefer the solution 2 since it would also make the moments library cleaner. I am not very comfortable with pandas' code yet. Could you tell me whether one of those ideas could be acceptable or if another solution could be preferable? |
I gathered the rolling and expanding methods into separate namespaces and added them to DataFrame and Series This way, the dataframes methods are available via: DataFrame.rolling., Considering the ewm* function, only 8 objects are added to the DataFrames. I did |
Hmm, the last implementation just does not work because df.rolling.mean provides the rolling object to the rolling_mean method as self argument instead of the dataframe. In order to use the rolling_ and expanding_ methods as is but still aggregate them into separate namespaces, I don't see an easy solution. https://github.com/dalejung/trtools/blob/master/trtools/monkey/__init__.py provides a way to do so, but it looks a bit complicated to do a so simple stuff. |
@Konubinix We have some machinery for this (which is also used by dt/cat/str). Look for But maybe wait a moment to first let some other people react and settle the discussion on which we actually want (as methods, as namespace methods or leave as is) before you put too much work in it. |
For |
I like the idea of namespace accessors for rolling and expanding aggregations. It has always seemed a little strange to me how these are functions instead of methods (like most things in pandas). |
I took into account your remarks and comments. |
# Add all the methods to DataFrame and Series | ||
import sys | ||
thismodule = sys.modules[__name__] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are creating lots of machinery that can simply use PandasDelegate
, see pandas/tseries.common.py
for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback, thank you for the PandasDelegate pointer. I checked into that possibility. If I understood correctly, it consists in subclassing PandasDelegate with a I my situation, since I don't propose to copy some methods from one delegate For instance, to allow to use df.rolling.mean, I would need to create an Then, I would have to call RollingDelegator._add_delegate_accessors(delegate=RollingMethods, accessors=[...the list of rolling methods...], typ="method"). Finally, I would have to add to do the same thing as in the previous patch: I would also need to do that for expanding and ewm. I think I can do this, but it results in much more code that is not more I may have understood something wrong. If so, could you please give me some hint on how to use PandasDelegate? Thanks again for your help. |
I pushed a version making use of the PandasDelegate mechanism. I am not sure I use it correctly. |
This makes the usage of those methods more object oriented. In order not to pollute the DataFrame and Series namespaces, the rolling and expanding methods were gathered into separate namespaces, using as far as possible the dedicated PandasDelegate mechanism. Also, delete some trailing whitespaces.
Just to point out another possible API here -- Instead of writing Conceptually, this makes some sense -- rolling window operations are very similar to grouped operations with overlapping groups. One possible downside is that methods off an accessor property are more discoverable -- IPython can do direct auto-complete. |
@shoyer +1 |
@shoyer, I like the idea. Nevertheless, it looks way more ambitious than this The purpose of this pull request is to make the functions in the moments You are now suggesting to change the API. I agree with this idea, but I think it |
"Can only use .{} accessor with floats or int values".format(name) | ||
) | ||
if isinstance(self, DataFrame) \ | ||
and not True in [ # check whether there is one dtype float or integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you not any(.....)
more idiomatic
this need a test along the lines of this: https://github.com/pydata/pandas/blob/master/pandas/tests/test_series.py#L79 IOW you are testing that the I think your impl is fine. We can discuss doing a |
@Konubinix @jreback I think this change itself is a great idea, but I'm raising the issue of the alternate API for a few reasons --
|
closing in favor of #11603 |
API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702
This makes the usage of those methods more object oriented
Also, delete some trailing whitespaces.