ENH: Add the moment function as DataFrame and Series method WITH namespacing #10702

Konubinix · 2015-07-30T10:47:07Z

This makes the usage of those methods more object oriented

Also, delete some trailing whitespaces.

jreback · 2015-07-30T15:03:29Z

ok, this looks reasonable.

I would like to have some smoke tests along the lines of this.
pls add to the API.rst as well (maybe have to do that somewhat manually).
Further I think that computation.rst could be modified to use this syntax.
finally would need a release note showing these changes

jorisvandenbossche · 2015-07-30T16:09:50Z

While in principle I agree that it would be consistent to have these on the objects, I am just not sure if I find it a good idea to add yet another 37 methods at once to DataFrame/Series.

TomAugspurger · 2015-07-30T16:11:07Z

I share @jorisvandenbossche's concern. We could do it in a .rolling and .expanding namespace. df.rolling.mean(periods=2), but I don't have a strong preference for that over where they are now.

jreback · 2015-07-30T16:28:20Z

ok, these are good points. Not real strong on this, so tabling for now, @TomAugspurger of namespacing makes more sense.

Konubinix · 2015-07-31T06:25:40Z

I like the idea of adding a namespace to avoid adding 37 names into DataFrame.

IMHO, there are two clean ways of doing this:

create on the fly the rolling and expanding namespaces in DataFrame, something close to what is done here https://github.com/dalejung/trtools/blob/master/trtools/core/rolling.py (though I must admit I did not take the time to understand in detail this solution)
change the moments library to gather all the rolling methods in a rolling namespace and all the expanding methods in an expanding namespace

I prefer the solution 2 since it would also make the moments library cleaner.

I am not very comfortable with pandas' code yet. Could you tell me whether one of those ideas could be acceptable or if another solution could be preferable?

Konubinix · 2015-07-31T06:52:00Z

I gathered the rolling and expanding methods into separate namespaces and added them to DataFrame and Series

This way, the dataframes methods are available via: DataFrame.rolling.,
DataFrame.expanding., DataFrame.ewma. Same for Series.

Considering the ewm* function, only 8 objects are added to the DataFrames. I did
not not whether it was a good idea to put the ewm* method in a separate
namespace.

Konubinix · 2015-07-31T07:33:05Z

Hmm, the last implementation just does not work because df.rolling.mean provides the rolling object to the rolling_mean method as self argument instead of the dataframe.

In order to use the rolling_ and expanding_ methods as is but still aggregate them into separate namespaces, I don't see an easy solution.

https://github.com/dalejung/trtools/blob/master/trtools/monkey/__init__.py provides a way to do so, but it looks a bit complicated to do a so simple stuff.

jorisvandenbossche · 2015-07-31T10:00:36Z

@Konubinix We have some machinery for this (which is also used by dt/cat/str). Look for AccessorProperty in core/base.py.

But maybe wait a moment to first let some other people react and settle the discussion on which we actually want (as methods, as namespace methods or leave as is) before you put too much work in it.

jreback · 2015-07-31T11:10:45Z

.rolling, .expaning, .ewma make name spaces for Series make sense, though these only should show up for int/float dtypes (similar to how .str/.cat/.dt work).

For DataFrame this is much tricker. You only really would want them to be visible for a single dtype frame (of int/float), otherwise they don't make sense.

shoyer · 2015-08-02T05:51:38Z

I like the idea of namespace accessors for rolling and expanding aggregations. It has always seemed a little strange to me how these are functions instead of methods (like most things in pandas).

Konubinix · 2015-08-04T14:13:13Z

I took into account your remarks and comments.
Now the methods are gathered into separate namespaces (ewm, rolling and expanding) and a check is made whether a Serie is of dtype int of float and a DataFrame contains at least one dtype int or float.

jreback · 2015-08-04T23:13:37Z

pandas/stats/moments.py

+# Add all the methods to DataFrame and Series
+import sys
+thismodule = sys.modules[__name__]
+


you are creating lots of machinery that can simply use PandasDelegate, see pandas/tseries.common.py for an example

Thank you for the comment. I will take a look at PandasDelegate.

Konubinix · 2015-08-10T14:31:06Z

@jreback, thank you for the PandasDelegate pointer.

I checked into that possibility.

If I understood correctly, it consists in subclassing PandasDelegate with a
class that performs the computation for each delegated name. Then we have to
call the method class _add_delegate_accessors to indicate from which class to
copy the methods signatures and docstring.

I my situation, since I don't propose to copy some methods from one delegate
class, it looks not really well suited.

For instance, to allow to use df.rolling.mean, I would need to create an
intermediate class RollingMethods with the method mean (avg, count, ...) that
would wrap the rolling_mean (rolling_avg, rolling_count, ...) methods, then
create a delegator class RollingDelegator that would implement the abstract
method _delegate_method inherited from PandasDelegate.

Then, I would have to call RollingDelegator._add_delegate_accessors(delegate=RollingMethods, accessors=[...the list of rolling methods...], typ="method").

Finally, I would have to add to do the same thing as in the previous patch:
create an accessor for the rolling namespace that would check the dtypes and
return an instanciated RollingDelegator.

I would also need to do that for expanding and ewm.

I think I can do this, but it results in much more code that is not more
readable.

I may have understood something wrong. If so, could you please give me some hint on how to use PandasDelegate?

Thanks again for your help.

Konubinix · 2015-08-10T15:08:07Z

I pushed a version making use of the PandasDelegate mechanism. I am not sure I use it correctly.

This makes the usage of those methods more object oriented. In order not to pollute the DataFrame and Series namespaces, the rolling and expanding methods were gathered into separate namespaces, using as far as possible the dedicated PandasDelegate mechanism. Also, delete some trailing whitespaces.

shoyer · 2015-08-17T00:34:14Z

Just to point out another possible API here --

Instead of writing df.rolling.mean(periods=2), we might move the periods argument earlier: df.rolling(periods=2).mean(). df.rolling(periods=2) could return a Rolling object with a groupby like API (e.g., supporting iteration and automatic method dispatch).

Conceptually, this makes some sense -- rolling window operations are very similar to grouped operations with overlapping groups. One possible downside is that methods off an accessor property are more discoverable -- IPython can do direct auto-complete.

max-sixty · 2015-08-17T00:49:07Z

@shoyer +1

Konubinix · 2015-08-17T07:17:11Z

@shoyer, I like the idea. Nevertheless, it looks way more ambitious than this
pull request.

The purpose of this pull request is to make the functions in the moments
namespace available in the dataframe and series namespaces. It was based on the
observation that all those functions take a dataframe or serie as first argument
and hence are particularly designed to work like instance methods. It then does
no more that inspecting the moments namespace and make its methods accessible
from dataframe and series instances.

You are now suggesting to change the API. I agree with this idea, but I think it
should be the purpose of another issue that could be entitled "Refactor the
moments namespace".

jreback · 2015-08-17T11:23:45Z

pandas/stats/moments.py

+            "Can only use .{} accessor with floats or int values".format(name)
+        )
+    if isinstance(self, DataFrame) \
+       and not True in [ # check whether there is one dtype float or integer


you not any(.....) more idiomatic

jreback · 2015-08-17T11:28:52Z

this need a test along the lines of this: https://github.com/pydata/pandas/blob/master/pandas/tests/test_series.py#L79

IOW you are testing that the __dir__ is correct (and that they execute the same as the pd.* functions.

I think your impl is fine. We can discuss doing a Rolling in another issue. Its a nice idea but is a pretty big expansion.

shoyer · 2015-08-17T16:44:23Z

@Konubinix @jreback I think this change itself is a great idea, but I'm raising the issue of the alternate API for a few reasons --

it's generally better to hold off API changes if they would cause awkward compatibility issues with a different, superior API that we might want to use in the future (that isn't really the case here since .rolling as an accessor and as a method can co-exist).
from a marketing perspective, it's awkward to make APIs that may only live for a few versions
it might not actually be that hard to make a basic Rolling object. I would certainly rather have a Rolling implementation missing some features we'd like for the final version (e.g., with only support for aggregation for now) than a new API that we might immediately deprecate.

jreback · 2015-11-15T16:29:49Z

closing in favor of #11603

…calculations, xref pandas-dev#10702

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702

jreback added Docs API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Jul 30, 2015

jreback added this to the 0.17.0 milestone Jul 30, 2015

jreback modified the milestones: Next Major Release, 0.17.0 Jul 30, 2015

jreback changed the title ~~ENH: Add the moment function as DataFrame and Series method~~ ENH: Add the moment function as DataFrame and Series method WITH namespacing Jul 30, 2015

jreback added the Needs Discussion Requires discussion from core team before further action label Jul 30, 2015

jreback reviewed Aug 4, 2015
View reviewed changes

jreback reviewed Aug 17, 2015
View reviewed changes

jreback modified the milestones: 0.17.1, Next Major Release Nov 14, 2015

jreback modified the milestones: 0.18.0, 0.17.1 Nov 15, 2015

jreback closed this Nov 15, 2015

jreback added a commit to jreback/pandas that referenced this pull request Dec 19, 2015

API: provide Rolling/Expanding/EWM objects for deferred rolling type …

3c23dc9

…calculations, xref pandas-dev#10702

jreback added a commit that referenced this pull request Dec 19, 2015

Merge pull request #11603 from jreback/rolling

2a1d9f2

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702

Uh oh!

ENH: Add the moment function as DataFrame and Series method WITH namespacing #10702

ENH: Add the moment function as DataFrame and Series method WITH namespacing #10702

Uh oh!

Conversation

Konubinix commented Jul 30, 2015

Uh oh!

jreback commented Jul 30, 2015

Uh oh!

jorisvandenbossche commented Jul 30, 2015

Uh oh!

TomAugspurger commented Jul 30, 2015

Uh oh!

jreback commented Jul 30, 2015

Uh oh!

Konubinix commented Jul 31, 2015

Uh oh!

Konubinix commented Jul 31, 2015

Uh oh!

Konubinix commented Jul 31, 2015

Uh oh!

jorisvandenbossche commented Jul 31, 2015

Uh oh!

jreback commented Jul 31, 2015

Uh oh!

shoyer commented Aug 2, 2015

Uh oh!

Konubinix commented Aug 4, 2015

Uh oh!

jreback Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

Konubinix Aug 5, 2015 via email

Choose a reason for hiding this comment

Uh oh!

Konubinix commented Aug 10, 2015

Uh oh!

Konubinix commented Aug 10, 2015

Uh oh!

shoyer commented Aug 17, 2015

Uh oh!

max-sixty commented Aug 17, 2015

Uh oh!

Konubinix commented Aug 17, 2015

Uh oh!

jreback Aug 17, 2015

Choose a reason for hiding this comment

Uh oh!

jreback commented Aug 17, 2015

Uh oh!

shoyer commented Aug 17, 2015

Uh oh!

jreback commented Nov 15, 2015

Uh oh!

Uh oh!