Skip to content

WIP/ENH: Weightby #15031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

WIP/ENH: Weightby #15031

wants to merge 2 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jan 1, 2017

closes #10030

this is on top of #14483

provides a groupby-like API to weighted calculations. The weights are lazily calculated and cached.
Deprecates the weights parameter to .sample(), and implements all of this logic inside .weightby.

TODO:

  • only sum/mean are implemented ATM, but logic for other ops (std, var, kurt, skew) are straightforward.
  • no logic ATM for a .groupby(...).weightby(...) or .groupby(...).rolling(...), but should be a straightforward enhancement.
  • easy enhancement to add 'other' weight calculations at some point (just adding args to the .weightby constructor), see here
In [1]: df = DataFrame({'A': [1, 2, 3, 4],
   ...:                 'B': [1, 2, 3, 4]})
   ...: 

In [2]: df
Out[2]: 
   A  B
0  1  1
1  2  2
2  3  3
3  4  4

In [3]: df.weightby('A').B.sum()
Out[3]: 3.0

In [4]: df.weightby('A').sum()
Out[4]: 
B    3.0
dtype: float64

In [5]: df.weightby('A').sample(n=2)
Out[5]: 
   B
3  4
1  2

In [7]: w = df.weightby('A')

In [8]: w.mean()
Out[8]: 
B    0.75
dtype: float64

In [9]: w._weights
Out[9]: array([ 0.1,  0.2,  0.3,  0.4])

function application that mimics the groupby(..).agg/.aggregate
interface

.apply is now a synonym for .agg, and will accept dict/list-likes
for aggregations

CLN: rename .name attr -> ._selection_name from SeriesGroupby for compat (didn't exist on DataFrameGroupBy)
resolves conflicts w.r.t. setting .name on a groupby object

closes pandas-dev#1623
closes pandas-dev#14464

custom .describe
closes pandas-dev#14483
closes pandas-dev#15015
closes pandas-dev#7014
@jreback jreback added Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations labels Jan 1, 2017
@jreback
Copy link
Contributor Author

jreback commented Jan 1, 2017

cc @josef-pkt

@codecov-io
Copy link

codecov-io commented Jan 1, 2017

Current coverage is 84.82% (diff: 94.75%)

Merging #15031 into master will increase coverage by 0.05%

@@             master     #15031   diff @@
==========================================
  Files           145        146     +1   
  Lines         51131      51343   +212   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43346      43552   +206   
- Misses         7785       7791     +6   
  Partials          0          0          

Powered by Codecov. Last update 6141754...d843a4e

@jreback
Copy link
Contributor Author

jreback commented Jan 2, 2017

This is quite straightforward to change to an API like the following (in fact the actual implementation already is like this, but I pass a hidden kwargs _weights in).

df.sum(...., weights=...)
df.groupby(...).sum(...., weights=...)
df.sample(....weights=...) (would stay the same)

I would rip out the weight validation code and put it else where (as a function).

The only downside of this it would be awkward to then specify different kinds of weights (like the link in the top section), because we could then need to add other kwargs like iweights, aweights etc. which I think is not nice.

@jreback jreback mentioned this pull request Jan 2, 2017
@mattayes
Copy link
Contributor

mattayes commented Jan 2, 2017

Could you alleviate the different weights kwargs by having one weight_type kwarg?

@jreback
Copy link
Contributor Author

jreback commented Jan 2, 2017

@mattayes yes that might be one way of doing it.

@jreback
Copy link
Contributor Author

jreback commented Jan 2, 2017

closing in favor of #15039

@jreback jreback closed this Jan 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

weighted mean
3 participants