Empty product not equal to 1 #7889

anderslundstedt · 2014-07-31T16:42:43Z

In [1]: import pandas as pd
In [2]: pd.Series().product()
Out[2]: nan

I excepted:

Out[2]: 1

http://en.wikipedia.org/wiki/Empty_product

The text was updated successfully, but these errors were encountered:

jreback · 2014-07-31T16:47:33Z

hmm, that is different from numpy....

want to do a pull-request?

ischwabacher · 2014-08-01T19:47:18Z

xref #7869

If you want to fix this, the problem is in the bottleneck_switch decorator in nanops.py; otherwise, you can wait for me to get to it. That decorator's constructor takes a zero_value argument for an operation to use when the array it's operating on is empty, then uses 0 instead in any case. We should rename the parameter to empty_value.

anderslundstedt · 2014-08-04T12:18:51Z

I do not feel comfortable fixing this myself, so I will wait. Thanks for the prompt responses!

anderslundstedt · 2014-08-04T12:19:58Z

Ooops. Accidentally closed. Now open again I hope. (I am not very experienced with Github.)

cpcloud · 2014-08-04T13:06:46Z

I can take this.

cpcloud · 2014-08-04T18:00:58Z

@anderslundstedt @jreback Fixing this to be consistent with numpy breaks tested behavior of TimeGrouper aggregations. See tseries/tests/test_resample.py:TestTimeGrouper.test_aggregate_with_nat for the test that breaks. Is this an API that should be broken?

jreback · 2014-08-04T18:08:59Z

I would just fix that test ('prod') as its 'wrong' too

cpcloud · 2014-08-04T18:09:24Z

i did that in the pr

cpcloud · 2014-08-04T18:09:33Z

whoops forgot to xref this issue

jorisvandenbossche · 2014-08-04T20:44:08Z

@cpcloud Can you give an example of a resample that would break?

Eg, would the NaN in the following example become 1?

In [3]: s1 = pd.Series(np.arange(5), index=pd.date_range('2012-01-01', periods=5))
In [4]: s2 = pd.Series(np.arange(5), index=pd.date_range('2012-01-10', periods=5))
In [5]: s = pd.concat([s1, s2])

In [10]: s.resample('2D', how='prod')
Out[10]:
2012-01-01     0
2012-01-03     6
2012-01-05     4
2012-01-07   NaN
2012-01-09     0
2012-01-11     2
2012-01-13    12
Freq: 2D, dtype: float64

ischwabacher · 2014-08-04T21:02:56Z

Do we care that type(pd.Series([], dtype='f64').product()) is still int?

cpcloud · 2014-08-04T21:36:20Z

@jorisvandenbossche no, if the group key is nan (or nat) then the value at that key will be 1 instead of nan or nat. Using your example s:

In [44]: s
Out[44]:
2012-01-01    0
2012-01-02    1
2012-01-03    2
2012-01-04    3
2012-01-05    4
2012-01-10    0
2012-01-11    1
2012-01-12    2
2012-01-13    3
2012-01-14    4
dtype: int64

In [45]: df = s.reset_index(name='col').rename(columns={'index': 'date'})

In [46]: df.loc[2, 'date'] = pd.NaT

In [47]: df
Out[47]:
        date  col
0 2012-01-01    0
1 2012-01-02    1
2        NaT    2
3 2012-01-04    3
4 2012-01-05    4
5 2012-01-10    0
6 2012-01-11    1
7 2012-01-12    2
8 2012-01-13    3
9 2012-01-14    4

In [48]: df.groupby(pd.TimeGrouper(key='date', freq='D')).prod()
Out[48]:
            col
date
2012-01-01    0
2012-01-02    1
2012-01-03    1
2012-01-04    3
2012-01-05    4
...         ...
2012-01-10    0
2012-01-11    1
2012-01-12    2
2012-01-13    3
2012-01-14    4

[14 rows x 1 columns]

jorisvandenbossche · 2014-08-05T07:46:17Z

@cpcloud Hmm, I don't fully get that. What is the difference? When the third row is set to NaT in your example, this then actually means that the 2012-01-03 row is removed, and so is missing, just like the others (2012-01-06 to 2012-01-09). So if that date gives 1 in the result, the others should also?

WIth 0.14.1, your example gives:

In [8]: df.groupby(pd.TimeGrouper(key='date', freq='D')).prod()
Out[8]:
            col
date
2012-01-01    0
2012-01-02    1
2012-01-03  NaN
2012-01-04    3
2012-01-05    4
2012-01-06  NaN
2012-01-07  NaN
2012-01-08  NaN
2012-01-09  NaN
2012-01-10    0
2012-01-11    1
2012-01-12    2
2012-01-13    3
2012-01-14    4

So the NaN in the third row becomes 1 according to your example? But not the other NaN values? That seems inconsistent to me? But if they all become 1, that also seems like a rather large API break, no?

cpcloud · 2014-08-05T12:55:21Z

Those will be one as well I just have my display max rows set to 10. I'll post the full example in a bit

cpcloud · 2014-08-05T13:01:14Z

Yes it's a breaking change, I'm not sure if it's the way to go as I'm not sure if this brings any benefits other that consistency with numpy and Wikipedia. OTOH I would've expected more breakage by changing the zero value default in nanops

jorisvandenbossche · 2014-08-05T13:10:24Z

In [23]: np.array([]).prod()
Out[23]: 1.0

In [24]: np.array([np.nan]).prod()
Out[24]: nan

So would it in some way be possible to regard this example above as a product of NaN (which stays NaN) instead of an empty product?
(warning, very naive view without looking at the actual code is following: that you follow a reasoning like first 'resample' and add the missing dates with NaN values, and only then groupby to reduce with the product operator)

anderslundstedt · 2014-08-05T13:11:47Z

@cpcloud It is more than "consistency with Wikipedia", it is matter of consistency with any mathematical identity involving a possibly empty product.

cpcloud · 2014-08-05T13:15:11Z

It was a tongue in cheek comment. I don't like introducing "valid" values where previously there were nans. This seems like it would break a ton of existing code. @anderslundstedt what operations are you doing that allowed you to find this inconsistency?

anderslundstedt · 2014-08-05T13:22:54Z

Well, I want to take the product of a possibly empty series. The context is that I want to compute historical stock prices adjusted for dividend etc., when there has been no dividend after price time this involves an empty product.

cpcloud · 2014-08-05T13:23:48Z

Can you show some code and data in an example? Thanks.

anderslundstedt · 2014-08-05T13:28:20Z

I do not see the point and I am not able to share the code I work with. My initial post basically sums up the problem.

It is not difficult to work around, so it is not a problem (for me) if you do not change the behaviour. Just though it would be nice with empty products equal to one, but I understand that you may not want to break existing code.

jorisvandenbossche · 2014-08-05T13:28:49Z

What I wanted to say with my comment above: the fact that in a resample operation with product, missing dates end up 1 or NaN is more an "implementation detail" of resample than the
insurmountable consequence of the mathematical identity of an empty product.
As you can also see the missing dates in resample as a product of NaN, which is NaN.

So it would maybe be possible to change the result of Series([]).product() to 1 while keeping the same result in resample/TimeGrouper (giving NaNs)? (again, without knowing the code)

Note: in R, an empty product also gives 1 by definition.

cpcloud · 2014-08-05T13:30:55Z

That's easy to do can just special case the empty product and leave the resample code as is.

anderslundstedt · 2014-08-05T13:33:59Z

Yes, I meant empty products of series that are empty in the strongest sense (neither values nor missing values). I have no opinion how one best would treat products of a series with only missing values.

jreback · 2014-08-05T13:36:11Z

as an aide, you can always do: s.fillna(1).product() to guarantee that you can do some sort of multiplication

anderslundstedt · 2014-08-05T13:40:59Z

@jreback I do not know if you meant that it would solve the original case, which it does not:

In [1]: import pandas as pd
In [2]: pd.Series().fillna(1).product()
Out[2]: nan

jreback · 2014-08-05T13:42:47Z

@anderslundstedt hmm, empty series is a special case I guess then, ok @cpcloud why don't we just fix the scalar case. What does that do to the resample case? (obvious the test case has to change)

cpcloud · 2014-08-08T22:35:26Z

this is far less trivial than i thought, mostly bc arith methods are added after the class is defined so i can't override it properly. i guess i'll just have to monkey patch it. i have a big refactor of this arith stuff that defines them on ndframe abstractly so that a subclass can override them for specific cases if necessary

jreback · 2014-09-09T23:39:20Z

@cpcloud status?

TomAugspurger · 2017-12-13T22:22:19Z

This is being handled in #18678

jreback added Bug labels Jul 31, 2014

jreback added this to the 0.15.0 milestone Jul 31, 2014

anderslundstedt closed this as completed Aug 4, 2014

anderslundstedt reopened this Aug 4, 2014

cpcloud self-assigned this Aug 4, 2014

cpcloud mentioned this issue Aug 4, 2014

BUG: define empty product on Series and DataFrame to be 1 #7928

Closed

jreback modified the milestones: 0.15.1, 0.15.0 Sep 14, 2014

jreback modified the milestones: Next Major Release, 0.16.0 Mar 2, 2015

TomAugspurger closed this as completed Dec 13, 2017

TomAugspurger modified the milestones: Next Major Release, No action Dec 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty product not equal to 1 #7889

Empty product not equal to 1 #7889

anderslundstedt commented Jul 31, 2014

jreback commented Jul 31, 2014

ischwabacher commented Aug 1, 2014

anderslundstedt commented Aug 4, 2014

anderslundstedt commented Aug 4, 2014

cpcloud commented Aug 4, 2014

cpcloud commented Aug 4, 2014

jreback commented Aug 4, 2014

cpcloud commented Aug 4, 2014

cpcloud commented Aug 4, 2014

jorisvandenbossche commented Aug 4, 2014

ischwabacher commented Aug 4, 2014

cpcloud commented Aug 4, 2014

jorisvandenbossche commented Aug 5, 2014

cpcloud commented Aug 5, 2014

cpcloud commented Aug 5, 2014

jorisvandenbossche commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

cpcloud commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

cpcloud commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

jorisvandenbossche commented Aug 5, 2014

cpcloud commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

jreback commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

jreback commented Aug 5, 2014

cpcloud commented Aug 8, 2014

jreback commented Sep 9, 2014

TomAugspurger commented Dec 13, 2017

Empty product not equal to 1 #7889

Empty product not equal to 1 #7889

Comments

anderslundstedt commented Jul 31, 2014

jreback commented Jul 31, 2014

ischwabacher commented Aug 1, 2014

anderslundstedt commented Aug 4, 2014

anderslundstedt commented Aug 4, 2014

cpcloud commented Aug 4, 2014

cpcloud commented Aug 4, 2014

jreback commented Aug 4, 2014

cpcloud commented Aug 4, 2014

cpcloud commented Aug 4, 2014

jorisvandenbossche commented Aug 4, 2014

ischwabacher commented Aug 4, 2014

cpcloud commented Aug 4, 2014

jorisvandenbossche commented Aug 5, 2014

cpcloud commented Aug 5, 2014

cpcloud commented Aug 5, 2014

jorisvandenbossche commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

cpcloud commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

cpcloud commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

jorisvandenbossche commented Aug 5, 2014

cpcloud commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

jreback commented Aug 5, 2014

anderslundstedt commented Aug 5, 2014

jreback commented Aug 5, 2014

cpcloud commented Aug 8, 2014

jreback commented Sep 9, 2014

TomAugspurger commented Dec 13, 2017