possible error in mean of dataframe with nan values? #763

CRP · 2012-02-08T11:42:26Z

I am not sure if I am doing something wrong, but this looks like a bug to me:

In [116]: ra.ix[:5,:5]
Out[116]:
005665 023740 028758 040828 045449
2011-05-09 NaN NaN NaN NaN NaN
2011-05-10 -0.00284467268624 0.0 0.0 -0.002844627571 0.000353248710642
2011-05-11 -0.00674436854779 0.0 0.0 -0.00674432257973 0.0
2011-05-12 0.0117105570627 0.0 0.0 0.0117104073878 0.0
2011-05-13 0.00367090573068 0.0 0.0 -0.00102855609881 0.0

In [117]: ra.ix[:5,:5].mean(axis=0,skipna=True)
Out[117]:
005665 0.0014481
023740 0.0000000
028758 0.0000000
040828 0.0002732
045449 8.831e-05

In [118]: ra.ix[:5,:5].mean(axis=1,skipna=True)
Out[118]:
2011-05-09 NaN
2011-05-10 NaN
2011-05-11 NaN
2011-05-12 NaN
2011-05-13 NaN

In [122]: ra.ix[1:5,:5].mean(axis=1,skipna=True)
Out[122]:
2011-05-10 -0.0010672
2011-05-11 -0.0026977
2011-05-12 0.0046842
2011-05-13 0.0005285

In step 118, I would expect the same output I get at step 122. Maybe the check if values are all NaN on a given row are done on the wrong axis?

CRP · 2012-02-08T11:54:47Z

the problem is apparently in _nanmean, where it does
the_mean = the_sum / count
since element 0 of count ==0, this throws an exception.
I see that the case of count==0 is handled in subsequent lines. Is there some assumption here about numpy behaviour? I have version 1.6.1

CRP · 2012-02-08T12:01:13Z

ok, this happens when the dataframe that I use is of type object:

In [209]: np.array((1,2,3),dtype=object)/np.array((0,0,0))

ZeroDivisionError Traceback (most recent call last)
/Users/c.prinoth/ in ()
----> 1 np.array((1,2,3),dtype=object)/np.array((0,0,0))

ZeroDivisionError: integer division or modulo by zero

In [210]: np.array((1,2,3),dtype=float)/np.array((0,0,0))
Out[210]: array([ inf, inf, inf])

so not really a pandas problem

adamklein · 2012-02-08T15:39:17Z

Seems like even as object, works now in latest git master:

In [6]: df = df.astype('O')

In [7]: df
Out[7]: 
                 005665 023740 028758       040828        045449
2011-05-09          NaN    NaN    NaN          NaN           NaN
2011-05-10 -0.002844673      0      0 -0.002844628  0.0003532487
2011-05-11 -0.006744369      0      0 -0.006744323             0
2011-05-12   0.01171056      0      0   0.01171041             0
2011-05-13  0.003670906      0      0 -0.001028556             0

In [8]: df.ix[1:5,:5].mean(axis=1, skipna=True)
Out[8]: 
2011-05-10   -0.001067
2011-05-11   -0.002698
2011-05-12    0.004684
2011-05-13    0.000528

In [9]: df.ix[0:5,:5].mean(axis=1, skipna=True)
Out[9]: 
2011-05-09         NaN
2011-05-10   -0.001067
2011-05-11   -0.002698
2011-05-12    0.004684
2011-05-13    0.000528

wesm · 2012-02-08T20:11:55Z

I recall working on this sometime in the last month. @CRP can you verify that things work on git master?

CRP · 2012-02-09T08:34:49Z

just did and it appears to work fine

thanks

Do not use bytes when loading back pickled objects

CRP closed this as completed Feb 8, 2012

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019

Merge pull request pandas-dev#763 from shashank88/pickle_fix

4d04943

Do not use bytes when loading back pickled objects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possible error in mean of dataframe with nan values? #763

possible error in mean of dataframe with nan values? #763

CRP commented Feb 8, 2012

CRP commented Feb 8, 2012

CRP commented Feb 8, 2012

adamklein commented Feb 8, 2012

wesm commented Feb 8, 2012

CRP commented Feb 9, 2012

possible error in mean of dataframe with nan values? #763

possible error in mean of dataframe with nan values? #763

Comments

CRP commented Feb 8, 2012

CRP commented Feb 8, 2012

CRP commented Feb 8, 2012

In [209]: np.array((1,2,3),dtype=object)/np.array((0,0,0))

adamklein commented Feb 8, 2012

wesm commented Feb 8, 2012

CRP commented Feb 9, 2012