Skip to content

possible error in mean of dataframe with nan values? #763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CRP opened this issue Feb 8, 2012 · 5 comments
Closed

possible error in mean of dataframe with nan values? #763

CRP opened this issue Feb 8, 2012 · 5 comments

Comments

@CRP
Copy link
Contributor

CRP commented Feb 8, 2012

I am not sure if I am doing something wrong, but this looks like a bug to me:

In [116]: ra.ix[:5,:5]
Out[116]:
005665 023740 028758 040828 045449
2011-05-09 NaN NaN NaN NaN NaN
2011-05-10 -0.00284467268624 0.0 0.0 -0.002844627571 0.000353248710642
2011-05-11 -0.00674436854779 0.0 0.0 -0.00674432257973 0.0
2011-05-12 0.0117105570627 0.0 0.0 0.0117104073878 0.0
2011-05-13 0.00367090573068 0.0 0.0 -0.00102855609881 0.0

In [117]: ra.ix[:5,:5].mean(axis=0,skipna=True)
Out[117]:
005665 0.0014481
023740 0.0000000
028758 0.0000000
040828 0.0002732
045449 8.831e-05

In [118]: ra.ix[:5,:5].mean(axis=1,skipna=True)
Out[118]:
2011-05-09 NaN
2011-05-10 NaN
2011-05-11 NaN
2011-05-12 NaN
2011-05-13 NaN

In [122]: ra.ix[1:5,:5].mean(axis=1,skipna=True)
Out[122]:
2011-05-10 -0.0010672
2011-05-11 -0.0026977
2011-05-12 0.0046842
2011-05-13 0.0005285

In step 118, I would expect the same output I get at step 122. Maybe the check if values are all NaN on a given row are done on the wrong axis?

@CRP
Copy link
Contributor Author

CRP commented Feb 8, 2012

the problem is apparently in _nanmean, where it does
the_mean = the_sum / count
since element 0 of count ==0, this throws an exception.
I see that the case of count==0 is handled in subsequent lines. Is there some assumption here about numpy behaviour? I have version 1.6.1

@CRP
Copy link
Contributor Author

CRP commented Feb 8, 2012

ok, this happens when the dataframe that I use is of type object:

In [209]: np.array((1,2,3),dtype=object)/np.array((0,0,0))

ZeroDivisionError Traceback (most recent call last)
/Users/c.prinoth/ in ()
----> 1 np.array((1,2,3),dtype=object)/np.array((0,0,0))

ZeroDivisionError: integer division or modulo by zero

In [210]: np.array((1,2,3),dtype=float)/np.array((0,0,0))
Out[210]: array([ inf, inf, inf])

so not really a pandas problem

@CRP CRP closed this as completed Feb 8, 2012
@adamklein
Copy link
Contributor

Seems like even as object, works now in latest git master:

In [6]: df = df.astype('O')

In [7]: df
Out[7]: 
                 005665 023740 028758       040828        045449
2011-05-09          NaN    NaN    NaN          NaN           NaN
2011-05-10 -0.002844673      0      0 -0.002844628  0.0003532487
2011-05-11 -0.006744369      0      0 -0.006744323             0
2011-05-12   0.01171056      0      0   0.01171041             0
2011-05-13  0.003670906      0      0 -0.001028556             0

In [8]: df.ix[1:5,:5].mean(axis=1, skipna=True)
Out[8]: 
2011-05-10   -0.001067
2011-05-11   -0.002698
2011-05-12    0.004684
2011-05-13    0.000528

In [9]: df.ix[0:5,:5].mean(axis=1, skipna=True)
Out[9]: 
2011-05-09         NaN
2011-05-10   -0.001067
2011-05-11   -0.002698
2011-05-12    0.004684
2011-05-13    0.000528

@wesm
Copy link
Member

wesm commented Feb 8, 2012

I recall working on this sometime in the last month. @CRP can you verify that things work on git master?

@CRP
Copy link
Contributor Author

CRP commented Feb 9, 2012

just did and it appears to work fine

thanks

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019
Do not use bytes when loading back pickled objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants