Skip to content

BUG: Series.argsort() fails with datetime64[ns] with NaT / dtypes are odd #2967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Mar 5, 2013 · 3 comments
Closed
Labels

Comments

@jreback
Copy link
Contributor

jreback commented Mar 5, 2013

pretty simple fix though, just need to make sure that the result array
is typed float64 (to accomodate the nans) or int64 (if no nans)

(rather than the same as input type)
(which could give weird results in some cases), e.g. you wouldn't
want a float array back just because you fed it nans....

heres the inspiration question
http://stackoverflow.com/questions/15207279/return-sorted-indexes-skipping-nan-values-in-pandas

In [29]: s = pd.Series([pd.Timestamp('201301%02d'% (i+1)) for i in range(5)])

In [30]: s
Out[30]: 
0   2013-01-01 00:00:00
1   2013-01-02 00:00:00
2   2013-01-03 00:00:00
3   2013-01-04 00:00:00
4   2013-01-05 00:00:00
dtype: datetime64[ns]

In [31]: s.argsort()
Out[31]: 
0    0
1    1
2    2
3    3
4    4
dtype: int64

In [32]: s.shift().argsort()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-61a586f08c06> in <module>()
----> 1 s.shift().argsort()

/mnt/home/jreback/pandas/pandas/core/series.pyc in argsort(self, axis, kind, order)
   2119             result = values.copy()
   2120             notmask = -mask
-> 2121             result[notmask] = np.argsort(values[notmask], kind=kind)
   2122             return Series(result, index=self.index, name=self.name)
   2123         else:

TypeError: array cannot be safely cast to required type
@jreback
Copy link
Contributor Author

jreback commented Mar 5, 2013

could also put -1 in the places where NaN/NaT are....so could always return an int64 Series, but is that weird?
we do this (put -1) with idxmin/max so maybe not so weird....

@wesm @changhiskhan

?

@hayd
Copy link
Contributor

hayd commented Mar 5, 2013

I don't think we got this error in 10.1:

In [6]: s.shift().argsort()
Out[6]: 
0    NaN
1      0
2      1
3      2
4      3

In [7]: s.shift().argsort().dtype
Out[7]: dtype('object')

@jreback
Copy link
Contributor Author

jreback commented Mar 6, 2013

yep...but in 0.10.1 the dtype of the series is itself wrong (its object)
so s.shift(1) 'works'

easy fix any how..

In [2]: s = pd.Series([pd.Timestamp('201301%02d'% (i+1)) for i in range(5)])

In [3]: s
Out[3]: 
0    2013-01-01 00:00:00
1    2013-01-02 00:00:00
2    2013-01-03 00:00:00
3    2013-01-04 00:00:00
4    2013-01-05 00:00:00

In [4]: s.dtype
Out[4]: dtype('object')

In [5]: s.shift(1).dtype
Out[5]: dtype('object')

In [6]: s.shift(1).argsort()
Out[6]: 
0    NaN
1      0
2      1
3      2
4      3

jreback added a commit that referenced this issue Mar 6, 2013
BUG: Series.argsort failing on datetime64[ns] when NaT present, GH #2967
@jreback jreback closed this as completed Mar 6, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants