Skip to content

unstack with DatetimeIndex with NaN gives "ValueError: cannot convert float NaN to integer" #7401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jun 9, 2014 · 9 comments · Fixed by #9292
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jorisvandenbossche
Copy link
Member

With following code:

df = pd.DataFrame({'A': list('aaaaabbbbb'),
                   'B':pd.date_range('2012-01-01', periods=5).tolist()*2,
                   'C':np.zeros(10)})
df.iloc[3,1] = np.NaN
df = df.set_index(['A', 'B'])

unstacking gives:

In [6]: df.unstack()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-9a91d46cdd8d> in <module>()
----> 1 df.unstack()
...
c:\users\vdbosscj\scipy\pandas-joris\pandas\tseries\index.pyc in _simple_new(cls
, values, name, freq, tz)
    461     def _simple_new(cls, values, name, freq=None, tz=None):
    462         if values.dtype != _NS_DTYPE:
--> 463             values = com._ensure_int64(values).view(_NS_DTYPE)
    464
    465         result = values.view(cls)

c:\users\vdbosscj\scipy\pandas-joris\pandas\algos.pyd in pandas.algos.ensure_int
64 (pandas\algos.c:47444)()

c:\users\vdbosscj\scipy\pandas-joris\pandas\algos.pyd in pandas.algos.ensure_int
64 (pandas\algos.c:47349)()

ValueError: cannot convert float NaN to integer

I don't know if this should work (unstacking with NaN in the index), but at least this is not a very clear error message to know what could be going wrong.

@jreback
Copy link
Contributor

jreback commented Jun 9, 2014

a secondary issue is that the display of df.set_index(['A','B']) is wrong, that should be NaT (it actually IS a NaT if you look at:

In [18]: df.set_index(['A','B']).index.values
Out[18]: 
array([('a', Timestamp('2012-01-01 00:00:00')),
       ('a', Timestamp('2012-01-02 00:00:00')),
       ('a', Timestamp('2012-01-03 00:00:00')), ('a', NaT),
       ('a', Timestamp('2012-01-05 00:00:00')),
       ('b', Timestamp('2012-01-01 00:00:00')),
       ('b', Timestamp('2012-01-02 00:00:00')),
       ('b', Timestamp('2012-01-03 00:00:00')),
       ('b', Timestamp('2012-01-04 00:00:00')),
       ('b', Timestamp('2012-01-05 00:00:00'))], dtype=object)

AND I think this is the cause of the error itself (in unstack)

in that _make_index in core/reshape.py is not returning a correctly (it has a nan intermixed with the i8 values)

@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2014

i'll take this one

@cpcloud cpcloud self-assigned this Jun 9, 2014
@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2014

hm @jreback is null indexing supposed to work at all? e.g.,:

This works:

In [103]: df = DataFrame({'x': range(3)}, index=[pd.NaT, Timestamp('now', offset='D'), Timestamp('20130101', offset='D')])

In [104]: df
Out[104]:
                     x
NaT                  0
2014-06-09 14:18:18  1
2013-01-01 00:00:00  2

In [105]: df.loc[pd.NaT]
Out[105]:
x    0
Name: NaT, dtype: int64

this raises:

In [108]: df = DataFrame({'x': range(4)}, index=[pd.NaT, Timestamp('now', offset='D'), Timestamp('20130101', offset='D'), pd.NaT])

In [110]: df
Out[110]:
                     x
NaT                  0
2014-06-09 14:18:34  1
2013-01-01 00:00:00  2
NaT                  3

In [109]: df.loc[pd.NaT]

with ValueError: cannot use label indexing with a null key

@jreback
Copy link
Contributor

jreback commented Jun 9, 2014

one a single null indexer is supported (for getitem)

@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2014

ok thx just checking my sanity

@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2014

@jreback @jorisvandenbossche can we define what should happen in the first case?

so this:

df = pd.DataFrame({'A': list('aaaaabbbbb'),
                   'B':pd.date_range('2012-01-01', periods=5).tolist()*2,
                   'C':np.zeros(10)})
df.iloc[3,1] = np.NaN
df = df.set_index(['A', 'B'])

df looks like:

C
A B
a 2012-01-01 0
2012-01-02 0
2012-01-03 0
NaT 0
2012-01-05 0
b 2012-01-01 0
2012-01-02 0
2012-01-03 0
2012-01-04 0
2012-01-05 0
df.unstack(0)

which currently raises should yield:

C
A a b
B
2012-01-01 0 0
2012-01-02 0 0
2012-01-03 0 0
NaT 0 NaN
2012-01-04 NaN 0
2012-01-05 0 0

is that correct?

@jreback
Copy link
Contributor

jreback commented Jun 10, 2014

looks right.

(w/o the NaT) this works

In [34]: df.reset_index().pivot('B','A','C')
Out[34]: 
A           a  b
B               
2012-01-01  0  0
2012-01-02  0  0
2012-01-03  0  0
2012-01-04  0  0
2012-01-05  0  0

@jorisvandenbossche
Copy link
Member Author

I think this is more related to issue #7403 than this one (as it is a general issue with NaN's, not perse with datetimes).

For the rest I think also a problem is with the ordering of the index. How is it defined that 2012-01-04 should be inserted after NaN. If the index is sorted in the unstack operation (but is it?), the NaN should come last?

@cpcloud
Copy link
Member

cpcloud commented Jun 10, 2014

@jorisvandenbossche was just trying to work out a rough idea of what should happen, but yes not clear how nan should be sorted in these kinds of ops.

in that case my knee-jerk reaction is to disallow it, for fear of a swarm of users wanting nans first and another swarm wanting nans last. of course to make people happy, someone will inevitably suggest a keyword argument but for consistency that would have to go in most index sorting ops which sounds like a lot of pain for very little benefit.

i am however in favor of a more informative error message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants