unstack with DatetimeIndex with NaN gives "ValueError: cannot convert float NaN to integer" #7401

jorisvandenbossche · 2014-06-09T12:06:31Z

With following code:

df = pd.DataFrame({'A': list('aaaaabbbbb'),
                   'B':pd.date_range('2012-01-01', periods=5).tolist()*2,
                   'C':np.zeros(10)})
df.iloc[3,1] = np.NaN
df = df.set_index(['A', 'B'])

unstacking gives:

In [6]: df.unstack()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-9a91d46cdd8d> in <module>()
----> 1 df.unstack()
...
c:\users\vdbosscj\scipy\pandas-joris\pandas\tseries\index.pyc in _simple_new(cls
, values, name, freq, tz)
    461     def _simple_new(cls, values, name, freq=None, tz=None):
    462         if values.dtype != _NS_DTYPE:
--> 463             values = com._ensure_int64(values).view(_NS_DTYPE)
    464
    465         result = values.view(cls)

c:\users\vdbosscj\scipy\pandas-joris\pandas\algos.pyd in pandas.algos.ensure_int
64 (pandas\algos.c:47444)()

c:\users\vdbosscj\scipy\pandas-joris\pandas\algos.pyd in pandas.algos.ensure_int
64 (pandas\algos.c:47349)()

ValueError: cannot convert float NaN to integer

I don't know if this should work (unstacking with NaN in the index), but at least this is not a very clear error message to know what could be going wrong.

The text was updated successfully, but these errors were encountered:

jreback · 2014-06-09T12:35:30Z

a secondary issue is that the display of df.set_index(['A','B']) is wrong, that should be NaT (it actually IS a NaT if you look at:

In [18]: df.set_index(['A','B']).index.values
Out[18]: 
array([('a', Timestamp('2012-01-01 00:00:00')),
       ('a', Timestamp('2012-01-02 00:00:00')),
       ('a', Timestamp('2012-01-03 00:00:00')), ('a', NaT),
       ('a', Timestamp('2012-01-05 00:00:00')),
       ('b', Timestamp('2012-01-01 00:00:00')),
       ('b', Timestamp('2012-01-02 00:00:00')),
       ('b', Timestamp('2012-01-03 00:00:00')),
       ('b', Timestamp('2012-01-04 00:00:00')),
       ('b', Timestamp('2012-01-05 00:00:00'))], dtype=object)

AND I think this is the cause of the error itself (in unstack)

in that _make_index in core/reshape.py is not returning a correctly (it has a nan intermixed with the i8 values)

cpcloud · 2014-06-09T13:56:54Z

i'll take this one

cpcloud · 2014-06-09T14:19:37Z

hm @jreback is null indexing supposed to work at all? e.g.,:

This works:

In [103]: df = DataFrame({'x': range(3)}, index=[pd.NaT, Timestamp('now', offset='D'), Timestamp('20130101', offset='D')])

In [104]: df
Out[104]:
                     x
NaT                  0
2014-06-09 14:18:18  1
2013-01-01 00:00:00  2

In [105]: df.loc[pd.NaT]
Out[105]:
x    0
Name: NaT, dtype: int64

this raises:

In [108]: df = DataFrame({'x': range(4)}, index=[pd.NaT, Timestamp('now', offset='D'), Timestamp('20130101', offset='D'), pd.NaT])

In [110]: df
Out[110]:
                     x
NaT                  0
2014-06-09 14:18:34  1
2013-01-01 00:00:00  2
NaT                  3

In [109]: df.loc[pd.NaT]

with ValueError: cannot use label indexing with a null key

jreback · 2014-06-09T14:38:06Z

one a single null indexer is supported (for getitem)

cpcloud · 2014-06-09T14:39:14Z

ok thx just checking my sanity

cpcloud · 2014-06-10T14:26:32Z

@jreback @jorisvandenbossche can we define what should happen in the first case?

so this:

df = pd.DataFrame({'A': list('aaaaabbbbb'),
                   'B':pd.date_range('2012-01-01', periods=5).tolist()*2,
                   'C':np.zeros(10)})
df.iloc[3,1] = np.NaN
df = df.set_index(['A', 'B'])

df looks like:

		C
A	B
a	2012-01-01	0
	2012-01-02	0
	2012-01-03	0
	NaT	0
	2012-01-05	0
b	2012-01-01	0
	2012-01-02	0
	2012-01-03	0
	2012-01-04	0
	2012-01-05	0

df.unstack(0)

which currently raises should yield:

	C
A	a	b
B
2012-01-01	0	0
2012-01-02	0	0
2012-01-03	0	0
NaT	0	NaN
2012-01-04	NaN	0
2012-01-05	0	0

is that correct?

jreback · 2014-06-10T14:33:15Z

looks right.

(w/o the NaT) this works

In [34]: df.reset_index().pivot('B','A','C')
Out[34]: 
A           a  b
B               
2012-01-01  0  0
2012-01-02  0  0
2012-01-03  0  0
2012-01-04  0  0
2012-01-05  0  0

jorisvandenbossche · 2014-06-10T14:45:21Z

I think this is more related to issue #7403 than this one (as it is a general issue with NaN's, not perse with datetimes).

For the rest I think also a problem is with the ordering of the index. How is it defined that 2012-01-04 should be inserted after NaN. If the index is sorted in the unstack operation (but is it?), the NaN should come last?

cpcloud · 2014-06-10T15:23:25Z

@jorisvandenbossche was just trying to work out a rough idea of what should happen, but yes not clear how nan should be sorted in these kinds of ops.

in that case my knee-jerk reaction is to disallow it, for fear of a swarm of users wanting nans first and another swarm wanting nans last. of course to make people happy, someone will inevitably suggest a keyword argument but for consistency that would have to go in most index sorting ops which sounds like a lot of pain for very little benefit.

i am however in favor of a more informative error message

jreback added Bug labels Jun 9, 2014

jreback added this to the 0.14.1 milestone Jun 9, 2014

jorisvandenbossche mentioned this issue Jun 9, 2014

BUG: incorrect unstacking with NaNs in the index #7403

Closed

cpcloud self-assigned this Jun 9, 2014

cpcloud mentioned this issue Jun 9, 2014

multiindex repr of datetime nat is incorrect #7409

Closed

jreback modified the milestones: 0.15.0, 0.14.1 Jul 1, 2014

behzadnouri mentioned this issue Jan 18, 2015

TST: tests for GH4862, GH7401, GH7403, GH7405 #9292

Merged

jreback closed this as completed in #9292 Jan 26, 2015

wesm unassigned cpcloud Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unstack with DatetimeIndex with NaN gives "ValueError: cannot convert float NaN to integer" #7401

unstack with DatetimeIndex with NaN gives "ValueError: cannot convert float NaN to integer" #7401

jorisvandenbossche commented Jun 9, 2014

jreback commented Jun 9, 2014

cpcloud commented Jun 9, 2014

cpcloud commented Jun 9, 2014

jreback commented Jun 9, 2014

cpcloud commented Jun 9, 2014

cpcloud commented Jun 10, 2014

jreback commented Jun 10, 2014

jorisvandenbossche commented Jun 10, 2014

cpcloud commented Jun 10, 2014

unstack with DatetimeIndex with NaN gives "ValueError: cannot convert float NaN to integer" #7401

unstack with DatetimeIndex with NaN gives "ValueError: cannot convert float NaN to integer" #7401

Comments

jorisvandenbossche commented Jun 9, 2014

jreback commented Jun 9, 2014

cpcloud commented Jun 9, 2014

cpcloud commented Jun 9, 2014

jreback commented Jun 9, 2014

cpcloud commented Jun 9, 2014

cpcloud commented Jun 10, 2014

jreback commented Jun 10, 2014

jorisvandenbossche commented Jun 10, 2014

cpcloud commented Jun 10, 2014