Skip to content

Stacking MultiIndex DataFrame columns with Timestamps levels fails #8039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lydakis opened this issue Aug 15, 2014 · 14 comments · Fixed by #8043
Closed

Stacking MultiIndex DataFrame columns with Timestamps levels fails #8039

lydakis opened this issue Aug 15, 2014 · 14 comments · Fixed by #8043

Comments

@lydakis
Copy link

lydakis commented Aug 15, 2014

You can see the bug in the following code:

import pandas as pd
import datetime as dt

key = pd.MultiIndex.from_tuples([(
                            dt.datetime(2014,8,1,0,0,0),
                            'SomeColumnName',
                            'AnotherOne')])

data = {
    '1' : 34204,
    '2' : 43580,
    '3' : 84329,
    '5' : 23485
}


ts = pd.Series(data=data)
df = pd.DataFrame(data=ts, columns=key)

stacked = df.stack()

print stacked

We would expect the data to be unchanged, however the returned DataFrame is empty.

The Pandas version used was 0.11.0

@TomAugspurger TomAugspurger added this to the 0.15.0 milestone Aug 15, 2014
@jreback
Copy link
Contributor

jreback commented Aug 15, 2014

when you pass columns it reindexes by the passed columns when you are passing a Series, since the data has the name of '0' (the column assigned) it disappears. This is undocumented (and doesn't work at all > 0.13.0)

Use this to create a frame

In [23]: result = ts.to_frame()

# if you are < 0.13.0
In [31]: result = DataFrame(ts)

In [33]: result
Out[33]: 
       0
1  34204
2  43580
3  84329
5  23485

And simply set the columns.

In [26]: key
Out[26]: 
MultiIndex(levels=[[2014-08-01 00:00:00], [u'SomeColumnName'], [u'AnotherOne']],
           labels=[[0], [0], [0]])

In [28]: result.columns = key

In [29]: result
Out[29]: 
       2014-08-01
   SomeColumnName
       AnotherOne
1           34204
2           43580
3           84329
5           23485

In [30]: result.unstack()
Out[30]: 
2014-08-01  SomeColumnName  AnotherOne  1    34204
                                        2    43580
                                        3    84329
                                        5    23485
dtype: int64

@jreback
Copy link
Contributor

jreback commented Aug 15, 2014

@TomAugspurger not a bug, but a usage issue.

@TomAugspurger
Copy link
Contributor

yeah just read your comment.

@TomAugspurger
Copy link
Contributor

@ldkge's problem was with stack though. Not sure why

In [67]: result.stack(0)
Out[67]: 
              SomeColumnName
                  AnotherOne
1 2014-08-01           34204
2 2014-08-01           43580
3 2014-08-01           84329
5 2014-08-01           23485

In [68]: result.stack(1)
Out[68]: 
Empty DataFrame
Columns: [(2014-08-01 00:00:00, AnotherOne)]
Index: []

would be different.

@jreback
Copy link
Contributor

jreback commented Aug 15, 2014

This works in master (recently added feature).

In [54]: result.stack([0,1])
Out[54]: 
                             AnotherOne
1 2014-08-01 SomeColumnName       34204
2 2014-08-01 SomeColumnName       43580
3 2014-08-01 SomeColumnName       84329
5 2014-08-01 SomeColumnName       23485

I am not what stack(1) would/should actually do

What would you expect?

@TomAugspurger
Copy link
Contributor

I thought it should shift the 1 level of the column's MultiIndex down to the row labels. so expected would be

>>>df.stack(1)
                  2014-08-01
                  AnotherOne
1 SomeColumnName           34204
2 SomeColumnName           43580
3 SomeColumnName           84329
5 SomeColumnName           23485

@jreback
Copy link
Contributor

jreback commented Aug 15, 2014

cc @onesandzeroes what do you think?

@jreback jreback reopened this Aug 15, 2014
@jreback
Copy link
Contributor

jreback commented Aug 15, 2014

ok I think agree could be a bug

@TomAugspurger
Copy link
Contributor

I'll submit a PR once I figure out what's wrong.

@TomAugspurger
Copy link
Contributor

@jreback it has to do with how the MultiIndex is storing the timestamp.

Any idea offhand why with

In [6]: idx = pd.MultiIndex.from_tuples([(pd.datetime(2014, 1, 1), 'A', 'B')])

these two aren't equal?

In [10]: idx.values[0][0]
Out[10]: Timestamp('2014-01-01 00:00:00')

In [8]: idx.levels[0].values
Out[8]: array(['2013-12-31T18:00:00.000000000-0600'], dtype='datetime64[ns]')

edit:

or even clearer, why isn't

In [34]: idx.levels[0].values[0]
Out[34]: numpy.datetime64('2013-12-31T18:00:00.000000000-0600')

equal to

In [33]: idx.levels[0][0]
Out[33]: Timestamp('2014-01-01 00:00:00')

I'm going to go digging in index.py

@jreback
Copy link
Contributor

jreback commented Aug 15, 2014

.values on an index returns the underlying data (its a DatetimeIndex).

where is this type of comparison?

@TomAugspurger
Copy link
Contributor

(I think) they're compared when constructing the new dataframe in core/reshape.py(661)_stack_multi_columns

ipdb> new_data
{(numpy.datetime64('2013-12-31T18:00:00.000000000-0600'), 'B'): array([1, 2, 3, 4])}
ipdb> new_columns
MultiIndex(levels=[[2014-01-01 00:00:00], ['B']],
           labels=[[0], [0]])

ipdb> result = DataFrame(new_data, index=new_index, columns=new_columns)
ipdb> result
    2014-01-01
             B
0 C        NaN
1 C        NaN
2 C        NaN
3 C        NaN

I'll see why new_data is a dict instead of an array.

@jreback
Copy link
Contributor

jreback commented Aug 15, 2014

I can't see exactly where you are pointing too...

levels should be using .equals for comparisons.....an Index method, so maybe need to wrap them

@onesandzeroes
Copy link
Contributor

@jreback I agree with TomAugspurger about what the expected behaviour of df.stack(1) should be, so if that's not happening at the moment I think it's a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants