Skip to content

Unstacked data in unexpected order #6565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sniemi opened this issue Mar 6, 2014 · 1 comment
Closed

Unstacked data in unexpected order #6565

sniemi opened this issue Mar 6, 2014 · 1 comment
Labels
Categorical Categorical Data Type Duplicate Report Duplicate issue or pull request

Comments

@sniemi
Copy link

sniemi commented Mar 6, 2014

After unstacking data that were grouped using groupby the data appears in an unexpected order. The groupby data are in the ascending order, but after unstacking this is no longer true.

The following code demonstrates:
import numpy as np
import pandas as pd

print pd.version

random data

data = pd.DataFrame()
data['a'] = np.random.random(1000)_10. - 2
data['b'] = np.random.random(1000)_10. - 2
data['c'] = np.random.random(1000)*10. - 2

bins in a and b

ba = np.linspace(-5., 5, 8)
bb = np.linspace(-5., 5, 9)

groupby bins

groups = data.groupby([pd.cut(data['a'], ba), pd.cut(data['b'], bb)])

mask, needs at least 10 to get robust std estimate

msk = groups.count()['c'] > 10

std(c) per bin, these are ordered as expected

stdc = groups['c'].agg([np.std, len])[msk]
print stdc

now unstack

result = stdc['std'].unstack()

and the results are not ordered as expected

print result

and the result is:
b (-1.25, 0] (-2.5, -1.25] (0, 1.25] (1.25, 2.5]
a
(-0.714, 0.714] 2.878018 NaN 3.019754 2.908735
(-2.143, -0.714] 2.490683 NaN 2.809243 2.754959
(0.714, 2.143] 2.656060 2.940749 2.421975 2.677205
(2.143, 3.571] 3.128538 2.879978 3.038488 3.191041
(3.571, 5] 2.534340 2.943076 3.215626 2.687091

b (2.5, 3.75] (3.75, 5]
a
(-0.714, 0.714] 3.328777 3.305938
(-2.143, -0.714] 2.716552 3.090601
(0.714, 2.143] 3.093307 1.669262
(2.143, 3.571] 2.885055 3.295693
(3.571, 5] 3.333355 2.972684

[5 rows x 6 columns]

This seems to happen if the bins cross zero.

@jreback
Copy link
Contributor

jreback commented Mar 6, 2014

this is a duplicate of #5314

ATM the moment its not easy to sort categoricals; these are being treated as 'strings' right now
because categorical support at the frame level is not fully their yet.

stay tuned and/or pls contribute to this effort!

@jreback jreback closed this as completed Mar 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants