Skip to content

BUG - sparse dataframes lose multi-index column names #11600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Ezekiel-Kruglick opened this issue Nov 14, 2015 · 8 comments
Closed

BUG - sparse dataframes lose multi-index column names #11600

Ezekiel-Kruglick opened this issue Nov 14, 2015 · 8 comments
Labels
Milestone

Comments

@Ezekiel-Kruglick
Copy link

From SO: http://stackoverflow.com/questions/33702198/do-python-pandas-sparse-dataframes-lose-multi-index-column-names-or-am-i-doing-i

Bug is simple in concept, multi-index with column level names loses those names when going into sparse dataframes.

Minimal example - first create a multi-index dataframe:

In[2]: import pandas as pd
In[3]: miindex = pd.MultiIndex.from_product([["x","y"], ["10","20"]],names=['row-foo', 'row-bar'])
micol = pd.MultiIndex.from_product([['a','b','c'], ["1","2"]],names=['col-foo', 'col-bar'])
df = pd.DataFrame(index=miindex, columns=micol).sortlevel().sortlevel(axis=1)
df = df.fillna(value=3.14)
df
Out[3]: 
col-foo             a           b           c      
col-bar             1     2     1     2     1     2
row-foo row-bar                                    
x       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14
y       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14

This gives us a nice test multi-index with column and row level names. Now if I make a sparse matrix out of that and show it, the column level names are gone.

In[4]: ds = df.to_sparse()
ds
Out[4]: 
                    a           b           c      
                    1     2     1     2     1     2
row-foo row-bar                                    
x       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14
y       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14

And if I convert the sparse version back to dense those level names are still gone.

In[6]: ds.to_dense()
Out[6]: 
                    a           b           c      
                    1     2     1     2     1     2
row-foo row-bar                                    
x       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14
y       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14

I AM aware that displaying the sparse version calls to_dense() but the loss appears to be happening at the conversion to sparse. I'm exploring moving to sparse to reduce memory usage for a code base and my attempts to access the levels within the sparse dataframe generate "KeyError: 'Level not found'"

@jreback jreback added Bug Sparse Sparse Data Type MultiIndex labels Nov 14, 2015
@jreback jreback added this to the Next Major Release milestone Nov 14, 2015
@jreback
Copy link
Contributor

jreback commented Nov 14, 2015

sparse has not gotten a lot of love, so pull-requests are welcome.

@Ezekiel-Kruglick
Copy link
Author

I've been looking for a way to get involved with pandas and contribute, maybe this will be my start. Although once I get setup I see some other stuff on the list that looks less challenging :)

@jreback
Copy link
Contributor

jreback commented Nov 14, 2015

that would be great!

there are a bunch of sparse issues which are on the easier side as well (though I don't think this is terribly involved)

lmk when u need help

@jreback
Copy link
Contributor

jreback commented Nov 14, 2015

@Ezekiel-Kruglick
Copy link
Author

Yup, already have it forked and cloned to my desktop and am exploring how the code connects, that page was quite useful!

@Ezekiel-Kruglick
Copy link
Author

Okay, I think I have it fixed. The test code is longer than the fix code by far.

Passes the most obvious testing. I'm going to run the whole suit of tests then will get you a pull request to review.

@Ezekiel-Kruglick
Copy link
Author

OK, pull request submitted. Ran all nosetests and got OK (SKIP=116) on 9172 tests. If this gets integrated I'll post a short update as an answer on SO.

@jreback
Copy link
Contributor

jreback commented Nov 19, 2015

closed by #11606

@jreback jreback closed this as completed Nov 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants