Skip to content

BUG: Inconsistency in make_axis_dummies (and/or Panel.to_frame()) with categorical index level #14017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pijucha opened this issue Aug 16, 2016 · 3 comments
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@pijucha
Copy link
Contributor

pijucha commented Aug 16, 2016

make_axis_dummies has some problems with an axis containing a CategoricalIndex with extra categories.

Code Sample, a copy-pastable example if possible

# category `z` is not used
cidx = pd.CategoricalIndex(list("xy"), categories=list("xyz"))
df = pd.DataFrame([[10, 11]], columns=cidx)
ldf = pd.Panel({'A': df, 'B': df}).to_frame()

from pandas.core.reshape import make_axis_dummies

make_axis_dummies(ldf)
Out[9]: 
minor          x    y
major minor          
0     x      1.0  0.0
      y      0.0  1.0

make_axis_dummies(ldf, transform=lambda x: x)
Out[10]: 
               x    y    z
major minor               
0     x      1.0  0.0  0.0
      y      0.0  1.0  0.0

Expected Output

I believe make_axis_dummies(ldf) and make_axis_dummies(ldf, transform=lambda x: x) should be equal.

output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 5d791cc7d955c0b074ad602eb03fa32bd3e17503
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)_i5-2520M_CPU_@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.18.1+368.g5d791cc
nose: 1.3.7
pip: 8.1.2
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
...

In fact, this may be an issue with Panel.to_frame() rather than make_axis_dummies:

ldf.index.levels[1]
Out[13]: CategoricalIndex(['x', 'y'], categories=['x', 'y', 'z'], ordered=False, name='minor', dtype='category')

I'd expect this level should contain all categories: CategoricalIndex(['x', 'y', 'z'], categories=['x', 'y', 'z'], ...) - even if 'z' is not used. If it had then the both outputs of make_axis_dummies would be as in Out[10].

(Somewhat related to #13854.)

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Categorical Categorical Data Type Difficulty Intermediate labels Aug 17, 2016
@jreback jreback added this to the Next Major Release milestone Aug 17, 2016
@jreback jreback changed the title Inconsistency in make_axis_dummies (and/or Panel.to_frame()) with categorical index level BUG: Inconsistency in make_axis_dummies (and/or Panel.to_frame()) with categorical index level Aug 17, 2016
@Aylr
Copy link

Aylr commented Oct 31, 2017

@jreback have you seen any progress or thought on this issue? I agree with @pijucha that the high level .get_dummies() API should create dummy columns based on the categories themselves, which may include levels not represented in the data, rather than the categories only represented in the data.

I'm happy to work on a PR if the maintainers agree.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

@Aylr pls be my guest. progress would be posted here (if there were any).

@mroeschke
Copy link
Member

make_axis_dummies is essentially a private function and Panel has been removed. Not sure if there's any further things to address in this issue. Feel free to reopen if so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants