Skip to content

API/ENH: df.stack(level=[0,1], dropna=False) behavior #8851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seth-p opened this issue Nov 18, 2014 · 3 comments
Closed

API/ENH: df.stack(level=[0,1], dropna=False) behavior #8851

seth-p opened this issue Nov 18, 2014 · 3 comments
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@seth-p
Copy link
Contributor

seth-p commented Nov 18, 2014

df.stack(level=[0,1], dropna=False) appears to be equivalent to df.stack(level=0, dropna=False).stack(level=0, dropna=False). While this makes a certain amount of sense, it results in additional rows that in many cases are probably not expected/desired. In particular, df.stack(level=[0,1], dropna=False) is not equivalent to df.stack(level=[0,1], dropna=True) even when df contains no missing values, which seems counterintuitive.

I think that when stacking multiple levels, one may want them stacked in one go, rather than sequentially -- so that when there are no missing values, df.stack(level=[0,1], dropna=False) would produce the same result as df.stack(level=[0,1], dropna=True).

Here is an example of current behavior. Since df has no missing values, I would want [6] to produce the same results as [5], not [7].

Python 3.4.2 (v3.4.2:ab2c023a9432, Oct  6 2014, 22:16:31) [MSC v.1600 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 2.3.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(np.zeros((2,3)), columns=pd.MultiIndex.from_tuples([('A','x'), ('A','y'), ('B','z')], names=['Upper', 'Lower']))

In [4]: df
Out[4]:
Upper  A     B
Lower  x  y  z
0      0  0  0
1      0  0  0

In [5]: df.stack(level=[0,1], dropna=True)
Out[5]:
   Upper  Lower
0  A      x        0
          y        0
   B      z        0
1  A      x        0
          y        0
   B      z        0
dtype: float64

In [6]: df.stack(level=[0,1], dropna=False)
Out[6]:
   Upper  Lower
0  A      x         0
          y         0
          z       NaN
   B      x       NaN
          y       NaN
          z         0
1  A      x         0
          y         0
          z       NaN
   B      x       NaN
          y       NaN
          z         0
dtype: float64

In [7]: df.stack(level=0, dropna=False).stack(level=0, dropna=False)
Out[7]:
   Upper  Lower
0  A      x         0
          y         0
          z       NaN
   B      x       NaN
          y       NaN
          z         0
1  A      x         0
          y         0
          z       NaN
   B      x       NaN
          y       NaN
          z         0
dtype: float64

In [13]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Windows
OS-release: 8
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.1
nose: 1.3.4
Cython: 0.21.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.6.0
IPython: 2.3.1
sphinx: None
patsy: 0.3.0
dateutil: 2.2
pytz: 2014.9
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.6.3
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None
@seth-p
Copy link
Contributor Author

seth-p commented Nov 20, 2014

Perhaps specifying a list (or tuple) of levels should produce the current behavior, but specifying a set of levels should result in my proposed behavior? This seems somewhat logical, and would be backwards-compatible.

@seth-p seth-p changed the title API/BUG: df.stack(level=[0,1], dropna=False) behavior API/ENH: df.stack(level=[0,1], dropna=False) behavior Dec 5, 2014
@jreback jreback added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 6, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 6, 2014
@seth-p
Copy link
Contributor Author

seth-p commented Feb 3, 2015

In #9023 I added support for (a) level=Index.ALL_LEVELS, to easily specify that all levels should be stacked; and (b) a new boolean sequentially arguments (default=True for backwards compatibility) specifying whether multiple levels should be stacked sequentially or simultaneously.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@wesm wesm added the Won't Fix label Jul 6, 2018
@wesm
Copy link
Member

wesm commented Jul 6, 2018

Closing as won't fix. The community is welcome to reopen and revisit later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
3 participants