You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
df.stack(level=[0,1], dropna=False) appears to be equivalent to df.stack(level=0, dropna=False).stack(level=0, dropna=False). While this makes a certain amount of sense, it results in additional rows that in many cases are probably not expected/desired. In particular, df.stack(level=[0,1], dropna=False) is not equivalent to df.stack(level=[0,1], dropna=True) even when df contains no missing values, which seems counterintuitive.
I think that when stacking multiple levels, one may want them stacked in one go, rather than sequentially -- so that when there are no missing values, df.stack(level=[0,1], dropna=False) would produce the same result as df.stack(level=[0,1], dropna=True).
Here is an example of current behavior. Since df has no missing values, I would want [6] to produce the same results as [5], not [7].
Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 6 2014, 22:16:31) [MSC v.1600 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.
IPython 2.3.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(np.zeros((2,3)), columns=pd.MultiIndex.from_tuples([('A','x'), ('A','y'), ('B','z')], names=['Upper', 'Lower']))
In [4]: df
Out[4]:
Upper A B
Lower x y z
0 0 0 0
1 0 0 0
In [5]: df.stack(level=[0,1], dropna=True)
Out[5]:
Upper Lower
0 A x 0
y 0
B z 0
1 A x 0
y 0
B z 0
dtype: float64
In [6]: df.stack(level=[0,1], dropna=False)
Out[6]:
Upper Lower
0 A x 0
y 0
z NaN
B x NaN
y NaN
z 0
1 A x 0
y 0
z NaN
B x NaN
y NaN
z 0
dtype: float64
In [7]: df.stack(level=0, dropna=False).stack(level=0, dropna=False)
Out[7]:
Upper Lower
0 A x 0
y 0
z NaN
B x NaN
y NaN
z 0
1 A x 0
y 0
z NaN
B x NaN
y NaN
z 0
dtype: float64
In [13]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Windows
OS-release: 8
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.15.1
nose: 1.3.4
Cython: 0.21.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.6.0
IPython: 2.3.1
sphinx: None
patsy: 0.3.0
dateutil: 2.2
pytz: 2014.9
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.6.3
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered:
Perhaps specifying a list (or tuple) of levels should produce the current behavior, but specifying a set of levels should result in my proposed behavior? This seems somewhat logical, and would be backwards-compatible.
seth-p
changed the title
API/BUG: df.stack(level=[0,1], dropna=False) behavior
API/ENH: df.stack(level=[0,1], dropna=False) behavior
Dec 5, 2014
In #9023 I added support for (a) level=Index.ALL_LEVELS, to easily specify that all levels should be stacked; and (b) a new boolean sequentially arguments (default=True for backwards compatibility) specifying whether multiple levels should be stacked sequentially or simultaneously.
df.stack(level=[0,1], dropna=False)
appears to be equivalent todf.stack(level=0, dropna=False).stack(level=0, dropna=False)
. While this makes a certain amount of sense, it results in additional rows that in many cases are probably not expected/desired. In particular,df.stack(level=[0,1], dropna=False)
is not equivalent todf.stack(level=[0,1], dropna=True)
even whendf
contains no missing values, which seems counterintuitive.I think that when stacking multiple levels, one may want them stacked in one go, rather than sequentially -- so that when there are no missing values,
df.stack(level=[0,1], dropna=False)
would produce the same result asdf.stack(level=[0,1], dropna=True)
.Here is an example of current behavior. Since
df
has no missing values, I would want[6]
to produce the same results as[5]
, not[7]
.The text was updated successfully, but these errors were encountered: