-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Unexpected Behavior when Setting partial row in MultiIndex-columned-Dataframe with Series #15310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
well you can do this
Your question are confusing, but I think you want to do this, yes?
|
Yes, that's right. I want to use a dictionary-like object (such as Series, or anything with a 'key:value' structure) to safely set some values in a row of df, without worrying about the order of the values or how many of them there are. Sorry it's so confusing. I think the main issue, or a big clue, is that your series has to have a MultiIndex for it to work. >>> good_series_1.index # Ready to set for 'Bob'
MultiIndex(levels=[[u'Bob', u'Jon'], [u'hours', u'sales']],
labels=[[0, 0], [0, 1]])
>>> good_series_2.index
MultiIndex(levels=[[u'Bob'], [u'hours', u'sales']],
labels=[[0, 0], [1, 0]]) Otherwise, any way of setting values will result in NaN. So the bug is perhaps the inability to match Index and MultiIndex. Right now, the best working syntax I found is >>> s = pd.Series([101.34,2],index=pd.MultiIndex.from_product([['Bob'],['sales','hours']]))
>>> df[1] = s But weirdly if the index is a DateTimeIndex and I want to insert it at the end, sometimes the above syntax destroys all the other data, and I have to make sure I do the super-redundant df[last_time_in_index,'Bob'] = s #Even though s, above, already has "Bob" The setting functionality of multiindex dataframes is clearly buggy. |
I recall an issue about this, but sometimes hard to find things. Ok In theory this can work, but the code is a bit messy ATM. You are welcome to dive in and have a look. IIRC this has been this way for a long time. |
You also get this strange effect where assigning itself introduces NaNs:
But, I think the original example of @joseortiz3 should not work? (so not to be considered as a bug) |
@jorisvandenbossche It might not be considered a bug, but just to clarify I think it should work, because there is no ambiguous way that instruction could be interpreted. Note that in your example, the assignment only works if the thing on the Right Hand Side has a MultiIndex (even if it doesn't match the index of thing on the left) >>> df.loc[0,'Bob'] = df.loc[0,pd.IndexSlice['Bob',:]]+1
>>> df
Bob Jon
hours sales hours sales
0 1 1 0 0
1 0 0 0 0
2 0 0 0 0
>>> df.loc[0,pd.IndexSlice['Bob',:]].index # RHS index
MultiIndex(levels=[[u'Bob', u'Jon'], [u'hours', u'sales']],
labels=[[0, 0], [0, 1]])
>>> df.loc[0,'Bob'].index # LHS index
Index([u'hours', u'sales'], dtype='object') for assigning to Dataframes with a MultiIndex axis
Not Working:
|
Here I demonstrate that the expected behavior of creating a pandas Series object, and using it to safely set values in a Multi-Index-Columned Dataframe does not work with the most ideal syntax, and that currently the only way to do this is needlessly cumbersome. The motivation behind using a series to set values in the dataframe is that this method uses the 'key:value' structure of the Series to set values in the Dataframe, which allows one to ignore the order of the 'keys', or even if 'keys' fully spans the second level of the Dataframe's column's multiIndex.
Edit: In other words, I want to use a dictionary-like object (such as Series, or anything with a 'key:value' structure) to safely set some values in a row of df, without worrying about the order of the values or how many of them there are.
Code Sample
Problem description
Setting a few values in a sub-row of a Multi-Index-Columned Dataframe with a Series object should work even if the indices of the Dataframe sub-row and the Series object do not match, when there is no ambiguous way the assignment could be interpreted, like in the example.
In general, the only sorts of objects that currently can be assigned with any method to a sub-row of a MultiIndexed Dataframe are objects which either have no indices (like tuples or arrays), or have MultiIndex indices.
Expected Output
If there is another method that works better than this, please also tell me. But it seems to me this should just work.
pandas: 0.19.1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.7.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: