-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Adding support for indexing a MultiIndex with a DataFrame and/or bi-dimensional np.array #15438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can you show a copy-paste example that creates the actual frame you are talking about (a generic simple version is fine, just give generic names to levels and such). |
s = pd.Series(range(4),
index=pd.MultiIndex.from_product([[1,2], ['a', 'b']],
names=['first', 'second']))
df = pd.DataFrame([[1, 'a'], [2, 'b']],
columns=['first', 'second']) What I would like to do: s.loc[df] what I end up doing: to_mi = lambda df : df.set_index(list(df.columns)).index
s.loc[to_mi(df)] outputting indeed: first second
1 a 0
2 b 3
dtype: int64 However, only now I realize that column names do not matter above. That is: when you index a idf = pd.MultiIndex.from_tuples([[1, 'a'], [2, 'b']], names=['other', 'names'])
s.loc[idf] which outputs other names
1 a 0
2 b 3
dtype: int64 So either this current behavior is undesired, or the answer to @jorisvandenbossche's question is trivial: "we don't respect names for |
Oh, by the way, In [2]: s = pd.Series(range(4), index=pd.MultiIndex.from_product([['a', 'b'], ['c', 'd']]))
In [3]: s.loc[pd.DataFrame([['a', 'c'], ['b', 'c']]).values] = [100, 101]
Exception ignored in: 'pandas._libs.lib.is_bool_array'
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
In [4]: s
Out[4]:
a c 100
d 1
b c 101
d 3
dtype: int64 I don't exactly understand what's going on with the "Exception ignored", but setter works with 2-d |
xref #12550 |
(From #15425 )
Currently, (non-
Multi
)Index
es can be indexed withSeries
indexers. And this actually also applies toMultiIndex
es, of which you would be selecting from the first level. Hence, it seems a natural consequence forMultiIndex
es to be indexed withDataFrame
indexers.Moreover, once #15434 is fixed, we will have a bi-dimensional object (
MultiIndex
) which can be indexed withnp.array
s... but only one-dimensional ones! This is also strange.The feature per se is certainly useful. As a simple real world example, I am currently working with a
subjects
DataFrame
to which I must attribute two columns fromdesign
, anotherDataFrame
, depending on agroup
andtime
columns ofsubjects
, which are also levels of theMultiIndex
ofdesign
. I would like to just doNow, I know this could be solved by
.join
ing the twoDataFrames
... but this is conceptually more complicated (I even currently ignore whether I can join oneDataFrame
on columns and the other on index levels... but this is OT), to the point that I'm rather doing:@jorisvandenbossche suggests this feature would add complexity to indexing, "eg, should the column names align on the level names?". I'm personally fine with both answers:
to_mi
above (transforming aDataFrame
inMultiIndex
, and then using it to actually index)DataFrame
into tuples - I had actually already done this in Mi indexing #15425 before rolling back)"Yes" is probably the cleanest answer (possibly together with allowing indexing with bi-dimensional
np.array
s, to obtain the equivalent of the "No" answer). In any case, once we decide, I can take care of this.The text was updated successfully, but these errors were encountered: