-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Merge on single level of MultiIndex #3662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
is |
No, I don't think so...
Hoped for:
|
yes this would most def be useful IMHO |
@hayd I think that df1.join(df2, how='inner') actually should do what you want. Right now it doesn't do this and in fact OT: should the empty ndframe be a singleton? i haven't really checked, but these objects return a different id every time they are instantiated... |
Down in the guts of this there's a function called |
from some comments on the mailing list: https://groups.google.com/forum/#!topic/pydata/LBSFq6of8ao What would help move this along would be:
would love for someone to attempt 2) as then can have a speed/memory benchmark and see even if 3) is worthwhile (I don't know how much gain this would really have - so not sure how much effort it needs). |
I'm not quite sure I understand the difference between 2) and 3) - is this just the underlying implementation or do you mean something else by "an actual indexing merge)? As for 2), are you saying that all that needs to be done would be to hide the SO solution from the user and allow specifying the index? Testcases: In principle this should amount to re-using test cases for joins with regular columns, no? I had a very brief look into that but couldn't find much -- any pointers? Looking at the expected behavior above, I am not sure I'd quite agree (at least for outer joins, which seem to be the more natural use case to me) -- I would expect a MultiIndex on the resulting dataframe with the index being the union of all index constituents of the two original tables. |
Maybe give me a small example of what you want to do, e.g. input and output (e.g. the normal use case) Here's a number of issues relating to this (and why we need specific test cases):
|
Here comes an example, see these slides for some context (p. 19+).
(So I would argue this is not handled) I would expect:
I would be fine requiring the user to name the levels consistently. Else you can always fall back on the existing solution based on regular columns. |
@hmgaudecker give a try with #6356 I need some more test cases for example, joining on non-level 0 I also need to see if its necessary (as I think its pretty complicated) to join 2 multi-indexes. This joins a single to a multi on an inferred level. |
I suppose the |
Sweet! Exactly what I was expecting.
I don't quite get what you mean here.
I would think so. Testcase below. [Sorry, gotta run now.]
right and left would become meaningful testcases here as well, but I don't have time to set this up right now. |
@hmgaudecker so I merged in the single join, see #6363; along with some docs on how to do a multi-multi join. THat's fairly complicated to actually implement. and IMHO not worth the effort as it really doesn't change the memory usage/speed that much at all. That said I created #6360 to track that use case. Feel free to submit a PR if you'd like. |
Here's a partial implementation: jreback@0c38215 |
@jreback The motivating / original example doesn't work atm (am I missing something?):
|
that got 'moved' to #6360 (maybe put that at the top), as was using @hmgaudecker examples maybe add in that example there as well you can 'do' that with the doc example though (just not directly) ATM |
ah, sorry I thought that one was closed. Edited into top. How do you do it atm (indirectly)? |
see the docs link above |
I don't think we can do this now, but it potentially sounds useful...
http://stackoverflow.com/questions/16650945/merge-on-single-level-of-multiindex
https://groups.google.com/forum/#!topic/pydata/LBSFq6of8ao
Example:
join/merge on 2.
The text was updated successfully, but these errors were encountered: