Skip to content

Join on level between two UNIQUE MultiIndex objects with unique level broadcast #15417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
blalterman opened this issue Feb 15, 2017 · 2 comments
Labels
Duplicate Report Duplicate issue or pull request MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@blalterman
Copy link

Performing simple arithmetic operations between two DataFrames that have a MultiIndex on the columns is disabled because the join on level is deemed ambiguous. This is not the case when the labels in both DataFrames are unique and there is a unique mapping between the two and the MultiIndex levels are uniquely labeled.

First, let's just construct the DataFrame.

>>> data = {
('b', 'x', ''): {0: 0.5, 1: 0.59999999999999998, 2: 0.69999999999999996},
('b', 'y', ''): {0: -0.25, 1: -0.26000000000000001, 2: 0.27000000000000002},
('b', 'z', ''): {0: 0.29999999999999999, 1: 0.40000000000000002, 2: -0.69999999999999996},
('v', 'x', 'a'): {0: 125.0, 1: 250.0, 2: 375.0},
('v', 'x', 'p1'): {0: 100.0, 1: 200.0, 2: 300.0},
('v', 'x', 'p2'): {0: 150.0, 1: 300.0, 2: 450.0},
('v', 'y', 'a'): {0: 250.0, 1: 375.0, 2: 750.0},
('v', 'y', 'p1'): {0: 200.0, 1: 300.0, 2: 600.0},
('v', 'y', 'p2'): {0: 300.0, 1: 450.0, 2: 900.0},
('v', 'z', 'a'): {0: 500.0, 1: 750.0, 2: 1000.0},
('v', 'z', 'p1'): {0: 400.0, 1: 600.0, 2: 800.0},
('v', 'z', 'p2'): {0: 600.0, 1: 900.0, 2: 1200.0}}
>>> data = pd.DataFrame.from_dict(data, orient="columns").sort_index(axis=1)
>>> data.columns.names = ["Measurement", "Component", "Species"]
>>> data.T
                                    0       1        2
Measurement Component Species                         
b           x                    0.50    0.60     0.70
            y                   -0.25   -0.26     0.27
            z                    0.30    0.40    -0.70
v           x         a        125.00  250.00   375.00
                      p1       100.00  200.00   300.00
                      p2       150.00  300.00   450.00
            y         a        250.00  375.00   750.00
                      p1       200.00  300.00   600.00
                      p2       300.00  450.00   900.00
            z         a        500.00  750.00  1000.00
                      p1       400.00  600.00   800.00
                      p2       600.00  900.00  1200.00

In the case that there is a unique mapping between the two DataFrames such that there is a single level over which to broadcast, a TypeError is currently raised.

>>> mag = data.pow(2).sum(axis=1, level=["Measurement", "Species"]).pipe(np.sqrt)
>>> mag
Measurement         b            v                          
Species                          a           p1           p2
0            0.634429   572.821962   458.257569   687.386354
1            0.766551   875.000000   700.000000  1050.000000
2            1.026109  1305.038314  1044.030651  1566.045976

>>> data.divide(mag, axis=1, level="Component").T
Traceback (most recent call last)
...
TypeError: Join on level between two MultiIndex objects is ambiguous

This solution provides a simple tool for many users, removing the necessity of a common for loop and manual index selection. The expected output would be something like

>>> data.divide(mag, axis=1, level="Component").T
                                      0         1         2
Measurement Component Species                              
b           x                  0.788110  0.782727  0.682189
            y                 -0.394055 -0.339182  0.263130
            z                  0.472866  0.521818 -0.682189
v           x         a        0.218218  0.285714  0.287348
                      p1       0.218218  0.285714  0.287348
                      p2       0.218218  0.285714  0.287348
            y         a        0.436436  0.428571  0.574696
                      p1       0.436436  0.428571  0.574696
                      p2       0.436436  0.428571  0.574696
            z         a        0.872872  0.857143  0.766261
                      p1       0.872872  0.857143  0.766261
                      p2       0.872872  0.857143  0.766261

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 26.1.1
Cython: None
numpy: 1.11.1
scipy: 0.18.0
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Feb 15, 2017

@bla1089 you are asking for #6360.

Its not really that difficult and there is a partial impl there. A community PR would go a long way here.

@jreback jreback added Difficulty Advanced Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 15, 2017
@jreback jreback added this to the Next Major Release milestone Feb 15, 2017
@jreback jreback added Duplicate Report Duplicate issue or pull request MultiIndex labels Feb 15, 2017
@mroeschke
Copy link
Member

Looks like #6360 has been solved by #20356, so I'm assuming that solves this issue. Please let us know if it doesn't and happy to reopen this issue if it doesn't

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants