Skip to content

Binary operations don't broadcast across multiindex #5645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
roblevy opened this issue Dec 5, 2013 · 17 comments
Closed

Binary operations don't broadcast across multiindex #5645

roblevy opened this issue Dec 5, 2013 · 17 comments
Labels
Enhancement MultiIndex Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@roblevy
Copy link
Contributor

roblevy commented Dec 5, 2013

related #6360

Based on this SO question

Consider the following two Series:

x = pd.DataFrame({'year':[1,1,1,1,2,2,2,2],
                  'country':['A','A','B','B','A','A','B','B'],
                  'prod':[1,2,1,2,1,2,1,2],
                  'val':[10,20,15,25,20,30,25,35]})
x = x.set_index(['year','country','prod']).squeeze()

y = pd.DataFrame({'year':[1,1,2,2],'prod':[1,2,1,2],
                  'mul':[10,0.1,20,0.2]})
y = y.set_index(['year','prod']).squeeze()

which look like:

    year  country  prod
    1     A        1       10
                   2       20
          B        1       15
                   2       25
    2     A        1       20
                   2       30
          B        1       25
                   2       35

year  prod
1     1       10.0
      2        0.1
2     1       20.0
      2        0.2

I find it to be an extremely common task, to perform binary operations by distributing the values of y over a particular level of x. For example, I'd like to multiply all values of product 1 in year 1 by 10.0, regardless of country.

The required result is therefore as follows:

    year  country  prod
    1     A        1       100.0
                   2       2.0
          B        1       150.0
                   2       2.5
    2     A        1       400.0
                   2       6.0
          B        1       500.0
                   2       7.0

The binary operation .mul() doesn't work as expected:

>>> x.mul(y, level=['year','prod'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 334, in f
    return self._binop(other, op, level=level, fill_value=fill_value)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 2075, in _binop
    this, other = self.align(other, level=level, join='outer')
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 2570, in align
    return_indexers=True)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 954, in join
    return_indexers=return_indexers)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/index.py", line 1058, in _join_level
    raise Exception('Join on level between two MultiIndex objects '
Exception: Join on level between two MultiIndex objects is ambiguous

To create the required result, the user currently has to do this:

x = x.reset_index('country').sort_index()
x.val = x.val * y
x = x.reset_index().set_index(['year',
                               'country',
                               'prod']).sortlevel()
@jreback
Copy link
Contributor

jreback commented Dec 5, 2013

@roblevy can you put your solution in the top section as well (then its easy to create expected result)

@roblevy
Copy link
Contributor Author

roblevy commented Dec 5, 2013

Done!

@jreback
Copy link
Contributor

jreback commented Dec 5, 2013

gr8! love for you to work on this! otherwise prob won't be tackled for a while. Its not too tricky and would get you a lot of exposure to the code.

@roblevy
Copy link
Contributor Author

roblevy commented Dec 5, 2013

:D Great

On 5 December 2013 23:18, jreback [email protected] wrote:

gr8! love for you to work on this! otherwise prob won't be tackled for a
while. Its not too tricky and would get you a lot of exposure to the code.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5645#issuecomment-29949006
.

@ghost
Copy link

ghost commented Dec 6, 2013

I think the example result has editing errors in it, where did the 400 come from? doesn't match your
description of the operation.

A very powerful operation to have added. Is there a sugary way to do it you had in mind?
I like the idea anyway.

@roblevy
Copy link
Contributor Author

roblevy commented Dec 9, 2013

Hi @y-p. The example result is correct.

The 400 is x[year 2, country A, product 1] * y[year 2, product 1] = 20 * 20 = 400

Notice how the values of x are multiplied by different numbers from y depending on year and product.

As far as I'm concerned, this should "just work" when the user does x.mul(y)

@ghost
Copy link

ghost commented Dec 9, 2013

Thanks, I see now.

@0o-de-lally
Copy link

@roblevy @jreback This feature is an important component to a project i'm working on. Any chance we can get this implemented on the .14 release as well? I see that a related bug was fixed for the same release. Seems like they belong together. How can I help get this released sooner?

@jreback
Copy link
Contributor

jreback commented Apr 21, 2014

@keyscores well, this is dependent on #6360 so that needs to be fixed first
that said, you can simply use the soln above. Its just as efficient, if not syntax friendly,
but you can wrap in a function

of course a PR would be welcom

@0o-de-lally
Copy link

@jreback I'll try the solution above. I can donate a small bounty as an incentive to move #6360 up to v0.14 , would that help? :)

@jreback
Copy link
Contributor

jreback commented Apr 21, 2014

@keyscores unfortunately no....:) that IS being worked on though

@0o-de-lally
Copy link

Ok it was worth a try. At least you know you have a cheerleader for that fix. Would love to see it in .14 :)

@roblevy
Copy link
Contributor Author

roblevy commented Jul 6, 2015

This may or may not be relevant, but I've found a comparatively neat workaround for this problem: unstack the index level(s) which doesn't (don't) appear in the other Series.

In [81]: x.unstack('country').mul(y, axis=0).stack()
Out[81]: 
year  prod  country
1     1     A          100.0
            B          150.0
      2     A            2.0
            B            2.5
2     1     A          400.0
            B          500.0
      2     A            6.0
            B            7.0
dtype: float64

This is not the best possible solution since it changes the order of the index of x which is unexpected, but it should be pretty efficient.

@jreback
Copy link
Contributor

jreback commented Jul 6, 2015

yes this would be quite efficient

In [31]: x.unstack('country').mul(y, axis=0).stack().reorder_levels(x.index.names)
Out[31]: 
year  country  prod
1     A        1       100.0
      B        1       150.0
      A        2         2.0
      B        2         2.5
2     A        1       400.0
      B        1       500.0
      A        2         6.0
      B        2         7.0
dtype: float64

@roblevy
Copy link
Contributor Author

roblevy commented Jul 13, 2015

Looks to me like you might (sometimes at least) need a .sortlevel() on the end of that command.

In [41]: z = pd.DataFrame(dict(country=['A', 'A', 'B', 'B'], prod=[1,2,1,2], val=[1,2,3,4]))
In [42]: z = z.set_index(['country', 'prod']).squeeze()
In [43]: x.unstack('year').mul(z, axis=0).stack().reorder_levels(x.index.names)
Out[43]: 
year  country  prod
1     A        1        10
2     A        1        20
1     A        2        40
2     A        2        60
1     B        1        45
2     B        1        75
1     B        2       100
2     B        2       140
dtype: int64

@znicholls
Copy link
Contributor

This appears to have been fixed somewhere, in the latest master you can do

import pandas as pd                                                                                                                                     

x = pd.DataFrame({'year':[1,1,1,1,2,2,2,2],
                  'country':['A','A','B','B','A','A','B','B'],
                  'prod':[1,2,1,2,1,2,1,2],
                  'val':[10,20,15,25,20,30,25,35]})
x = x.set_index(['year','country','prod']).squeeze()

y = pd.DataFrame({'year':[1,1,2,2],'prod':[1,2,1,2],
                  'mul':[10,0.1,20,0.2]})
y = y.set_index(['year','prod']).squeeze()

x.mul(y, axis=0)

# result below
year  prod  country
1     1     A          100.0
            B          150.0
      2     A            2.0
            B            2.5
2     1     A          400.0
            B          500.0
      2     A            6.0
            B            7.0
dtype: float64

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@mroeschke
Copy link
Member

Closing as this seems addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement MultiIndex Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

5 participants