Skip to content

DOC/API: clarify usage of where, mask broadcast arguments #15558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adbull opened this issue Mar 3, 2017 · 5 comments
Open

DOC/API: clarify usage of where, mask broadcast arguments #15558

adbull opened this issue Mar 3, 2017 · 5 comments
Labels
Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@adbull
Copy link
Contributor

adbull commented Mar 3, 2017

Code Sample, a copy-pastable example if possible

>>> x = pd.Series(0, pd.MultiIndex.from_product([[0], [1, 2]]))
>>> y = pd.Series(True, [0])
>>> x.where(y, level=0)

Traceback (most recent call last):
  File "bug.py", line 1, in <module>
    x.where(y, level=0)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 4813, in where
    raise_on_error)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 4557, in _where
    cond, _ = cond.align(self, join='right', broadcast_axis=1)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/series.py", line 2350, in align
    broadcast_axis=broadcast_axis)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 4419, in align
    fill_axis=fill_axis)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 4488, in _align_series
    return_indexers=True)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2646, in join
    return_indexers=return_indexers)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2738, in _join_multi
    raise ValueError("cannot join with no level specified and no "
ValueError: cannot join with no level specified and no overlapping names

>>> x = pd.DataFrame(0, pd.MultiIndex.from_product([[0], [1, 2]]), [3])
>>> y = pd.Series(True, [0])
>>> x.where(y, axis=0, level=0)

    3
0 1 NaN
  2 NaN

>>> y = pd.Series(True, [3])
>>> x.where(y, axis=1)

      3
0 1 NaN
  2 NaN
  
>>> y = pd.DataFrame(True, [0], [3])
>>> x.where(y, level=0)

Traceback (most recent call last):
  File "bug.py", line 3, in <module>
    x.where(y, level=0)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 4813, in where
    raise_on_error)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 4557, in _where
    cond, _ = cond.align(self, join='right', broadcast_axis=1)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 2816, in align
    broadcast_axis=broadcast_axis)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 4414, in align
    fill_axis=fill_axis)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 4436, in _align_frame
    other.index, how=join, level=level, return_indexers=True)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2646, in join
    return_indexers=return_indexers)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2738, in _join_multi
    raise ValueError("cannot join with no level specified and no "
ValueError: cannot join with no level specified and no overlapping names

Problem description

where() and mask() ignore the broadcasting arguments axis and level

Expected output

0  1    0
   2    0
dtype: int64

    3
0 1 0
  2 0

    3
0 1 0
  2 0

    3
0 1 0
  2 0

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.8-100.fc24.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: C
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.8.0
xarray: 0.9.1
IPython: 4.2.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

@adbull adbull changed the title BUG: DataFrame.where ignores level argument when passed frame BUG: DataFrame.where ignores axis, level arguments Mar 3, 2017
@adbull adbull changed the title BUG: DataFrame.where ignores axis, level arguments BUG: where ignores axis, level arguments Mar 3, 2017
@adbull adbull changed the title BUG: where ignores axis, level arguments BUG: where, mask ignore axis, level arguments Mar 3, 2017
@jorisvandenbossche
Copy link
Member

Well, yes, axis and level are used to align other, not to align cond. But I agree this is completely not clear from the docs, and IMO also confusing.
We actually just had a discussion about this a few days ago: #15414 (comment)

@adbull
Copy link
Contributor Author

adbull commented Mar 3, 2017

Ah, I see. That does seem inconsistent with other uses of these keywords, e.g. for numeric data I would expect x.where(y, axis=axis, level=level) to agree with x.mul(y.replace({False: np.nan, True: 1}), axis=axis, level=level).

I'd definitely agree this should be documented somewhere.

@jreback jreback added Difficulty Novice Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 3, 2017
@jreback
Copy link
Contributor

jreback commented Mar 3, 2017

@adbull can you do a PR to update the docs?

@jreback jreback added this to the Next Major Release milestone Mar 3, 2017
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 3, 2017

Yep, as I said I also find it confusing.
The docs we can certainly already be clarified (PR welcome!), the broader question is maybe as well if we want to do something about this.

Another inconsistency is that the cond and other are aligned differently to the calling objects (cond.align(self, ..) vs self.align(other, ..))

@jreback
Copy link
Contributor

jreback commented Mar 3, 2017

Yes, we should apply these to other alignment. cond is a boolean array, virtually always the dim same shape as the calling frame, if it is not (or a scalar), then I would say simply raise.

alternatively, we could add additional keyword broadcast_axis to apply to the cond (which passes thru to .align), but this is quickly getting complicated.

@adbull adbull changed the title BUG: where, mask ignore axis, level arguments DOC/API: clarify usage of where, mask broadcast arguments Mar 3, 2017
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

5 participants