Skip to content

Bug in iloc aligned objects #37728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Nov 19, 2020
Merged

Bug in iloc aligned objects #37728

merged 19 commits into from
Nov 19, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 10, 2020

If iloc should be purely position based, this PR fixes the bug described in the issue. One test depends on the wrong behavior, so I had to adjust it.

@phofl phofl added the Indexing Related to indexing on series/frames, not to indexes themselves label Nov 10, 2020
if name == "iloc":
for index, loc in zip(range(len(ilocs)), ilocs):
val = value.iloc[:, index]
self._setitem_single_column(loc, val, plane_indexer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be solved with an appropriate call to _align_frame or _align_series? id really prefer to avoid passing name around. by the time we get to setitem_with_indexer we should always be working with positional indexers (i.e. iloc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm no not to my knowledge. The issue is, that we want to achieve oposing things between loc and iloc, but the objects do look the same. setitem_with_indexer could be solved with a default value, that is not a problem.

But I can not see a way to do this without passing some kind of flag to the align calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this a bit again. The indexer in setitem_with_indexer already is positional, but the value may not be properly aligned for loc, hence we call the align functions there. This is not necessary for the iloc path, but this information is lost there, if we do not pass it through somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds like we should be doing some appropriate alignment within Loc before passing it into iLoc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be an option, but would result in a bigger rewrite and I am not sure if that would introduce more complexity, because the align_series method is called in various places. Should I give it a try?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this PR is making it so that we never call align_frame/align_series if setitem_with_indexer was called from iloc, only if it was called from loc. setitem_with_indexer is only called from one place within loc. Can we just do that alignment right before that call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we could try, but it is not that simple. For example before calling setitem_with_indexer from loc, value may be a DataFrame with different dtypes. Currently _align_series is called for every column to get a ndarray with the correct dtype. If we call this before calling setitem_with_indexer we would have to call _align_frame or copy the logic, which decides if we align every colum at once or separately. _align_frame returns an object dtype array, which obviously would introduce a lot of dtype errors. That is what I meant with a bigger rewrite here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for explaining, i think i have a better handle on it now.

ive got an upcoming PR to make all 2D cases go through split_path. let's revisit this after that goes through and see if we can make it cleaner

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great. Just ping me when merged, then I will have a look.

@pep8speaks
Copy link

pep8speaks commented Nov 17, 2020

Hello @phofl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-18 20:26:46 UTC

# GH: 22046
df1 = DataFrame({"a2": [11, 12, 13], "b2": [14, 15, 16]})
df2 = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
df2.iloc[:, [1]] = df1.iloc[:, [0]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there other indexer/value type combinations that are relevant here? e.g. the value being set here is a DataFrame, would a Series trigger the same bug? what if instead of [1] the indexer was slice(1, 2)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df1 = DataFrame({"a2": [11, 12, 13], "b2": [14, 15, 16]})
df2 = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
df2.iloc[:, [1]] = df1.iloc[:, 0]

raises

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 132, in <module>
    df2.iloc[:, [1]] = df1.iloc[:, 0]
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 684, in __setitem__
    iloc._setitem_with_indexer(indexer, value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1637, in _setitem_with_indexer
    self._setitem_single_block(indexer, value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1851, in _setitem_single_block
    self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/internals/managers.py", line 562, in setitem
    return self.apply("setitem", indexer=indexer, value=value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/developer/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1005, in setitem
    values[indexer] = value
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (1,3)

Process finished with exit code 1

Did not check slice previously, this triggered the bug too. Will add test

@@ -1816,13 +1821,13 @@ def _setitem_single_block(self, indexer, value):

indexer = maybe_convert_ix(*indexer)

if isinstance(value, (ABCSeries, dict)):
if isinstance(value, (ABCSeries, dict)) and name != "iloc":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have a dict here dont we still want to wrap it in a Series?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, have no tests getting here with iloc. Will add one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMK when these tests are all ready and ill take another look. this one is almost ready

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are all in

@jreback jreback added this to the 1.2 milestone Nov 18, 2020
@jreback jreback added the Bug label Nov 18, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor comments can be a followup. @jbrockmendel merge when good.

@@ -1714,7 +1714,7 @@ def _setitem_with_indexer_2d_value(self, indexer, value):
# setting with a list, re-coerces
self._setitem_single_column(loc, value[:, i].tolist(), pi)

def _setitem_with_indexer_frame_value(self, indexer, value: "DataFrame"):
def _setitem_with_indexer_frame_value(self, indexer, value: "DataFrame", name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type name here (and above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1787,7 +1792,7 @@ def _setitem_single_column(self, loc: int, value, plane_indexer):
# reset the sliced object if unique
self.obj._iset_item(loc, ser)

def _setitem_single_block(self, indexer, value):
def _setitem_single_block(self, indexer, value, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1854,7 +1858,7 @@ def _setitem_with_indexer_missing(self, indexer, value):
if index.is_unique:
new_indexer = index.get_indexer([new_index[-1]])
if (new_indexer != -1).any():
return self._setitem_with_indexer(new_indexer, value)
return self._setitem_with_indexer(new_indexer, value, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check me on this, but i think it might be the case that we only get here with name = "loc"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you are right. Iloc raises when indexer is a dict. That is the only way we can get in there. Removed it

@jreback jreback merged commit 3d259e6 into pandas-dev:master Nov 19, 2020
@jreback
Copy link
Contributor

jreback commented Nov 19, 2020

thanks @phofl nice!

@phofl phofl deleted the 22046 branch November 19, 2020 15:43
expected = DataFrame({"a": [1, 2, 3], "b": [11, 12, 13], "c": [7, 8, 9]})
tm.assert_frame_equal(df2, expected)

def test_setitem_iloc_dictionary_value(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick for future: setitem_iloc -> iloc_setitem

@jbrockmendel
Copy link
Member

agreed thanks @phofl, this wasnt easy, glad to see it fixed

@@ -1815,14 +1821,13 @@ def _setitem_single_block(self, indexer, value):
return

indexer = maybe_convert_ix(*indexer)

if isinstance(value, (ABCSeries, dict)):
if isinstance(value, ABCSeries) and name != "iloc" or isinstance(value, dict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for follow-up, can you put parens where appropriate so i dont have to think about whether the "and" gets evaluated before the "or"

@@ -1854,7 +1859,7 @@ def _setitem_with_indexer_missing(self, indexer, value):
if index.is_unique:
new_indexer = index.get_indexer([new_index[-1]])
if (new_indexer != -1).any():
return self._setitem_with_indexer(new_indexer, value)
return self._setitem_with_indexer(new_indexer, value, "loc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment explaining that we only get here with name=="loc" and thats why its hardcoded (or assertion)

@phofl
Copy link
Member Author

phofl commented Nov 19, 2020

Thanks @jbrockmendel and @jreback
Will add a followup to adress comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
4 participants