Skip to content

BUG: df.loc[[x], :] fails if df has zero rows #41170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
RagBlufThim opened this issue Apr 26, 2021 · 7 comments · Fixed by #41358
Closed
2 of 3 tasks

BUG: df.loc[[x], :] fails if df has zero rows #41170

RagBlufThim opened this issue Apr 26, 2021 · 7 comments · Fixed by #41358
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@RagBlufThim
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
df = pd.DataFrame({"A": [12,23,34,45]}, index = [list("aabb"), [0,1,2,3]])
print(df)
print("- - -")
print(df.loc[df.A < 30, :].loc[["b"], :])  # empty as expected
print("- - -")
print(df.loc[df.A < 10, :].loc[["b"], :])  # raises ValueError

Complete Output of Code Sample

      A
a 0  12
  1  23
b 2  34
  3  45
- - -
Empty DataFrame
Columns: [A]
Index: []
- - -
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 889, in __getitem__
    return self._getitem_tuple(key)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1060, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 791, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 865, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1113, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1053, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1254, in _get_listlike_indexer
    indexer, keyarr = ax._convert_listlike_indexer(key)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexes\multi.py", line 2559, in _convert_listlike_indexer
    _, indexer = self.reindex(keyarr, level=level)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexes\multi.py", line 2470, in reindex
    target, indexer, _ = self._join_level(
  File "C:\test\venv124\lib\site-packages\pandas\core\indexes\base.py", line 3924, in _join_level
    ngroups = 1 + new_lev_codes.max()
  File "C:\test\venv124\lib\site-packages\numpy\core\_methods.py", line 39, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

Problem description

In both cases the df.loc[df.A...] returns a dataframe that doesn't contain any rows with an index value of "b".
Accordingly in the first case the result of .loc[["b"], :] is an empty dataframe, but in the second case a ValueError is raised. The difference between the cases is that in the first case df.loc[df.A...] returns a dataframe with some rows (though none with index value "b"), while in the second case df.loc[df.A...] returns a dataframe with zero rows.
I think that shouldn't make a difference.

In the original code .loc[df.A...] and .loc[["b"], :] are not directly combined in one expression, but the first one creates a selection of rows of the dataframe, this selection is processed further, and during this another expression uses the second .loc.

The traceback looks very similar to the one in #40235. Maybe both bugs have a common root cause.

Expected Output

df.loc[df.A < 10, :].loc[["b"], :] returns an empty dataframe like df.loc[df.A < 30, :].loc[["b"], :] does.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.9.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
...

pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
...

@RagBlufThim RagBlufThim added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2021
@phofl
Copy link
Member

phofl commented Apr 26, 2021

Hi, thanks for your report. I think both cases should raise a KeyError not return an empty DataFrame. But we are not as consistent as we would like with MultiIndexes.

@phofl phofl added Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2021
@phofl
Copy link
Member

phofl commented May 4, 2021

@jbrockmendel

We are running through def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False) here which still has raise_missing. Should this have been removed with 1.0?

def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False):

@jreback
Copy link
Contributor

jreback commented May 4, 2021

@jbrockmendel

We are running through def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False) here which still has raise_missing. Should this have been removed with 1.0?

def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False):

hmm yeah i think we did remove that (or thought)

@phofl
Copy link
Member

phofl commented May 4, 2021

Thanks, will check what impact this would have, then we can decide if we want to remove now or with 2.0

@jbrockmendel
Copy link
Member

We are running through def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False) here which still has raise_missing. Should this have been removed with 1.0?

AFAICT raise_missing only affects the message that goes with the KeyError, not whether an exception is raised at all. Am I reading that wrong?

@phofl
Copy link
Member

phofl commented May 6, 2021

Yes you are right. I've only read the docstring, which says

raise_missing: bool
    Whether to raise a KeyError if some labels are not found. Will be
    removed in the future, and then this method will always behave as
    if raise_missing=True.

so I figured this would affect the actual error thrown. Should we remove this keyword and always raise with the same message? If we want to wait with this, I would update the docstring to reflect the actual behavior

@jbrockmendel
Copy link
Member

No real opinion here, will trust your judgement. I'll be happy with just about anything that simplifies the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants