BUG: df.loc[[x], :] fails if df has zero rows #41170

RagBlufThim · 2021-04-26T20:28:56Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
df = pd.DataFrame({"A": [12,23,34,45]}, index = [list("aabb"), [0,1,2,3]])
print(df)
print("- - -")
print(df.loc[df.A < 30, :].loc[["b"], :])  # empty as expected
print("- - -")
print(df.loc[df.A < 10, :].loc[["b"], :])  # raises ValueError

Complete Output of Code Sample

      A
a 0  12
  1  23
b 2  34
  3  45
- - -
Empty DataFrame
Columns: [A]
Index: []
- - -
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 889, in __getitem__
    return self._getitem_tuple(key)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1060, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 791, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 865, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1113, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1053, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexing.py", line 1254, in _get_listlike_indexer
    indexer, keyarr = ax._convert_listlike_indexer(key)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexes\multi.py", line 2559, in _convert_listlike_indexer
    _, indexer = self.reindex(keyarr, level=level)
  File "C:\test\venv124\lib\site-packages\pandas\core\indexes\multi.py", line 2470, in reindex
    target, indexer, _ = self._join_level(
  File "C:\test\venv124\lib\site-packages\pandas\core\indexes\base.py", line 3924, in _join_level
    ngroups = 1 + new_lev_codes.max()
  File "C:\test\venv124\lib\site-packages\numpy\core\_methods.py", line 39, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

Problem description

In both cases the df.loc[df.A...] returns a dataframe that doesn't contain any rows with an index value of "b".
Accordingly in the first case the result of .loc[["b"], :] is an empty dataframe, but in the second case a ValueError is raised. The difference between the cases is that in the first case df.loc[df.A...] returns a dataframe with some rows (though none with index value "b"), while in the second case df.loc[df.A...] returns a dataframe with zero rows.
I think that shouldn't make a difference.

In the original code .loc[df.A...] and .loc[["b"], :] are not directly combined in one expression, but the first one creates a selection of rows of the dataframe, this selection is processed further, and during this another expression uses the second .loc.

The traceback looks very similar to the one in #40235. Maybe both bugs have a common root cause.

Expected Output

df.loc[df.A < 10, :].loc[["b"], :] returns an empty dataframe like df.loc[df.A < 30, :].loc[["b"], :] does.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 2cb9652
python : 3.9.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
...

pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
...

The text was updated successfully, but these errors were encountered:

phofl · 2021-04-26T20:34:57Z

Hi, thanks for your report. I think both cases should raise a KeyError not return an empty DataFrame. But we are not as consistent as we would like with MultiIndexes.

phofl · 2021-05-04T21:26:25Z

@jbrockmendel

We are running through def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False) here which still has raise_missing. Should this have been removed with 1.0?

pandas/pandas/core/indexing.py

Line 1269 in 88ce933

def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False):

jreback · 2021-05-04T21:38:59Z

@jbrockmendel

We are running through def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False) here which still has raise_missing. Should this have been removed with 1.0?

pandas/pandas/core/indexing.py

Line 1269 in 88ce933

def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False):

hmm yeah i think we did remove that (or thought)

phofl · 2021-05-04T21:40:49Z

Thanks, will check what impact this would have, then we can decide if we want to remove now or with 2.0

jbrockmendel · 2021-05-04T23:23:21Z

We are running through def _get_listlike_indexer(self, key, axis: int, raise_missing: bool = False) here which still has raise_missing. Should this have been removed with 1.0?

AFAICT raise_missing only affects the message that goes with the KeyError, not whether an exception is raised at all. Am I reading that wrong?

phofl · 2021-05-06T21:29:49Z

Yes you are right. I've only read the docstring, which says

raise_missing: bool
    Whether to raise a KeyError if some labels are not found. Will be
    removed in the future, and then this method will always behave as
    if raise_missing=True.

so I figured this would affect the actual error thrown. Should we remove this keyword and always raise with the same message? If we want to wait with this, I would update the docstring to reflect the actual behavior

jbrockmendel · 2021-05-07T01:42:36Z

No real opinion here, will trust your judgement. I'll be happy with just about anything that simplifies the code.

RagBlufThim added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2021

phofl added Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2021

phofl mentioned this issue May 6, 2021

Bug in loc not raising KeyError with MultiIndex containing no longer used levels #41358

Merged

5 tasks

jreback added this to the 1.3 milestone May 7, 2021

phofl mentioned this issue May 7, 2021

CLN: Remove raise if missing only controlling the error message #41371

Merged

1 task

jreback closed this as completed in #41358 May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: df.loc[[x], :] fails if df has zero rows #41170

BUG: df.loc[[x], :] fails if df has zero rows #41170

RagBlufThim commented Apr 26, 2021

INSTALLED VERSIONS

phofl commented Apr 26, 2021

phofl commented May 4, 2021

jreback commented May 4, 2021

phofl commented May 4, 2021

jbrockmendel commented May 4, 2021

phofl commented May 6, 2021

jbrockmendel commented May 7, 2021

BUG: df.loc[[x], :] fails if df has zero rows #41170

BUG: df.loc[[x], :] fails if df has zero rows #41170

Comments

RagBlufThim commented Apr 26, 2021

Code Sample, a copy-pastable example

Complete Output of Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

phofl commented Apr 26, 2021

phofl commented May 4, 2021

jreback commented May 4, 2021

phofl commented May 4, 2021

jbrockmendel commented May 4, 2021

phofl commented May 6, 2021

jbrockmendel commented May 7, 2021

Output of `pd.show_versions()`