Skip to content

BUG: DataFrame.drop raising TypeError when passing empty DatetimeIndex/list #28114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ Groupby/resample/rolling
Reshaping
^^^^^^^^^

-
- Bug in :meth:`DataFrame.drop` when passing empty :class:`DatetimeIndex` or ``list`` (:issue:`27994`)
-

Sparse
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3916,6 +3916,9 @@ def drop(

for axis, labels in axes.items():
if labels is not None:
# Check for empty Index, GH 27994
if is_list_like(labels) and not len(labels):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than do this, you can just handle the empty case in _drop_axis

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure @jreback? I tried a little, but I couldn't get it work while making sure to raise KeyError when passing str/List[str] to DataFrame.drop() that has DatetimeIndex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the suggestion is to include the not len(labels) check in NDFrame._drop_axis. And when labels is empty we just return self from _drop_axis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try this

continue
obj = obj._drop_axis(labels, axis, level=level, errors=errors)

if inplace:
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4720,6 +4720,10 @@ def get_indexer_non_unique(self, target):
return pself.get_indexer_non_unique(ptarget)

if self.is_all_dates:
# If `target` doesn't consist of dates, `target.asi8` will return
# None, which will raise TypeError. GH 27994
if not target.is_all_dates:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this likely is not necessary if you implemented as per my comment above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When passing str, for example, to dataframe with DatetimeIndex, type(self) == type(target) would return False, in which case I believe KeyError should be raised. But underlying arrays aren't empty.
If not here, I feel like Cython code needs changing though I'm not confident.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exception do you get without this? Index.is_all_dates can be surprisingly expensive.

raise KeyError("{} not found in axis".format(target.to_numpy()))
tgt_values = target.asi8
else:
tgt_values = target._ndarray_values
Expand Down
26 changes: 26 additions & 0 deletions pandas/tests/frame/test_axis_select_reindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,32 @@ def test_drop(self):
df.drop(labels=df[df.b > 0].index, inplace=True)
assert_frame_equal(df, expected)

def test_drop_empty_index(self):
# GH 27994
df = pd.DataFrame(
{"a": [0, 1]},
index=[pd.Timestamp("2019-01-01"), pd.Timestamp("2019-01-01")],
)
result_empty_list = df.drop([])
assert_frame_equal(result_empty_list, df)

result_empty_array = df.drop(np.array([]))
assert_frame_equal(result_empty_array, df)

result_empty_index = df.drop(df[:0])
assert_frame_equal(result_empty_index, df)

@pytest.mark.parametrize("labels", ["a", ["a"]])
def test_drop_empty_index_str(self, labels):
# Passing str to DataFrame.drop() for dataframe with DatetimeIndex
# should raise KeyError, GH 27994
df = pd.DataFrame(
{"a": [0, 1]},
index=[pd.Timestamp("2019-01-01"), pd.Timestamp("2019-01-01")],
)
with pytest.raises(KeyError, match="not found in axis"):
df.drop(labels)

def test_drop_multiindex_not_lexsorted(self):
# GH 11640

Expand Down