Skip to content

DataFrame.drop with empty DatetimeIndex behaves differently than passing empty Index #27994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
katsuya-horiuchi opened this issue Aug 18, 2019 · 10 comments
Labels
DataFrame DataFrame data structure good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@katsuya-horiuchi
Copy link

katsuya-horiuchi commented Aug 18, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({
    {'value': [1, 2]},
    index=[pd.Timestamp('2019-01-01'), pd.Timestamp('2019-01-01')]
})
df = df.drop(df[df['value'] > 10].index)

Problem description

When calling drop for dataframe with DatetimeIndex, the function may raise an exception. See the following traceback:

Traceback (most recent call last):
  File "main.py", line 7, in <module>
    df = df.drop(df[df['value'] > 10].index)
  File "pandas/pandas/core/frame.py", line 4042, in drop
    errors=errors,
  File "pandas/pandas/core/generic.py", line 3891, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "pandas/pandas/core/generic.py", line 3940, in _drop_axis
    labels_missing = (axis.get_indexer_for(labels) == -1).any()
  File "pandas/pandas/core/indexes/base.py", line 4769, in get_indexer_for
    indexer, _ = self.get_indexer_non_unique(target, **kwargs)
  File "pandas/pandas/core/indexes/base.py", line 4752, in get_indexer_non_unique
    indexer, missing = self._engine.get_indexer_non_unique(tgt_values)
  File "pandas/_libs/index.pyx", line 291, in pandas._libs.index.IndexEngine.get_indexer_non_unique
    stargets = set(targets)
TypeError: 'NoneType' object is not iterable

This can be avoided by not using DataFrame.drop like this:

 df = df[df['value'] <= 10]

But nonetheless I'm reporting the issue as I expect drop to not raise the exception.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 9f93d57
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.0-5-amd64
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.0.dev0+1184.g9f93d5702
numpy : 1.16.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.1.0
Cython : 0.29.13
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@mroeschke
Copy link
Member

Your example is equivalent to df.drop([]). Our documentation states that we should be raising a KeyError instead in this case since [] in not found in the axis : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

@mroeschke mroeschke added the Error Reporting Incorrect or improved errors from pandas label Aug 18, 2019
@katsuya-horiuchi
Copy link
Author

Thank you @mroeschke

My initial thought was the issue is caused by duplicated values in index (i.e. pd.Timestamp('2019-01-01')).
However, the following will work despite duplication in the index:

df = pd.DataFrame(
    {'value': [1, 2]},
    index=[0, 0]
)
df = df.drop(df[df['value'] > 10].index)

So I'm wondering if this difference was intentional.

@mroeschke mroeschke changed the title Cannot use DataFrame.drop for DatetimeIndex DataFrame.drop with empty DatetimeIndex behaves differently than passing empty Index Aug 19, 2019
@mroeschke
Copy link
Member

Ah I see. Thanks for the counter example. I agree then I would expected passing an empty DatetimeIndex to operate the same as passing an empty Index.

@mroeschke mroeschke added Bug DataFrame DataFrame data structure and removed Error Reporting Incorrect or improved errors from pandas labels Aug 19, 2019
@katsuya-horiuchi
Copy link
Author

Interestingly, I got TypeError while trying df.drop([]). The following is the traceback.

Traceback (most recent call last):
  File "main.py", line 10, in <module>
    print(df.drop([]))
  File "pandas/pandas/core/frame.py", line 4042, in drop
    errors=errors,
  File "pandas/pandas/core/generic.py", line 3893, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "pandas/pandas/core/generic.py", line 3942, in _drop_axis
    labels_missing = (axis.get_indexer_for(labels) == -1).any()
  File "pandas/pandas/core/indexes/base.py", line 4769, in get_indexer_for
    indexer, _ = self.get_indexer_non_unique(target, **kwargs)
  File "pandas/pandas/core/indexes/base.py", line 4752, in get_indexer_non_unique
    indexer, missing = self._engine.get_indexer_non_unique(tgt_values)
  File "pandas/_libs/index.pyx", line 291, in pandas._libs.index.IndexEngine.get_indexer_non_unique
    stargets = set(targets)
TypeError: 'NoneType' object is not iterable

With that said, would it make sense to not raise these exceptions when passing empty list/index to DataFrame.drop()?

If so, I might be able to contribute.
The change I'm thinking would align with my expectation: df = df.drop(df[df['value'] > 10].index) would behave the same way as df = df[df['value'] <= 10]

@mroeschke
Copy link
Member

Agreed with your conclusion, I think DataFrame.drop([]) should just be a no-op like you demonstrated in #27994 (comment)

@mroeschke
Copy link
Member

This looks to work on master now. Could use a test

In [9]: df = pd.DataFrame(
   ...:     {'value': [1, 2]},
   ...:     index=[pd.Timestamp('2019-01-01'), pd.Timestamp('2019-01-01')]
   ...: )

In [10]: df = df.drop(df[df['value'] > 10].index)
In [12]: df
Out[12]:
            value
2019-01-01      1
2019-01-01      2

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jul 11, 2021
@ghost
Copy link

ghost commented Jul 21, 2021

@mroeschke I'm looking for an issue to pickup for my first contribution, was wondering if I could take this up.

@mroeschke
Copy link
Member

Yes go for it @mikephung122.

@ghost
Copy link

ghost commented Jul 27, 2021

@mroeschke I submitted a Pull Request (#42746) for this issue, could you take a look when you get the chance?

@jreback jreback added this to the 1.4 milestone Jul 28, 2021
@jreback jreback modified the milestones: 1.4, Contributions Welcome Jan 8, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@phofl
Copy link
Member

phofl commented Nov 16, 2022

fixed in #42746

@phofl phofl closed this as completed Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants