Skip to content

BUG: Datetime MultiIndex Regression #35858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
matthewgilbert opened this issue Aug 22, 2020 · 5 comments · Fixed by #36675
Closed
3 tasks done

BUG: Datetime MultiIndex Regression #35858

matthewgilbert opened this issue Aug 22, 2020 · 5 comments · Fixed by #36675
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@matthewgilbert
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas

date = pandas.Timestamp("2000")
x = pandas.DataFrame([
    ["a", date, 1],
], columns=["a", "b", "c"]).set_index(["a", "b"])["c"]
x.loc[:, [date]]


InvalidIndexError: [Timestamp('2000-01-01 00:00:00')]

Problem description

The problem stems from pandas.DatetimeIndex.get_loc raising a InvalidIndexError in 1.* instead of a TypeError as in 0.25.3

0.25.3

import pandas
level_index = pandas.DatetimeIndex(['2000-01-01'], dtype='datetime64[ns]', name='b', freq=None)
key = [pandas.Timestamp('2000-01-01 00:00:00')]
level_index.get_loc(key)

...
TypeError: Cannot convert input [[Timestamp('2000-01-01 00:00:00')]] of type <class 'list'> to Timestamp

1.1.0

import pandas
level_index = pandas.DatetimeIndex(['2000-01-01'], dtype='datetime64[ns]', name='b', freq=None)
key = [pandas.Timestamp('2000-01-01 00:00:00')]
level_index.get_loc(key)

...
InvalidIndexError: [Timestamp('2000-01-01 00:00:00')]

Previously the TypeError was handled by

except TypeError:

Expected Output

I would expect this to behave similar to 0.25.3 which is valid indexing synatx.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.6.11.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.16.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200814
Cython : 0.29.21
pytest : 5.3.5
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.1
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.12.0-RAY
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.1
sqlalchemy : 1.3.19
tables : 3.5.1
tabulate : None
xarray : 0.15.0
xlrd : None
xlwt : None
numba : None

@matthewgilbert matthewgilbert added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 22, 2020
@matthewgilbert
Copy link
Contributor Author

Fixing this seems like a one liner which I implemented here master...matthewgilbert:master.

The tests continue to pass, but I am unsure on a few things. Any guidance on the following from someone more familiar with the code base would be appricated

  • Where would be the best place to test for this behavior? It seems like the is related to _handle_lowerdim_multi_index_axis0 not properly handling the error, however there is no test coverage for this function and it is a private method. Where would be the best place to add a test for this?
  • Is the comment # slices are unhashable still accurate with this change

@matthewgilbert matthewgilbert changed the title BUG: BUG: Datetime MultiIndex Regression in 1.* Aug 22, 2020
@matthewgilbert matthewgilbert changed the title BUG: Datetime MultiIndex Regression in 1.* BUG: Datetime MultiIndex Regression Aug 22, 2020
@simonjayhawkins simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 3, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.2 milestone Sep 3, 2020
@simonjayhawkins
Copy link
Member

Thanks @matthewgilbert for the report. This was giving the correct output in 1.0.5

>>> import pandas
>>>
>>> date = pandas.Timestamp("2000")
>>> x = pandas.DataFrame([
...     ["a", date, 1],
... ], columns=["a", "b", "c"]).set_index(["a", "b"])["c"]
>>> x.loc[:, [date]]
a  b
a  2000-01-01    1
Name: c, dtype: int64
>>>
>>> pandas.__version__
'1.0.5'
>>>

@simonjayhawkins simonjayhawkins added Datetime Datetime data dtype MultiIndex labels Sep 3, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 7, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Sep 7, 2020
@simonjayhawkins
Copy link
Member

first bad commit: [0b0cd08] BUG: DatetimeIndex.get_loc/get_value raise InvalidIndexError (#31257) cc @jbrockmendel

https://github.com/simonjayhawkins/pandas/runs/1081704526?check_suite_focus=true

@simonjayhawkins
Copy link
Member

Thanks @matthewgilbert for the report.

just to be explicit about the changes in behaviour for other readers.

>>> pd.__version__
'1.0.5'
>>>
>>> date = pandas.Timestamp("2000")
>>> x = pandas.DataFrame([["a", date, 1],], columns=["a", "b", "c"]).set_index(
...     ["a", "b"]
... )["c"]
>>> x.loc[:, [date]]
a  b
a  2000-01-01    1
Name: c, dtype: int64
>>>
>>> pd.__version__
'1.1.2'
>>>
>>> date = pandas.Timestamp("2000")
>>> x = pandas.DataFrame([["a", date, 1],], columns=["a", "b", "c"]).set_index(
...     ["a", "b"]
... )["c"]
>>> x.loc[:, [date]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexing.py", line 873, in __getitem__
    return self._getitem_tuple(key)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexing.py", line 1044, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexing.py", line 766, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexing.py", line 826, in _getitem_nested_tuple
    result = self._handle_lowerdim_multi_index_axis0(tup)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexing.py", line 1066, in _handle_lowerdim_multi_index_axis0

    return self._get_label(tup, axis=axis)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexing.py", line 1059, in _get_label
    return self.obj.xs(label, axis=axis)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\generic.py", line 3486, in xs
    loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexes\multi.py", line 2859, in get_loc_level
    k = self._get_level_indexer(k, level=i)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexes\multi.py", line 2966, in _get_level_indexer
    code = self._get_loc_single_level_index(level_index, key)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexes\multi.py", line 2634, in _get_loc_single_level_index
    return level_index.get_loc(key)
  File "C:\Users\simon\Anaconda3\envs\pandas-1.1.2\lib\site-packages\pandas\core\indexes\datetimes.py", line 586, in get_loc
    raise InvalidIndexError(key)
pandas.errors.InvalidIndexError: [Timestamp('2000-01-01 00:00:00')]
>>>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version
Projects
None yet
2 participants