Skip to content

BUG: ValueError: Given date string not likely a datetime when grouping by PeriodIndex level #34010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
TomAugspurger opened this issue May 5, 2020 · 2 comments · Fixed by #34049
Closed
3 tasks done
Labels
Bug Groupby Period Period data type Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@TomAugspurger
Copy link
Contributor

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

In [10]: t = pd.Series([1], index=pd.PeriodIndex(['2000'], name="A", freq="D"))

In [11]: t.groupby(level="A").size()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
    593         try:
--> 594             return self._engine.get_loc(key)
    595         except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'A'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-11-ca2eac2f77ff> in <module>
----> 1 t.groupby("A").size()

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/series.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed)
   1685             group_keys=group_keys,
   1686             squeeze=squeeze,
-> 1687             observed=observed,
   1688         )
   1689

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated)
    407                 sort=sort,
    408                 observed=observed,
--> 409                 mutated=self.mutated,
    410             )
    411

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate)
    588
    589         elif is_in_axis(gpr):  # df.groupby('name')
--> 590             if gpr in obj:
    591                 if validate:
    592                     obj._check_label_or_level_ambiguity(gpr, axis=axis)

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/generic.py in __contains__(self, key)
   1848     def __contains__(self, key) -> bool_t:
   1849         """True if the key is in the info axis"""
-> 1850         return key in self._info_axis
   1851
   1852     @property

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in __contains__(self, key)
    384         else:
    385             try:
--> 386                 self.get_loc(key)
    387                 return True
    388             except (TypeError, KeyError):

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
    598
    599             try:
--> 600                 asdt, parsed, reso = parse_time_string(key, self.freq)
    601                 key = asdt
    602             except TypeError:

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

ValueError: Given date string not likely a datetime.

Problem description

With a regular Index, we see the following

In [13]: s = pd.Series([1], index=pd.Index(['a'], name='A'))

In [14]: s.groupby(level="A").size()
Out[14]:
A
a    1
dtype: int64

This seems to only affect PeriodIndex. DatetimeIndex works fine.

Expected Output

A
2000-01-01    1
Freq: D, dtype: int64
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone May 5, 2020
@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label May 6, 2020
@jorisvandenbossche
Copy link
Member

Tagged as a regression, since this works fine in 0.25, but is broken in 1.0.3

@simonjayhawkins
Copy link
Member

simonjayhawkins commented May 7, 2020

Tagged as a regression, since this works fine in 0.25, but is broken in 1.0.3

regression from #27909 (i.e. 1.0.0)

9cb5de0 is the first bad commit
commit 9cb5de0
Author: jbrockmendel [email protected]
Date: Tue Sep 3 04:39:20 2019 -0700

CLN: Catch more specific exceptions in groupby (#27909)

cc @jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Period Period data type Regression Functionality that used to work in a prior pandas version
Projects
None yet
4 participants