Skip to content

Bug in Series.groupby would raise ValueError when grouping by PeriodIndex level #34049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 19, 2020
6 changes: 2 additions & 4 deletions pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -607,11 +607,9 @@ def is_in_axis(key) -> bool:

# if the grouper is obj[name]
def is_in_obj(gpr) -> bool:
if not hasattr(gpr, "name"):
return False
try:
return gpr is obj[gpr.name]
except (KeyError, IndexError):
return id(gpr) == id(obj[gpr.name])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reverting the change from the original PR as opening gambit. plan to backport so either the old code or minimal changes works for me.

except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't want to do this. I think the fix should rather be in Series.__contains__.

In [11]: s = pd.Series([1, 2, 3], index=pd.period_range('2000', periods=3, name='A'))

In [12]: "A" in s
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-6445c229fb58> in <module>
----> 1 "A" in s

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/generic.py in __contains__(self, key)
   1737     def __contains__(self, key) -> bool_t:
   1738         """True if the key is in the info axis"""
-> 1739         return key in self._info_axis
   1740
   1741     @property

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    353         @wraps(func)
    354         def wrapper(*args, **kwargs) -> Callable:
--> 355             return func(*args, **kwargs)
    356
    357         # collecting docstring and docstring templates

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in __contains__(self, key)
    332             hash(key)
    333             try:
--> 334                 self.get_loc(key)
    335                 return True
    336             except KeyError:

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
    501
    502             try:
--> 503                 asdt, reso = parse_time_string(key, self.freq)
    504             except DateParseError as err:
    505                 # A string with invalid format

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

ValueError: Given date string not likely a datetime.

should raise a KeyError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps the fix would be in Series.__getitem__.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plan is to backport this, so hoping to keep changes minimal. (if not reverting)

added a TODO as suggested #34049 (comment) for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. longer-term i think Tom's approach is optimal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test this adds raises a ValueError here. instead of catching Exception, lets add ValueError and a comment about how it gets raised

return False

for i, (gpr, level) in enumerate(zip(keys, levels)):
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/groupby/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -1307,6 +1307,14 @@ def test_size_groupby_all_null():
tm.assert_series_equal(result, expected)


def test_size_period_index():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to the new test_size.py file

# https://github.com/pandas-dev/pandas/issues/34010
ser = Series([1], index=pd.PeriodIndex(["2000"], name="A", freq="D"))
grp = ser.groupby(level="A")
result = grp.size()
tm.assert_series_equal(result, ser)


# quantile
# --------------------------------

Expand Down