-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug in Series.groupby would raise ValueError when grouping by PeriodIndex level #34049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
857e576
086cb47
f3c16fc
0f5fb58
0afc3b4
af642d3
b5f4dfd
effc7c5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -607,11 +607,9 @@ def is_in_axis(key) -> bool: | |
|
||
# if the grouper is obj[name] | ||
def is_in_obj(gpr) -> bool: | ||
if not hasattr(gpr, "name"): | ||
return False | ||
try: | ||
return gpr is obj[gpr.name] | ||
except (KeyError, IndexError): | ||
return id(gpr) == id(obj[gpr.name]) | ||
except Exception: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably don't want to do this. I think the fix should rather be in In [11]: s = pd.Series([1, 2, 3], index=pd.period_range('2000', periods=3, name='A'))
In [12]: "A" in s
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-6445c229fb58> in <module>
----> 1 "A" in s
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/generic.py in __contains__(self, key)
1737 def __contains__(self, key) -> bool_t:
1738 """True if the key is in the info axis"""
-> 1739 return key in self._info_axis
1740
1741 @property
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
353 @wraps(func)
354 def wrapper(*args, **kwargs) -> Callable:
--> 355 return func(*args, **kwargs)
356
357 # collecting docstring and docstring templates
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in __contains__(self, key)
332 hash(key)
333 try:
--> 334 self.get_loc(key)
335 return True
336 except KeyError:
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
501
502 try:
--> 503 asdt, reso = parse_time_string(key, self.freq)
504 except DateParseError as err:
505 # A string with invalid format
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()
ValueError: Given date string not likely a datetime. should raise a KeyError? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or perhaps the fix would be in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the plan is to backport this, so hoping to keep changes minimal. (if not reverting) added a TODO as suggested #34049 (comment) for now There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make sense. longer-term i think Tom's approach is optimal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the test this adds raises a ValueError here. instead of catching Exception, lets add ValueError and a comment about how it gets raised |
||
return False | ||
|
||
for i, (gpr, level) in enumerate(zip(keys, levels)): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1307,6 +1307,14 @@ def test_size_groupby_all_null(): | |
tm.assert_series_equal(result, expected) | ||
|
||
|
||
def test_size_period_index(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you move this to the new test_size.py file |
||
# https://github.com/pandas-dev/pandas/issues/34010 | ||
ser = Series([1], index=pd.PeriodIndex(["2000"], name="A", freq="D")) | ||
grp = ser.groupby(level="A") | ||
result = grp.size() | ||
tm.assert_series_equal(result, ser) | ||
|
||
|
||
# quantile | ||
# -------------------------------- | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just reverting the change from the original PR as opening gambit. plan to backport so either the old code or minimal changes works for me.