-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug in Series.groupby would raise ValueError when grouping by PeriodIndex level #34049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in Series.groupby would raise ValueError when grouping by PeriodIndex level #34049
Conversation
pandas/core/groupby/grouper.py
Outdated
try: | ||
return gpr is obj[gpr.name] | ||
except (KeyError, IndexError): | ||
return id(gpr) == id(obj[gpr.name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just reverting the change from the original PR as opening gambit. plan to backport so either the old code or minimal changes works for me.
pandas/core/groupby/grouper.py
Outdated
return gpr is obj[gpr.name] | ||
except (KeyError, IndexError): | ||
return id(gpr) == id(obj[gpr.name]) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't want to do this. I think the fix should rather be in Series.__contains__
.
In [11]: s = pd.Series([1, 2, 3], index=pd.period_range('2000', periods=3, name='A'))
In [12]: "A" in s
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-6445c229fb58> in <module>
----> 1 "A" in s
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/generic.py in __contains__(self, key)
1737 def __contains__(self, key) -> bool_t:
1738 """True if the key is in the info axis"""
-> 1739 return key in self._info_axis
1740
1741 @property
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
353 @wraps(func)
354 def wrapper(*args, **kwargs) -> Callable:
--> 355 return func(*args, **kwargs)
356
357 # collecting docstring and docstring templates
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in __contains__(self, key)
332 hash(key)
333 try:
--> 334 self.get_loc(key)
335 return True
336 except KeyError:
~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
501
502 try:
--> 503 asdt, reso = parse_time_string(key, self.freq)
504 except DateParseError as err:
505 # A string with invalid format
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()
ValueError: Given date string not likely a datetime.
should raise a KeyError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps the fix would be in Series.__getitem__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the plan is to backport this, so hoping to keep changes minimal. (if not reverting)
added a TODO as suggested #34049 (comment) for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. longer-term i think Tom's approach is optimal
pandas/core/groupby/grouper.py
Outdated
return gpr is obj[gpr.name] | ||
except (KeyError, IndexError): | ||
return id(gpr) == id(obj[gpr.name]) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test this adds raises a ValueError here. instead of catching Exception, lets add ValueError and a comment about how it gets raised
@@ -1307,6 +1307,14 @@ def test_size_groupby_all_null(): | |||
tm.assert_series_equal(result, expected) | |||
|
|||
|
|||
def test_size_period_index(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this to the new test_size.py file
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -721,6 +721,7 @@ Groupby/resample/rolling | |||
- Bug in :meth:`DataFrame.groupby` where a ``ValueError`` would be raised when grouping by a categorical column with read-only categories and ``sort=False`` (:issue:`33410`) | |||
- Bug in :meth:`GroupBy.first` and :meth:`GroupBy.last` where None is not preserved in object dtype (:issue:`32800`) | |||
- Bug in :meth:`Rolling.min` and :meth:`Rolling.max`: Growing memory usage after multiple calls when using a fixed window (:issue:`30726`) | |||
- Bug in :meth:`Series.groupby` would raise ValueError when grouping by PeriodIndex level (:issue:`34010`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use double backticks on ValueError and PeriodIndex
I think all comments addressed. will merge tomorrow if no further comments. |
Opened #34240 for the followup. LGTM. |
…ould raise ValueError when grouping by PeriodIndex level)
… ValueError when grouping by PeriodIndex level) (#34247)
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff