Skip to content

Bug in Series.groupby would raise ValueError when grouping by PeriodIndex level #34049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 19, 2020

Conversation

simonjayhawkins
Copy link
Member

@simonjayhawkins simonjayhawkins added Groupby Period Period data type labels May 7, 2020
try:
return gpr is obj[gpr.name]
except (KeyError, IndexError):
return id(gpr) == id(obj[gpr.name])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reverting the change from the original PR as opening gambit. plan to backport so either the old code or minimal changes works for me.

return gpr is obj[gpr.name]
except (KeyError, IndexError):
return id(gpr) == id(obj[gpr.name])
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't want to do this. I think the fix should rather be in Series.__contains__.

In [11]: s = pd.Series([1, 2, 3], index=pd.period_range('2000', periods=3, name='A'))

In [12]: "A" in s
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-6445c229fb58> in <module>
----> 1 "A" in s

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/generic.py in __contains__(self, key)
   1737     def __contains__(self, key) -> bool_t:
   1738         """True if the key is in the info axis"""
-> 1739         return key in self._info_axis
   1740
   1741     @property

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    353         @wraps(func)
    354         def wrapper(*args, **kwargs) -> Callable:
--> 355             return func(*args, **kwargs)
    356
    357         # collecting docstring and docstring templates

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in __contains__(self, key)
    332             hash(key)
    333             try:
--> 334                 self.get_loc(key)
    335                 return True
    336             except KeyError:

~/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/indexes/period.py in get_loc(self, key, method, tolerance)
    501
    502             try:
--> 503                 asdt, reso = parse_time_string(key, self.freq)
    504             except DateParseError as err:
    505                 # A string with invalid format

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()

ValueError: Given date string not likely a datetime.

should raise a KeyError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps the fix would be in Series.__getitem__.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plan is to backport this, so hoping to keep changes minimal. (if not reverting)

added a TODO as suggested #34049 (comment) for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. longer-term i think Tom's approach is optimal

return gpr is obj[gpr.name]
except (KeyError, IndexError):
return id(gpr) == id(obj[gpr.name])
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test this adds raises a ValueError here. instead of catching Exception, lets add ValueError and a comment about how it gets raised

@simonjayhawkins simonjayhawkins changed the title Revert part "CLN: Catch more specific exceptions in groupby (#27909)" Bug in Series.groupby would raise ValueError when grouping by PeriodIndex level May 7, 2020
@jreback jreback added this to the 1.1 milestone May 9, 2020
@@ -1307,6 +1307,14 @@ def test_size_groupby_all_null():
tm.assert_series_equal(result, expected)


def test_size_period_index():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to the new test_size.py file

@@ -721,6 +721,7 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrame.groupby` where a ``ValueError`` would be raised when grouping by a categorical column with read-only categories and ``sort=False`` (:issue:`33410`)
- Bug in :meth:`GroupBy.first` and :meth:`GroupBy.last` where None is not preserved in object dtype (:issue:`32800`)
- Bug in :meth:`Rolling.min` and :meth:`Rolling.max`: Growing memory usage after multiple calls when using a fixed window (:issue:`30726`)
- Bug in :meth:`Series.groupby` would raise ValueError when grouping by PeriodIndex level (:issue:`34010`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use double backticks on ValueError and PeriodIndex

@simonjayhawkins
Copy link
Member Author

I think all comments addressed. will merge tomorrow if no further comments.

@TomAugspurger
Copy link
Contributor

Opened #34240 for the followup. LGTM.

@simonjayhawkins simonjayhawkins merged commit 8cd8ed3 into pandas-dev:master May 19, 2020
@simonjayhawkins simonjayhawkins deleted the period-regression branch May 19, 2020 08:45
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request May 19, 2020
…ould raise ValueError when grouping by PeriodIndex level)
simonjayhawkins added a commit that referenced this pull request May 19, 2020
… ValueError when grouping by PeriodIndex level) (#34247)
@simonjayhawkins simonjayhawkins modified the milestones: 1.1, 1.0.4 May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: ValueError: Given date string not likely a datetime when grouping by PeriodIndex level
4 participants