-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
fix a indices bug for categorical-datetime columns #26860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is to fix a bug reported in pandas-dev#26859
Codecov Report
@@ Coverage Diff @@
## master #26860 +/- ##
==========================================
- Coverage 91.88% 90.46% -1.42%
==========================================
Files 179 179
Lines 50696 50697 +1
==========================================
- Hits 46581 45865 -716
- Misses 4115 4832 +717
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26860 +/- ##
=========================================
Coverage ? 91.87%
=========================================
Files ? 179
Lines ? 50697
Branches ? 0
=========================================
Hits ? 46578
Misses ? 4119
Partials ? 0
Continue to review full report at Codecov.
|
Can you add a test? That's typically the first thing we look for with PRs |
Adds a test for a bug fix for DataFrameGroupby.indices in pandas-dev#26860
cleans up the test to adhere to pep8 formatting
I need to find a better way to test to equality of the dictionaries in the output for 3.5. |
The test no longer depends on ordering of a dictionary. Also, the test matches the timestamp/datetime outputs that are the current standard in the code.
Handle Py3.5 dict ordering issues. Cleanup for Pep8. No longer using numpy testing utility.
This should also have a entry in the whatsnew. |
Thanks for the feedback. Will update. |
@alexifm can you merge master and update |
@alexifm can you merge master and ill take a look |
Updated it and addressed the comments in the review. Sorry it took so long. |
doc/source/whatsnew/v0.25.2.rst
Outdated
@@ -99,7 +99,7 @@ Other | |||
^^^^^ | |||
|
|||
- Compatibility with Python 3.8 in :meth:`DataFrame.query` (:issue:`27261`) | |||
- | |||
- Bug in :func:`get_indexer_dict` when passed keys are not numpy array. (:issue:`26860`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a public function. Can you rephrase this to make sense for an end user?
And can you move the release note to the 1.0.0 whatsenew?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to put it in terms of gb.indices
which was where the problem originally came about?
@@ -305,6 +305,8 @@ def get_flattened_iterator(comp_ids, ngroups, levels, labels): | |||
|
|||
def get_indexer_dict(label_list, keys): | |||
""" return a diction of {labels} -> {indexers} """ | |||
# address GH 26860 | |||
keys = [np.asarray(key) for key in keys] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the types on key
here? Series, Index, Array?
I worry a bit about doing this on a DatetimeIndex with tz. That will emit a warning, since we're changing how we handle datetimes in np.asarray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm not all that sure what is going into get_indexer_dict
which was why I put the fix under the indices
property since it was more about fixing that particular input.
pandas/tests/groupby/test_groupby.py
Outdated
]), | ||
ids=lambda cols: ",".join(cols) | ||
) | ||
def test_groupby_indices(gb_cols): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems very complicated. I haven't gone through it yet, but I would appreciate at least one test as simple as the example from the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think it makes sense to simplify. My idea was that gb.indices
fails under certain combinations of types of columns and I wanted to enumerate as many of the combinations as possible. The original iteration was inside the test but it was a mess. I can still do the iteration inside the test but in a much cleaner manner.
…est; address py3.5 issues
pandas/tests/groupby/test_groupby.py
Outdated
ids=lambda cols: ",".join(cols) | ||
) | ||
def test_groupby_indices(gb_cols): | ||
def test_groupby_indices_output(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is parametrizing no longer viable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I can revert it or find a middle ground so the parameterization isn't overkill. Thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the test and parametrized it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, there's a merge conflict in the whatsnew. Can you merge master and resolve that?
@jbrockmendel do you have thoughts here?
@@ -176,7 +176,7 @@ Groupby/resample/rolling | |||
^^^^^^^^^^^^^^^^^^^^^^^^ | |||
|
|||
- | |||
- | |||
- Bug in :meth:`DataFrameGroupBy.indices` raises exception when grouping on multiple columns and one is a categorical with datetime values. (:issue:`26860`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"raises" -> "raising an"
"and one" -> "when one"
i think we need the test cases to include Categorical[datetimetz] |
can you merge master and update to comments |
can you merge master |
@alexifm can you rebase. this would be nice to get in |
closing as stale, if you want to continue pls ping. |
This is to fix a bug reported in #26859
git diff upstream/master -u -- "*.py" | flake8 --diff