fix a indices bug for categorical-datetime columns #26860

alexifm · 2019-06-14T20:17:38Z

This is to fix a bug reported in #26859

closes Groupby indices error with datetime categorical #26859
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This is to fix a bug reported in pandas-dev#26859

codecov · 2019-06-14T20:58:51Z

Codecov Report

Merging #26860 into master will decrease coverage by 1.41%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26860      +/-   ##
==========================================
- Coverage   91.88%   90.46%   -1.42%     
==========================================
  Files         179      179              
  Lines       50696    50697       +1     
==========================================
- Hits        46581    45865     -716     
- Misses       4115     4832     +717

Flag	Coverage Δ
#multiple	`90.46% <100%> (ø)`	⬆️
#single	`?`

Impacted Files	Coverage Δ
pandas/core/groupby/ops.py	`96% <100%> (ø)`	⬆️
pandas/core/computation/pytables.py	`62.5% <0%> (-27.75%)`	⬇️
pandas/io/pytables.py	`63.82% <0%> (-26.48%)`	⬇️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/computation/common.py	`84.21% <0%> (-5.27%)`	⬇️
pandas/core/computation/expr.py	`94.78% <0%> (-3.03%)`	⬇️
pandas/io/clipboard/clipboards.py	`31.88% <0%> (-2.9%)`	⬇️
pandas/io/formats/printing.py	`84.49% <0%> (-1.07%)`	⬇️
pandas/core/indexes/datetimes.py	`96.21% <0%> (-0.17%)`	⬇️
pandas/core/arrays/categorical.py	`95.8% <0%> (-0.13%)`	⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 430f0fd...6f8fdc0. Read the comment docs.

codecov · 2019-06-14T20:58:51Z

Codecov Report

❗ No coverage uploaded for pull request base (master@d91ffa6). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #26860   +/-   ##
=========================================
  Coverage          ?   91.87%           
=========================================
  Files             ?      179           
  Lines             ?    50697           
  Branches          ?        0           
=========================================
  Hits              ?    46578           
  Misses            ?     4119           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.46% <100%> (?)`
#single	`41.11% <0%> (?)`

Impacted Files	Coverage Δ
pandas/core/groupby/ops.py	`96% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d91ffa6...85b3b1a. Read the comment docs.

WillAyd · 2019-06-15T00:56:50Z

Can you add a test? That's typically the first thing we look for with PRs

Adds a test for a bug fix for DataFrameGroupby.indices in pandas-dev#26860

pep8speaks · 2019-06-15T02:04:53Z

Hello @alexifm! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-29 03:58:29 UTC

cleans up the test to adhere to pep8 formatting

alexifm · 2019-06-15T02:36:22Z

I need to find a better way to test to equality of the dictionaries in the output for 3.5.

The test no longer depends on ordering of a dictionary. Also, the test matches the timestamp/datetime outputs that are the current standard in the code.

Handle Py3.5 dict ordering issues. Cleanup for Pep8. No longer using numpy testing utility.

pandas/core/groupby/ops.py

pandas/tests/groupby/test_groupby.py

topper-123 · 2019-06-16T21:08:28Z

This should also have a entry in the whatsnew.

alexifm · 2019-06-17T02:17:58Z

Thanks for the feedback. Will update.

jreback · 2019-06-28T14:50:56Z

@alexifm can you merge master and update

jbrockmendel · 2019-08-28T17:30:09Z

@alexifm can you merge master and ill take a look

alexifm · 2019-08-28T19:45:36Z

Updated it and addressed the comments in the review. Sorry it took so long.

TomAugspurger · 2019-08-28T20:47:18Z

doc/source/whatsnew/v0.25.2.rst

@@ -99,7 +99,7 @@ Other
 ^^^^^

 - Compatibility with Python 3.8 in :meth:`DataFrame.query` (:issue:`27261`)
-
+- Bug in :func:`get_indexer_dict` when passed keys are not numpy array. (:issue:`26860`)


I don't think this is a public function. Can you rephrase this to make sense for an end user?

And can you move the release note to the 1.0.0 whatsenew?

Would it make more sense to put it in terms of gb.indices which was where the problem originally came about?

TomAugspurger · 2019-08-28T20:51:01Z

pandas/core/sorting.py

@@ -305,6 +305,8 @@ def get_flattened_iterator(comp_ids, ngroups, levels, labels):

 def get_indexer_dict(label_list, keys):
    """ return a diction of {labels} -> {indexers} """
+    # address GH 26860
+    keys = [np.asarray(key) for key in keys]


What are the types on key here? Series, Index, Array?

I worry a bit about doing this on a DatetimeIndex with tz. That will emit a warning, since we're changing how we handle datetimes in np.asarray.

Honestly, I'm not all that sure what is going into get_indexer_dict which was why I put the fix under the indices property since it was more about fixing that particular input.

TomAugspurger · 2019-08-28T20:51:43Z

pandas/tests/groupby/test_groupby.py

+    ]),
+    ids=lambda cols: ",".join(cols)
+)
+def test_groupby_indices(gb_cols):


This test seems very complicated. I haven't gone through it yet, but I would appreciate at least one test as simple as the example from the issue.

Yea, I think it makes sense to simplify. My idea was that gb.indices fails under certain combinations of types of columns and I wanted to enumerate as many of the combinations as possible. The original iteration was inside the test but it was a mess. I can still do the iteration inside the test but in a much cleaner manner.

…est; address py3.5 issues

jbrockmendel · 2019-08-28T21:34:09Z

pandas/tests/groupby/test_groupby.py

-    ids=lambda cols: ",".join(cols)
-)
-def test_groupby_indices(gb_cols):
+def test_groupby_indices_output():


is parametrizing no longer viable?

Yea, I can revert it or find a middle ground so the parameterization isn't overkill. Thoughts on that?

I simplified the test and parametrized it.

pandas/tests/groupby/test_groupby.py

TomAugspurger

Sorry, there's a merge conflict in the whatsnew. Can you merge master and resolve that?

@jbrockmendel do you have thoughts here?

TomAugspurger · 2019-09-11T16:01:24Z

doc/source/whatsnew/v1.0.0.rst

@@ -176,7 +176,7 @@ Groupby/resample/rolling
 ^^^^^^^^^^^^^^^^^^^^^^^^

 -
-
+- Bug in :meth:`DataFrameGroupBy.indices` raises exception when grouping on multiple columns and one is a categorical with datetime values. (:issue:`26860`)


"raises" -> "raising an"

"and one" -> "when one"

jbrockmendel · 2019-09-11T22:22:56Z

i think we need the test cases to include Categorical[datetimetz]

jreback · 2019-10-06T23:41:02Z

can you merge master and update to comments

jreback · 2019-10-18T21:34:12Z

can you merge master

jbrockmendel · 2019-11-02T00:39:07Z

@alexifm can you rebase. this would be nice to get in

jreback · 2019-11-13T18:19:22Z

closing as stale, if you want to continue pls ping.

fix a indices bug for categorical-datetime columns

6f8fdc0

This is to fix a bug reported in pandas-dev#26859

WillAyd added the Groupby label Jun 15, 2019

test for DataFrameGroupby.indices

cfa0fef

Adds a test for a bug fix for DataFrameGroupby.indices in pandas-dev#26860

pep8 formatting for test

806f980

cleans up the test to adhere to pep8 formatting

alexifm added 2 commits June 14, 2019 20:14

Fix tests to handly Py3.5

27086ba

The test no longer depends on ordering of a dictionary. Also, the test matches the timestamp/datetime outputs that are the current standard in the code.

Fixing Py3.5 and other checks.

0d5385f

Handle Py3.5 dict ordering issues. Cleanup for Pep8. No longer using numpy testing utility.

jreback requested changes Jun 16, 2019

View reviewed changes

topper-123 added the Bug label Jun 16, 2019

topper-123 added this to the 0.25.0 milestone Jun 16, 2019

jreback removed this from the 0.25.0 milestone Jun 28, 2019

alexifm and others added 4 commits August 28, 2019 12:33

updated groupby indices test to address comments

3820a94

move test position

51bda3a

move bug fix to address comment

b6ba161

Merge branch 'master' into patch-1

f833ece

alexifm added 4 commits August 28, 2019 13:03

undo pep8 autoformatting

650a3ec

Merge remote-tracking branch 'origin/patch-1' into patch-1

88630ce

format fix

c926c06

add bug to what's new

39f394e

TomAugspurger reviewed Aug 28, 2019

View reviewed changes

provide simple test for original github issue; simplify full output t…

182de89

…est; address py3.5 issues

jbrockmendel reviewed Aug 28, 2019

View reviewed changes

pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Aug 28, 2019

View reviewed changes

pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved

alexifm added 6 commits August 28, 2019 14:44

update what's new for 1.0

7b5b370

address comments on cleaning up test

c700c1a

parametrize test

5543e0d

fix imports

4ae7db8

import fix

abcfaff

black formatting

85b3b1a

TomAugspurger reviewed Sep 11, 2019

View reviewed changes

jreback closed this Nov 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix a indices bug for categorical-datetime columns #26860

fix a indices bug for categorical-datetime columns #26860

alexifm commented Jun 14, 2019 •

edited

Loading

codecov bot commented Jun 14, 2019

codecov bot commented Jun 14, 2019 •

edited

Loading

WillAyd commented Jun 15, 2019

pep8speaks commented Jun 15, 2019 •

edited

Loading

alexifm commented Jun 15, 2019

topper-123 commented Jun 16, 2019

alexifm commented Jun 17, 2019

jreback commented Jun 28, 2019

jbrockmendel commented Aug 28, 2019

alexifm commented Aug 28, 2019

TomAugspurger Aug 28, 2019

alexifm Aug 28, 2019

TomAugspurger Aug 28, 2019

alexifm Aug 28, 2019

TomAugspurger Aug 28, 2019

alexifm Aug 28, 2019

jbrockmendel Aug 28, 2019

alexifm Aug 28, 2019

alexifm Aug 29, 2019

TomAugspurger left a comment

TomAugspurger Sep 11, 2019

jbrockmendel commented Sep 11, 2019

jreback commented Oct 6, 2019

jreback commented Oct 18, 2019

jbrockmendel commented Nov 2, 2019

jreback commented Nov 13, 2019

fix a indices bug for categorical-datetime columns #26860

fix a indices bug for categorical-datetime columns #26860

Conversation

alexifm commented Jun 14, 2019 • edited Loading

codecov bot commented Jun 14, 2019

Codecov Report

codecov bot commented Jun 14, 2019 • edited Loading

Codecov Report

WillAyd commented Jun 15, 2019

pep8speaks commented Jun 15, 2019 • edited Loading

Comment last updated at 2019-08-29 03:58:29 UTC

alexifm commented Jun 15, 2019

topper-123 commented Jun 16, 2019

alexifm commented Jun 17, 2019

jreback commented Jun 28, 2019

jbrockmendel commented Aug 28, 2019

alexifm commented Aug 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 11, 2019

jreback commented Oct 6, 2019

jreback commented Oct 18, 2019

jbrockmendel commented Nov 2, 2019

jreback commented Nov 13, 2019

alexifm commented Jun 14, 2019 •

edited

Loading

codecov bot commented Jun 14, 2019 •

edited

Loading

pep8speaks commented Jun 15, 2019 •

edited

Loading