Fix groupby().count() for datetime columns #18167

watercrossing · 2017-11-08T10:49:52Z

closes Pandas 0.18.1 df.groupby().count() throws "ValueError: Buffer has wrong number of dimensions" when one of the counted columns has dtype datetime64 #13393
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

watercrossing · 2017-11-08T11:20:55Z

In which whatsnew file should the whatsnew entry go?

jreback · 2017-11-08T11:24:35Z

pandas/tests/groupby/test_counting.py

@@ -195,3 +195,13 @@ def test_ngroup_respects_groupby_order(self):
                                g.ngroup())
            assert_series_equal(Series(df['group_index'].values),
                                g.cumcount())
+
+    def test_count_with_datetime(self):
+        df = DataFrame({'x': ['a', 'a', 'b'],


add gh issue number as a comment

jreback

you can put in 0.21.1 bug fixes

jreback · 2017-11-08T11:25:59Z

pandas/core/groupby.py

-        val = ((mask & ~isna(blk.get_values())) for blk in data.blocks)
+        makeNDarray = lambda vals: vals[None, :] if vals.ndim == 1 else vals
+
+        val = ((mask & ~isna(makeNDarray(blk.get_values())))


just use np.atleast_2d
no need for separate mask

codecov · 2017-11-08T11:35:55Z

Codecov Report

Merging #18167 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18167      +/-   ##
==========================================
+ Coverage    91.4%   91.41%   +<.01%     
==========================================
  Files         163      163              
  Lines       50073    50074       +1     
==========================================
+ Hits        45769    45773       +4     
+ Misses       4304     4301       -3

Flag	Coverage Δ
#multiple	`89.21% <100%> (+0.02%)`	⬆️
#single	`40.32% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.04% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️
pandas/plotting/_converter.py	`65.2% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93c755e...073f22d. Read the comment docs.

codecov · 2017-11-08T11:35:55Z

Codecov Report

❗ No coverage uploaded for pull request base (master@5350330). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #18167   +/-   ##
=========================================
  Coverage          ?   91.38%           
=========================================
  Files             ?      163           
  Lines             ?    50068           
  Branches          ?        0           
=========================================
  Hits              ?    45754           
  Misses            ?     4314           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`89.19% <100%> (?)`
#single	`40.33% <0%> (?)`

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.04% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5350330...9eb429e. Read the comment docs.

jreback · 2017-11-08T11:50:59Z

pandas/tests/groupby/test_counting.py

@@ -195,3 +195,14 @@ def test_ngroup_respects_groupby_order(self):
                                g.ngroup())
            assert_series_equal(Series(df['group_index'].values),
                                g.cumcount())
+
+    def test_count_with_datetime(self):


can u parametrize to count over tz naive (and aware);
also over Timedeltas and Period would be great

change test name to _datetimelike

jreback · 2017-11-08T12:46:38Z

pandas/tests/groupby/test_counting.py

+
+        expected = DataFrame({'y': [2, 1]}, index=['a', 'b'])
+        expected.index.name = "x"
+


use parametrize (the decorator)

Oh! I looked in the wrong test file for inspiration then.

jreback · 2017-11-08T12:48:35Z

pandas/tests/groupby/test_counting.py

+              Timestamp('2016-05-07 20:09:29+00:00')]
+        _test(d1)
+        d2 = [Timedelta(x, unit="h") for x in range(1, 4)]
+        _test(d2)


you need to change Timedelta and Period to work here

I am not sure I understand your request.

the examples for timedelta and period are unique and won’t group to the same as the other examples

just change to individually construct them like you did for time stamps

but we are not grouping by the timestamps, it doesn't matter if they are unique or not; the original issue fails regardless.

watercrossing · 2017-11-08T16:43:29Z

I don't think the Travis Issue is related.

jreback · 2017-11-08T20:25:55Z

thanks @watercrossing, nice PR!

(cherry picked from commit 4054632)

jreback reviewed Nov 8, 2017

View reviewed changes

jreback requested changes Nov 8, 2017

View reviewed changes

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby labels Nov 8, 2017

jreback requested changes Nov 8, 2017

View reviewed changes

jreback reviewed Nov 8, 2017

View reviewed changes

Fix groupby().count() for datetimelike columns

9eb429e

watercrossing force-pushed the groupbyCountFix branch from af6440a to 9eb429e Compare November 8, 2017 13:26

jreback added this to the 0.21.1 milestone Nov 8, 2017

jreback merged commit 4054632 into pandas-dev:master Nov 8, 2017

watercrossing deleted the groupbyCountFix branch November 9, 2017 10:27

watercrossing added a commit to watercrossing/pandas that referenced this pull request Nov 10, 2017

Fix groupby().count() for datetimelike columns (pandas-dev#18167)

a9bb95c

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Fix groupby().count() for datetimelike columns (pandas-dev#18167)

2459687

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017

Fix groupby().count() for datetimelike columns (pandas-dev#18167)

b0bda40

(cherry picked from commit 4054632)

TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017

Fix groupby().count() for datetimelike columns (#18167)

33cc9e6

(cherry picked from commit 4054632)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix groupby().count() for datetime columns #18167

Fix groupby().count() for datetime columns #18167

watercrossing commented Nov 8, 2017 •

edited

Loading

watercrossing commented Nov 8, 2017

jreback Nov 8, 2017

jreback left a comment

jreback Nov 8, 2017

codecov bot commented Nov 8, 2017

codecov bot commented Nov 8, 2017 •

edited

Loading

jreback Nov 8, 2017

jreback Nov 8, 2017

jreback Nov 8, 2017

watercrossing Nov 8, 2017

jreback Nov 8, 2017

watercrossing Nov 8, 2017

jreback Nov 8, 2017

watercrossing Nov 8, 2017

watercrossing commented Nov 8, 2017

jreback commented Nov 8, 2017


		expected = DataFrame({'y': [2, 1]}, index=['a', 'b'])
		expected.index.name = "x"

Fix groupby().count() for datetime columns #18167

Fix groupby().count() for datetime columns #18167

Conversation

watercrossing commented Nov 8, 2017 • edited Loading

watercrossing commented Nov 8, 2017

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 8, 2017

Codecov Report

codecov bot commented Nov 8, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

watercrossing commented Nov 8, 2017

jreback commented Nov 8, 2017

watercrossing commented Nov 8, 2017 •

edited

Loading

codecov bot commented Nov 8, 2017 •

edited

Loading