-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix groupby().count() for datetime columns #18167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In which whatsnew file should the whatsnew entry go? |
@@ -195,3 +195,13 @@ def test_ngroup_respects_groupby_order(self): | |||
g.ngroup()) | |||
assert_series_equal(Series(df['group_index'].values), | |||
g.cumcount()) | |||
|
|||
def test_count_with_datetime(self): | |||
df = DataFrame({'x': ['a', 'a', 'b'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add gh issue number as a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put in 0.21.1 bug fixes
pandas/core/groupby.py
Outdated
val = ((mask & ~isna(blk.get_values())) for blk in data.blocks) | ||
makeNDarray = lambda vals: vals[None, :] if vals.ndim == 1 else vals | ||
|
||
val = ((mask & ~isna(makeNDarray(blk.get_values()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use np.atleast_2d
no need for separate mask
Codecov Report
@@ Coverage Diff @@
## master #18167 +/- ##
==========================================
+ Coverage 91.4% 91.41% +<.01%
==========================================
Files 163 163
Lines 50073 50074 +1
==========================================
+ Hits 45769 45773 +4
+ Misses 4304 4301 -3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18167 +/- ##
=========================================
Coverage ? 91.38%
=========================================
Files ? 163
Lines ? 50068
Branches ? 0
=========================================
Hits ? 45754
Misses ? 4314
Partials ? 0
Continue to review full report at Codecov.
|
@@ -195,3 +195,14 @@ def test_ngroup_respects_groupby_order(self): | |||
g.ngroup()) | |||
assert_series_equal(Series(df['group_index'].values), | |||
g.cumcount()) | |||
|
|||
def test_count_with_datetime(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u parametrize to count over tz naive (and aware);
also over Timedeltas and Period would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change test name to _datetimelike
|
||
expected = DataFrame({'y': [2, 1]}, index=['a', 'b']) | ||
expected.index.name = "x" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use parametrize (the decorator)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I looked in the wrong test file for inspiration then.
Timestamp('2016-05-07 20:09:29+00:00')] | ||
_test(d1) | ||
d2 = [Timedelta(x, unit="h") for x in range(1, 4)] | ||
_test(d2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to change Timedelta and Period to work here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand your request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the examples for timedelta and period are unique and won’t group to the same as the other examples
just change to individually construct them like you did for time stamps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we are not grouping by the timestamps, it doesn't matter if they are unique or not; the original issue fails regardless.
af6440a
to
9eb429e
Compare
I don't think the Travis Issue is related. |
thanks @watercrossing, nice PR! |
(cherry picked from commit 4054632)
(cherry picked from commit 4054632)
git diff upstream/master -u -- "*.py" | flake8 --diff