Skip to content

Fix groupby().count() for datetime columns #18167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2017

Conversation

watercrossing
Copy link
Contributor

@watercrossing watercrossing commented Nov 8, 2017

@watercrossing
Copy link
Contributor Author

In which whatsnew file should the whatsnew entry go?

@@ -195,3 +195,13 @@ def test_ngroup_respects_groupby_order(self):
g.ngroup())
assert_series_equal(Series(df['group_index'].values),
g.cumcount())

def test_count_with_datetime(self):
df = DataFrame({'x': ['a', 'a', 'b'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add gh issue number as a comment

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put in 0.21.1 bug fixes

val = ((mask & ~isna(blk.get_values())) for blk in data.blocks)
makeNDarray = lambda vals: vals[None, :] if vals.ndim == 1 else vals

val = ((mask & ~isna(makeNDarray(blk.get_values())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use np.atleast_2d
no need for separate mask

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby labels Nov 8, 2017
@codecov
Copy link

codecov bot commented Nov 8, 2017

Codecov Report

Merging #18167 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18167      +/-   ##
==========================================
+ Coverage    91.4%   91.41%   +<.01%     
==========================================
  Files         163      163              
  Lines       50073    50074       +1     
==========================================
+ Hits        45769    45773       +4     
+ Misses       4304     4301       -3
Flag Coverage Δ
#multiple 89.21% <100%> (+0.02%) ⬆️
#single 40.32% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.04% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/plotting/_converter.py 65.2% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93c755e...073f22d. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 8, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@5350330). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18167   +/-   ##
=========================================
  Coverage          ?   91.38%           
=========================================
  Files             ?      163           
  Lines             ?    50068           
  Branches          ?        0           
=========================================
  Hits              ?    45754           
  Misses            ?     4314           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.19% <100%> (?)
#single 40.33% <0%> (?)
Impacted Files Coverage Δ
pandas/core/groupby.py 92.04% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5350330...9eb429e. Read the comment docs.

@@ -195,3 +195,14 @@ def test_ngroup_respects_groupby_order(self):
g.ngroup())
assert_series_equal(Series(df['group_index'].values),
g.cumcount())

def test_count_with_datetime(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u parametrize to count over tz naive (and aware);
also over Timedeltas and Period would be great

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change test name to _datetimelike


expected = DataFrame({'y': [2, 1]}, index=['a', 'b'])
expected.index.name = "x"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use parametrize (the decorator)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I looked in the wrong test file for inspiration then.

Timestamp('2016-05-07 20:09:29+00:00')]
_test(d1)
d2 = [Timedelta(x, unit="h") for x in range(1, 4)]
_test(d2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to change Timedelta and Period to work here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand your request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the examples for timedelta and period are unique and won’t group to the same as the other examples

just change to individually construct them like you did for time stamps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we are not grouping by the timestamps, it doesn't matter if they are unique or not; the original issue fails regardless.

@watercrossing
Copy link
Contributor Author

I don't think the Travis Issue is related.

@jreback jreback added this to the 0.21.1 milestone Nov 8, 2017
@jreback jreback merged commit 4054632 into pandas-dev:master Nov 8, 2017
@jreback
Copy link
Contributor

jreback commented Nov 8, 2017

thanks @watercrossing, nice PR!

@watercrossing watercrossing deleted the groupbyCountFix branch November 9, 2017 10:27
watercrossing added a commit to watercrossing/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby
Projects
None yet
2 participants