Skip to content

BUG: Fix groupby nunique with NaT #17624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 22, 2017

Conversation

Licht-T
Copy link
Contributor

@Licht-T Licht-T commented Sep 22, 2017

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo some minor comments. ping on green.

@@ -3688,6 +3688,18 @@ def test_nunique_with_timegrouper(self):
)['data'].apply(pd.Series.nunique)
tm.assert_series_equal(result, expected)

def test_nunique_with_timegrouper_and_nat(self):
test = pd.DataFrame({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the github issue number as a comment.

move this test to test_timegrouper.py (same dir)

@@ -3177,7 +3177,11 @@ def nunique(self, dropna=True):

out = np.add.reduceat(inc, idx).astype('int64', copy=False)
if len(ids):
res = out if ids[0] != -1 else out[1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a helpful comment here on what is happening

@jreback jreback added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Sep 22, 2017
@codecov
Copy link

codecov bot commented Sep 22, 2017

Codecov Report

Merging #17624 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17624      +/-   ##
==========================================
- Coverage    91.2%   91.18%   -0.02%     
==========================================
  Files         163      163              
  Lines       49637    49640       +3     
==========================================
- Hits        45269    45263       -6     
- Misses       4368     4377       +9
Flag Coverage Δ
#multiple 88.97% <100%> (ø) ⬆️
#single 40.18% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.23% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8276a42...800d1ad. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 22, 2017

Codecov Report

Merging #17624 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17624      +/-   ##
==========================================
- Coverage    91.2%   91.18%   -0.02%     
==========================================
  Files         163      163              
  Lines       49637    49655      +18     
==========================================
+ Hits        45269    45276       +7     
- Misses       4368     4379      +11
Flag Coverage Δ
#multiple 88.97% <100%> (ø) ⬆️
#single 40.17% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.23% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/io/excel.py 80.37% <0%> (-0.18%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️
pandas/core/indexes/range.py 92.83% <0%> (+0.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8276a42...29d4cf6. Read the comment docs.

@Licht-T
Copy link
Contributor Author

Licht-T commented Sep 22, 2017

@jreback Thanks for your review. All is now fixed!

@jreback jreback added this to the 0.21.0 milestone Sep 22, 2017
@jreback jreback merged commit f797c1d into pandas-dev:master Sep 22, 2017
@jreback
Copy link
Contributor

jreback commented Sep 22, 2017

thanks @Licht-T nice patch! keep em coming!

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.groupby(pd.TimeGrouper()) mishandles null values in dates
2 participants