-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix bug in window function count should count anything non-null #15196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this looks right, pls add a note in whatsnew, bug fix section. note that I did try to change this a while back and got some tests failures (which I think you mentioned), so these need to be investigated / fixed. I think the 'floatifying' is slightly different that checking for non-null, but need to drill down and see exactly. |
# GH12541 | ||
df_inf = DataFrame({'x': [1, 2, 3], 'y': [1., 2., np.Inf]}) | ||
df_date = DataFrame({'x': [1, 2, 3], | ||
'y': pd.date_range('20130101',periods=3)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add:
- some NaT in here for the datetimes
- add in a timedelta example,
- float example with np.nan.
- some strings (with np.nan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do all of these with a single df (with multiple columns), and a single comparision tests (with the results)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/pandas-dev/pandas/pull/15054/files#diff-d1a05a7cf744e0feb0fc6cd7128903a8 is an example of how to create these various types (though need to add some nans)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NaN and NaT won't be counted. Is that right?
Current coverage is 85.98% (diff: 100%)@@ master #15196 diff @@
==========================================
Files 140 140
Lines 51263 51267 +4
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 44076 44084 +8
+ Misses 7187 7183 -4
Partials 0 0
|
@@ -447,7 +447,7 @@ Bug Fixes | |||
|
|||
|
|||
- Bug in ``Series`` constructor when both ``copy=True`` and ``dtype`` arguments are provided (:issue:`15125`) | |||
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`) | |||
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed something here by accident I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ops, sorry about that!
'fl_nan': [1., 2., np.NaN], | ||
'str_nan': ['aa', 'bb', np.NaN], | ||
'dt_nat': [pd.Timestamp('20170101'), pd.Timestamp('20170203'), | ||
pd.Timestamp(None)]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to test Periods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure could add periods in here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this?
df = DataFrame(
{...,
'periods': [pd.Period('2012-01'), pd.Period('2012-02'),
pd.Period('2012-03')],
...
'periods_nat': [pd.Period('2012-01'), pd.Period('2012-02'),
pd.Period(None)]},
columns=cols)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
lgtm. ping on green (I will fix the minor typo on merge) |
thanks @mralgos |
closes pandas-dev#12541 Author: Giacomo Ferroni <[email protected]> Author: Giacomo <[email protected]> Closes pandas-dev#15196 from mralgos/gh12541 and squashes the following commits: 65d70eb [Giacomo Ferroni] Added Periods to the test 94084b4 [Giacomo] Merge branch 'master' into gh12541 9621315 [Giacomo Ferroni] Added extra test and updated whatsnew cb84935 [Giacomo Ferroni] pylint checks 26c86a5 [Giacomo Ferroni] Test added for GH12541 ea49e77 [Giacomo Ferroni] Fix for GH12541
git diff upstream/master | flake8 --diff