Skip to content

Fix bug in window function count should count anything non-null #15196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

mralgos
Copy link
Contributor

@mralgos mralgos commented Jan 23, 2017

@jreback
Copy link
Contributor

jreback commented Jan 23, 2017

this looks right, pls add a note in whatsnew, bug fix section.

note that I did try to change this a while back and got some tests failures (which I think you mentioned), so these need to be investigated / fixed. I think the 'floatifying' is slightly different that checking for non-null, but need to drill down and see exactly.

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 23, 2017
# GH12541
df_inf = DataFrame({'x': [1, 2, 3], 'y': [1., 2., np.Inf]})
df_date = DataFrame({'x': [1, 2, 3],
'y': pd.date_range('20130101',periods=3)})
Copy link
Contributor

@jreback jreback Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add:

  • some NaT in here for the datetimes
  • add in a timedelta example,
  • float example with np.nan.
  • some strings (with np.nan)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do all of these with a single df (with multiple columns), and a single comparision tests (with the results)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pandas-dev/pandas/pull/15054/files#diff-d1a05a7cf744e0feb0fc6cd7128903a8 is an example of how to create these various types (though need to add some nans)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN and NaT won't be counted. Is that right?

@codecov-io
Copy link

codecov-io commented Jan 24, 2017

Current coverage is 85.98% (diff: 100%)

Merging #15196 into master will increase coverage by <.01%

@@             master     #15196   diff @@
==========================================
  Files           140        140          
  Lines         51263      51267     +4   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          44076      44084     +8   
+ Misses         7187       7183     -4   
  Partials          0          0          

Powered by Codecov. Last update be3f2ae...65d70eb

@@ -447,7 +447,7 @@ Bug Fixes


- Bug in ``Series`` constructor when both ``copy=True`` and ``dtype`` arguments are provided (:issue:`15125`)
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed something here by accident I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops, sorry about that!

'fl_nan': [1., 2., np.NaN],
'str_nan': ['aa', 'bb', np.NaN],
'dt_nat': [pd.Timestamp('20170101'), pd.Timestamp('20170203'),
pd.Timestamp(None)]},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to test Periods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure could add periods in here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this?

df = DataFrame(
{...,
'periods': [pd.Period('2012-01'), pd.Period('2012-02'),
pd.Period('2012-03')],
...
'periods_nat': [pd.Period('2012-01'), pd.Period('2012-02'),
pd.Period(None)]},
columns=cols)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@jreback jreback added this to the 0.20.0 milestone Jan 24, 2017
@jreback
Copy link
Contributor

jreback commented Jan 24, 2017

lgtm. ping on green (I will fix the minor typo on merge)

@jreback jreback closed this in 3ac83eb Jan 24, 2017
@jreback
Copy link
Contributor

jreback commented Jan 24, 2017

thanks @mralgos

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#12541

Author: Giacomo Ferroni <[email protected]>
Author: Giacomo <[email protected]>

Closes pandas-dev#15196 from mralgos/gh12541 and squashes the following commits:

65d70eb [Giacomo Ferroni] Added Periods to the test
94084b4 [Giacomo] Merge branch 'master' into gh12541
9621315 [Giacomo Ferroni] Added extra test and updated whatsnew
cb84935 [Giacomo Ferroni] pylint checks
26c86a5 [Giacomo Ferroni] Test added for GH12541
ea49e77 [Giacomo Ferroni] Fix for GH12541
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: window function count should count anything non-null
4 participants