Skip to content

BUG: window function count should count anything non-null #12541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kawochen opened this issue Mar 6, 2016 · 10 comments
Closed

BUG: window function count should count anything non-null #12541

kawochen opened this issue Mar 6, 2016 · 10 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@kawochen
Copy link
Contributor

kawochen commented Mar 6, 2016

Right now np.Inf and non-numeric types are not counted.

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
from pandas import DataFrame

df = DataFrame({'A': pd.date_range('20130101',periods=4), 'B': [0, 1, 2, np.Inf], 'C':['a', 'b', 'c', 'd']})

roll = df.rolling(window=2)
res = roll.count()
print(res)
## partial output A, B don't error
     A    B
0  1.0  1.0
1  2.0  2.0
2  2.0  2.0
3  2.0  1.0
## column C errors

Expected Output

   A  B  C
0  1  1  1
1  2  2  2
2  2  2  2
3  2  2  2

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: a0aaad98e8c567a3a04aef165cdb159fbd98d13f
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-76-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0rc1+77.ga0aaad9.dirty
nose: 1.3.7
pip: 8.0.1
setuptools: 19.4
Cython: 0.23.4
numpy: 1.10.2
scipy: None
statsmodels: None
xarray: None
IPython: 4.0.0
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.10
pymysql: None
psycopg2: None
jinja2: None
boto: None

I think it can be patched, together with #12538, by

--- a/pandas/core/window.py
+++ b/pandas/core/window.py
@@ -509,19 +509,9 @@ class _Rolling_and_Expanding(_Rolling):
         blocks, obj = self._create_blocks(how=None)
         results = []
         for b in blocks:
-
-            if com.needs_i8_conversion(b.values):
-                result = b.notnull().astype(int)
-            else:
-                try:
-                    result = np.isfinite(b).astype(float)
-                except TypeError:
-                    result = np.isfinite(b.astype(float)).astype(float)
-
-                result[pd.isnull(result)] = 0
-
-            result = self._constructor(result, window=window, min_perio
-                                       center=self.center).sum()
+            result = self._constructor(
+                b.notnull(), window=window, min_periods=0, center=self.
+            ).sum().astype(int)

@jreback
Copy link
Contributor

jreback commented Mar 7, 2016

yes that looks right.

anything break if you do that?

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Mar 7, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 7, 2016
@masongallo
Copy link
Contributor

I was going to help on this during my lunch, but it looks like this causes a bunch of tests to fail in test_window.py since they expect floats. Not sure if the tests should just be coerced to expect int - I can't think of a case where a count should be a float.

@jreback
Copy link
Contributor

jreback commented Mar 7, 2016

@masongallo the tests need to be changed as well, for count only as well. The xref issue #12538 concerns that (but I am going to close in favor of this one)

@kawochen
Copy link
Contributor Author

kawochen commented Mar 7, 2016

For the record, can we state why signed int is preferred over unsigned int?

@jreback
Copy link
Contributor

jreback commented Mar 7, 2016

we could just use uints, but I think the problem is once you cast to uints they never cast back to integers unless you explicity do it. So if you then interoperate with signed integers you get casting.

It would probably work / be ok. uints just aren't supported quite as well (though everything may work).

@jreback
Copy link
Contributor

jreback commented Mar 7, 2016

So for consistency make this return ints.

We can always revisit the counting things -> uints later

@jreback
Copy link
Contributor

jreback commented Apr 8, 2016

hmm, anyone want to do this PR?

@masongallo
Copy link
Contributor

I'm quite busy for the next week, but I'll take a look afterwards if someone doesn't beat me to it

@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 26, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Sep 1, 2016
@mralgos
Copy link
Contributor

mralgos commented Jan 23, 2017

Hi guys, I've applied the fix suggested by @kawochen and added a specific test for that (cf. master...mralgos:gh12541).
I haven't had failures on test_window.py, but another weird thing happens at the end of a build test. More info here: https://travis-ci.org/mralgos/pandas/builds/194060134
Any idea? Thanks

@jreback
Copy link
Contributor

jreback commented Jan 23, 2017

@mralgos if you want to put up a PR that would be great

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
closes pandas-dev#12541

Author: Giacomo Ferroni <[email protected]>
Author: Giacomo <[email protected]>

Closes pandas-dev#15196 from mralgos/gh12541 and squashes the following commits:

65d70eb [Giacomo Ferroni] Added Periods to the test
94084b4 [Giacomo] Merge branch 'master' into gh12541
9621315 [Giacomo Ferroni] Added extra test and updated whatsnew
cb84935 [Giacomo Ferroni] pylint checks
26c86a5 [Giacomo Ferroni] Test added for GH12541
ea49e77 [Giacomo Ferroni] Fix for GH12541
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants