Rolling(axis='columns').count() ignores axis= keyword #13503

mrocklin · 2016-06-23T16:15:24Z

addtl example on #13753

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': range(5), 'y': range(5)})

In [3]: df
Out[3]: 
   x  y
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4

In [4]: df.rolling(2, axis='columns').sum()  # meets expectations
Out[4]: 
    x    y
0 NaN  0.0
1 NaN  2.0
2 NaN  4.0
3 NaN  6.0
4 NaN  8.0

In [5]: df.rolling(2, axis='columns').count()  # appears to be the same as axis='rows'
Out[5]: 
     x    y
0  1.0  1.0
1  2.0  2.0
2  2.0  2.0
3  2.0  2.0
4  2.0  2.0

In [6]: df.rolling(2, axis='rows').count()  # yeah, exactly the same
Out[6]: 
     x    y
0  1.0  1.0
1  2.0  2.0
2  2.0  2.0
3  2.0  2.0
4  2.0  2.0

output of `pd.show_versions()`

In [7]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Arose from tests in dask/dask#1280

The text was updated successfully, but these errors were encountered:

jreback · 2016-06-23T20:44:42Z

yes .count() is one of the few methods that is implemented in a special way so it takes a different path and is not using the .axis arg :<

mrocklin · 2016-06-23T21:05:24Z

Should I consider this a bug or is count(axis=) not supported?

jreback · 2016-06-23T21:08:45Z

no, its a bug.

jorisvandenbossche · 2016-06-24T15:21:52Z

Actually, using axis=1 seems broken for the other methods (which use _apply) as well if you have different dtypes:

In [49]: df = pd.DataFrame({'x': range(5), 'y': np.arange(0, 1, 0.2)})

In [50]: df
Out[50]:
   x    y
0  0  0.0
1  1  0.2
2  2  0.4
3  3  0.6
4  4  0.8

In [51]: df.rolling(2, axis=1).sum()
Out[51]:
    x   y
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN

Because the _apply does this for each block of uniform dtype, which is fine for axis=0, but not for axis=1.

yhaque1213 · 2019-04-06T20:52:50Z

I'm interested in helping out with this! Can I work on it?

yhaque1213 · 2019-04-17T22:32:02Z

@jorisvandenbossche

Actually, using axis=1 seems broken for the other methods (which use _apply) as well if you have different dtypes:
In [49]: df = pd.DataFrame({'x': range(5), 'y': np.arange(0, 1, 0.2)})

In [50]: df
Out[50]:
   x    y
0  0  0.0
1  1  0.2
2  2  0.4
3  3  0.6
4  4  0.8

In [51]: df.rolling(2, axis=1).sum()
Out[51]:
    x   y
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
Because the _apply does this for each block of uniform dtype, which is fine for axis=0, but not for axis=1.

I think that this issue is not an issue with the axis, it is an issue with the way that the blocks are being created in the call to _to_dict_of_blocks() in generic.py, blocks are created by homogeneous data types. This splits the original DataFrame by type, causing the sum() problem later on.

If the DataFrame is of a uniform type, then it will create one block that includes both columns.
If the DataFrame’s columns are of different types, then it creates two separate blocks, one for each column. When the rolling sum is calculated for each block, then the prepended NaN's take over, resulting in this error.

It seems like this should be filed under a different issue, since it does not have anything to do with axis.

jreback · 2019-04-17T22:42:51Z

axis=1 generally is not implemented at all as unless you have a single dtype you have mixed types across columns

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Novice labels Jun 23, 2016

jreback added this to the Next Major Release milestone Jun 23, 2016

This was referenced Jul 22, 2016

pd.expanding is incorrectly calculating window size when axis=1 #13753

Closed

BUG: rolling with axis=1 #13757

Closed

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

yhaque1213 mentioned this issue Apr 11, 2019

BUG: rolling.count with axis=1 #26055

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 0.25.0 Apr 16, 2019

jreback closed this as completed in #26055 Apr 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling(axis='columns').count() ignores axis= keyword #13503

Rolling(axis='columns').count() ignores axis= keyword #13503

mrocklin commented Jun 23, 2016 •

edited by jreback

Loading

jreback commented Jun 23, 2016

mrocklin commented Jun 23, 2016

jreback commented Jun 23, 2016

jorisvandenbossche commented Jun 24, 2016

yhaque1213 commented Apr 6, 2019

yhaque1213 commented Apr 17, 2019

jreback commented Apr 17, 2019

Rolling(axis='columns').count() ignores axis= keyword #13503

Rolling(axis='columns').count() ignores axis= keyword #13503

Comments

mrocklin commented Jun 23, 2016 • edited by jreback Loading

Code Sample, a copy-pastable example if possible

output of pd.show_versions()

jreback commented Jun 23, 2016

mrocklin commented Jun 23, 2016

jreback commented Jun 23, 2016

jorisvandenbossche commented Jun 24, 2016

yhaque1213 commented Apr 6, 2019

yhaque1213 commented Apr 17, 2019

jreback commented Apr 17, 2019

mrocklin commented Jun 23, 2016 •

edited by jreback

Loading

output of `pd.show_versions()`