Skip to content

Rolling(axis='columns').count() ignores axis= keyword #13503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrocklin opened this issue Jun 23, 2016 · 7 comments · Fixed by #26055
Closed

Rolling(axis='columns').count() ignores axis= keyword #13503

mrocklin opened this issue Jun 23, 2016 · 7 comments · Fixed by #26055
Labels
Bug good first issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@mrocklin
Copy link
Contributor

mrocklin commented Jun 23, 2016

addtl example on #13753

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': range(5), 'y': range(5)})

In [3]: df
Out[3]: 
   x  y
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4

In [4]: df.rolling(2, axis='columns').sum()  # meets expectations
Out[4]: 
    x    y
0 NaN  0.0
1 NaN  2.0
2 NaN  4.0
3 NaN  6.0
4 NaN  8.0

In [5]: df.rolling(2, axis='columns').count()  # appears to be the same as axis='rows'
Out[5]: 
     x    y
0  1.0  1.0
1  2.0  2.0
2  2.0  2.0
3  2.0  2.0
4  2.0  2.0

In [6]: df.rolling(2, axis='rows').count()  # yeah, exactly the same
Out[6]: 
     x    y
0  1.0  1.0
1  2.0  2.0
2  2.0  2.0
3  2.0  2.0
4  2.0  2.0

output of pd.show_versions()

In [7]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Arose from tests in dask/dask#1280

@jreback
Copy link
Contributor

jreback commented Jun 23, 2016

yes .count() is one of the few methods that is implemented in a special way so it takes a different path and is not using the .axis arg :<

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Novice labels Jun 23, 2016
@jreback jreback added this to the Next Major Release milestone Jun 23, 2016
@mrocklin
Copy link
Contributor Author

Should I consider this a bug or is count(axis=) not supported?

@jreback
Copy link
Contributor

jreback commented Jun 23, 2016

no, its a bug.

@jorisvandenbossche
Copy link
Member

Actually, using axis=1 seems broken for the other methods (which use _apply) as well if you have different dtypes:

In [49]: df = pd.DataFrame({'x': range(5), 'y': np.arange(0, 1, 0.2)})

In [50]: df
Out[50]:
   x    y
0  0  0.0
1  1  0.2
2  2  0.4
3  3  0.6
4  4  0.8

In [51]: df.rolling(2, axis=1).sum()
Out[51]:
    x   y
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN

Because the _apply does this for each block of uniform dtype, which is fine for axis=0, but not for axis=1.

@yhaque1213
Copy link
Contributor

I'm interested in helping out with this! Can I work on it?

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Apr 16, 2019
@yhaque1213
Copy link
Contributor

@jorisvandenbossche

Actually, using axis=1 seems broken for the other methods (which use _apply) as well if you have different dtypes:

In [49]: df = pd.DataFrame({'x': range(5), 'y': np.arange(0, 1, 0.2)})

In [50]: df
Out[50]:
   x    y
0  0  0.0
1  1  0.2
2  2  0.4
3  3  0.6
4  4  0.8

In [51]: df.rolling(2, axis=1).sum()
Out[51]:
    x   y
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN

Because the _apply does this for each block of uniform dtype, which is fine for axis=0, but not for axis=1.

I think that this issue is not an issue with the axis, it is an issue with the way that the blocks are being created in the call to _to_dict_of_blocks() in generic.py, blocks are created by homogeneous data types. This splits the original DataFrame by type, causing the sum() problem later on.

If the DataFrame is of a uniform type, then it will create one block that includes both columns.
If the DataFrame’s columns are of different types, then it creates two separate blocks, one for each column. When the rolling sum is calculated for each block, then the prepended NaN's take over, resulting in this error.

It seems like this should be filed under a different issue, since it does not have anything to do with axis.

@jreback
Copy link
Contributor

jreback commented Apr 17, 2019

axis=1 generally is not implemented at all as unless you have a single dtype you have mixed types across columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants