Skip to content

BUG: exclude object columns in DataFrame resample even if they are numeric #16811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zsluedem opened this issue Jul 1, 2017 · 7 comments
Closed
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Resample resample method

Comments

@zsluedem
Copy link

zsluedem commented Jul 1, 2017

xref #3087
xref #12537

i have a dataframe like this

df

2011-01-07 14:51:00 4288
2011-01-07 14:52:00 4262
2011-01-07 14:53:00 2586
2011-01-07 14:54:00 10738
2011-01-07 14:55:00 4448
2011-01-07 14:56:00 10565
2011-01-07 14:57:00 6539
2011-01-07 14:58:00 25077
2011-01-07 14:59:00 6163
2011-01-07 15:00:00 9644
2011-01-10 09:31:00 23382
2011-01-10 09:32:00 10298
2011-01-10 09:33:00 13689
2011-01-10 09:34:00 9793
2011-01-10 09:35:00 9870
2011-01-10 09:36:00 4249
2011-01-10 09:37:00 5298
2011-01-10 09:38:00 3752
2011-01-10 09:39:00 4087
2011-01-10 09:40:00 10728
Name: volume, dtype: object

df.resample('D').sum()

2011-01-07 84310
2011-01-08 0
2011-01-09 0
2011-01-10 95146
Freq: D, Name: volume, dtype: int64

Problem description

when i execute code above

it produce the 0 not nan where not sample is there

i try another data type float
it normally produce nan
i think it's about the data type

Expected Output

2011-01-07 84310
2011-01-08 nan
2011-01-09 nan
2011-01-10 95146
Freq: D, Name: volume, dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.1.1
pip: 9.0.1
setuptools: 35.0.2
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.9.6
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jul 1, 2017

pls show an actualy reproducible example with code.

@dsm054
Copy link
Contributor

dsm054 commented Jul 2, 2017

I'm pretty sure the OP is referring to this effect:

In [26]: s = pd.Series(1, index=pd.date_range("2017-01-05", periods=4, freq="B"))

In [27]: s.resample("D").sum()
Out[27]: 
2017-01-05    1.0
2017-01-06    1.0
2017-01-07    NaN
2017-01-08    NaN
2017-01-09    1.0
2017-01-10    1.0
Freq: D, dtype: float64

In [28]: s.astype(object).resample("D").sum()
Out[28]: 
2017-01-05    1
2017-01-06    1
2017-01-07    0
2017-01-08    0
2017-01-09    1
2017-01-10    1
Freq: D, dtype: int64

@zsluedem
Copy link
Author

zsluedem commented Jul 3, 2017

@dsm054 i think you are right . i can't reproduce the example now.
it happened in the middle of my code but i debug it now.

@jreback
Copy link
Contributor

jreback commented Jul 5, 2017

So we are resampling on the object columns. This is a bug as we exclude them. This should ignore this column.

In [26]: df = pd.DataFrame({'A': [1, 2, 3], 'B': list('abc'), 'C':pd.date_range('20130101', periods=3, freq='1min')})

In [27]: df
Out[27]: 
   A  B                   C
0  1  a 2013-01-01 00:00:00
1  2  b 2013-01-01 00:01:00
2  3  c 2013-01-01 00:02:00

In [28]: df.resample('1H', on='C').mean()
Out[28]: 
            A
C            
2013-01-01  2

@jreback jreback added Bug Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Resample resample method labels Jul 5, 2017
@jreback jreback added this to the Next Major Release milestone Jul 5, 2017
@jreback jreback changed the title resample with method sum() produce 0 not nan BUG: exclude object columns in DataFrame resample even if they are numeric Jul 5, 2017
@yiwiz-sai
Copy link

yiwiz-sai commented Jan 2, 2018

def test(grp):
    print grp
    return ''.join(grp)
ser=pd.Series(list('abcdfefeagg'))
print ser.rolling(window=3).apply(test)

I can not get what I want....
the rolling function can not work for string
my pandas version is 0.21

@ron819
Copy link

ron819 commented Nov 6, 2018

any chance this will be fixed for next release?

@jbrockmendel jbrockmendel added the Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply label Sep 22, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

We no longer drop nuisance columns, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Resample resample method
Projects
None yet
Development

No branches or pull requests

7 participants