BUG: exclude object columns in DataFrame resample even if they are numeric #16811

zsluedem · 2017-07-01T02:07:33Z

i have a dataframe like this

df

2011-01-07 14:51:00 4288
2011-01-07 14:52:00 4262
2011-01-07 14:53:00 2586
2011-01-07 14:54:00 10738
2011-01-07 14:55:00 4448
2011-01-07 14:56:00 10565
2011-01-07 14:57:00 6539
2011-01-07 14:58:00 25077
2011-01-07 14:59:00 6163
2011-01-07 15:00:00 9644
2011-01-10 09:31:00 23382
2011-01-10 09:32:00 10298
2011-01-10 09:33:00 13689
2011-01-10 09:34:00 9793
2011-01-10 09:35:00 9870
2011-01-10 09:36:00 4249
2011-01-10 09:37:00 5298
2011-01-10 09:38:00 3752
2011-01-10 09:39:00 4087
2011-01-10 09:40:00 10728
Name: volume, dtype: object

df.resample('D').sum()

2011-01-07 84310
2011-01-08 0
2011-01-09 0
2011-01-10 95146
Freq: D, Name: volume, dtype: int64

Problem description

when i execute code above

it produce the 0 not nan where not sample is there

i try another data type float
it normally produce nan
i think it's about the data type

Expected Output

2011-01-07 84310
2011-01-08 nan
2011-01-09 nan
2011-01-10 95146
Freq: D, Name: volume, dtype: int64

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.1.1
pip: 9.0.1
setuptools: 35.0.2
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.9.6
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-07-01T08:00:15Z

pls show an actualy reproducible example with code.

dsm054 · 2017-07-02T19:33:48Z

I'm pretty sure the OP is referring to this effect:

In [26]: s = pd.Series(1, index=pd.date_range("2017-01-05", periods=4, freq="B"))

In [27]: s.resample("D").sum()
Out[27]: 
2017-01-05    1.0
2017-01-06    1.0
2017-01-07    NaN
2017-01-08    NaN
2017-01-09    1.0
2017-01-10    1.0
Freq: D, dtype: float64

In [28]: s.astype(object).resample("D").sum()
Out[28]: 
2017-01-05    1
2017-01-06    1
2017-01-07    0
2017-01-08    0
2017-01-09    1
2017-01-10    1
Freq: D, dtype: int64

zsluedem · 2017-07-03T01:22:04Z

@dsm054 i think you are right . i can't reproduce the example now.
it happened in the middle of my code but i debug it now.

jreback · 2017-07-05T22:27:50Z

So we are resampling on the object columns. This is a bug as we exclude them. This should ignore this column.

In [26]: df = pd.DataFrame({'A': [1, 2, 3], 'B': list('abc'), 'C':pd.date_range('20130101', periods=3, freq='1min')})

In [27]: df
Out[27]: 
   A  B                   C
0  1  a 2013-01-01 00:00:00
1  2  b 2013-01-01 00:01:00
2  3  c 2013-01-01 00:02:00

In [28]: df.resample('1H', on='C').mean()
Out[28]: 
            A
C            
2013-01-01  2

yiwiz-sai · 2018-01-02T01:11:13Z

def test(grp):
    print grp
    return ''.join(grp)
ser=pd.Series(list('abcdfefeagg'))
print ser.rolling(window=3).apply(test)

I can not get what I want....
the rolling function can not work for string
my pandas version is 0.21

ron819 · 2018-11-06T06:12:32Z

any chance this will be fixed for next release?

jbrockmendel · 2023-03-27T23:38:50Z

We no longer drop nuisance columns, closing.

jreback added Bug Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Resample resample method labels Jul 5, 2017

jreback added this to the Next Major Release milestone Jul 5, 2017

jreback changed the title ~~resample with method sum() produce 0 not nan~~ BUG: exclude object columns in DataFrame resample even if they are numeric Jul 5, 2017

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

jbrockmendel added the Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply label Sep 22, 2020

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

jbrockmendel closed this as completed Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: exclude object columns in DataFrame resample even if they are numeric #16811

BUG: exclude object columns in DataFrame resample even if they are numeric #16811

zsluedem commented Jul 1, 2017 •

edited by jreback

Loading

INSTALLED VERSIONS

jreback commented Jul 1, 2017

dsm054 commented Jul 2, 2017

zsluedem commented Jul 3, 2017

jreback commented Jul 5, 2017

yiwiz-sai commented Jan 2, 2018 •

edited

Loading

ron819 commented Nov 6, 2018

jbrockmendel commented Mar 27, 2023

BUG: exclude object columns in DataFrame resample even if they are numeric #16811

BUG: exclude object columns in DataFrame resample even if they are numeric #16811

Comments

zsluedem commented Jul 1, 2017 • edited by jreback Loading

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Jul 1, 2017

dsm054 commented Jul 2, 2017

zsluedem commented Jul 3, 2017

jreback commented Jul 5, 2017

yiwiz-sai commented Jan 2, 2018 • edited Loading

ron819 commented Nov 6, 2018

jbrockmendel commented Mar 27, 2023

zsluedem commented Jul 1, 2017 •

edited by jreback

Loading

Output of `pd.show_versions()`

yiwiz-sai commented Jan 2, 2018 •

edited

Loading