min_count is ignored by sum/prod when resampling a PeriodIndex #19974

roedema · 2018-03-02T20:51:22Z

Code Sample

Setup:

import numpy as np
import pandas as pd

index = pd.date_range(start='2018', freq='M', periods=6)
data = np.ones(6)
data[3:6] = np.nan

datetime = pd.Series(data, index)
period = datetime.to_period()

Resampling and summing (min_count=1) the Series with a DatetimeIndex:

datetime
datetime.resample('Q').sum(min_count=1)

2018-01-31   1.0
2018-02-28   1.0
2018-03-31   1.0
2018-04-30   NaN
2018-05-31   NaN
2018-06-30   NaN
Freq: M, dtype: float64

2018-03-31    3.0
2018-06-30    NaN
Freq: Q-DEC, dtype: float64

Resampling and summing (min_count=1) the Series with a PeriodIndex:

period
period.resample('Q').sum(min_count=1)

2018-01    1.0
2018-02    1.0
2018-03    1.0
2018-04    NaN
2018-05    NaN
2018-06    NaN
Freq: M, dtype: float64

2018Q1    3.0
2018Q2    0.0
Freq: Q-DEC, dtype: float64

Problem description

sum() and prod() seem to ignore the min_count argument when used on a resampled series or dataframe with a PeriodIndex.

Expected Output

I would expect the same result whether using a DatetimeIndex or PeriodIndex. Specifically, in the example above, 2018Q2 should be NaN.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2018-03-08T16:55:17Z

Indeed, this does look like a violation of the spec. Investigation and patch is welcome!

mroeschke · 2019-10-20T20:18:35Z

Looks to be fixed on master. Could use a test.

In [3]: period.resample('Q').sum(min_count=1)
   ...:
Out[3]:
2018Q1    3.0
2018Q2    NaN
Freq: Q-DEC, dtype: float64

In [4]: pd.__version__
Out[4]: '0.26.0.dev0+593.g9d45934af'

roedema changed the title ~~min_count is ignored by sum/prod when resampling~~ min_count is ignored by sum/prod when resampling a PeriodIndex Mar 2, 2018

gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type labels Mar 8, 2018

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type labels Oct 20, 2019

ganevgv mentioned this issue Nov 21, 2019

TST: add tests for period resample sum with min_count #29762

Closed

5 tasks

jreback added this to the 1.0 milestone Nov 22, 2019

jreback modified the milestones: 1.0, Contributions Welcome Jan 1, 2020

mroeschke mentioned this issue Jan 22, 2020

TST: More regression tests #31196

Merged

7 tasks

simonjayhawkins modified the milestones: Contributions Welcome, 1.1 Jan 22, 2020

WillAyd closed this as completed in #31196 Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

min_count is ignored by sum/prod when resampling a PeriodIndex #19974

min_count is ignored by sum/prod when resampling a PeriodIndex #19974

roedema commented Mar 2, 2018 •

edited by gfyoung

Loading

INSTALLED VERSIONS

gfyoung commented Mar 8, 2018

mroeschke commented Oct 20, 2019

min_count is ignored by sum/prod when resampling a PeriodIndex #19974

min_count is ignored by sum/prod when resampling a PeriodIndex #19974

Comments

roedema commented Mar 2, 2018 • edited by gfyoung Loading

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

gfyoung commented Mar 8, 2018

mroeschke commented Oct 20, 2019

roedema commented Mar 2, 2018 •

edited by gfyoung

Loading

Output of `pd.show_versions()`