We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup:
import numpy as np import pandas as pd index = pd.date_range(start='2018', freq='M', periods=6) data = np.ones(6) data[3:6] = np.nan datetime = pd.Series(data, index) period = datetime.to_period()
Resampling and summing (min_count=1) the Series with a DatetimeIndex:
min_count=1
datetime datetime.resample('Q').sum(min_count=1)
2018-01-31 1.0 2018-02-28 1.0 2018-03-31 1.0 2018-04-30 NaN 2018-05-31 NaN 2018-06-30 NaN Freq: M, dtype: float64
2018-03-31 3.0 2018-06-30 NaN Freq: Q-DEC, dtype: float64
Resampling and summing (min_count=1) the Series with a PeriodIndex:
period period.resample('Q').sum(min_count=1)
2018-01 1.0 2018-02 1.0 2018-03 1.0 2018-04 NaN 2018-05 NaN 2018-06 NaN Freq: M, dtype: float64
2018Q1 3.0 2018Q2 0.0 Freq: Q-DEC, dtype: float64
sum() and prod() seem to ignore the min_count argument when used on a resampled series or dataframe with a PeriodIndex.
sum()
prod()
min_count
I would expect the same result whether using a DatetimeIndex or PeriodIndex. Specifically, in the example above, 2018Q2 should be NaN.
pd.show_versions()
commit: None python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.22.0 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.5.0.post20170921 Cython: 0.26.1 numpy: 1.13.3 scipy: 0.19.1 pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.1.0 openpyxl: 2.4.8 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.0 bs4: 4.6.0 html5lib: 0.999999999 sqlalchemy: 1.1.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
The text was updated successfully, but these errors were encountered:
Indeed, this does look like a violation of the spec. Investigation and patch is welcome!
Sorry, something went wrong.
Looks to be fixed on master. Could use a test.
In [3]: period.resample('Q').sum(min_count=1) ...: Out[3]: 2018Q1 3.0 2018Q2 NaN Freq: Q-DEC, dtype: float64 In [4]: pd.__version__ Out[4]: '0.26.0.dev0+593.g9d45934af'
Successfully merging a pull request may close this issue.
Code Sample
Setup:
Resampling and summing (
min_count=1
) the Series with a DatetimeIndex:Resampling and summing (
min_count=1
) the Series with a PeriodIndex:Problem description
sum()
andprod()
seem to ignore themin_count
argument when used on a resampled series or dataframe with a PeriodIndex.Expected Output
I would expect the same result whether using a DatetimeIndex or PeriodIndex. Specifically, in the example above, 2018Q2 should be NaN.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: