Skip to content

BUG: resample nunique calculation incorrect #13795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
agtsai-i opened this issue Jul 26, 2016 · 2 comments
Closed

BUG: resample nunique calculation incorrect #13795

agtsai-i opened this issue Jul 26, 2016 · 2 comments
Labels
Bug Duplicate Report Duplicate issue or pull request Resample resample method

Comments

@agtsai-i
Copy link

Summary

Passing 'nunique' to resample produces incorrect output. However, passing pd.Series.nunique works.

Code Sample, a copy-pastable example if possible

import pandas as pd
from pandas import Timestamp

time = [Timestamp('2016-06-28 09:35:35'), Timestamp('2016-06-28 16:09:30'), Timestamp('2016-06-28 16:46:28')]
data = ['1', '2', '3']
test = pd.DataFrame({'time':time, 'data':data})
display(test.set_index('time').resample('H').apply({'data':'nunique'}))
display(test.set_index('time').resample('H').apply({'data':pd.Series.nunique}))

Expected Output

(INCORRECT, in bold)
2016-06-28 09:00:00 1
2016-06-28 10:00:00 0
2016-06-28 11:00:00 0
2016-06-28 12:00:00 0
2016-06-28 13:00:00 0
2016-06-28 14:00:00 0
2016-06-28 15:00:00 0
2016-06-28 16:00:00 1

(CORRECT, in bold)
2016-06-28 09:00:00 1
2016-06-28 10:00:00 0
2016-06-28 11:00:00 0
2016-06-28 12:00:00 0
2016-06-28 13:00:00 0
2016-06-28 14:00:00 0
2016-06-28 15:00:00 0
2016-06-28 16:00:00 2

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.10-17.31.amzn1.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.3.1
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jul 26, 2016

looks like the same as #13453

@agtsai-i
Copy link
Author

Wasn't quite sure, so I submitted anyways

@sinhrks sinhrks added Bug Resample resample method labels Jul 26, 2016
@jreback jreback added the Duplicate Report Duplicate issue or pull request label Jul 26, 2016
@jreback jreback closed this as completed Jul 26, 2016
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Jul 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Resample resample method
Projects
None yet
Development

No branches or pull requests

4 participants