-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Groupby creates emptu groups depending on base parameter #25161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This appears fixed in 0.24.x.
I don't think |
|
I am guessing that choice was made because the intention was to set the "base" relative to the resampling frequency. Feel free to open another issue to discuss the possibility to extending the base argument. |
Hi, could I create a test function for this problem? That would be my first issue. |
Use resample() which is what Grouper calls, assert index instead of result of size()
Code Sample, a copy-pastable example if possible
Problem description
The code above generates either 6 or 7 groups depending if the dataframe starts at '2018-11-26 16:17:43.500000' (case 1) or '2018-11-26 16:17:43.510000' (case 2).
The correct output is clearly the one obtained in case 2. Case 1, instead, creates an empty group at the end of the dataframe. This can cause troubles with groupby.apply() if the applied function does not handle empty dataframes.
Actual Output
2018-11-26 16:17:43.510 10
2018-11-26 16:27:43.510 #0
dtype: int64
Expected Output
2018-11-26 16:17:43.510 10
dtype: int64
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.23.4
pytest: 4.0.2
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.12
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: