-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
groupby on 2 categorical columns, when one categorical is based on datetimes, incorrectly returns all NaN dataframe #21390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems to boil down to a problem with reindexing with such a categorical index:
on 0.22.0 works correctly, but on master gives:
|
cc @toobaz this seems to be related to the new MultiIndexUIntEngine Using the above example (
|
Sorry Pietro, probably a bit prematurely pointed to that :-), as in the end it is code in the MultiIndexUIntEngine that surfaces another bug. Iterating a MultiIndex (
|
Good :-) In general, it is unlikely that bugs in the MI engine code are dtype-specific, as it entirely delegates actual lookup to single levels, and only looks for integers (codes). |
PR: #21657 |
Code Sample, a copy-pastable example if possible
Problem description
When grouping by [str, datetime] columns, results are as expected:
After converting
label1
,label2
, andminute
to categoricals, that same groupby returns all NaNs:I only got this bug when grouping on 2 categoricals with one of them being datetime based (order is irrelevant). Grouping by
['label1', 'label2']
and'minute'
by itself works as expected.Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-22-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.0.5
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: