-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Groupby indices error with datetime categorical #26859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Milestone
Comments
alexifm
added a commit
to alexifm/pandas
that referenced
this issue
Jun 14, 2019
This is to fix a bug reported in pandas-dev#26859
This was referenced Jun 14, 2019
Yeah, that's a bug. See you've already made a PR, thanks for that. |
Looks like this is fixed on master. Could use a test
|
take |
@mahaoyu are you still working on it? |
Is anybody still working on it? |
5 tasks
@FivelMttz updated my old tests and got the PR going. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Code Sample
Problem description
Tossing an error. You can play around with difference choices of columns but this happens so long as you include
'b'
with one of the other columns.'b'
on its own is okay.Expected Output
Not an error.
Cause
If we inspect,
BaseGrouper.indices
, we see thatkeys
gets passed toget_indexer_dict
here:pandas/pandas/core/groupby/ops.py
Lines 227 to 235 in 430f0fd
get_indexer_dict
eventually passes the elements ofkeys
toget_value_at
found here:pandas/pandas/_libs/util.pxd
Lines 94 to 99 in 2b32e41
The problem is that to build
keys
, theget_values
method is called on each group index (you can see inBaseGrouper.indices
how this isn't an issue when there's a single grouper). When grouping on a categorical-datetime column likedf['b']
, theget_values
method on the underlying categorical array is called and within that method this branch of the if statement is triggered, causing a DatetimeIndex to be returned instead of a numpy array.pandas/pandas/core/arrays/categorical.py
Line 1504 in 2b32e41
Solution
Now, it states in the
Categorical.get_values
doc string that an Index object could be return and not a numpy array. The simplest thing is to just introduce a line like this beforeget_indexer_dict
A pull request for this will be created imminently.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-21-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.14.3
scipy: 1.2.1
pyarrow: 0.13.0
xarray: None
IPython: 7.4.0
sphinx: 2.0.0
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.3.1
pandas_gbq: None
pandas_datareader: 0.7.0
gcsfs: None
The text was updated successfully, but these errors were encountered: