-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Grouping-aggregation with first() discards categoricals column #22512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@zoquda : Thanks for reporting this! When you say, "reportedly", can you confirm if it works in |
@gfyoung Certainly. I've repeated the test with different pandas versions. The issue occurs from pandas version |
Ah, so this looks like a regression then. Patch and PR are welcome then! |
I'll take a look at this one. Thanks! |
another similar example:
yields
|
I haven't had much time to work on extra stuff lately. I will try to swing back to this soon. |
This appears to work on master now. Could use a test
|
I'll take! |
Hi, I've started making tests for the aggregate first() function using DFrames with different columns types. I was wondering if addition tests need to be made for other aggregate functions with categorical columns for this issue? |
Code Sample
with the result dataframes looking as follows:
Problem description
A grouping-aggregation operation with
first()
as aggregation function and multiple columns asby
seems to discard categorical columns (see theunexpected_result
dataframe in the above code example). Besides being unexpected, this behaviour also seems inconsistent with the other similar grouping-aggregation operations above.See also this stackoverflow page.
Expected Output
The result of
df2.groupby(by=['A', 'B']).first()
would be expected as:The older pandas version 0.21.0 reportedly does produce this expected output.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.7.2
pip: 10.0.1
setuptools: 40.0.0
Cython: 0.28.5
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.7
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.7
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.7
lxml: 4.2.4
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.10
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: