-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: groupby and agg on read-only array gives ValueError: buffer source array is read-only #36014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jeet-parekh Can you create a copy / pastable example that doesn't use external links? https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports |
A couple of more things. This fails df_new.groupby(["species"])["sepal_length"].sum() This works df_new.groupby(["species"])[["sepal_length"]].sum() |
@dsaxton, fixed it. I missed that fact that it isn't copy-pastable. Will edit in the main issue post as well. |
Thanks @jeet-parekh. Fails on master as well and looks like a bug to me. |
Another temporary workaround is to make a copy:
Then it seems to me that all groupby ops work, whether as a Series or a DataFrame. |
A reproducer without the use of pyarrow:
It's already failing in 1.0, but not in 0.25. So not a very recent regression, but still a regression compared to 0.25. |
I see the same behaviour with @jorisvandenbossche's code. It succeeds for min, max, count, and median aggregations. But fails for sum and mean. Not sure if that's relevant. |
The direct fix would be to add a pandas/pandas/_libs/groupby.pyx Lines 473 to 477 in b528be6
however, using |
I have checked that this issue has not already been reported.
Two variants of this bug have been reported - BUG: pd.read_parquet with pyarrow fails when row number is 0 and contains Pandas extensions type #35436 and BUG: read-only buffer failures in datetime parsing #34857
EDIT: I read into those two issues a bit more. They don't seem similar. But I'll keep it there.
I have confirmed this bug exists on the latest version of pandas.
Bug exists in pandas 1.1.1
Code Sample, a copy-pastable example
Problem description
This is the traceback.
In the
.agg
line that fails, if you do a min, max, median, or count aggregation, then it's going to work.But if you do a sum or mean, then it fails.
Expected Output
I expected the aggregation to succeed without any error.
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: