-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: ValueError: buffer source array is read-only during groupby #33410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here's a reproducer without parquet. In [29]: cats = np.array([1])
In [30]: cats.flags.writeable = False
In [31]: df = pd.DataFrame({"a": [1], "b": pd.Categorical([1], categories=pd.Index(cats))})
In [32]: df.groupby("b", sort=False)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-32-882468ddac04> in <module>
----> 1 df.groupby("b", sort=False)
~/sandbox/pandas/pandas/core/frame.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed)
5825 group_keys=group_keys,
5826 squeeze=squeeze,
-> 5827 observed=observed,
5828 )
5829
~/sandbox/pandas/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated)
408 sort=sort,
409 observed=observed,
--> 410 mutated=self.mutated,
411 )
412
~/sandbox/pandas/pandas/core/groupby/grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate)
623 in_axis=in_axis,
624 )
--> 625 if not isinstance(gpr, Grouping)
626 else gpr
627 )
~/sandbox/pandas/pandas/core/groupby/grouper.py in __init__(self, index, grouper, obj, name, level, sort, observed, in_axis)
310
311 self.grouper, self.all_grouper = recode_for_groupby(
--> 312 self.grouper, self.sort, observed
313 )
314 categories = self.grouper.categories
~/sandbox/pandas/pandas/core/groupby/categorical.py in recode_for_groupby(c, sort, observed)
69 # including those missing from the data (GH-13179), which .unique()
70 # above dropped
---> 71 cat = cat.add_categories(c.categories[~c.categories.isin(cat.categories)])
72
73 return c.reorder_categories(cat.categories), None
~/sandbox/pandas/pandas/core/indexes/base.py in isin(self, values, level)
4872 if level is not None:
4873 self._validate_index_level(level)
-> 4874 return algos.isin(self, values)
4875
4876 def _get_string_slice(self, key: str_t, use_lhs: bool = True, use_rhs: bool = True):
~/sandbox/pandas/pandas/core/algorithms.py in isin(comps, values)
452 comps = comps.astype(object)
453
--> 454 return f(comps, values)
455
456
~/sandbox/pandas/pandas/_libs/hashtable_func_helper.pxi in pandas._libs.hashtable.ismember_int64()
553 @cython.wraparound(False)
554 @cython.boundscheck(False)
--> 555 def ismember_int64(int64_t[:] arr, int64_t[:] values):
556 """
557 Return boolean of values in arr on an
~/sandbox/pandas/pandas/_libs/hashtable.cpython-37m-darwin.so in View.MemoryView.memoryview_cwrapper()
~/sandbox/pandas/pandas/_libs/hashtable.cpython-37m-darwin.so in View.MemoryView.memoryview.__cinit__()
ValueError: buffer source array is read-only
There's a standard way to get cython to accept readonly arrays (adding |
in hashtable_func_helper.pxi.in L 209 |
Thanks. @erik-hasse are you interested in making a PR with that change and tests? |
Sure. I haven't contributed to Pandas before, anything I should read before making the change and writing the test? |
Our contributing doc page is apparent broken right, now doc/source/development/contributing.rst should have all the information you need. |
The dev docs are still (or again since a few days) working fine: https://pandas.pydata.org/docs/dev/development/contributing.html |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
The above code raises an exception:
This specifically requires the following:
In addtion, passing observed=True stops the error from occurring.
I believe this is related to #31710, but they were unable to provide an example for groupby, and the issue remains on 1.0.3.
Expected Output
A DataFrameGroupBy object.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200309
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.47.0
The text was updated successfully, but these errors were encountered: