Skip to content

MultiIndex.from_product throws on read-only array #15286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RhysU opened this issue Feb 1, 2017 · 5 comments · Fixed by #30222
Closed

MultiIndex.from_product throws on read-only array #15286

RhysU opened this issue Feb 1, 2017 · 5 comments · Fixed by #30222
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@RhysU
Copy link

RhysU commented Feb 1, 2017

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
a = np.array(range(3))
b = ['a', 'b']
# Succeeds, as expected
pd.MultiIndex.from_product([a, b]) 
a.setflags(write=False)
# Raises a ValueError with "buffer source array is read-only"
pd.MultiIndex.from_product([a, b]) 

Problem description

Nothing in the MultiIndex.from_product documentation suggests the inputs need to be mutable. Moreover, it is surprising that the arguments are being mutated.

Expected Output

Not raising.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.1.17-pv-ts1 machine: x86_64 processor: byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2-ts2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Feb 1, 2017

yeah this is actually raising here.

In [4]: a = np.array(range(3))
   ...: a.setflags(write=False)
   ...: 

In [5]: pd.factorize(a)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-11e8991352c5> in <module>()
----> 1 pd.factorize(a)

/Users/jreback/pandas/pandas/core/algorithms.py in factorize(values, sort, order, na_sentinel, size_hint)
    351     uniques = vec_klass()
    352     check_nulls = not is_integer_dtype(values)
--> 353     labels = table.get_labels(vals, uniques, 0, na_sentinel, check_nulls)
    354 
    355     labels = _ensure_platform_int(labels)

/Users/jreback/pandas/pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_labels (pandas/hashtable.c:15207)()
    848 
    849     @cython.boundscheck(False)
--> 850     def get_labels(self, int64_t[:] values, Int64Vector uniques,
    851                    Py_ssize_t count_prior, Py_ssize_t na_sentinel,
    852                    bint check_null=True):

/Users/jreback/pandas/pandas/hashtable.cpython-35m-darwin.so in View.MemoryView.memoryview_cwrapper (pandas/hashtable.c:40659)()

/Users/jreback/pandas/pandas/hashtable.cpython-35m-darwin.so in View.MemoryView.memoryview.__cinit__ (pandas/hashtable.c:36894)()

ValueError: buffer source array is read-only
> /Users/jreback/pandas/stringsource(345)View.MemoryView.memoryview.__cinit__ (pandas/hashtable.c:36894)()

you can fix the same way as here, where we call a signature w/o memoryviews (which
https://github.com/pandas-dev/pandas/blob/master/pandas/src/algos_take_helper.pxi.in#L172

is a patch that we have done in the past: #10070

and here's the ultimate cause: https://mail.python.org/pipermail/cython-devel/2013-February/003384.html

I would actually like to have a better method of fixing this.

@jreback
Copy link
Contributor

jreback commented Feb 1, 2017

cc @wesm

@wesm
Copy link
Member

wesm commented Feb 1, 2017

Does this still raise if you use ndarray instead of memoryview?

@jreback
Copy link
Contributor

jreback commented Feb 1, 2017

no it works if we use ndarray (in the signature). That's what we did in take, made the signature ndarray, the then dispatch inside ( to 2 different routines). IIRC it does make a perf difference.

@mroeschke
Copy link
Member

This looks to work on master now. Could use a test.

In [250]: a = np.array(range(3))
     ...: b = ['a', 'b']
     ...: # Succeeds, as expected
     ...: pd.MultiIndex.from_product([a, b])
     ...: a.setflags(write=False)
     ...: # Raises a ValueError with "buffer source array is read-only"
     ...: pd.MultiIndex.from_product([a, b])
Out[250]:
MultiIndex([(0, 'a'),
            (0, 'b'),
            (1, 'a'),
            (1, 'b'),
            (2, 'a'),
            (2, 'b')],
           )

In [251]: In [4]: a = np.array(range(3))
     ...:    ...: a.setflags(write=False)
     ...:    ...:
     ...:
     ...: In [5]: pd.factorize(a)
Out[251]: (array([0, 1, 2]), array([0, 1, 2]))

In [252]: pd.__version__
Out[252]: '0.26.0.dev0+593.g9d45934af'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Difficulty Intermediate MultiIndex labels Oct 21, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants