MultiIndex.from_product throws on read-only array #15286

RhysU · 2017-02-01T20:10:44Z

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
a = np.array(range(3))
b = ['a', 'b']
# Succeeds, as expected
pd.MultiIndex.from_product([a, b]) 
a.setflags(write=False)
# Raises a ValueError with "buffer source array is read-only"
pd.MultiIndex.from_product([a, b])

Problem description

Nothing in the MultiIndex.from_product documentation suggests the inputs need to be mutable. Moreover, it is surprising that the arguments are being mutated.

Expected Output

Not raising.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.1.17-pv-ts1 machine: x86_64 processor: byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2-ts2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-02-01T20:22:54Z

yeah this is actually raising here.

In [4]: a = np.array(range(3))
   ...: a.setflags(write=False)
   ...: 

In [5]: pd.factorize(a)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-11e8991352c5> in <module>()
----> 1 pd.factorize(a)

/Users/jreback/pandas/pandas/core/algorithms.py in factorize(values, sort, order, na_sentinel, size_hint)
    351     uniques = vec_klass()
    352     check_nulls = not is_integer_dtype(values)
--> 353     labels = table.get_labels(vals, uniques, 0, na_sentinel, check_nulls)
    354 
    355     labels = _ensure_platform_int(labels)

/Users/jreback/pandas/pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_labels (pandas/hashtable.c:15207)()
    848 
    849     @cython.boundscheck(False)
--> 850     def get_labels(self, int64_t[:] values, Int64Vector uniques,
    851                    Py_ssize_t count_prior, Py_ssize_t na_sentinel,
    852                    bint check_null=True):

/Users/jreback/pandas/pandas/hashtable.cpython-35m-darwin.so in View.MemoryView.memoryview_cwrapper (pandas/hashtable.c:40659)()

/Users/jreback/pandas/pandas/hashtable.cpython-35m-darwin.so in View.MemoryView.memoryview.__cinit__ (pandas/hashtable.c:36894)()

ValueError: buffer source array is read-only
> /Users/jreback/pandas/stringsource(345)View.MemoryView.memoryview.__cinit__ (pandas/hashtable.c:36894)()

you can fix the same way as here, where we call a signature w/o memoryviews (which
https://github.com/pandas-dev/pandas/blob/master/pandas/src/algos_take_helper.pxi.in#L172

is a patch that we have done in the past: #10070

and here's the ultimate cause: https://mail.python.org/pipermail/cython-devel/2013-February/003384.html

I would actually like to have a better method of fixing this.

jreback · 2017-02-01T20:24:33Z

cc @wesm

wesm · 2017-02-01T21:47:24Z

Does this still raise if you use ndarray instead of memoryview?

jreback · 2017-02-01T21:58:05Z

no it works if we use ndarray (in the signature). That's what we did in take, made the signature ndarray, the then dispatch inside ( to 2 different routines). IIRC it does make a perf difference.

mroeschke · 2019-10-21T00:24:35Z

This looks to work on master now. Could use a test.

In [250]: a = np.array(range(3))
     ...: b = ['a', 'b']
     ...: # Succeeds, as expected
     ...: pd.MultiIndex.from_product([a, b])
     ...: a.setflags(write=False)
     ...: # Raises a ValueError with "buffer source array is read-only"
     ...: pd.MultiIndex.from_product([a, b])
Out[250]:
MultiIndex([(0, 'a'),
            (0, 'b'),
            (1, 'a'),
            (1, 'b'),
            (2, 'a'),
            (2, 'b')],
           )

In [251]: In [4]: a = np.array(range(3))
     ...:    ...: a.setflags(write=False)
     ...:    ...:
     ...:
     ...: In [5]: pd.factorize(a)
Out[251]: (array([0, 1, 2]), array([0, 1, 2]))

In [252]: pd.__version__
Out[252]: '0.26.0.dev0+593.g9d45934af'

jreback added Bug Difficulty Intermediate MultiIndex labels Feb 1, 2017

jreback added this to the 0.20.0 milestone Feb 1, 2017

jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

rabernat mentioned this issue Apr 1, 2017

Factorize fails with read-only array #12813

Closed

jreback mentioned this issue Apr 1, 2017

NotImplementedError: > 1 ndim Categorical raised when array is read-only #15860

Closed

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Difficulty Intermediate MultiIndex labels Oct 21, 2019

jbrockmendel mentioned this issue Dec 12, 2019

TST: tests for needs-test issues #30222

Merged

11 tasks

jreback modified the milestones: Contributions Welcome, 1.0 Dec 12, 2019

WillAyd closed this as completed in #30222 Dec 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiIndex.from_product throws on read-only array #15286

MultiIndex.from_product throws on read-only array #15286

RhysU commented Feb 1, 2017

jreback commented Feb 1, 2017

jreback commented Feb 1, 2017

wesm commented Feb 1, 2017

jreback commented Feb 1, 2017

mroeschke commented Oct 21, 2019

MultiIndex.from_product throws on read-only array #15286

MultiIndex.from_product throws on read-only array #15286

Comments

RhysU commented Feb 1, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jreback commented Feb 1, 2017

jreback commented Feb 1, 2017

wesm commented Feb 1, 2017

jreback commented Feb 1, 2017

mroeschke commented Oct 21, 2019

Output of `pd.show_versions()`