Skip to content

Msgpack - ValueError: buffer source array is read-only #11880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeetjitsu opened this issue Dec 21, 2015 · 5 comments · Fixed by #12013
Closed

Msgpack - ValueError: buffer source array is read-only #11880

jeetjitsu opened this issue Dec 21, 2015 · 5 comments · Fixed by #12013
Labels
Milestone

Comments

@jeetjitsu
Copy link
Contributor

I get the Value error when processing data using pandas. I followed the following steps:

  1. convert to msgpack format with compress flag
  2. subsequently read file into a dataframe
  3. push to sql table with to_sql

On the third step i get ValueError: buffer source array is read-only.

This problem does not arise if I wrap the read_msgpack call inside a pandas.concat

Example

import pandas as pd
import numpy as np

from sqlalchemy import create_engine

eng = create_engine("sqlite:///:memory:")

df1 = pd.DataFrame({ 'A' : 1.,
                                   'B' : pd.Timestamp('20130102'),
                                   'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                                   'D' : np.array([3] * 4,dtype='int32'),
                                   'E' : 'foo' })

df1.to_msgpack('test.msgpack', compress='zlib')
df2 = pd.read_msgpack('test.msgpack')

df2.to_sql('test', eng, if_exists='append', chunksize=1000) # throws value error

df2 = pd.cooncat([pd.read_msgpack('test.msgpack')])

df2.to_sql('test', eng, if_exists='append', chunksize=1000) # works

This happens with both blosc and zlib compression. While I have found a solution, this behaviour seems very odd and for very large files there is a small performance hit.

edit: @TomAugspurger changed the sql engine to sqlite

@jreback
Copy link
Contributor

jreback commented Dec 21, 2015

pls pd.show_versions()

@TomAugspurger
Copy link
Contributor

replace eng = create_engine("mysql+mysqldb://user:pass@localhost/dbname") with eng = create_engine("sqlite:///:memory:") to make this easier to reproduce (still raises)

@jeetjitsu
Copy link
Contributor Author

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-21-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_IN

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 18.2
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: 1.2.8
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.10
pymysql: None
psycopg2: None
Jinja2: None

@jreback jreback added this to the Next Major Release milestone Dec 26, 2015
@jreback
Copy link
Contributor

jreback commented Dec 26, 2015

I think we need to tell numpy to take ownership of the data, maybe np.array with copy=False around the np.frombuffer. @shoyer how does one normally do this?

In [9]: df2._data.blocks[0].values.flags
Out[9]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : False
  UPDATEIFCOPY : False

In [10]: df1._data.blocks[0].values.flags
Out[10]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

@kawochen
Copy link
Contributor

df2['E'].a exhibits the bug as well.

@jreback jreback modified the milestones: 0.18.0, Next Major Release Jan 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants