You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
---------------------------------------------------------------------------
ArrowIOError Traceback (most recent call last)
<ipython-input-65-1aeaae9e36a0> in <module>()
4 columns=list(map(str, range(1000))),
5 default_fill_value=0.0)
----> 6 rpd.to_parquet('rpd.pq')
...
ArrowIOError: Column 8 had 4 while previous column had 8
Problem description
SparseDataFrames and parquet should be a match made in data science heaven, because parquet should be able to compress the sparse columns and get big space and IO savings. But the to_parquet method seems to be very unhappy when it gets a sparse dataframe.
you should open an issue on the arrow tracker for support. pandas sparse format is somewhat bespoke and not likely to be supported. maybe a more common COO format might.
Code Sample, a copy-pastable example if possible
Problem description
SparseDataFrames and parquet should be a match made in data science heaven, because parquet should be able to compress the sparse columns and get big space and IO savings. But the
to_parquet
method seems to be very unhappy when it gets a sparse dataframe.Output of
pd.show_versions()
pandas: 0.22.0
pytest: 3.5.0
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: 0.7.1
xarray: None
IPython: 6.3.1
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: