Build empty SparseDataFrame by columns very loog compared to by index. #16197

cfrancois7 · 2017-05-02T09:05:00Z

Code Sample, a copy-pastable example if possible

I want to create a sparse matrix with a 4 level multiindex and about 340 000 x 340 000 cells.
I is not possible to build it in dense and to sparse it.
So I tried to build it directly in SparseDataFrame.

n = len(index)
print(n)
>>> 338275
%timeit df0 = pd.SparseDataFrame(index=index)
1000 loops, best of 3: 419 µs per loop

But if I tried to construct:

df1 = pd.SparseDataFrame(columns=index)

or

df2 = pd.SparseDataFrame(index=index, columns=index)

An all night wasn't enough to build the empty SparseDataFrame.
I don't understand how to build this empty SparseDataFrame in a quite reseanoble time (less than <20 minutes with 8 GoRam).

Output of `pd.show_versions()`

``` commit: None python: 3.6.0.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.19.0
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0

</details>

The text was updated successfully, but these errors were encountered:

jreback · 2017-05-02T11:24:58Z

can can try this (avail in 0.20.0 releasing shortly): http://pandas-docs.github.io/pandas-docs-travis/sparse.html#sparsedataframe.

though I suspect that may not work. pandas sparse is row based, NOT column based, so your size would blow up memory.

cfrancois7 · 2017-05-02T12:12:06Z

Thank you.
Yes, I've already tried by importing a COO sparse matrix without success.
Indeed, it makes sense my approach didn't work because pandas sparse is row based.
I'm changing my approach by SpareSeries.

pandas-dev#16191)

pandas-dev#16191) (cherry picked from commit 1c0b632)

(cherry picked from commit 1c0b632)

pandas-dev#16191)

jreback closed this as completed May 2, 2017

jreback added the Sparse Sparse Data Type label May 2, 2017

jreback added this to the won't fix milestone May 2, 2017

jreback added the Performance Memory or execution speed performance label May 2, 2017

keitakurita pushed a commit to keitakurita/pandas that referenced this issue May 3, 2017

BUG: incorrect handling of scipy.sparse.dok formats (pandas-dev#16197)

20d5b34

keitakurita pushed a commit to keitakurita/pandas that referenced this issue May 3, 2017

BUG: incorrect handling of scipy.sparse.dok formats (pandas-dev#16197)

0ecb2c0

keitakurita pushed a commit to keitakurita/pandas that referenced this issue May 7, 2017

BUG: incorrect handling of scipy.sparse.dok formats (pandas-dev#16197)

a2ced79

keitakurita pushed a commit to keitakurita/pandas that referenced this issue May 8, 2017

BUG: incorrect handling of scipy.sparse.dok formats (pandas-dev#16197)

6d0c545

jreback pushed a commit that referenced this issue May 11, 2017

BUG: incorrect handling of scipy.sparse.dok formats (#16197) (#16191)

1c0b632

pcluo pushed a commit to pcluo/pandas that referenced this issue May 22, 2017

BUG: incorrect handling of scipy.sparse.dok formats (pandas-dev#16197) (

2eb631b

pandas-dev#16191)

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue May 29, 2017

BUG: incorrect handling of scipy.sparse.dok formats (pandas-dev#16197) (

7b41c7f

pandas-dev#16191) (cherry picked from commit 1c0b632)

TomAugspurger pushed a commit that referenced this issue May 30, 2017

BUG: incorrect handling of scipy.sparse.dok formats (#16197) (#16191)

02be419

(cherry picked from commit 1c0b632)

stangirala pushed a commit to stangirala/pandas that referenced this issue Jun 11, 2017

BUG: incorrect handling of scipy.sparse.dok formats (pandas-dev#16197) (

3793c31

pandas-dev#16191)

TomAugspurger modified the milestones: won't fix, No action Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build empty SparseDataFrame by columns very loog compared to by index. #16197

Build empty SparseDataFrame by columns very loog compared to by index. #16197

cfrancois7 commented May 2, 2017

jreback commented May 2, 2017

cfrancois7 commented May 2, 2017 •

edited

Loading

Build empty SparseDataFrame by columns very loog compared to by index. #16197

Build empty SparseDataFrame by columns very loog compared to by index. #16197

Comments

cfrancois7 commented May 2, 2017

Code Sample, a copy-pastable example if possible

Output of pd.show_versions()

jreback commented May 2, 2017

cfrancois7 commented May 2, 2017 • edited Loading

Output of `pd.show_versions()`

cfrancois7 commented May 2, 2017 •

edited

Loading