You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The sparse design in Pandas is really very neat. However, I find the sparse documentation to be focused on syntax and structure, while not explaining practical applications of the sparse data structures.
Practical application should include operations on series and dataframes that produce innately sparse results. One such example is pivot (and pivot_table). Pivot induces sparsity by creating elements corresponding to the cartesian product of the new index and column values. It should be able to produce a sparse DataFrame as its output, either using automatic heuristics or by accepting some kind of sparse=bool parameter.
Problem description
The sparse design in Pandas is really very neat. However, I find the sparse documentation to be focused on syntax and structure, while not explaining practical applications of the sparse data structures.
Practical application should include operations on series and dataframes that produce innately sparse results. One such example is
pivot
(andpivot_table
). Pivot induces sparsity by creating elements corresponding to the cartesian product of the new index and column values. It should be able to produce a sparse DataFrame as its output, either using automatic heuristics or by accepting some kind ofsparse=bool
parameter.Code Sample
I would like to be able to do:
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8
pandas: 0.20.1
pytest: 3.3.1
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.5
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
feather: None
matplotlib: 1.5.3
openpyxl: 1.6.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: