ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack #19241

jnothman · 2018-01-15T01:35:17Z

Problem description

The sparse design in Pandas is really very neat. However, I find the sparse documentation to be focused on syntax and structure, while not explaining practical applications of the sparse data structures.

Practical application should include operations on series and dataframes that produce innately sparse results. One such example is pivot (and pivot_table). Pivot induces sparsity by creating elements corresponding to the cartesian product of the new index and column values. It should be able to produce a sparse DataFrame as its output, either using automatic heuristics or by accepting some kind of sparse=bool parameter.

Code Sample

I would like to be able to do:

>>> df = pd.DataFrame({'x': ['a', 'a', 'c', 'a', 'b', 'b'], 'y': ['A', 'B', 'C', 'D', 'A', 'B'], 'val': [5, 6, 7, 8, 9, 10]}).pivot('x', 'y', 'val')
>>> table = df.pivot('x', 'y', 'val', sparse=True)  # or sparse='auto'
>>> table.density
0.5

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

pandas: 0.20.1
pytest: 3.3.1
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.5
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
feather: None
matplotlib: 1.5.3
openpyxl: 1.6.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jnothman · 2018-01-15T02:12:58Z

Somehow I managed to not see #14493, which is a duplicate

jnothman changed the title ~~ENH: Sparse pivot~~ ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack Jan 15, 2018

jnothman closed this as completed Jan 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack #19241

ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack #19241

jnothman commented Jan 15, 2018

INSTALLED VERSIONS

jnothman commented Jan 15, 2018

ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack #19241

ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack #19241

Comments

jnothman commented Jan 15, 2018

Problem description

Code Sample

Output of pd.show_versions()

INSTALLED VERSIONS

jnothman commented Jan 15, 2018

Output of `pd.show_versions()`