Skip to content

ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack #19241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Jan 15, 2018 · 1 comment
Closed

ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack #19241

jnothman opened this issue Jan 15, 2018 · 1 comment

Comments

@jnothman
Copy link
Contributor

Problem description

The sparse design in Pandas is really very neat. However, I find the sparse documentation to be focused on syntax and structure, while not explaining practical applications of the sparse data structures.

Practical application should include operations on series and dataframes that produce innately sparse results. One such example is pivot (and pivot_table). Pivot induces sparsity by creating elements corresponding to the cartesian product of the new index and column values. It should be able to produce a sparse DataFrame as its output, either using automatic heuristics or by accepting some kind of sparse=bool parameter.

Code Sample

I would like to be able to do:

>>> df = pd.DataFrame({'x': ['a', 'a', 'c', 'a', 'b', 'b'], 'y': ['A', 'B', 'C', 'D', 'A', 'B'], 'val': [5, 6, 7, 8, 9, 10]}).pivot('x', 'y', 'val')
>>> table = df.pivot('x', 'y', 'val', sparse=True)  # or sparse='auto'
>>> table.density
0.5

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

pandas: 0.20.1
pytest: 3.3.1
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.5
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
feather: None
matplotlib: 1.5.3
openpyxl: 1.6.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jnothman jnothman changed the title ENH: Sparse pivot ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack Jan 15, 2018
@jnothman
Copy link
Contributor Author

Somehow I managed to not see #14493, which is a duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant