Skip to content

ENH GH20601 raise error when pivot table's number of levels > int32 #20709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 34 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a635140
ENH GH20601 raise an error when the number of levels in a pivot table…
anhqle Apr 16, 2018
ac224f5
TST add a test for pivot table large number of levels causing int32 o…
anhqle Apr 16, 2018
acbc4eb
CLN PEP8 compliance
anhqle Apr 16, 2018
662ce5f
DOC add whatsnew entry
anhqle Apr 16, 2018
804101c
Fix issue 17912 (#20705)
CianciuStyles Apr 16, 2018
1e4e04b
ENH: ExtensionArray.setitem (#19907)
TomAugspurger Apr 16, 2018
8756f55
DEP: Add 'python_requires' to setup.py to drop 3.4 support (#20698)
djhoese Apr 16, 2018
da33359
DOC: Correct documentation to GroupBy.rank (#20708)
gfyoung Apr 16, 2018
4a34497
API: rolling.apply will pass Series to function (#20584)
jreback Apr 16, 2018
6245e8c
TST: add tests for take() on empty arrays (#20582)
jorisvandenbossche Apr 17, 2018
75295e1
CLN: Replacing %s with .format in pandas/core/frame.py (#20461)
AaronCritchley Apr 17, 2018
bb095a6
change the indent for the pydoc of apply() function. (#20715)
zhao-zihao Apr 17, 2018
7ed1f53
PKG: remove pyproject.toml for now (#20718)
jorisvandenbossche Apr 18, 2018
b9f826f
DOC: use apply(raw=True) in docs to silence warning (#20741)
jorisvandenbossche Apr 19, 2018
07739aa
Fix more tests expecting little-endian (#20738)
ginggs Apr 19, 2018
ede11af
DOC: add coverage href to README.md (#20736)
wuhaochen Apr 19, 2018
78fee04
DEPR: Deprecate DatetimeIndex.offset in favor of DatetimeIndex.freq (…
jschendel Apr 19, 2018
3e691a4
ENH: DataFrame.append preserves columns dtype if possible (#19021)
topper-123 Apr 20, 2018
be057a1
DOC: Clean up badges in README (#20749)
wuhaochen Apr 20, 2018
3a2e9e6
BUG: fixes indexing with monotonic decreasing DTI (#19362) (#20677)
mapehe Apr 20, 2018
23bc217
DOC: Various EA docs (#20707)
TomAugspurger Apr 21, 2018
54470f3
BUG: unexpected assign by a single-element list (GH19474) (#20732)
kittoku Apr 21, 2018
669d9b2
Add interpolate to doc string (#20776)
topper-123 Apr 21, 2018
336fba7
TST: #20720
jreback Apr 21, 2018
7e75e4a
Fixed WOM offset when n=0 (#20549)
Apr 21, 2018
0d199e4
BUG: Fix problems in group rank when both nans and infinity are prese…
peterpanmj Apr 21, 2018
8def649
TST: split test_groupby.py (#20781)
jreback Apr 21, 2018
466f90a
ENH GH20601 raise an error when the number of levels in a pivot table…
anhqle Apr 16, 2018
dc982de
TST add a test for pivot table large number of levels causing int32 o…
anhqle Apr 16, 2018
ea53feb
CLN PEP8 compliance
anhqle Apr 16, 2018
50d5e02
DOC add whatsnew entry
anhqle Apr 16, 2018
90b7624
ENH catch the int32 overflow error earlier and in two separate places…
anhqle Apr 22, 2018
8baba4b
CLN git merge clean up
anhqle Apr 22, 2018
2416db1
CLN edit whatsnew entry and remove old code
anhqle Apr 22, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1177,6 +1177,7 @@ Reshaping
- Bug in :meth:`DataFrame.astype` where column metadata is lost when converting to categorical or a dictionary of dtypes (:issue:`19920`)
- Bug in :func:`cut` and :func:`qcut` where timezone information was dropped (:issue:`19872`)
- Bug in :class:`Series` constructor with a ``dtype=str``, previously raised in some cases (:issue:`19853`)
- Improved error message when the number of levels in a pivot table is too large causing int32 overflow (:issue:`20601`)

Other
^^^^^
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/reshape/reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,8 @@ def _make_selectors(self):
self.full_shape = ngroups, stride

selector = self.sorted_labels[-1] + stride * comp_index + self.lift
if np.prod(self.full_shape) > (2 ** 31 - 1):
raise ValueError('Pivot table is too big, causing int32 overflow')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : Is it okay to catch it here, or should we try to catch earlier as you mentioned before?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to make any change, and would love to hear the reasoning for catching it earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally as soon as you know this is out of bounds you want to raise.

mask = np.zeros(np.prod(self.full_shape), dtype=bool)
mask.put(selector, True)

Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -1237,6 +1237,16 @@ def test_pivot_string_func_vs_func(self, f, f_numpy):
aggfunc=f_numpy)
tm.assert_frame_equal(result, expected)

@pytest.mark.slow
def test_pivot_number_of_levels_larger_than_int32(self):
# GH 20601
data = DataFrame({'ind1': list(range(1337600)) * 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using list(range use np.arange (and array ops)

'ind2': list(range(3040)) * 2 * 440,
'count': [1] * 2 * 1337600})
with tm.assert_raises_regex(ValueError, 'int32 overflow'):
data.pivot_table(index='ind1', columns='ind2',
values='count', aggfunc='count')


class TestCrosstab(object):

Expand Down