-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrameGroupBy.boxplot crashes if any group contains duplicate index #30772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. The problem looks to be a bit deeper than boxplot. In [21]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
In [22]: df2 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['b', 'b', 'c'])
In [23]: pd.concat([df1, df2], keys=['a', 'b'], axis=1)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-6f9592337c0e> in <module>
----> 1 pd.concat([df, df2], keys=['a', 'b'], axis=1)
~/sandbox/pandas/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
282 )
283
--> 284 return op.get_result()
285
286
~/sandbox/pandas/pandas/core/reshape/concat.py in get_result(self)
495
496 new_data = concatenate_block_managers(
--> 497 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
498 )
499 if not self.copy:
~/sandbox/pandas/pandas/core/internals/managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
2026 blocks.append(b)
2027
-> 2028 return BlockManager(blocks, axes)
~/sandbox/pandas/pandas/core/internals/managers.py in __init__(self, blocks, axes, do_integrity_check)
138
139 if do_integrity_check:
--> 140 self._verify_integrity()
141
142 self._consolidate_check()
~/sandbox/pandas/pandas/core/internals/managers.py in _verify_integrity(self)
333 for block in self.blocks:
334 if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
--> 335 construction_error(tot_items, block.shape[1:], self.axes)
336 if len(self.items) != tot_items:
337 raise AssertionError(
~/sandbox/pandas/pandas/core/internals/managers.py in construction_error(tot_items, block_shape, axes, e)
1693 if block_shape[0] == 0:
1694 raise ValueError("Empty data passed with indices specified.")
-> 1695 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
1696
1697
ValueError: Shape of passed values is (4, 4), indices imply (3, 4) I haven't looked to see what the expected output of that concat is. Are you interested in investigating further @xuancong84? |
Actually, this is a duplicate of #6963. Post over there if you're interested in investigating. |
@TomAugspurger I have proposed and posted a solution for #6963, but I am not sure whether that will fix the bug in this post. |
For DataFrameGroupBy, if any group contains duplicate index, boxplot will crash. See code below for illustration, setting crash=True will give rise to duplicate index in Group 1 (2nd group), causing boxplot to crash.
The error stack trace looks like the following:
From practical point of view, when people use boxplot, it is not necessary to ensure no duplicate index, therefore, boxplot should work regardless of whether there exist duplicate index or not, it is irrelevant. Interestingly, DataFrame.boxplot does not crash when there exist duplicate index.
The text was updated successfully, but these errors were encountered: