Skip to content

Prevent UnicodeDecodeError in pivot_table under Py2 #17489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 12, 2017
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1705,6 +1705,7 @@ Reshaping
- Bug in ``pd.concat()`` in which concatenating with an empty dataframe with ``join='inner'`` was being improperly handled (:issue:`15328`)
- Bug with ``sort=True`` in ``DataFrame.join`` and ``pd.merge`` when joining on indexes (:issue:`15582`)
- Bug in ``DataFrame.nsmallest`` and ``DataFrame.nlargest`` where identical values resulted in duplicated rows (:issue:`15297`)
- Bug in ``pivot._add_margins` when ``margins_name`` is Unicode under Py2 (:issue:`13292`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write this from a user perspective

Bug in :func:`pandas.pivot_table` incorrectly raising when passing unicode input for margins keyword


Numeric
^^^^^^^
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/reshape/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from pandas.core.groupby import Grouper
from pandas.core.reshape.util import cartesian_product
from pandas.core.index import Index, _get_objs_combined_axis
from pandas.compat import range, lrange, zip
from pandas.compat import range, lrange, zip, u
from pandas import compat
import pandas.core.common as com
from pandas.util._decorators import Appender, Substitution
Expand Down Expand Up @@ -145,7 +145,7 @@ def _add_margins(table, data, values, rows, cols, aggfunc,
if not isinstance(margins_name, compat.string_types):
raise ValueError('margins_name argument must be a string')

msg = 'Conflicting name "{name}" in margins'.format(name=margins_name)
msg = u('Conflicting name "{name}" in margins').format(name=margins_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parentheses can be removed.

for level in table.index.names:
if margins_name in table.index.get_level_values(level):
raise ValueError(msg)
Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -1625,3 +1625,9 @@ def test_isleapyear_deprecate(self):

with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
assert isleapyear(2004)

def test_issue_13292(self):
# The below shouldn't raise an exception anymore.
frame = pd.DataFrame({'foo': [1, 2, 3]})
pd.pivot_table(frame, index=['foo'], aggfunc=len, margins=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the comment, instead change the test name to something useful like: test_pivot_margins_name_unicode and put a comment with the issue number

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still like to compare this here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out how to construct the expected object correctly. Please have a look.

margins_name=u'\u0394\u03bf\u03ba\u03b9\u03bc\u03ae')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get the result = pd.pivot_table(....)

and compare to a constructed expected with tm.assert_frame_equal

Copy link
Contributor Author

@mpenkov mpenkov Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Could you please suggest what the constructed expected should look like? I'm a bit new and unsure of what should go there. Here's what I've got so far:

    def test_pivot_margins_name_unicode(self):
        # issue #13292
        greek = u'\u0394\u03bf\u03ba\u03b9\u03bc\u03ae'
        frame = pd.DataFrame({'foo': [1, 2, 3]})
        table = pd.pivot_table(frame, index=['foo'], aggfunc=len, margins=True,
                               margins_name=greek)
        expected = pd.DataFrame({}, columns=['foo'], index=[1, 2, 3, greek])
        tm.assert_frame_equal(table, expected)

This is failing on the last line with DataFrame shape mismatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't pass the {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. This is what I have now:

expected = pd.DataFrame(columns=['foo'], index=[1, 2, 3, greek])

It's still failing:

E       AssertionError: DataFrame are different
E
E       DataFrame shape mismatch
E       [left]:  (4, 0)
E       [right]: (4, 1)

pandas/util/testing.py:1105: AssertionError

What else could be wrong?