Skip to content

Prevent UnicodeDecodeError in pivot_table under Py2 #17489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 12, 2017
Merged

Prevent UnicodeDecodeError in pivot_table under Py2 #17489

merged 5 commits into from
Sep 12, 2017

Conversation

mpenkov
Copy link
Contributor

@mpenkov mpenkov commented Sep 10, 2017

@gfyoung gfyoung added Error Reporting Incorrect or improved errors from pandas Unicode Unicode strings labels Sep 10, 2017
@gfyoung
Copy link
Member

gfyoung commented Sep 10, 2017

@mpenkov : Will need a whatsnew entry

@codecov
Copy link

codecov bot commented Sep 10, 2017

Codecov Report

Merging #17489 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17489      +/-   ##
==========================================
- Coverage   91.15%   91.14%   -0.02%     
==========================================
  Files         163      163              
  Lines       49534    49534              
==========================================
- Hits        45153    45148       -5     
- Misses       4381     4386       +5
Flag Coverage Δ
#multiple 88.92% <100%> (ø) ⬆️
#single 40.22% <50%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/pivot.py 96.35% <100%> (+0.99%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23050dc...e851082. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 10, 2017

Codecov Report

Merging #17489 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17489      +/-   ##
==========================================
- Coverage   91.15%   91.14%   -0.02%     
==========================================
  Files         163      163              
  Lines       49534    49534              
==========================================
- Hits        45153    45148       -5     
- Misses       4381     4386       +5
Flag Coverage Δ
#multiple 88.92% <100%> (ø) ⬆️
#single 40.22% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/pivot.py 96.35% <100%> (+0.99%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.05%) ⬇️
pandas/io/feather_format.py 85.71% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23050dc...ce96abd. Read the comment docs.

@mpenkov
Copy link
Contributor Author

mpenkov commented Sep 10, 2017

@gfyoung Added the what's new. Please check.

@@ -145,7 +145,7 @@ def _add_margins(table, data, values, rows, cols, aggfunc,
if not isinstance(margins_name, compat.string_types):
raise ValueError('margins_name argument must be a string')

msg = 'Conflicting name "{name}" in margins'.format(name=margins_name)
msg = u('Conflicting name "{name}" in margins').format(name=margins_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parentheses can be removed.

def test_issue_13292(self):
# The below shouldn't raise an exception anymore.
frame = pd.DataFrame({'foo': [1, 2, 3]})
pd.pivot_table(frame, index=['foo'], aggfunc=len, margins=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the comment, instead change the test name to something useful like: test_pivot_margins_name_unicode and put a comment with the issue number

# The below shouldn't raise an exception anymore.
frame = pd.DataFrame({'foo': [1, 2, 3]})
pd.pivot_table(frame, index=['foo'], aggfunc=len, margins=True,
margins_name=u'\u0394\u03bf\u03ba\u03b9\u03bc\u03ae')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get the result = pd.pivot_table(....)

and compare to a constructed expected with tm.assert_frame_equal

Copy link
Contributor Author

@mpenkov mpenkov Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Could you please suggest what the constructed expected should look like? I'm a bit new and unsure of what should go there. Here's what I've got so far:

    def test_pivot_margins_name_unicode(self):
        # issue #13292
        greek = u'\u0394\u03bf\u03ba\u03b9\u03bc\u03ae'
        frame = pd.DataFrame({'foo': [1, 2, 3]})
        table = pd.pivot_table(frame, index=['foo'], aggfunc=len, margins=True,
                               margins_name=greek)
        expected = pd.DataFrame({}, columns=['foo'], index=[1, 2, 3, greek])
        tm.assert_frame_equal(table, expected)

This is failing on the last line with DataFrame shape mismatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't pass the {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. This is what I have now:

expected = pd.DataFrame(columns=['foo'], index=[1, 2, 3, greek])

It's still failing:

E       AssertionError: DataFrame are different
E
E       DataFrame shape mismatch
E       [left]:  (4, 0)
E       [right]: (4, 1)

pandas/util/testing.py:1105: AssertionError

What else could be wrong?

@@ -1705,6 +1705,7 @@ Reshaping
- Bug in ``pd.concat()`` in which concatenating with an empty dataframe with ``join='inner'`` was being improperly handled (:issue:`15328`)
- Bug with ``sort=True`` in ``DataFrame.join`` and ``pd.merge`` when joining on indexes (:issue:`15582`)
- Bug in ``DataFrame.nsmallest`` and ``DataFrame.nlargest`` where identical values resulted in duplicated rows (:issue:`15297`)
- Bug in ``pivot._add_margins` when ``margins_name`` is Unicode under Py2 (:issue:`13292`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write this from a user perspective

Bug in :func:`pandas.pivot_table` incorrectly raising when passing unicode input for margins keyword

def test_pivot_margins_name_unicode(self):
# issue #13292
frame = pd.DataFrame({'foo': [1, 2, 3]})
pd.pivot_table(frame, index=['foo'], aggfunc=len, margins=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still like to compare this here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out how to construct the expected object correctly. Please have a look.

@jreback jreback added this to the 0.21.0 milestone Sep 12, 2017
@jreback jreback merged commit d46b027 into pandas-dev:master Sep 12, 2017
@jreback
Copy link
Contributor

jreback commented Sep 12, 2017

thanks @mpenkov

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception when adding a unicode margins_name in pivot_table
4 participants