Skip to content

Fix #12529 / Improve to_clipboard for objects containing unicode #12580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Fix #12529 / Improve to_clipboard for objects containing unicode #12580

wants to merge 2 commits into from

Conversation

dalito
Copy link

@dalito dalito commented Mar 9, 2016

Superceded by #14599


I added the whatsnew entry to 0.18.0.

@dalito dalito changed the title Fixes #12529 / Improves to_clipboard for objects containing unicode Fix #12529 / Improve to_clipboard for objects containing unicode Mar 9, 2016
@jreback
Copy link
Contributor

jreback commented Mar 9, 2016

move to 0.18.1

@jreback
Copy link
Contributor

jreback commented Mar 10, 2016

use the option display.encoding for the actual encoding

@jreback
Copy link
Contributor

jreback commented Mar 10, 2016

pls add a test

@dalito
Copy link
Author

dalito commented Mar 10, 2016

use the option display.encoding for the actual encoding

pandas.options.display.encoding is utf-8 on my German Win10 or Win7. However, the encoding I have to use to get correct behaviour is "cp1250" (which is what is returned by locale.getdefaultlocale()[1]).

I'll look into adding a test tomorrow.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2016

hmm ok

look in the code and see how display.encoding is determined as well

it may be that that is wrong (iow not taking into account the locale) or this is just different

@jreback jreback added Unicode Unicode strings Error Reporting Incorrect or improved errors from pandas labels Mar 10, 2016
@dalito
Copy link
Author

dalito commented Mar 10, 2016

Before writing a new test I ran the tests in "test_clipboard.py" on the unmodified 0.18.0rc1 installation. Surprisingly they fail:

py-2|C:\dev\Python27_64\Lib\site-packages\pandas\io\tests>nosetests test_clipboard.py
.EE.
======================================================================
ERROR: test_round_trip_frame (pandas.io.tests.test_clipboard.TestClipboard)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\Python27_64\Lib\site-packages\pandas\io\tests\test_clipboard.py", line 81, in test_round_trip_frame
    self.check_round_trip_frame(dt)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\io\tests\test_clipboard.py", line 69, in check_round_trip_frame
    tm.assert_frame_equal(data, result, check_dtype=False)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 1115, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 999, in assert_series_equal
    check_less_precise, obj='{0}'.format(obj))
  File "pandas\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:3809)
  File "pandas\src\testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2685)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 866, in raise_assert_detail
    [right]: {3}""".format(obj, message, left, right)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 9: ordinal not in range(128)

======================================================================
ERROR: test_round_trip_frame_sep (pandas.io.tests.test_clipboard.TestClipboard)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\Python27_64\Lib\site-packages\pandas\io\tests\test_clipboard.py", line 73, in test_round_trip_frame_sep
    self.check_round_trip_frame(dt, sep=',')
  File "C:\dev\Python27_64\Lib\site-packages\pandas\io\tests\test_clipboard.py", line 69, in check_round_trip_frame
    tm.assert_frame_equal(data, result, check_dtype=False)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 1115, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 999, in assert_series_equal
    check_less_precise, obj='{0}'.format(obj))
  File "pandas\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:3809)
  File "pandas\src\testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2685)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 866, in raise_assert_detail
    [right]: {3}""".format(obj, message, left, right)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 9: ordinal not in range(128)

----------------------------------------------------------------------
Ran 4 tests in 0.328s

FAILED (errors=2)

@dalito
Copy link
Author

dalito commented Mar 10, 2016

After fixing pandas\util\testing.py, I still get a failure:

py-2|C:\dev\Python27_64\Lib\site-packages\pandas\io\tests>nosetests test_clipboard.py
.FF.
======================================================================
FAIL: test_round_trip_frame (pandas.io.tests.test_clipboard.TestClipboard)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\Python27_64\Lib\site-packages\pandas\io\tests\test_clipboard.py", line 81, in test_round_trip_frame
    self.check_round_trip_frame(dt)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\io\tests\test_clipboard.py", line 69, in check_round_trip_frame
    tm.assert_frame_equal(data, result, check_dtype=False)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 1115, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 999, in assert_series_equal
    check_less_precise, obj='{0}'.format(obj))
  File "pandas\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:3809)
  File "pandas\src\testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2685)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 867, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 1] are different

DataFrame.iloc[:, 1] values are different (50.0 %)
[left]:  [en, espa\xf1ol]
[right]: [en, espa\xc3\xb1ol]

======================================================================
FAIL: test_round_trip_frame_sep (pandas.io.tests.test_clipboard.TestClipboard)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\Python27_64\Lib\site-packages\pandas\io\tests\test_clipboard.py", line 73, in test_round_trip_frame_sep
    self.check_round_trip_frame(dt, sep=',')
  File "C:\dev\Python27_64\Lib\site-packages\pandas\io\tests\test_clipboard.py", line 69, in check_round_trip_frame
    tm.assert_frame_equal(data, result, check_dtype=False)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 1115, in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 999, in assert_series_equal
    check_less_precise, obj='{0}'.format(obj))
  File "pandas\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:3809)
  File "pandas\src\testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2685)
  File "C:\dev\Python27_64\Lib\site-packages\pandas\util\testing.py", line 867, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 1] are different

DataFrame.iloc[:, 1] values are different (50.0 %)
[left]:  [en, espa\xf1ol]
[right]: [en, espa\xc3\xb1ol]

----------------------------------------------------------------------
Ran 4 tests in 0.391s

FAILED (failures=2)

@jreback
Copy link
Contributor

jreback commented Mar 11, 2016

yeah these currently fail on windows py2 (and I have a single/different one failing on macosx), but nothing fails on linux (which is where Travis runs). They all work on py3.

I think we have an issue about that. But if not, can you open one. Something is still not being encoded correctly. Of course if you can figure out would be great!

@jreback
Copy link
Contributor

jreback commented May 7, 2016

can you rebase / update

@pijucha
Copy link
Contributor

pijucha commented Jul 24, 2016

@dalito Are you still working on it?

If I'm not mistaken the errors you're getting are caused by pandas.util.clipboard (pyperclip), which expects unicode input.

This solution: pijucha@e53dcb0 works for me in python2 (both windows and linux). You can test it and include in your PR if it also works for you.

except UnicodeEncodeError:
# try again with encoding from locale
from locale import getdefaultlocale
obj.to_csv(buf, sep=sep, encoding=getdefaultlocale()[1], **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe using default locale is not necessary. utf-8 should do the job (plus decoding buf.getvalue to utf-8) . The windows code of pyperclip (pandas.util.clipboard) would ultimately convert it back to unicode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a while ago since I looked at this. I just remember that I found some edge cases which still failed. I'll try to find my test cases to see if what you propose works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I didn't want to submit a new PR while yours is open. But probably more important is whether this works for your edge cases and on your system.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

can you rebase / update?

@jorisvandenbossche
Copy link
Member

Closing this PR as this will be handled in #14599. @dalito Thanks for your work on this, and you are certainly welcome to test the other PR.

@jorisvandenbossche jorisvandenbossche added this to the No action milestone Nov 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_clipboard is no longer Excel compatible
4 participants