-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG in clipboard (linux, python2) with unicode and separator #13747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
OK. Actually, I saw that but thought it was purely windows related. The bug here is an incorrect use of |
I get the same on macosx / py2. so your report is prob better here. actually we cannot repro this on the builds anyhow which would be ideal. ok will re-open and make this an xref issue of that one. |
I see now that Apparently, it also solves #12529. So, indeed, these issues are more closely related than I thought. |
+1 to fix this when possible. It's the only test that fails for me (OSX) in the codebase. More of an annoyance but still... I'd also suggest not changing/setting default values for kwargs when to_clipboard is called, seems confusing at best and I think functionality is unchanged (proposed fix in jlou2u@01277af that includes pijucha's change) I haven't done anything with travis before but looking at .travis.yml it seems that xsel is only added onto python 3 builds, but not python 2 builds. I think pandas.util.clipboard will raise import error if it can't find a clipboard utility and test_clipboard.py will raise nose.SkipTest("no clipboard found") if it can't find a clipboard. That's my best guess at why this can't be reproduced in the builds. |
This test is failing for me on OSX, (with the latest code) |
i was able to change _copyOSX function in pandas.util.clipboard.py def _copyOSX(text): to make the test pass. The test fails because to_clipboard fails for the data frame and it falls back to the string to clipboard. The to_clipboard fails because, we are trying to encode from ascii to utf8 when we call encode, but the str is already in UTF-8 when we have non ascii characters in the dataframe, hence when it tries to read the non ascii character using ascii, we get a UnicodeDecodeError. By capturing the UnicodeDecodeError exception and passing the string as it (as it is unicode encoded) we can make it work |
@aileronajay The thing is the file If I have some time this week I can submit a PR. Unless someone else can do it faster and better. |
created a pull request (containing the same change as your commit pijucha/pandas@e53dcb0 ), fb922d6 |
vendered updated version of Pyperclip closes pandas-dev#13747 closes pandas-dev#14362 closes pandas-dev#12807 closes pandas-dev#12529 Author: Ajay Saxena <[email protected]> Author: Ajay Saxena <[email protected]> Closes pandas-dev#14599 from aileronajay/master and squashes the following commits: 2aafb66 [Ajay Saxena] moved comment inside test and added github issue labels to test b74fbc1 [Ajay Saxena] ignore lint test for pyperclip files 9db42d8 [Ajay Saxena] whatsnew conflict 1dca292 [Ajay Saxena] conflict resolution 98b61e8 [Ajay Saxena] merge conflict cedb690 [Ajay Saxena] merge conflict in whats new file 7af95da [Ajay Saxena] merging lastest changes ac8ae60 [Ajay Saxena] skip clipboard test if clipboard primitives are absent b03ed56 [Ajay Saxena] changed whatsnew file c0aafd7 [Ajay Saxena] Merge branch 'test_branch' 9946fb7 [Ajay Saxena] Merge branch 'master' of https://github.com/pandas-dev/pandas into test_branch ed1375f [Ajay Saxena] Merge branch 'test_branch' 0665fd4 [Ajay Saxena] fixed linting and test case as per code review d202fd0 [Ajay Saxena] added test for valid encoding, modified setup.py so that pandas/util/clipboard can be found dd57ae3 [Ajay Saxena] code review changes and read clipboard invalid encoding test 71d58d0 [Ajay Saxena] testing encoding in kwargs to to_clipboard and test case for the same 02f87b0 [Ajay Saxena] removed duplicate files 825bbe2 [Ajay Saxena] all files related to pyperclip are under pandas.util.clipboard c5a87d8 [Ajay Saxena] Merge branch 'test_branch' of https://github.com/aileronajay/pandas into test_branch f708c2e [Ajay Saxena] Merge branch 'master' of https://github.com/aileronajay/pandas d565b1f [Ajay Saxena] updated pyperclip to the latest version 14d94a0 [Ajay Saxena] changed the pandas util clipboard file to return unicode if the python version is 2, else str 66d8ebf [Ajay Saxena] removed the disabled tag for clipboard test so that we can check if they pass after this change edb8553 [Ajay Saxena] refactored the new unicode test to be in sync with the rest of the file c83d000 [Ajay Saxena] added test case for unicode round trip fb922d6 [Ajay Saxena] changes for GH 13747 (cherry picked from commit 4a1a330)
This is probably a known bug but I couldn't find a github issue.
There is a disabled test
test_clipboard.py
which fails with the following errorCode Sample, a copy-pastable example if possible
More explicitly (the example from the above test):
Expected Output
output of
pd.show_versions()
There are probably 2 issues in the code.
to_clipboard
falls back toto_string
method.(In this case, fixing 1 solves the problem. But in general, if something else raises and we fall back here, a separator is ignored.)
I don't know what to do about 2, but 1 seems to be easy.
Part of the code in
util.clipboard.py
callssubprocess.Popen.communicate()
, which operates on byte types (bytes in PY3 and strings in PY2). So,encode
/decode
are needed only in PY3.I believe this 6d4fdb0 fixes the problem. But for now I tested only one pair of functions (in KDE) and couldn't possibly test it on OS X.
The text was updated successfully, but these errors were encountered: