Skip to content

BUG in clipboard (linux, python2) with unicode and separator (GH13747) #14599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
fb922d6
changes for GH 13747
Nov 7, 2016
c83d000
added test case for unicode round trip
Nov 8, 2016
edb8553
refactored the new unicode test to be in sync with the rest of the file
Nov 8, 2016
66d8ebf
removed the disabled tag for clipboard test so that we can check if t…
Nov 8, 2016
14d94a0
changed the pandas util clipboard file to return unicode if the pytho…
aileronajay Nov 11, 2016
d565b1f
updated pyperclip to the latest version
Nov 12, 2016
f708c2e
Merge branch 'master' of https://github.com/aileronajay/pandas
Nov 12, 2016
c5a87d8
Merge branch 'test_branch' of https://github.com/aileronajay/pandas i…
Nov 12, 2016
825bbe2
all files related to pyperclip are under pandas.util.clipboard
Nov 12, 2016
02f87b0
removed duplicate files
Nov 12, 2016
71d58d0
testing encoding in kwargs to to_clipboard and test case for the same
Nov 12, 2016
dd57ae3
code review changes and read clipboard invalid encoding test
Nov 12, 2016
d202fd0
added test for valid encoding, modified setup.py so that pandas/util/…
Nov 12, 2016
0665fd4
fixed linting and test case as per code review
Nov 16, 2016
ed1375f
Merge branch 'test_branch'
Nov 16, 2016
9946fb7
Merge branch 'master' of https://github.com/pandas-dev/pandas into te…
Nov 16, 2016
c0aafd7
Merge branch 'test_branch'
Nov 16, 2016
b03ed56
changed whatsnew file
Nov 16, 2016
ac8ae60
skip clipboard test if clipboard primitives are absent
Nov 17, 2016
7af95da
merging lastest changes
Nov 17, 2016
cedb690
merge conflict in whats new file
Nov 17, 2016
98b61e8
merge conflict
Nov 17, 2016
1dca292
conflict resolution
Nov 17, 2016
9db42d8
whatsnew conflict
Nov 17, 2016
b74fbc1
ignore lint test for pyperclip files
Nov 17, 2016
2aafb66
moved comment inside test and added github issue labels to test
Nov 17, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions pandas/io/clipboard.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
""" io on the clipboard """
from pandas import compat, get_option, option_context, DataFrame
from pandas.compat import StringIO
from pandas.compat import StringIO, PY2


def read_clipboard(**kwargs): # pragma: no cover
Expand Down Expand Up @@ -83,8 +83,12 @@ def to_clipboard(obj, excel=None, sep=None, **kwargs): # pragma: no cover
if sep is None:
sep = '\t'
buf = StringIO()
obj.to_csv(buf, sep=sep, **kwargs)
clipboard_set(buf.getvalue())
# clipboard_set (pyperclip) expects unicode
obj.to_csv(buf, sep=sep, encoding='utf-8', **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so encoding is a valid kwarg. So instead put this in to_clipboard(....., encoding=None, **kwargs).

Then check for encoding, if its not None or utf-8 I would raise an error, then pass onto pyperclip. add a test case for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback the changes that i have made earlier (show above) are in the to_clipboard method. I am trying to understand your review comment, are you suggesting that instead of explicitly passing encoding = 'utf-8' to obj.to_csv i should check before hand if the encoding is None or utf-8 and raise an exception if it is not one of them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you still should pass encoding='utf-8' to .to_csv; the issue is that .to_clipboard() takes kwargs which could have encoding as one of them. So you need to explicity check if its invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback oh ok, i get it now, let me make that change

text = buf.getvalue()
if PY2:
text = text.decode('utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we always decode here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text (as returned from buf.getvalue()) is always a string - it's already unicode in Py3 and cannot be decoded.

clipboard_set(text)
return
except:
pass
Expand Down
9 changes: 9 additions & 0 deletions pandas/io/tests/test_clipboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,3 +113,12 @@ def test_read_clipboard_infer_excel(self):
exp = pd.read_clipboard()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add tests for other issues that we mentioned (or attach the issue number to an existing tests as appropriate)

Copy link
Contributor Author

@aileronajay aileronajay Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#14362 , says that that upgrading to latest should fix, we already have round trip tests to test this (that have now been enabled)
#12807, existing tests fail,( that now pass)
#12529, i have added unicode test to test this


tm.assert_frame_equal(res, exp)

# unicode round trip test for GH 13747
def test_round_trip_frame_unicode(self):
sep = ','
df = pd.DataFrame({'a':['µasd','Ωœ∑´'], 'b':['øπ∆˚¬','œ∑´®']})
Copy link
Contributor

@pijucha pijucha Nov 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aileronajay
Just a thought: adding this

cls.data['utf8'] = pd.DataFrame({'a': ['µasd', 'Ωœ∑´'], 'b': ['øπ∆˚¬', 'œ∑´®']})

to the method setUpClass above would have the same (or better) effect as the separate test but the code would be neater.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pijucha that makes sense and i thought of doing that initially, but wouldn't that make the code similar to en espanol test that is already present. Though i can make change if that seems to be the way to go :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing it in setUpClass just allows to run all tests automatically. Whether it's similar to en espanol is another thing (Chinese or cyrillic characters might be less similar but it's probably ok).

df.to_clipboard(excel=None, sep =sep)
result = read_clipboard(sep = sep, index_col = 0)
tm.assert_frame_equal(df, result, check_dtype=False)