-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG in clipboard (linux, python2) with unicode and separator (GH13747) #14599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
fb922d6
c83d000
edb8553
66d8ebf
14d94a0
d565b1f
f708c2e
c5a87d8
825bbe2
02f87b0
71d58d0
dd57ae3
d202fd0
0665fd4
ed1375f
9946fb7
c0aafd7
b03ed56
ac8ae60
7af95da
cedb690
98b61e8
1dca292
9db42d8
b74fbc1
2aafb66
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
""" io on the clipboard """ | ||
from pandas import compat, get_option, option_context, DataFrame | ||
from pandas.compat import StringIO | ||
from pandas.compat import StringIO, PY2 | ||
|
||
|
||
def read_clipboard(**kwargs): # pragma: no cover | ||
|
@@ -83,8 +83,12 @@ def to_clipboard(obj, excel=None, sep=None, **kwargs): # pragma: no cover | |
if sep is None: | ||
sep = '\t' | ||
buf = StringIO() | ||
obj.to_csv(buf, sep=sep, **kwargs) | ||
clipboard_set(buf.getvalue()) | ||
# clipboard_set (pyperclip) expects unicode | ||
obj.to_csv(buf, sep=sep, encoding='utf-8', **kwargs) | ||
text = buf.getvalue() | ||
if PY2: | ||
text = text.decode('utf-8') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shouldn't we always decode here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
clipboard_set(text) | ||
return | ||
except: | ||
pass | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -113,3 +113,12 @@ def test_read_clipboard_infer_excel(self): | |
exp = pd.read_clipboard() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add tests for other issues that we mentioned (or attach the issue number to an existing tests as appropriate) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
tm.assert_frame_equal(res, exp) | ||
|
||
# unicode round trip test for GH 13747 | ||
def test_round_trip_frame_unicode(self): | ||
sep = ',' | ||
df = pd.DataFrame({'a':['µasd','Ωœ∑´'], 'b':['øπ∆˚¬','œ∑´®']}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aileronajay cls.data['utf8'] = pd.DataFrame({'a': ['µasd', 'Ωœ∑´'], 'b': ['øπ∆˚¬', 'œ∑´®']}) to the method There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @pijucha that makes sense and i thought of doing that initially, but wouldn't that make the code similar to en espanol test that is already present. Though i can make change if that seems to be the way to go :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Placing it in |
||
df.to_clipboard(excel=None, sep =sep) | ||
result = read_clipboard(sep = sep, index_col = 0) | ||
tm.assert_frame_equal(df, result, check_dtype=False) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so
encoding
is a valid kwarg. So instead put this into_clipboard(....., encoding=None, **kwargs)
.Then check for encoding, if its not
None
orutf-8
I would raise an error, then pass onto pyperclip. add a test case for this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback the changes that i have made earlier (show above) are in the to_clipboard method. I am trying to understand your review comment, are you suggesting that instead of explicitly passing encoding = 'utf-8' to obj.to_csv i should check before hand if the encoding is None or utf-8 and raise an exception if it is not one of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you still should pass
encoding='utf-8'
to.to_csv
; the issue is that.to_clipboard()
takeskwargs
which could haveencoding
as one of them. So you need to explicity check if its invalid.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback oh ok, i get it now, let me make that change