Skip to content

BUG in clipboard (linux, python2) with unicode and separator (GH13747) #14599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
fb922d6
changes for GH 13747
Nov 7, 2016
c83d000
added test case for unicode round trip
Nov 8, 2016
edb8553
refactored the new unicode test to be in sync with the rest of the file
Nov 8, 2016
66d8ebf
removed the disabled tag for clipboard test so that we can check if t…
Nov 8, 2016
14d94a0
changed the pandas util clipboard file to return unicode if the pytho…
aileronajay Nov 11, 2016
d565b1f
updated pyperclip to the latest version
Nov 12, 2016
f708c2e
Merge branch 'master' of https://github.com/aileronajay/pandas
Nov 12, 2016
c5a87d8
Merge branch 'test_branch' of https://github.com/aileronajay/pandas i…
Nov 12, 2016
825bbe2
all files related to pyperclip are under pandas.util.clipboard
Nov 12, 2016
02f87b0
removed duplicate files
Nov 12, 2016
71d58d0
testing encoding in kwargs to to_clipboard and test case for the same
Nov 12, 2016
dd57ae3
code review changes and read clipboard invalid encoding test
Nov 12, 2016
d202fd0
added test for valid encoding, modified setup.py so that pandas/util/…
Nov 12, 2016
0665fd4
fixed linting and test case as per code review
Nov 16, 2016
ed1375f
Merge branch 'test_branch'
Nov 16, 2016
9946fb7
Merge branch 'master' of https://github.com/pandas-dev/pandas into te…
Nov 16, 2016
c0aafd7
Merge branch 'test_branch'
Nov 16, 2016
b03ed56
changed whatsnew file
Nov 16, 2016
ac8ae60
skip clipboard test if clipboard primitives are absent
Nov 17, 2016
7af95da
merging lastest changes
Nov 17, 2016
cedb690
merge conflict in whats new file
Nov 17, 2016
98b61e8
merge conflict
Nov 17, 2016
1dca292
conflict resolution
Nov 17, 2016
9db42d8
whatsnew conflict
Nov 17, 2016
b74fbc1
ignore lint test for pyperclip files
Nov 17, 2016
2aafb66
moved comment inside test and added github issue labels to test
Nov 17, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ Bug Fixes

- compat with ``dateutil==2.6.0`` for testing (:issue:`14621`)
- allow ``nanoseconds`` in ``Timestamp.replace`` kwargs (:issue:`14621`)
- BUG in clipboard (linux, python2) with unicode and separator (:issue:`13747`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list all of the isues this is expected to close

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented this change now

24 changes: 21 additions & 3 deletions pandas/io/clipboard.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
""" io on the clipboard """
from pandas import compat, get_option, option_context, DataFrame
from pandas.compat import StringIO
from pandas.compat import StringIO, PY2


def read_clipboard(sep='\s+', **kwargs): # pragma: no cover
Expand All @@ -18,6 +18,14 @@ def read_clipboard(sep='\s+', **kwargs): # pragma: no cover
-------
parsed : DataFrame
"""
encoding = kwargs.pop('encoding', 'utf-8')

# only utf-8 is valid for passed value because that's what clipboard
# supports
if encoding is not None and encoding.lower().replace('-', '') != 'utf8':
raise NotImplementedError(
'reading from clipboard only supports utf-8 encoding')

from pandas.util.clipboard import clipboard_get
from pandas.io.parsers import read_table
text = clipboard_get()
Expand Down Expand Up @@ -78,6 +86,12 @@ def to_clipboard(obj, excel=None, sep=None, **kwargs): # pragma: no cover
- Windows:
- OS X:
"""
encoding = kwargs.pop('encoding', 'utf-8')

# testing if an invalid encoding is passed to clipboard
if encoding is not None and encoding.lower().replace('-', '') != 'utf8':
raise ValueError('clipboard only supports utf-8 encoding')

from pandas.util.clipboard import clipboard_set
if excel is None:
excel = True
Expand All @@ -87,8 +101,12 @@ def to_clipboard(obj, excel=None, sep=None, **kwargs): # pragma: no cover
if sep is None:
sep = '\t'
buf = StringIO()
obj.to_csv(buf, sep=sep, **kwargs)
clipboard_set(buf.getvalue())
# clipboard_set (pyperclip) expects unicode
obj.to_csv(buf, sep=sep, encoding='utf-8', **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so encoding is a valid kwarg. So instead put this in to_clipboard(....., encoding=None, **kwargs).

Then check for encoding, if its not None or utf-8 I would raise an error, then pass onto pyperclip. add a test case for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback the changes that i have made earlier (show above) are in the to_clipboard method. I am trying to understand your review comment, are you suggesting that instead of explicitly passing encoding = 'utf-8' to obj.to_csv i should check before hand if the encoding is None or utf-8 and raise an exception if it is not one of them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you still should pass encoding='utf-8' to .to_csv; the issue is that .to_clipboard() takes kwargs which could have encoding as one of them. So you need to explicity check if its invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback oh ok, i get it now, let me make that change

text = buf.getvalue()
if PY2:
text = text.decode('utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we always decode here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text (as returned from buf.getvalue()) is always a string - it's already unicode in Py3 and cannot be decoded.

clipboard_set(text)
return
except:
pass
Expand Down
28 changes: 22 additions & 6 deletions pandas/io/tests/test_clipboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from pandas import read_clipboard
from pandas import get_option
from pandas.util import testing as tm
from pandas.util.testing import makeCustomDataframe as mkdf, disabled
from pandas.util.testing import makeCustomDataframe as mkdf


try:
Expand All @@ -18,7 +18,6 @@
raise nose.SkipTest("no clipboard found")


@disabled
class TestClipboard(tm.TestCase):

@classmethod
Expand Down Expand Up @@ -52,20 +51,24 @@ def setUpClass(cls):
# Test for non-ascii text: GH9263
cls.data['nonascii'] = pd.DataFrame({'en': 'in English'.split(),
'es': 'en español'.split()})
# unicode round trip test for GH 13747
cls.data['utf8'] = pd.DataFrame({'a': ['µasd', 'Ωœ∑´'],
'b': ['øπ∆˚¬', 'œ∑´®']})
cls.data_types = list(cls.data.keys())

@classmethod
def tearDownClass(cls):
super(TestClipboard, cls).tearDownClass()
del cls.data_types, cls.data

def check_round_trip_frame(self, data_type, excel=None, sep=None):
def check_round_trip_frame(self, data_type, excel=None, sep=None,
encoding=None):
data = self.data[data_type]
data.to_clipboard(excel=excel, sep=sep)
data.to_clipboard(excel=excel, sep=sep, encoding=encoding)
if sep is not None:
result = read_clipboard(sep=sep, index_col=0)
result = read_clipboard(sep=sep, index_col=0, encoding=encoding)
else:
result = read_clipboard()
result = read_clipboard(encoding=encoding)
tm.assert_frame_equal(data, result, check_dtype=False)

def test_round_trip_frame_sep(self):
Expand Down Expand Up @@ -115,3 +118,16 @@ def test_read_clipboard_infer_excel(self):
exp = pd.read_clipboard()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add tests for other issues that we mentioned (or attach the issue number to an existing tests as appropriate)

Copy link
Contributor Author

@aileronajay aileronajay Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#14362 , says that that upgrading to latest should fix, we already have round trip tests to test this (that have now been enabled)
#12807, existing tests fail,( that now pass)
#12529, i have added unicode test to test this


tm.assert_frame_equal(res, exp)

# test case for testing invalid encoding
def test_invalid_encoding(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put comments inside the test
put the issue number as a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented the suggested change

data = self.data['string']
with tm.assertRaises(ValueError):
data.to_clipboard(encoding='ascii')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test read_clipboard too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have not modified read_clipboard, so that check for encoding is not present in read_clipboard. Should i add the check in read_clipboard and the test backing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also read_clipboard later on uses the encoding from kwargs**, if i pop the kwargs in read clipboard before this code, will that not cause any issues?

if compat.PY3:
try:
text = compat.bytes_to_str(
text, encoding=(kwargs.get('encoding') or
get_option('display.encoding'))
)
except:
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right in reading you still need to strip the passed encoding (and read from the clipboard as utc8)

then i guess we could reencode as the encoding (if it's not None)

put a test for this but for now I would just raise NotImplementError if an encoding is passed in and it's not utf8 or none

with tm.assertRaises(NotImplementedError):
pd.read_clipboard(encoding='ascii')

def test_round_trip_valid_encodings(self):
for enc in ['UTF-8', 'utf-8', 'utf8']:
for dt in self.data_types:
self.check_round_trip_frame(dt, encoding=enc)
266 changes: 0 additions & 266 deletions pandas/util/clipboard.py

This file was deleted.

Loading