Skip to content

BUG: to_clipboard fails to format output for Excel #21111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
742aa3b
Fixed copy table to excel
david-liu-brattle-1 May 17, 2018
a8c098d
Unit Test and whatsnew
david-liu-brattle-1 May 18, 2018
1fee38f
Revert "Unit Test and whatsnew"
david-liu-brattle-1 May 18, 2018
5204489
Merge remote-tracking branch 'upstream/master' into fix-excel-clipboard
david-liu-brattle-1 May 18, 2018
fd1d3dd
Unit test for excel clipboard IO and updated whatsnew
david-liu-brattle-1 May 18, 2018
8439dfe
PEP8
david-liu-brattle-1 May 18, 2018
753e239
Test for function default values
david-liu-brattle-1 May 18, 2018
ba4bc36
More robust clipboard tests
david-liu-brattle-1 May 18, 2018
ef8bf54
Test for correct shape when results aren't expected to exactly match
david-liu-brattle-1 May 18, 2018
4d8a1aa
PEP8
david-liu-brattle-1 May 18, 2018
ce02a40
Formatting
david-liu-brattle-1 May 18, 2018
b46159a
Merge branch 'master' into fix-excel-clipboard
david-liu-brattle-1 Jun 6, 2018
f698ed6
Rebase
david-liu-brattle-1 Jun 6, 2018
2b7b891
Typo
david-liu-brattle-1 Jun 6, 2018
009e3e9
Fixed python 27 compatibility
david-liu-brattle-1 Jun 6, 2018
c4cc756
Merge branch 'master' into fix-excel-clipboard
david-liu-brattle-1 Jun 6, 2018
cd4be4b
Style fix
david-liu-brattle-1 Jun 6, 2018
f7bc16f
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
david-liu-brattle-1 Jun 18, 2018
30f5d78
Fix test
david-liu-brattle-1 Jun 18, 2018
e363374
Added warning for excel=False and sep!=None
david-liu-brattle-1 Jun 23, 2018
1a3a6d2
Merge branch 'master' into fix-excel-clipboard
david-liu-brattle-1 Jun 27, 2018
5013d67
Removed xfail, add whatsnew
david-liu-brattle-1 Jun 27, 2018
24a650f
Rebuild
david-liu-brattle-1 Jun 27, 2018
5db662f
Typo fixes
david-liu-brattle-1 Jun 27, 2018
3939bf3
permissions
david-liu-brattle-1 Jun 27, 2018
e50b752
Merge remote-tracking branch 'upstream/master' into david-liu-brattle…
jorisvandenbossche Jun 29, 2018
676a58c
fix warning + small edits
jorisvandenbossche Jun 29, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.23.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Indexing
I/O
^^^

-
-Bug in :func:`DataFrame.to_clipboard` where data sent to clipboard was not properly tab-delimited even when ``excel=True`` (:issue:`21104`)
-

Plotting
Expand Down
13 changes: 9 additions & 4 deletions pandas/io/clipboards.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
""" io on the clipboard """
from pandas import compat, get_option, option_context, DataFrame
from pandas.compat import StringIO, PY2
import warnings


def read_clipboard(sep=r'\s+', **kwargs): # pragma: no cover
Expand Down Expand Up @@ -55,11 +56,14 @@ def read_clipboard(sep=r'\s+', **kwargs): # pragma: no cover

counts = {x.lstrip().count('\t') for x in lines}
if len(lines) > 1 and len(counts) == 1 and counts.pop() != 0:
sep = r'\t'
sep = '\t'

if sep is None and kwargs.get('delim_whitespace') is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments on these cases (including the existing ones).

sep = r'\s+'

if sep == r'\s+' and kwargs.get('engine') is None:
kwargs['engine'] = 'python'

return read_table(StringIO(text), sep=sep, **kwargs)


Expand Down Expand Up @@ -99,7 +103,7 @@ def to_clipboard(obj, excel=True, sep=None, **kwargs): # pragma: no cover
if excel:
try:
if sep is None:
sep = r'\t'
sep = '\t'
buf = StringIO()
# clipboard_set (pyperclip) expects unicode
obj.to_csv(buf, sep=sep, encoding='utf-8', **kwargs)
Expand All @@ -108,8 +112,9 @@ def to_clipboard(obj, excel=True, sep=None, **kwargs): # pragma: no cover
text = text.decode('utf-8')
clipboard_set(text)
return
except:
pass
except TypeError:
warnings.warn('to_clipboard in excel mode requires a single \
character separator. Set "excel=false" or change the separator')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize False


if isinstance(obj, DataFrame):
# str(df) has various unhelpful defaults, like truncation
Expand Down
61 changes: 55 additions & 6 deletions pandas/tests/io/test_clipboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from pandas.util import testing as tm
from pandas.util.testing import makeCustomDataframe as mkdf
from pandas.io.clipboard.exceptions import PyperclipException
from pandas.io.clipboard import clipboard_set
from pandas.io.clipboard import clipboard_set, clipboard_get


try:
Expand Down Expand Up @@ -60,6 +60,9 @@ def setup_class(cls):
# unicode round trip test for GH 13747, GH 12529
cls.data['utf8'] = pd.DataFrame({'a': ['µasd', 'Ωœ∑´'],
'b': ['øπ∆˚¬', 'œ∑´®']})
# Test for quotes and common delimiters in text
cls.data['delim_symbols'] = pd.DataFrame({'a': ['"a,\t"b|c', 'd\tef´'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excel can be very picky about how it handles copying/pasting text that includes delimiters or quote characters. This dataframe caused some of the original clipboard tests to fail until I changed the to_clipboard and read_clipboard functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK - wasn't terribly familiar with this test so seemed strange at first. This whole module could be refactored to use parametrization and match what we do in other modules.

OK to do in a separate change if you are up to it

'b': ['hi\'j', 'k\'\'lm']})
cls.data_types = list(cls.data.keys())

@classmethod
Expand All @@ -69,18 +72,36 @@ def teardown_class(cls):
def check_round_trip_frame(self, data_type, excel=None, sep=None,
encoding=None):
data = self.data[data_type]
data.to_clipboard(excel=excel, sep=sep, encoding=encoding)
if sep is not None:
result = read_clipboard(sep=sep, index_col=0, encoding=encoding)
if excel in [None, True] and sep is not None and len(sep) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly not overly familiar with this code but having a hard time making sense of it - is this possible to parametrize? We typically avoid a lot of conditionals and loops in test scenarios

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like this one?

@pytest.mark.parametrize('writer_name, writer_kwargs, module', [
('to_csv', {}, 'os'),
('to_excel', {'engine': 'xlwt'}, 'xlwt'),
('to_feather', {}, 'feather'),
('to_html', {}, 'os'),
('to_json', {}, 'os'),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's one example. There's plenty of them in the tests so feel free to poke around

with tm.assert_produces_warning():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically for raising exceptions and warnings we split this off into separate tests

data.to_clipboard(excel=excel, sep=sep, encoding=encoding)
else:
result = read_clipboard(encoding=encoding)
tm.assert_frame_equal(data, result, check_dtype=False)
data.to_clipboard(excel=excel, sep=sep, encoding=encoding)

if excel in [None, True] and sep is not None:
# Expect Excel
result = read_clipboard(sep=sep, index_col=0,
encoding=encoding)
elif excel in [None, True] and sep is None:
# Expect Excel with tabs
result = read_clipboard(sep='\t', index_col=0,
encoding=encoding)
else:
# Expect df.__repr__ format
result = read_clipboard(encoding=encoding)

if excel in [None, True]:
tm.assert_frame_equal(data, result, check_dtype=False)
else:
assert data.to_string() == result.to_string()
assert data.shape == result.shape

def test_round_trip_frame_sep(self):
for dt in self.data_types:
self.check_round_trip_frame(dt, sep=',')
self.check_round_trip_frame(dt, sep=r'\s+')
self.check_round_trip_frame(dt, sep='|')
self.check_round_trip_frame(dt, sep='\t')

def test_round_trip_frame_string(self):
for dt in self.data_types:
Expand Down Expand Up @@ -124,6 +145,34 @@ def test_read_clipboard_infer_excel(self):

tm.assert_frame_equal(res, exp)

def test_excel_clipboard_tabs(self):
for dt in self.data_types:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here - can we replace the loops with parametrization?

for sep in ['\t', None, 'default', ',', '|']:
for excel in [True, None, 'default', False]:
# Function default should be to to produce tab delimited
kwargs = {}
if excel != 'default':
kwargs['excel'] = excel
if sep != 'default':
kwargs['sep'] = sep
data = self.data[dt]
data.to_clipboard(**kwargs)
if sep in ['\t', None, 'default'] and excel is not False:
# Expect tab delimited
result = read_clipboard(sep='\t', index_col=0)
tm.assert_frame_equal(data, result, check_dtype=False)
assert clipboard_get().count('\t') > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of counting the number of tabs can you construct the expected output and match exactly against that?

elif excel is False:
# Expect spaces (ie. df.__repr__() default)
result = read_clipboard(sep=r'\s+')
assert result.to_string() == data.to_string()
assert data.shape == result.shape
else:
# Expect other delimited ',' and '|'
result = read_clipboard(sep=sep, index_col=0)
tm.assert_frame_equal(data, result, check_dtype=False)
assert clipboard_get().count(sep) > 0

def test_invalid_encoding(self):
# test case for testing invalid encoding
data = self.data['string']
Expand Down