-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: to_clipboard fails to format output for Excel #21111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: to_clipboard fails to format output for Excel #21111
Conversation
Reverted a change in e1d5a27
Codecov Report
@@ Coverage Diff @@
## master #21111 +/- ##
==========================================
+ Coverage 91.9% 91.9% +<.01%
==========================================
Files 154 154
Lines 49656 49657 +1
==========================================
+ Hits 45637 45638 +1
Misses 4019 4019
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test to make sure this produces the correct output? Also could use a whatsnew note
Hello @david-liu-brattle-1! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on June 29, 2018 at 09:40 Hours UTC |
pandas/tests/io/test_clipboard.py
Outdated
for dt in self.data_types: | ||
for sep in ['\t', None]: | ||
data = self.data[dt] | ||
data.to_clipboard(excel=True, sep=sep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also test with excel=False
? In <=0.22 it would also be tab delimited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to test data.to_clipboard(excel=False, sep=sep)
? I don't think that was (or should be) tab delimited. Do you mean excel=None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, right, I meant to the default df.to_clipboard()
(sep=None, excel=None)
# pandas 0.22
In [72]: df = pd.DataFrame({'a': [1, 3], 'b': [4, 5]})
In [73]: df.to_clipboard()
# <paste>
In [74]: res = """ a b
...: 0 1 4
...: 1 3 5"""
In [75]: res.count('\t')
Out[75]: 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Should be added
Added additional test df containing common delimiter symbols and quotes. Added warning when attempting to copy excel format but an error is caught Default engine to "python" when reading clipboard with regex delimiter
pandas/tests/io/test_clipboard.py
Outdated
data.to_clipboard(excel=excel, sep=sep, encoding=encoding) | ||
if sep is not None: | ||
result = read_clipboard(sep=sep, index_col=0, encoding=encoding) | ||
if excel in [None, True] and sep is not None and len(sep) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly not overly familiar with this code but having a hard time making sense of it - is this possible to parametrize? We typically avoid a lot of conditionals and loops in test scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean like this one?
pandas/pandas/tests/io/test_common.py
Lines 183 to 188 in e033c06
@pytest.mark.parametrize('writer_name, writer_kwargs, module', [ | |
('to_csv', {}, 'os'), | |
('to_excel', {'engine': 'xlwt'}, 'xlwt'), | |
('to_feather', {}, 'feather'), | |
('to_html', {}, 'os'), | |
('to_json', {}, 'os'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's one example. There's plenty of them in the tests so feel free to poke around
pandas/tests/io/test_clipboard.py
Outdated
@@ -124,6 +145,34 @@ def test_read_clipboard_infer_excel(self): | |||
|
|||
tm.assert_frame_equal(res, exp) | |||
|
|||
def test_excel_clipboard_tabs(self): | |||
for dt in self.data_types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here - can we replace the loops with parametrization?
pandas/tests/io/test_clipboard.py
Outdated
@@ -60,6 +60,9 @@ def setup_class(cls): | |||
# unicode round trip test for GH 13747, GH 12529 | |||
cls.data['utf8'] = pd.DataFrame({'a': ['µasd', 'Ωœ∑´'], | |||
'b': ['øπ∆˚¬', 'œ∑´®']}) | |||
# Test for quotes and common delimiters in text | |||
cls.data['delim_symbols'] = pd.DataFrame({'a': ['"a,\t"b|c', 'd\tef´'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excel can be very picky about how it handles copying/pasting text that includes delimiters or quote characters. This dataframe caused some of the original clipboard tests to fail until I changed the to_clipboard and read_clipboard functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK - wasn't terribly familiar with this test so seemed strange at first. This whole module could be refactored to use parametrization and match what we do in other modules.
OK to do in a separate change if you are up to it
pandas/tests/io/test_clipboard.py
Outdated
# Expect tab delimited | ||
result = read_clipboard(sep='\t', index_col=0) | ||
tm.assert_frame_equal(data, result, check_dtype=False) | ||
assert clipboard_get().count('\t') > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of counting the number of tabs can you construct the expected output and match exactly against that?
pandas/tests/io/test_clipboard.py
Outdated
if sep is not None: | ||
result = read_clipboard(sep=sep, index_col=0, encoding=encoding) | ||
if excel in [None, True] and sep is not None and len(sep) > 1: | ||
with tm.assert_produces_warning(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically for raising exceptions and warnings we split this off into separate tests
pandas/io/clipboards.py
Outdated
pass | ||
except TypeError: | ||
warnings.warn('to_clipboard in excel mode requires a single \ | ||
character separator. Set "excel=false" or change the separator') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah pls re-factor these tests to make it clear what is going on. ideally you can refactor in a commit, then do your changes in the following (or even a separate PR) to refactor first.
@david-liu-brattle-1 can you rebase / update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/io/clipboards.py
Outdated
pass | ||
except TypeError: | ||
warnings.warn('to_clipboard in excel mode requires a single \ | ||
character separator. Set "excel=False" or change the separator') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use implicit line continuation instead of \
?
Like
warnings.warn('to_clipboard in excel mode requires a single '
'character separator. Set "excel=False" or change '
'the separator')
And the failing tests are actually clipboard tests that are failing, so should be addressed. |
…x-excel-clipboard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@david-liu-brattle-1 doesn't this need a test for the actual case that broke? (so the excel=True
case)
@jorisvandenbossche |
ping on green. |
Ah, OK, the other PR is merged now, so then in this PR the |
@jreback |
Looks like file permissions were modified on Should be able to just to reset to the current setup.py to back out the change |
See also - https://stackoverflow.com/questions/1257592/how-do-i-remove-files-saying-old-mode-100755-new-mode-100644-from-unstaged-cha - might have some funky git settings |
@chris-b1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather minor point, other lgtm.
pandas/io/clipboards.py
Outdated
buf = StringIO() | ||
# clipboard_set (pyperclip) expects unicode | ||
obj.to_csv(buf, sep=sep, encoding='utf-8', **kwargs) | ||
text = buf.getvalue() | ||
if PY2: | ||
if compat.PY2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor point but we generally just import PY2 directly (you use it in 2 places)
@WillAyd merge when satisfied |
pandas/io/clipboards.py
Outdated
if len(sep) > 1 and kwargs.get('engine') is None: | ||
kwargs['engine'] = 'python' | ||
elif len(sep) > 1 and kwargs.get('engine') == 'c': | ||
warnings.warn('from_clipboard with regex separator does not work' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reasoning for going with a warning here instead of an error? Curious what actually happens if this comes up (question maybe applicable to other errors as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good question, we should check what read_csv
does (warning or erroring)
But anyhow, if we change it to an error, that would be for 0.24.0 IMO, so another PR.
pandas/io/clipboards.py
Outdated
text = text.decode('utf-8') | ||
clipboard_set(text) | ||
return | ||
except TypeError: | ||
warnings.warn('to_clipboard in excel mode requires a single ' | ||
'character separator. Set "excel=False" or change ' | ||
'the separator') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@david-liu-brattle-1 I removed the last sentence of this warning, as setting excel=False
does not help as then you get the warning that the separator is ignored. It's simply that to_clipboard
does not support multiple character separator at all.
@david-liu-brattle-1 Thanks a lot! |
(cherry picked from commit dc45fba)
(cherry picked from commit dc45fba)
DataFrame.to_clipboard
has been broken for pasting to excel. Tables are copied with spaces as delimiters instead of tabs (#21104).This issue originated in e1d5a27#diff-3f25860d9237143c1952a1f93c3aae18R102
which I've partially reverted.
By setting the delimiter to
r'\t'
, a 2 character string, obj.to_csv raised an error, but is was caught and passed silently. I reverted the separator to'\t'
.Similar issue in
read_clipboard
also fixed.