Skip to content

BUG: to_clipboard fails to format output for Excel #21111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

david-liu-brattle-1
Copy link
Contributor

@david-liu-brattle-1 david-liu-brattle-1 commented May 17, 2018

DataFrame.to_clipboard has been broken for pasting to excel. Tables are copied with spaces as delimiters instead of tabs (#21104).

This issue originated in e1d5a27#diff-3f25860d9237143c1952a1f93c3aae18R102
which I've partially reverted.

By setting the delimiter to r'\t', a 2 character string, obj.to_csv raised an error, but is was caught and passed silently. I reverted the separator to '\t'.

Similar issue in read_clipboard also fixed.

Reverted a change in e1d5a27
@codecov
Copy link

codecov bot commented May 18, 2018

Codecov Report

Merging #21111 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21111      +/-   ##
==========================================
+ Coverage    91.9%    91.9%   +<.01%     
==========================================
  Files         154      154              
  Lines       49656    49657       +1     
==========================================
+ Hits        45637    45638       +1     
  Misses       4019     4019
Flag Coverage Δ
#multiple 90.28% <100%> (ø) ⬆️
#single 41.9% <100%> (-0.13%) ⬇️
Impacted Files Coverage Δ
pandas/io/clipboards.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b63e81...676a58c. Read the comment docs.

@WillAyd WillAyd added the IO Excel read_excel, to_excel label May 18, 2018
@WillAyd WillAyd added this to the 0.23.1 milestone May 18, 2018
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test to make sure this produces the correct output? Also could use a whatsnew note

@pep8speaks
Copy link

pep8speaks commented May 18, 2018

Hello @david-liu-brattle-1! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 29, 2018 at 09:40 Hours UTC

for dt in self.data_types:
for sep in ['\t', None]:
data = self.data[dt]
data.to_clipboard(excel=True, sep=sep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test with excel=False? In <=0.22 it would also be tab delimited

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to test data.to_clipboard(excel=False, sep=sep)? I don't think that was (or should be) tab delimited. Do you mean excel=None?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, right, I meant to the default df.to_clipboard() (sep=None, excel=None)

# pandas 0.22
In [72]: df = pd.DataFrame({'a': [1, 3], 'b': [4, 5]})

In [73]: df.to_clipboard()

# <paste>
In [74]: res = """	a	b
    ...: 0	1	4
    ...: 1	3	5"""

In [75]: res.count('\t')
Out[75]: 6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Should be added

Added additional test df containing common delimiter symbols and quotes.

Added warning when attempting to copy excel format but an error is
caught

Default engine to "python" when reading clipboard with regex delimiter
data.to_clipboard(excel=excel, sep=sep, encoding=encoding)
if sep is not None:
result = read_clipboard(sep=sep, index_col=0, encoding=encoding)
if excel in [None, True] and sep is not None and len(sep) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly not overly familiar with this code but having a hard time making sense of it - is this possible to parametrize? We typically avoid a lot of conditionals and loops in test scenarios

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like this one?

@pytest.mark.parametrize('writer_name, writer_kwargs, module', [
('to_csv', {}, 'os'),
('to_excel', {'engine': 'xlwt'}, 'xlwt'),
('to_feather', {}, 'feather'),
('to_html', {}, 'os'),
('to_json', {}, 'os'),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's one example. There's plenty of them in the tests so feel free to poke around

@@ -124,6 +145,34 @@ def test_read_clipboard_infer_excel(self):

tm.assert_frame_equal(res, exp)

def test_excel_clipboard_tabs(self):
for dt in self.data_types:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here - can we replace the loops with parametrization?

@@ -60,6 +60,9 @@ def setup_class(cls):
# unicode round trip test for GH 13747, GH 12529
cls.data['utf8'] = pd.DataFrame({'a': ['µasd', 'Ωœ∑´'],
'b': ['øπ∆˚¬', 'œ∑´®']})
# Test for quotes and common delimiters in text
cls.data['delim_symbols'] = pd.DataFrame({'a': ['"a,\t"b|c', 'd\tef´'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excel can be very picky about how it handles copying/pasting text that includes delimiters or quote characters. This dataframe caused some of the original clipboard tests to fail until I changed the to_clipboard and read_clipboard functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK - wasn't terribly familiar with this test so seemed strange at first. This whole module could be refactored to use parametrization and match what we do in other modules.

OK to do in a separate change if you are up to it

# Expect tab delimited
result = read_clipboard(sep='\t', index_col=0)
tm.assert_frame_equal(data, result, check_dtype=False)
assert clipboard_get().count('\t') > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of counting the number of tabs can you construct the expected output and match exactly against that?

if sep is not None:
result = read_clipboard(sep=sep, index_col=0, encoding=encoding)
if excel in [None, True] and sep is not None and len(sep) > 1:
with tm.assert_produces_warning():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically for raising exceptions and warnings we split this off into separate tests

pass
except TypeError:
warnings.warn('to_clipboard in excel mode requires a single \
character separator. Set "excel=false" or change the separator')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize False

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah pls re-factor these tests to make it clear what is going on. ideally you can refactor in a commit, then do your changes in the following (or even a separate PR) to refactor first.

@jreback
Copy link
Contributor

jreback commented Jun 4, 2018

@david-liu-brattle-1 can you rebase / update

@jreback jreback removed this from the 0.23.1 milestone Jun 4, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.23.1 milestone Jun 6, 2018
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed all the tests? (not fully clear if that was asked by the other reviewers)

@WillAyd @chris-b1 can you take a new look?

pass
except TypeError:
warnings.warn('to_clipboard in excel mode requires a single \
character separator. Set "excel=False" or change the separator')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use implicit line continuation instead of \ ?

Like

            warnings.warn('to_clipboard in excel mode requires a single '
                          'character separator. Set "excel=False" or change '
                          'the separator')

@jorisvandenbossche
Copy link
Member

And the failing tests are actually clipboard tests that are failing, so should be addressed.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-liu-brattle-1 doesn't this need a test for the actual case that broke? (so the excel=True case)

@david-liu-brattle-1
Copy link
Contributor Author

@jorisvandenbossche
#21163 should have tests for all of the issues fixed in this PR. I think that one will be merged first? In which case adding more tests here would be redundant right?

@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

@david-liu-brattle-1

  • can you rebase on master
  • add a whatsnew 0.23.2, IIRC this is a bug fix (IO section)
  • remove the setup.py that was added.

ping on green.

@jorisvandenbossche
Copy link
Member

#21163 should have tests for all of the issues fixed in this PR. I think that one will be merged first? In which case adding more tests here would be redundant right?

Ah, OK, the other PR is merged now, so then in this PR the xfails should be removed?

@david-liu-brattle-1
Copy link
Contributor Author

@jreback
I think we're all set except for the setup.py file. I'm not sure what exactly is happening with that, from my end it doesn't look like there are any changes to the existing file and I don't think I created that file.

@chris-b1
Copy link
Contributor

Looks like file permissions were modified on setup.py (the 100755 → 100644 in the diff)

Should be able to just to reset to the current setup.py to back out the change git checkout origin/master setup.py

@chris-b1
Copy link
Contributor

@david-liu-brattle-1
Copy link
Contributor Author

@chris-b1
Thanks, that did the trick.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather minor point, other lgtm.

buf = StringIO()
# clipboard_set (pyperclip) expects unicode
obj.to_csv(buf, sep=sep, encoding='utf-8', **kwargs)
text = buf.getvalue()
if PY2:
if compat.PY2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor point but we generally just import PY2 directly (you use it in 2 places)

@jreback
Copy link
Contributor

jreback commented Jun 28, 2018

@WillAyd merge when satisfied

if len(sep) > 1 and kwargs.get('engine') is None:
kwargs['engine'] = 'python'
elif len(sep) > 1 and kwargs.get('engine') == 'c':
warnings.warn('from_clipboard with regex separator does not work'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reasoning for going with a warning here instead of an error? Curious what actually happens if this comes up (question maybe applicable to other errors as well)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good question, we should check what read_csv does (warning or erroring)
But anyhow, if we change it to an error, that would be for 0.24.0 IMO, so another PR.

text = text.decode('utf-8')
clipboard_set(text)
return
except TypeError:
warnings.warn('to_clipboard in excel mode requires a single '
'character separator. Set "excel=False" or change '
'the separator')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-liu-brattle-1 I removed the last sentence of this warning, as setting excel=False does not help as then you get the warning that the separator is ignored. It's simply that to_clipboard does not support multiple character separator at all.

@jorisvandenbossche jorisvandenbossche merged commit dc45fba into pandas-dev:master Jun 29, 2018
@jorisvandenbossche
Copy link
Member

@david-liu-brattle-1 Thanks a lot!

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jul 2, 2018
jorisvandenbossche pushed a commit that referenced this pull request Jul 5, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas.DataFrame.to_clipboard with excel option does not parse into columns with 0.23
6 participants