BUG: to_clipboard fails to format output for Excel #21111

david-liu-brattle-1 · 2018-05-17T23:18:20Z

DataFrame.to_clipboard has been broken for pasting to excel. Tables are copied with spaces as delimiters instead of tabs (#21104).

This issue originated in e1d5a27#diff-3f25860d9237143c1952a1f93c3aae18R102
which I've partially reverted.

By setting the delimiter to r'\t', a 2 character string, obj.to_csv raised an error, but is was caught and passed silently. I reverted the separator to '\t'.

Similar issue in read_clipboard also fixed.

closes pandas.DataFrame.to_clipboard with excel option does not parse into columns with 0.23 #21104

Reverted a change in e1d5a27

codecov · 2018-05-18T00:17:03Z

Codecov Report

Merging #21111 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21111      +/-   ##
==========================================
+ Coverage    91.9%    91.9%   +<.01%     
==========================================
  Files         154      154              
  Lines       49656    49657       +1     
==========================================
+ Hits        45637    45638       +1     
  Misses       4019     4019

Flag	Coverage Δ
#multiple	`90.28% <100%> (ø)`	⬆️
#single	`41.9% <100%> (-0.13%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/clipboards.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b63e81...676a58c. Read the comment docs.

WillAyd

Can you add a test to make sure this produces the correct output? Also could use a whatsnew note

pep8speaks · 2018-05-18T03:51:33Z

Hello @david-liu-brattle-1! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 29, 2018 at 09:40 Hours UTC

This reverts commit a8c098d.

chris-b1 · 2018-05-18T13:59:53Z

pandas/tests/io/test_clipboard.py

+        for dt in self.data_types:
+            for sep in ['\t', None]:
+                data = self.data[dt]
+                data.to_clipboard(excel=True, sep=sep)


Can you also test with excel=False? In <=0.22 it would also be tab delimited

You mean to test data.to_clipboard(excel=False, sep=sep)? I don't think that was (or should be) tab delimited. Do you mean excel=None?

Sorry, right, I meant to the default df.to_clipboard() (sep=None, excel=None)

# pandas 0.22 In [72]: df = pd.DataFrame({'a': [1, 3], 'b': [4, 5]}) In [73]: df.to_clipboard() # <paste> In [74]: res = """ a b ...: 0 1 4 ...: 1 3 5""" In [75]: res.count('\t') Out[75]: 6

Good point. Should be added

Added additional test df containing common delimiter symbols and quotes. Added warning when attempting to copy excel format but an error is caught Default engine to "python" when reading clipboard with regex delimiter

WillAyd · 2018-05-19T02:34:13Z

pandas/tests/io/test_clipboard.py

-        data.to_clipboard(excel=excel, sep=sep, encoding=encoding)
-        if sep is not None:
-            result = read_clipboard(sep=sep, index_col=0, encoding=encoding)
+        if excel in [None, True] and sep is not None and len(sep) > 1:


Admittedly not overly familiar with this code but having a hard time making sense of it - is this possible to parametrize? We typically avoid a lot of conditionals and loops in test scenarios

Do you mean like this one?

pandas/pandas/tests/io/test_common.py

Lines 183 to 188 in e033c06

@pytest.mark.parametrize('writer_name, writer_kwargs, module', [

('to_csv', {}, 'os'),

('to_excel', {'engine': 'xlwt'}, 'xlwt'),

('to_feather', {}, 'feather'),

('to_html', {}, 'os'),

('to_json', {}, 'os'),

Yes that's one example. There's plenty of them in the tests so feel free to poke around

WillAyd · 2018-05-19T02:34:33Z

pandas/tests/io/test_clipboard.py

@@ -124,6 +145,34 @@ def test_read_clipboard_infer_excel(self):

        tm.assert_frame_equal(res, exp)

+    def test_excel_clipboard_tabs(self):
+        for dt in self.data_types:


Same thing here - can we replace the loops with parametrization?

WillAyd · 2018-05-19T02:35:06Z

pandas/tests/io/test_clipboard.py

@@ -60,6 +60,9 @@ def setup_class(cls):
        # unicode round trip test for GH 13747, GH 12529
        cls.data['utf8'] = pd.DataFrame({'a': ['µasd', 'Ωœ∑´'],
                                         'b': ['øπ∆˚¬', 'œ∑´®']})
+        # Test for quotes and common delimiters in text
+        cls.data['delim_symbols'] = pd.DataFrame({'a': ['"a,\t"b|c', 'd\tef´'],


What's the point of this?

Excel can be very picky about how it handles copying/pasting text that includes delimiters or quote characters. This dataframe caused some of the original clipboard tests to fail until I changed the to_clipboard and read_clipboard functions.

Ah OK - wasn't terribly familiar with this test so seemed strange at first. This whole module could be refactored to use parametrization and match what we do in other modules.

OK to do in a separate change if you are up to it

WillAyd · 2018-05-19T02:35:43Z

pandas/tests/io/test_clipboard.py

+                        # Expect tab delimited
+                        result = read_clipboard(sep='\t', index_col=0)
+                        tm.assert_frame_equal(data, result, check_dtype=False)
+                        assert clipboard_get().count('\t') > 0


Instead of counting the number of tabs can you construct the expected output and match exactly against that?

WillAyd · 2018-05-19T02:36:25Z

pandas/tests/io/test_clipboard.py

-        if sep is not None:
-            result = read_clipboard(sep=sep, index_col=0, encoding=encoding)
+        if excel in [None, True] and sep is not None and len(sep) > 1:
+            with tm.assert_produces_warning():


Typically for raising exceptions and warnings we split this off into separate tests

WillAyd · 2018-05-19T02:36:42Z

pandas/io/clipboards.py

-            pass
+        except TypeError:
+            warnings.warn('to_clipboard in excel mode requires a single \
+            character separator. Set "excel=false" or change the separator')


Capitalize False

jreback

yeah pls re-factor these tests to make it clear what is going on. ideally you can refactor in a commit, then do your changes in the following (or even a separate PR) to refactor first.

jreback · 2018-06-04T21:40:11Z

@david-liu-brattle-1 can you rebase / update

jorisvandenbossche

You removed all the tests? (not fully clear if that was asked by the other reviewers)

@WillAyd @chris-b1 can you take a new look?

jorisvandenbossche · 2018-06-06T13:03:29Z

pandas/io/clipboards.py

-            pass
+        except TypeError:
+            warnings.warn('to_clipboard in excel mode requires a single \
+            character separator. Set "excel=False" or change the separator')


Can you use implicit line continuation instead of \ ?

Like

warnings.warn('to_clipboard in excel mode requires a single ' 'character separator. Set "excel=False" or change ' 'the separator')

jorisvandenbossche · 2018-06-06T13:07:31Z

And the failing tests are actually clipboard tests that are failing, so should be addressed.

…x-excel-clipboard

Fix Test

jorisvandenbossche

@david-liu-brattle-1 doesn't this need a test for the actual case that broke? (so the excel=True case)

david-liu-brattle-1 · 2018-06-23T19:18:29Z

@jorisvandenbossche
#21163 should have tests for all of the issues fixed in this PR. I think that one will be merged first? In which case adding more tests here would be redundant right?

jreback · 2018-06-26T22:21:44Z

@david-liu-brattle-1

can you rebase on master
add a whatsnew 0.23.2, IIRC this is a bug fix (IO section)
remove the setup.py that was added.

ping on green.

jorisvandenbossche · 2018-06-27T12:14:30Z

#21163 should have tests for all of the issues fixed in this PR. I think that one will be merged first? In which case adding more tests here would be redundant right?

Ah, OK, the other PR is merged now, so then in this PR the xfails should be removed?

david-liu-brattle-1 · 2018-06-27T23:03:05Z

@jreback
I think we're all set except for the setup.py file. I'm not sure what exactly is happening with that, from my end it doesn't look like there are any changes to the existing file and I don't think I created that file.

chris-b1 · 2018-06-27T23:12:33Z

Looks like file permissions were modified on setup.py (the 100755 → 100644 in the diff)

Should be able to just to reset to the current setup.py to back out the change git checkout origin/master setup.py

chris-b1 · 2018-06-27T23:17:54Z

See also - https://stackoverflow.com/questions/1257592/how-do-i-remove-files-saying-old-mode-100755-new-mode-100644-from-unstaged-cha - might have some funky git settings

david-liu-brattle-1 · 2018-06-27T23:27:28Z

@chris-b1
Thanks, that did the trick.

jreback

rather minor point, other lgtm.

jreback · 2018-06-28T00:26:30Z

pandas/io/clipboards.py

            buf = StringIO()
            # clipboard_set (pyperclip) expects unicode
            obj.to_csv(buf, sep=sep, encoding='utf-8', **kwargs)
            text = buf.getvalue()
-            if PY2:
+            if compat.PY2:


minor point but we generally just import PY2 directly (you use it in 2 places)

jreback · 2018-06-28T00:27:58Z

@WillAyd merge when satisfied

WillAyd · 2018-06-28T01:58:24Z

pandas/io/clipboards.py

+    if len(sep) > 1 and kwargs.get('engine') is None:
+        kwargs['engine'] = 'python'
+    elif len(sep) > 1 and kwargs.get('engine') == 'c':
+        warnings.warn('from_clipboard with regex separator does not work'


What was the reasoning for going with a warning here instead of an error? Curious what actually happens if this comes up (question maybe applicable to other errors as well)

I think that's a good question, we should check what read_csv does (warning or erroring)
But anyhow, if we change it to an error, that would be for 0.24.0 IMO, so another PR.

…-1-fix-excel-clipboard

jorisvandenbossche · 2018-06-29T09:42:16Z

pandas/io/clipboards.py

                text = text.decode('utf-8')
            clipboard_set(text)
            return
        except TypeError:
            warnings.warn('to_clipboard in excel mode requires a single '
-                          'character separator. Set "excel=False" or change '
-                          'the separator')


@david-liu-brattle-1 I removed the last sentence of this warning, as setting excel=False does not help as then you get the warning that the separator is ignored. It's simply that to_clipboard does not support multiple character separator at all.

jorisvandenbossche · 2018-06-29T12:24:46Z

@david-liu-brattle-1 Thanks a lot!

(cherry picked from commit dc45fba)

Fixed copy table to excel

742aa3b

Reverted a change in e1d5a27

WillAyd added the IO Excel read_excel, to_excel label May 18, 2018

WillAyd added this to the 0.23.1 milestone May 18, 2018

WillAyd requested changes May 18, 2018

View reviewed changes

Unit Test and whatsnew

a8c098d

david-liu-brattle-1 and others added 4 commits May 17, 2018 23:52

Revert "Unit Test and whatsnew"

1fee38f

This reverts commit a8c098d.

Merge remote-tracking branch 'upstream/master' into fix-excel-clipboard

5204489

Unit test for excel clipboard IO and updated whatsnew

fd1d3dd

PEP8

8439dfe

chris-b1 reviewed May 18, 2018

View reviewed changes

david-liu-brattle-1 added 3 commits May 18, 2018 10:37

Test for function default values

753e239

More robust clipboard tests

ba4bc36

Added additional test df containing common delimiter symbols and quotes. Added warning when attempting to copy excel format but an error is caught Default engine to "python" when reading clipboard with regex delimiter

Test for correct shape when results aren't expected to exactly match

ef8bf54

chris-b1 added the Needs Backport label May 18, 2018

david-liu-brattle-1 added 2 commits May 18, 2018 14:51

PEP8

4d8a1aa

Formatting

ce02a40

WillAyd requested changes May 19, 2018

View reviewed changes

david-liu-brattle-1 mentioned this pull request May 19, 2018

CI: Failure on Appveyor Master #21102

Closed

jreback requested changes May 19, 2018

View reviewed changes

david-liu-brattle-1 mentioned this pull request May 22, 2018

Cleanup clipboard tests #21163

Merged

jreback removed this from the 0.23.1 milestone Jun 4, 2018

david-liu-brattle-1 and others added 3 commits June 5, 2018 22:17

Merge branch 'master' into fix-excel-clipboard

b46159a

Rebase

f698ed6

Typo

2b7b891

jorisvandenbossche added this to the 0.23.1 milestone Jun 6, 2018

jorisvandenbossche reviewed Jun 6, 2018

View reviewed changes

david-liu-brattle-1 added 2 commits June 18, 2018 16:29

Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…

f7bc16f

…x-excel-clipboard

Fix test

30f5d78

Fix Test

jorisvandenbossche reviewed Jun 20, 2018

View reviewed changes

Added warning for excel=False and sep!=None

e363374

david-liu-brattle-1 and others added 4 commits June 27, 2018 10:49

Merge branch 'master' into fix-excel-clipboard

1a3a6d2

Removed xfail, add whatsnew

5013d67

Rebuild

24a650f

Typo fixes

5db662f

permissions

3939bf3

jreback approved these changes Jun 28, 2018

View reviewed changes

WillAyd reviewed Jun 28, 2018

View reviewed changes

jorisvandenbossche added 2 commits June 29, 2018 11:23

Merge remote-tracking branch 'upstream/master' into david-liu-brattle…

e50b752

…-1-fix-excel-clipboard

fix warning + small edits

676a58c

jorisvandenbossche reviewed Jun 29, 2018

View reviewed changes

jorisvandenbossche approved these changes Jun 29, 2018

View reviewed changes

jorisvandenbossche merged commit dc45fba into pandas-dev:master Jun 29, 2018

jorisvandenbossche removed the Needs Backport label Jul 2, 2018

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jul 2, 2018

BUG: to_clipboard fails to format output for Excel (pandas-dev#21111)

278e4f7

(cherry picked from commit dc45fba)

jorisvandenbossche pushed a commit that referenced this pull request Jul 5, 2018

BUG: to_clipboard fails to format output for Excel (#21111)

06d76e0

(cherry picked from commit dc45fba)

WillAyd mentioned this pull request Jul 10, 2018

to_clipboard does not use the default options #21836

Closed

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: to_clipboard fails to format output for Excel (pandas-dev#21111)

18959f9

	@pytest.mark.parametrize('writer_name, writer_kwargs, module', [
	('to_csv', {}, 'os'),
	('to_excel', {'engine': 'xlwt'}, 'xlwt'),
	('to_feather', {}, 'feather'),
	('to_html', {}, 'os'),
	('to_json', {}, 'os'),

Uh oh!

BUG: to_clipboard fails to format output for Excel #21111

BUG: to_clipboard fails to format output for Excel #21111

Uh oh!

Conversation

david-liu-brattle-1 commented May 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented May 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on June 29, 2018 at 09:40 Hours UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

jreback commented Jun 4, 2018

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Jun 6, 2018

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

david-liu-brattle-1 commented Jun 23, 2018

Uh oh!

jreback commented Jun 26, 2018

Uh oh!

jorisvandenbossche commented Jun 27, 2018

Uh oh!

david-liu-brattle-1 commented Jun 27, 2018

Uh oh!

chris-b1 commented Jun 27, 2018

Uh oh!

chris-b1 commented Jun 27, 2018

Uh oh!

david-liu-brattle-1 commented Jun 27, 2018

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

david-liu-brattle-1 commented May 17, 2018 •

edited

Loading

codecov bot commented May 18, 2018 •

edited

Loading

pep8speaks commented May 18, 2018 •

edited

Loading