Skip to content

BUG, COMPAT: Fix multi-char sep + non-utf8 in read_csv for Python 2 #13812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Jul 27, 2016

Title is self-explanatory. Closes #3404.

@jorisvandenbossche
Copy link
Member

Not sure if so many different ones to test is not overkill, but looks good, thanks

@jorisvandenbossche jorisvandenbossche added Testing pandas testing functions or related to the test suite IO CSV read_csv, to_csv labels Jul 27, 2016
@jorisvandenbossche jorisvandenbossche added this to the 0.19.0 milestone Jul 27, 2016
@gfyoung
Copy link
Member Author

gfyoung commented Jul 28, 2016

@jorisvandenbossche : it was a good thing you asked about a specific test. You caught a Python 2 bug. Turns out when we have a StringIO (the Python 2 BytesIO) of encoded data, we can't wrap it with a TextIOWrapper and pass in the encoding parameter like we do in Python 3 so that we can decode it.

Normally, this would in fact be taken care by CsvReader, BUT that's only for standard delimiters. When we have a multi-char separator, we have to resort to an ad-hoc method of creating a generator, which does zero decoding.

In short, replace the Testing label with the Bug and Compat labels.

@gfyoung gfyoung changed the title TST: Add test for multi-char sep and non-utf8 in read_csv BUG, COMPAT: Fix multi-char sep + non-utf8 in read_csv for Python 2 Jul 28, 2016
@gfyoung gfyoung force-pushed the test-multichar-sep-non-utf8-encode branch from 61a5980 to e085fef Compare July 28, 2016 06:16
@codecov-io
Copy link

codecov-io commented Jul 28, 2016

Current coverage is 85.23% (diff: 50.00%)

Merging #13812 into master will increase coverage by <.01%

@@             master     #13812   diff @@
==========================================
  Files           140        140          
  Lines         50420      50422     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42975      42977     +2   
  Misses         7445       7445          
  Partials          0          0          

Powered by Codecov. Last update cc216ad...e085fef

@gfyoung gfyoung force-pushed the test-multichar-sep-non-utf8-encode branch from e085fef to b317fec Compare July 28, 2016 07:32
@gfyoung
Copy link
Member Author

gfyoung commented Jul 28, 2016

@jorisvandenbossche : Squashed the bug, and Travis is passing here (had to make a doc change after, hence the [ci skip]). Ready to merge if there are no concerns.

@jreback jreback merged commit 28b4b01 into pandas-dev:master Jul 28, 2016
@jreback
Copy link
Contributor

jreback commented Jul 28, 2016

thanks @gfyoung

@gfyoung gfyoung deleted the test-multichar-sep-non-utf8-encode branch July 29, 2016 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants