BUG: Fix encoding for Stata format 118 files #21279

bashtage · 2018-05-31T23:35:47Z

Ensure that Stata 118 files always use utf-8 encoding

closes BUG: read_stata always uses 'utf8' #21244
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2018-06-01T02:41:29Z

Codecov Report

Merging #21279 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #21279   +/-   ##
=======================================
  Coverage   91.85%   91.85%           
=======================================
  Files         153      153           
  Lines       49549    49549           
=======================================
  Hits        45512    45512           
  Misses       4037     4037

Flag	Coverage Δ
#multiple	`90.25% <ø> (ø)`	⬆️
#single	`41.87% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b32fdc4...ac8c2c2. Read the comment docs.

pep8speaks · 2018-06-01T13:56:51Z

Hello @bashtage! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 05, 2018 at 07:20 Hours UTC

jorisvandenbossche · 2018-06-04T13:08:43Z

doc/source/whatsnew/v0.23.1.txt

@@ -23,7 +23,7 @@ New features
 Deprecations
 ~~~~~~~~~~~~

-
+- :func:`read_stata` and :class:`StataReader` have deprecated the ``encoding`` parameter. Stata files only support a single encoding and so this input has no effect. (:issue:`21244`)


I know you say it didn't have any effect before, but still, we should not introduce deprecation warnings in a bug fix release. So would either move this to 0.24.0 or split the PR

bashtage · 2018-06-04T20:47:07Z

I removed the deprecation.

Ensure that Stata 118 files always use utf-8 encoding

jreback · 2018-06-05T10:29:25Z

@bashtage so the fix looks good. can your restore the encoding paramater then can merge into 0.23.1 (and then come back with another PR to deprecate it)? thxs

bashtage · 2018-06-06T10:38:24Z

I removed the deprecation warning but encoding is now a no-op, which is the correct behavior.

I'd rather not restore the (potentially) broken behavior where users are allowed to set an invalid encoding (e.g. latin-1 for more recent file formats), and this can wait to 24 if you think this is too much change for a bug fix release.

jorisvandenbossche · 2018-06-06T12:53:36Z

I'd rather not restore the (potentially) broken behavior where users are allowed to set an invalid encoding (e.g. latin-1 for more recent file formats), and this can wait to 24 if you think this is too much change for a bug fix release.

Yes, we should not remove the keyword for 0.23.1, even when it was a no-op, because that breaks people code who inadvertently used it. So I would add it back here, but then a next PR to deprecate it if you want.

bashtage · 2018-06-06T13:05:14Z

The keyword is still available in read_Stata and Stata Reader and so code won't break.

…

On Wed, Jun 6, 2018, 13:54 Joris Van den Bossche ***@***.***> wrote: I'd rather not restore the (potentially) broken behavior where users are allowed to set an invalid encoding (e.g. latin-1 for more recent file formats), and this can wait to 24 if you think this is too much change for a bug fix release. Yes, we should not remove the keyword for 0.23.1, even when it was a no-op, because that breaks people code who inadvertently used it. So I would add it back here, but then a next PR to deprecate it if you want. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21279 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFU5Rahuwy53qij-xodhPX9G9cdjgDq3ks5t59DugaJpZM4UV4zG> .

jorisvandenbossche · 2018-06-06T13:12:05Z

Ah, sorry, missed that you only removed the internal passing through of the keyword. That's fine then!

jorisvandenbossche · 2018-06-06T13:15:34Z

@bashtage Thanks!

Ensure that Stata 118 files always use utf-8 encoding

bashtage force-pushed the stata-uft8-decode branch from bc2d45e to dcbbb32 Compare June 1, 2018 13:56

jschendel added Unicode Unicode strings IO Stata read_stata, to_stata labels Jun 1, 2018

bashtage force-pushed the stata-uft8-decode branch 2 times, most recently from 461dd94 to 4bdb2c4 Compare June 2, 2018 09:51

jorisvandenbossche reviewed Jun 4, 2018

View reviewed changes

bashtage force-pushed the stata-uft8-decode branch from 4bdb2c4 to c285003 Compare June 4, 2018 20:32

BUG: Fix encoding for Stata format 118 format files

ac8c2c2

Ensure that Stata 118 files always use utf-8 encoding

bashtage force-pushed the stata-uft8-decode branch from c285003 to ac8c2c2 Compare June 5, 2018 07:20

jreback modified the milestone: 0.23.1 Jun 5, 2018

jreback added this to the 0.23.1 milestone Jun 5, 2018

jorisvandenbossche approved these changes Jun 6, 2018

View reviewed changes

jorisvandenbossche merged commit fbb47d6 into pandas-dev:master Jun 6, 2018

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

BUG: Fix encoding for Stata format 118 format files (pandas-dev#21279)

97520b2

Ensure that Stata 118 files always use utf-8 encoding

bashtage deleted the stata-uft8-decode branch March 21, 2019 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fix encoding for Stata format 118 files #21279

BUG: Fix encoding for Stata format 118 files #21279

Uh oh!

bashtage commented May 31, 2018

Uh oh!

codecov bot commented Jun 1, 2018 •

edited

Loading

Uh oh!

pep8speaks commented Jun 1, 2018 •

edited

Loading

Uh oh!

jorisvandenbossche Jun 4, 2018

Uh oh!

bashtage commented Jun 4, 2018

Uh oh!

jreback commented Jun 5, 2018

Uh oh!

bashtage commented Jun 6, 2018

Uh oh!

jorisvandenbossche commented Jun 6, 2018

Uh oh!

bashtage commented Jun 6, 2018 via email

Uh oh!

jorisvandenbossche commented Jun 6, 2018

Uh oh!

jorisvandenbossche commented Jun 6, 2018

Uh oh!

Uh oh!

Uh oh!

BUG: Fix encoding for Stata format 118 files #21279

BUG: Fix encoding for Stata format 118 files #21279

Uh oh!

Conversation

bashtage commented May 31, 2018

Uh oh!

codecov bot commented Jun 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pep8speaks commented Jun 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on June 05, 2018 at 07:20 Hours UTC

Uh oh!

jorisvandenbossche Jun 4, 2018

Choose a reason for hiding this comment

Uh oh!

bashtage commented Jun 4, 2018

Uh oh!

jreback commented Jun 5, 2018

Uh oh!

bashtage commented Jun 6, 2018

Uh oh!

jorisvandenbossche commented Jun 6, 2018

Uh oh!

bashtage commented Jun 6, 2018 via email

Uh oh!

jorisvandenbossche commented Jun 6, 2018

Uh oh!

jorisvandenbossche commented Jun 6, 2018

Uh oh!

Uh oh!

codecov bot commented Jun 1, 2018 •

edited

Loading

pep8speaks commented Jun 1, 2018 •

edited

Loading