Skip to content

BUG: Fix encoding for Stata format 118 files #21279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 6, 2018

Conversation

bashtage
Copy link
Contributor

Ensure that Stata 118 files always use utf-8 encoding

@codecov
Copy link

codecov bot commented Jun 1, 2018

Codecov Report

Merging #21279 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #21279   +/-   ##
=======================================
  Coverage   91.85%   91.85%           
=======================================
  Files         153      153           
  Lines       49549    49549           
=======================================
  Hits        45512    45512           
  Misses       4037     4037
Flag Coverage Δ
#multiple 90.25% <ø> (ø) ⬆️
#single 41.87% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b32fdc4...ac8c2c2. Read the comment docs.

@bashtage bashtage force-pushed the stata-uft8-decode branch from bc2d45e to dcbbb32 Compare June 1, 2018 13:56
@pep8speaks
Copy link

pep8speaks commented Jun 1, 2018

Hello @bashtage! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 05, 2018 at 07:20 Hours UTC

@jschendel jschendel added Unicode Unicode strings IO Stata read_stata, to_stata labels Jun 1, 2018
@bashtage bashtage force-pushed the stata-uft8-decode branch 2 times, most recently from 461dd94 to 4bdb2c4 Compare June 2, 2018 09:51
@@ -23,7 +23,7 @@ New features
Deprecations
~~~~~~~~~~~~

-
- :func:`read_stata` and :class:`StataReader` have deprecated the ``encoding`` parameter. Stata files only support a single encoding and so this input has no effect. (:issue:`21244`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you say it didn't have any effect before, but still, we should not introduce deprecation warnings in a bug fix release. So would either move this to 0.24.0 or split the PR

@bashtage bashtage force-pushed the stata-uft8-decode branch from 4bdb2c4 to c285003 Compare June 4, 2018 20:32
@bashtage
Copy link
Contributor Author

bashtage commented Jun 4, 2018

I removed the deprecation.

Ensure that Stata 118 files always use utf-8 encoding
@bashtage bashtage force-pushed the stata-uft8-decode branch from c285003 to ac8c2c2 Compare June 5, 2018 07:20
@jreback jreback modified the milestone: 0.23.1 Jun 5, 2018
@jreback
Copy link
Contributor

jreback commented Jun 5, 2018

@bashtage so the fix looks good. can your restore the encoding paramater then can merge into 0.23.1 (and then come back with another PR to deprecate it)? thxs

@jreback jreback added this to the 0.23.1 milestone Jun 5, 2018
@bashtage
Copy link
Contributor Author

bashtage commented Jun 6, 2018

I removed the deprecation warning but encoding is now a no-op, which is the correct behavior.

I'd rather not restore the (potentially) broken behavior where users are allowed to set an invalid encoding (e.g. latin-1 for more recent file formats), and this can wait to 24 if you think this is too much change for a bug fix release.

@jorisvandenbossche
Copy link
Member

I'd rather not restore the (potentially) broken behavior where users are allowed to set an invalid encoding (e.g. latin-1 for more recent file formats), and this can wait to 24 if you think this is too much change for a bug fix release.

Yes, we should not remove the keyword for 0.23.1, even when it was a no-op, because that breaks people code who inadvertently used it. So I would add it back here, but then a next PR to deprecate it if you want.

@bashtage
Copy link
Contributor Author

bashtage commented Jun 6, 2018 via email

@jorisvandenbossche
Copy link
Member

Ah, sorry, missed that you only removed the internal passing through of the keyword. That's fine then!

@jorisvandenbossche jorisvandenbossche merged commit fbb47d6 into pandas-dev:master Jun 6, 2018
@jorisvandenbossche
Copy link
Member

@bashtage Thanks!

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
Ensure that Stata 118 files always use utf-8 encoding
@bashtage bashtage deleted the stata-uft8-decode branch March 21, 2019 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Stata read_stata, to_stata Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_stata always uses 'utf8'
5 participants