-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Allow poorly formatted stata files to be read #25967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #25967 +/- ##
==========================================
- Coverage 91.84% 91.83% -0.01%
==========================================
Files 175 175
Lines 52550 52550
==========================================
- Hits 48266 48261 -5
- Misses 4284 4289 +5
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25967 +/- ##
==========================================
- Coverage 91.84% 91.84% -0.01%
==========================================
Files 175 175
Lines 52550 52550
==========================================
- Hits 48266 48262 -4
- Misses 4284 4288 +4
Continue to review full report at Codecov.
|
Add a fall back decode path that allows improperly formatted Stata files written in 118 format but using latin-1 encoded strings to be read closes pandas-dev#25960
3f711fc
to
ddc806f
Compare
has been incorrectly encoded by Stata or some other software. You should verify | ||
the string values returned are correct.""" | ||
with pytest.warns(UnicodeWarning, match=msg): | ||
encoded = read_stata(self.dta_encoding_118) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use tm.assert_produces_warning
. Our implementation also checks stacklevel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this already checks
Refactor decode and null terminate to use file encoding
ddc806f
to
27a173d
Compare
lgtm. @bashtage if you can get the check_stacklevel to work would be great. |
I switched to |
actually this does check, thanks @bashtage |
Add a fall back decode path that allows improperly formatted Stata
files written in 118 format but using latin-1 encoded strings to be
read
closes #25960
git diff upstream/master -u -- "*.py" | flake8 --diff