-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
StataReader.variable_labels() does not read variable label correctly for stata datasets saved under Stata 13 using 'save' (but it can read datasets saved using 'saveold') #7816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
docs are here: http://pandas.pydata.org/pandas-docs/stable/io.html#reading-from-stata-format something like:
look inside the |
closing as a usage question |
Hello, I'm sorry if I wasn't clear earlier. I did use the Can you please re-open this issue? It is still not resolved (I am using the latest Pandas master branch). Thanks. |
ok, so this is a feature/bug request then? ok |
cc @bashtage |
Yes it is a bug/feature request. I guess Stata changed something in how they save data files, which means that the Stata reader needs to be updated to accommodate this change. Many thanks! |
@shafiquejamal Would be helpful if you could share a simple example file .dta which produces the problem, as well as a v12 one that works. This looks like it is implemented in the v13 path - although it probably is buggy |
Certainly. I have a couple of .dta files of about 450kb each that I can share (problem dataset I tried dragging them into this comment window, but I'm getting this error at the bottom of this comment window: "Unfortunately, we don't support that file type. Try again with a PNG, GIF, or JPG." How can I share these .dta files with you? Thanks, |
@shafiquejamal put them up on a public dropbox / share site. I think you can do it via gist as well. and post the link here. |
Here is the dropbox link: https://www.dropbox.com/sh/4r0fhspsiwpim5p/AACBaC-lu7TaNPLUQQgU_rt4a So StataReader can handle the file ending in |
The bug, unfortunately, seems to be in stata. Stata's dta file definition claims that it gives the offset to the start of this segment as 1 of 14 8 byte values, in . Unfortunately, this value is 0 (0000 0000 0000 0000 in the file) in this file, and is 0 in 1 I just saved from Stata 13. The code appears to be a correct implementation of Stata's documented file format, so I'm not sure if this should be "fixed" (which would be to hack around Stata's problem). |
Thanks for looking into this so quickly. I'll see about contacting folks at Stata to see whether they can fix their documentation, which would then just justify modifying Pandas. To summarize then: the problem is that the offset (to the start of the segment in the dta file that defines the variable labels) should be 1, according to Stata's documentation (in Many thanks, |
I have submitted a patch that works around the difference between the docs and the implementation. The required value is technically unnecessary since it can be computed from other values. |
… files Stata's implementation does not match the online dta file format description. The solution used here is to directly compute the offset rather than reading it from the dta file. If Stata fixes their implementation, the original code can be restored. closes pandas-dev#7816
Thanks! Its working with my datasets. Cheers, |
If I use SataReader to read a Stata dataset saved in Stata 13 using the
save
command, I can get the data but not the variable labels.If, however, I use the
saveold
command in Stata 13, I am able to get the variable labels in Python3 usingStataReader.variable_labels()
.Can anyone suggest how to accommodate Stata 13? Thanks,
The text was updated successfully, but these errors were encountered: