Skip to content

ENH: Add support for reading Stata 7 (non-SE) format dta files #47176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cmjcharlton opened this issue May 30, 2022 · 6 comments · Fixed by #58044
Closed

ENH: Add support for reading Stata 7 (non-SE) format dta files #47176

cmjcharlton opened this issue May 30, 2022 · 6 comments · Fixed by #58044
Labels
Enhancement IO Stata read_stata, to_stata

Comments

@cmjcharlton
Copy link
Contributor

Is your feature request related to a problem?

If I attempt to read in a dta file saved in Stata 7 format using the read_stata() function I get the following error message:

ValueError: Version of given Stata file is 110. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).

This is unfortunate as this is the version of the format used by default by R's write.dta() function in the foreign package.

Describe the solution you'd like

It would be nice if this version of the data format was supported, at least for reading, in the same manner as the other variants.

API breaking implications

None that I am aware of.

Describe alternatives you've considered

write.dta also supports saving to the versions 6, 8 and 10 of the dta format which are supported, so I could manually specify a different version instead. Alternatively I could save my data using the readstata13 or haven packages which both support more recent versions of the Stata dta format.

Another option would be to use a different data format entirely that is also supported by both R and Pandas.

Additional context

It appears that to implement support for this additional variant the following two changes are required:

  1. Edit the line:
    if self.format_version > 108:

To:

if self.format_version > 110:
  1. Change the line:
    if self.format_version not in [104, 105, 108, 111, 113, 114, 115]:

To include 110 in the list of supported variants, i.e.

if self.format_version not in [104, 105, 108, 110, 111, 113, 114, 115]:

After making these changes locally my version 7 files appear to load correctly.

@cmjcharlton cmjcharlton added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 30, 2022
@jbrockmendel jbrockmendel added the IO Stata read_stata, to_stata label Jun 8, 2022
@mroeschke mroeschke removed the Needs Triage Issue that has not been reviewed by a pandas team member label Oct 6, 2022
@jbrockmendel
Copy link
Member

cc @bashtage

@bashtage
Copy link
Contributor

Is the 110 format documented? I think we haven't supported 110 because there are no docs.

@cmjcharlton
Copy link
Contributor Author

The 110 format is documented in the book "Stata programming manual : release 7." (ISBN-10: 1881228525, ISBN-13: 9781881228523) on pages 88-94.

@bashtage
Copy link
Contributor

bashtage commented Jan 24, 2023

Anyone want to share the relevant pages, which I think should fall under fair use?

I also have it available through a local library, so we'll see what can be done.

@cmjcharlton
Copy link
Contributor Author

Here's an OCRed version of the text. There may be transcription errors but on the whole the text is very similar to that of other versions of the format, e.g. Stata 8.

stata-dta110.txt

@cmjcharlton
Copy link
Contributor Author

cmjcharlton commented Mar 9, 2024

After some discussion with Stata technical support I now have documentation for all of the missing format versions. The 103, 105, 108 and 110 documents are all based on the descriptions from the corresponding Stata manuals. The 102, 104 and 111 documents are my best guesses based on available information. I have however tested that I am able to use these format descriptions to read and write files that Stata 18 is happy to load.

dta_102.txt
dta_103.txt
dta_104.txt
dta_105.txt
dta_108.txt
dta_110.txt
dta_111.txt

dta-format changes.txt

sample-data.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants