Skip to content

TST: Match dta format used for Stata test files with that implied by their filenames #58192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

cmjcharlton
Copy link
Contributor

A few of the Stata test data files were not saved in the format implied by their file names. As these are binary files they are harder to compare, so I have included Stata output detailing their contents below:

dtatests_original.txt

dtatests_updated.txt

The Stata tests already include files in the formats that these were actually saved as, so converting them shouldn't lose any coverage.

After making the change I ran the following Stata script in the test directory, confirming that all the files now match their expected version:

local files : dir "." files "*.dta"
foreach file in `files' {
  dtaversion `file'
}
  (file "s4_educ1.dta" is .dta-format 105 from Stata 5)
  (file "stata-compat-105.dta" is .dta-format 105 from Stata 5)
  (file "stata-compat-108.dta" is .dta-format 108 from Stata 6)
  (file "stata-compat-111.dta" is .dta-format 111 from Stata 7)
  (file "stata-compat-113.dta" is .dta-format 113 from Stata 8 or 9)
  (file "stata-compat-114.dta" is .dta-format 114 from Stata 10 or 11)
  (file "stata-compat-118.dta" is .dta-format 118 from Stata 14, 15, 16, 17, or 18)
  (file "stata-dta-partially-labeled.dta" is .dta-format 118 from Stata 14, 15, 16, 17, or 18)
  (file "stata10_115.dta" is .dta-format 115 from Stata 12)
  (file "stata10_117.dta" is .dta-format 117 from Stata 13)
  (file "stata11_115.dta" is .dta-format 115 from Stata 12)
  (file "stata11_117.dta" is .dta-format 117 from Stata 13)
  (file "stata12_117.dta" is .dta-format 117 from Stata 13)
  (file "stata13_dates.dta" is .dta-format 117 from Stata 13)
  (file "stata14_118.dta" is .dta-format 118 from Stata 14, 15, 16, 17, or 18)
  (file "stata15.dta" is .dta-format 118 from Stata 14, 15, 16, 17, or 18)
  (file "stata16_118.dta" is .dta-format 118 from Stata 14, 15, 16, 17, or 18)
  (file "stata1_114.dta" is .dta-format 114 from Stata 10 or 11)
  (file "stata1_117.dta" is .dta-format 117 from Stata 13)
  (file "stata1_encoding.dta" is .dta-format 114 from Stata 10 or 11)
  (file "stata1_encoding_118.dta" is .dta-format 118 from Stata 14, 15, 16, 17, or 18)
  (file "stata2_113.dta" is .dta-format 113 from Stata 8 or 9)
  (file "stata2_114.dta" is .dta-format 114 from Stata 10 or 11)
  (file "stata2_115.dta" is .dta-format 115 from Stata 12)
  (file "stata2_117.dta" is .dta-format 117 from Stata 13)
  (file "stata3_113.dta" is .dta-format 113 from Stata 8 or 9)
  (file "stata3_114.dta" is .dta-format 114 from Stata 10 or 11)
  (file "stata3_115.dta" is .dta-format 115 from Stata 12)
  (file "stata3_117.dta" is .dta-format 117 from Stata 13)
  (file "stata4_113.dta" is .dta-format 113 from Stata 8 or 9)
  (file "stata4_114.dta" is .dta-format 114 from Stata 10 or 11)
  (file "stata4_115.dta" is .dta-format 115 from Stata 12)
  (file "stata4_117.dta" is .dta-format 117 from Stata 13)
  (file "stata5_113.dta" is .dta-format 113 from Stata 8 or 9)
  (file "stata5_114.dta" is .dta-format 114 from Stata 10 or 11)
  (file "stata5_115.dta" is .dta-format 115 from Stata 12)
  (file "stata5_117.dta" is .dta-format 117 from Stata 13)
  (file "stata6_113.dta" is .dta-format 113 from Stata 8 or 9)
  (file "stata6_114.dta" is .dta-format 114 from Stata 10 or 11)
  (file "stata6_115.dta" is .dta-format 115 from Stata 12)
  (file "stata6_117.dta" is .dta-format 117 from Stata 13)
  (file "stata7_111.dta" is .dta-format 111 from Stata 7)
  (file "stata7_115.dta" is .dta-format 115 from Stata 12)
  (file "stata7_117.dta" is .dta-format 117 from Stata 13)
  (file "stata8_113.dta" is .dta-format 113 from Stata 8 or 9)
  (file "stata8_115.dta" is .dta-format 115 from Stata 12)
  (file "stata8_117.dta" is .dta-format 117 from Stata 13)
  (file "stata9_115.dta" is .dta-format 115 from Stata 12)
  (file "stata9_117.dta" is .dta-format 117 from Stata 13)

@mroeschke mroeschke added Testing pandas testing functions or related to the test suite IO Stata read_stata, to_stata labels Apr 9, 2024
@mroeschke mroeschke added this to the 3.0 milestone Apr 9, 2024
@mroeschke mroeschke merged commit d065d92 into pandas-dev:main Apr 9, 2024
52 checks passed
@mroeschke
Copy link
Member

Thanks @cmjcharlton

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Stata read_stata, to_stata Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST: Some Stata test datasets do not match the expected version
2 participants