-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pandas cannot actually read Stata file format 104 #26667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @bashtage does this make sense? |
By design. AFAIK there is no documentation for this ancient format. |
I should add that there are no dta test files from this Era. I think the
earliest is 108 so it is hard to claim compact from earlier. I suspect it
drifted over time since there was nothing checking that the changes were
correct against old files.
…On Wed, Jun 5, 2019, 17:39 Tom Augspurger ***@***.***> wrote:
cc @bashtage <https://github.com/bashtage> does this make sense?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26667?email_source=notifications&email_token=ABKTSRPKSV6T6YOJ6SR4R2TPY7T3ZA5CNFSM4HT3JVPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAJS6I#issuecomment-499161465>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKTSRNZO5B5I7TVYGG33MTPY7T3ZANCNFSM4HT3JVPA>
.
|
Sounds good. Would you recommend bumping our minimal supported stata version? |
Should figure out the oldest dta file available for testing and probably use that as the minimum officially supported. If someone could provide some 104 files (and a dump of their contents) then it could be possible to verify the 104 compliance. |
Just looked it up, dta file format 113 comes from 2003, so I can't imagine when a dta 104 file dates to. |
Per Stata, version 4 was released in 1995. |
I found a format 108 file online and it can read the numeric values but doesn't handle all of the data correctly. Might see if I can figure out a patch. |
Test against old Stata versions and remove text indicating support for versions which do not work reliably closes pandas-dev#26667
Test against old Stata versions and remove text indicating support for versions which do not work reliably closes pandas-dev#26667
Test against old Stata versions and remove text indicating support for versions which do not work reliably closes pandas-dev#26667
Test against old Stata versions and remove text indicating support for versions which do not work reliably closes #26667
Code Sample, a copy-pastable example if possible
Data usage agreements prevent me from attaching a Stata file of format version 104, so I cannot easily provide a copy/paste example. This is unfortunate as I have several of these files...
Problem description
The
stata.py
module will provide error messages that it can only read certain versions and explicitly enumerates version 104 both in the_version_error
string and inStataReader._read_old_header
However,
StataReader
cannot read files with version 104. This is becauseStataReader.__init__
calls_read_header
code_read_header
calls_read_old_header
code_read_old_header
calls_get_time_stamp
code_get_time_stamp
raises aValueError
for format versions > 104 codeNone of this behavior can be overridden or otherwise configured as part of instantiating
pandas.io.stata.StataReader
Expected Output
Remove 104 as a supported format and error:
ValueError: Version of given Stata file is not 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), or 118 (Stata 14)
Alternatively
_get_time_stamp
could be changed to return""
or a similar value to reflect that Stata did not include a file timestamp in that format version. My experiments subclassingStataReader
to suppress the error, read the data file in as a df, and export to CSV yield a file equivalent to an equivalentexport delimited
call from Stata.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-50-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: None
pip: 19.1.1
setuptools: 41.0.1
Cython: None
numpy: 1.16.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: