-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets #21616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Dtype Conversions
Unexpected or buggy dtype conversions
IO Data
IO issues that don't fit into a more specific label
Numeric Operations
Arithmetic, Comparison, and Logical operations
Milestone
Comments
cc @jreback |
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 9, 2018
…as-dev#21616) Memory for numbers in sas7bdat-parsing was not initialized properly to 0. For sas7bdat files with numbers smaller than 8 bytes this made the least significant part of the numbers essentially random. Fix it by initializing memory correctly.
4 tasks
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 11, 2018
…as-dev#21616) Memory for numbers in sas7bdat-parsing was not initialized properly to 0. For sas7bdat files with numbers smaller than 8 bytes this made the least significant part of the numbers essentially random. Fix it by initializing memory correctly.
troels
added a commit
to troels/pandas
that referenced
this issue
Sep 12, 2018
…as-dev#21616) Memory for numbers in sas7bdat-parsing was not initialized properly to 0. For sas7bdat files with numbers smaller than 8 bytes this made the least significant part of the numbers essentially random. Fix it by initializing memory correctly.
jreback
pushed a commit
that referenced
this issue
Sep 15, 2018
Thanks. |
aeltanawy
pushed a commit
to aeltanawy/pandas
that referenced
this issue
Sep 20, 2018
Sup3rGeo
pushed a commit
to Sup3rGeo/pandas
that referenced
this issue
Oct 1, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Dtype Conversions
Unexpected or buggy dtype conversions
IO Data
IO issues that don't fit into a more specific label
Numeric Operations
Arithmetic, Comparison, and Logical operations
This problem has been reported twice on stackoverflow, but I cannot find any issue raised here to address it.
https://stackoverflow.com/questions/49059421/pandas-fails-with-correct-data-type-while-reading-a-sas-file
https://stackoverflow.com/questions/51005244/how-can-i-preserve-the-data-type-of-a-column-when-using-pandas-read-sas/51005427#51005427
read_sas() is not handling SAS numeric values that are stored using fewer than the full 8 bytes required for floating point numbers. It appears that instead of padding the short values with binary zeros that read_sas is using some other, perhaps randomly chosen, bytes to fill out the values.
In the first stackover flow example you can see that SAS has stored the value of the number 8. This would be represented in IEEE floating point as the 8 bytes represented by the hex string '40 20 00 00 00 00 00 00'. In the sas7bdat file SAS stored only the 3 most significant bytes. Instead of padding it with 5 bytes of zeros before converting it it looks like read_sas padded it with bytes represented by the hex codes '06 07 80 FD C1' instead. So that instead of reading the value as 8 it was read as 8.00046 .
The text was updated successfully, but these errors were encountered: