read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets #21616

sasutils · 2018-06-24T18:01:28Z

This problem has been reported twice on stackoverflow, but I cannot find any issue raised here to address it.

https://stackoverflow.com/questions/49059421/pandas-fails-with-correct-data-type-while-reading-a-sas-file
https://stackoverflow.com/questions/51005244/how-can-i-preserve-the-data-type-of-a-column-when-using-pandas-read-sas/51005427#51005427

read_sas() is not handling SAS numeric values that are stored using fewer than the full 8 bytes required for floating point numbers. It appears that instead of padding the short values with binary zeros that read_sas is using some other, perhaps randomly chosen, bytes to fill out the values.

In the first stackover flow example you can see that SAS has stored the value of the number 8. This would be represented in IEEE floating point as the 8 bytes represented by the hex string '40 20 00 00 00 00 00 00'. In the sas7bdat file SAS stored only the 3 most significant bytes. Instead of padding it with 5 bytes of zeros before converting it it looks like read_sas padded it with bytes represented by the hex codes '06 07 80 FD C1' instead. So that instead of reading the value as 8 it was read as 8.00046 .

gfyoung · 2018-06-25T20:22:47Z

cc @jreback

…as-dev#21616) Memory for numbers in sas7bdat-parsing was not initialized properly to 0. For sas7bdat files with numbers smaller than 8 bytes this made the least significant part of the numbers essentially random. Fix it by initializing memory correctly.

…) (#22651)

sasutils · 2018-09-15T16:17:53Z

Thanks.

…as-dev#21616) (pandas-dev#22651)

sasutils changed the title ~~read_sas does not handle variables stored with fewer than 8 bytes in SAS datasets~~ read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets Jun 24, 2018

gfyoung added IO Data IO issues that don't fit into a more specific label Numeric Operations Arithmetic, Comparison, and Logical operations Dtype Conversions Unexpected or buggy dtype conversions labels Jun 25, 2018

troels mentioned this issue Sep 9, 2018

BUG: Make sure that sas7bdat parsers memory is initialized to 0 (#21616) #22651

Merged

4 tasks

jreback added this to the 0.24.0 milestone Sep 15, 2018

jreback closed this as completed in #22651 Sep 15, 2018

jreback pushed a commit that referenced this issue Sep 15, 2018

BUG: Make sure that sas7bdat parsers memory is initialized to 0 (#21616…

307797c

…) (#22651)

aeltanawy pushed a commit to aeltanawy/pandas that referenced this issue Sep 20, 2018

BUG: Make sure that sas7bdat parsers memory is initialized to 0 (pand…

d950096

…as-dev#21616) (pandas-dev#22651)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018

BUG: Make sure that sas7bdat parsers memory is initialized to 0 (pand…

e9721ed

…as-dev#21616) (pandas-dev#22651)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets #21616

read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets #21616

sasutils commented Jun 24, 2018

gfyoung commented Jun 25, 2018

sasutils commented Sep 15, 2018

read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets #21616

read_sas does not handle numeric variables stored with fewer than 8 bytes in SAS datasets #21616

Comments

sasutils commented Jun 24, 2018

gfyoung commented Jun 25, 2018

sasutils commented Sep 15, 2018