Skip to content

TypeError with pd.read_sas() (pandas 0.18.0) #12647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
raderaj opened this issue Mar 16, 2016 · 12 comments
Closed

TypeError with pd.read_sas() (pandas 0.18.0) #12647

raderaj opened this issue Mar 16, 2016 · 12 comments
Labels
Bug IO SAS SAS: read_sas
Milestone

Comments

@raderaj
Copy link

raderaj commented Mar 16, 2016

Code Sample, a copy-pastable example if possible

df = pd.read_sas('infilename.sas7bdat')

Expected Output

to read a sas7bdat file into a pandas data frame.

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.0
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.0
pytz: 2016.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8

boto: 2.39.0

Error seen

TypeError Traceback (most recent call last)
in ()
----> 1 pd.read_sas('mydatainfo.sas7bdat')

C:\Anaconda\lib\site-packages\pandas\io\sas\sasreader.py in read_sas(filepath_or_buffer, format, index, encoding, chunksize, iterator)
52 reader = SAS7BDATReader(filepath_or_buffer, index=index,
53 encoding=encoding,
---> 54 chunksize=chunksize)
55 else:
56 raise ValueError('unknown SAS format')

C:\Anaconda\lib\site-packages\pandas\io\sas\sas7bdat.py in init(self, path_or_buf, index, convert_dates, blank_missing, chunksize, encoding)
234 self._path_or_buf = open(self._path_or_buf, 'rb')
235
--> 236 self._get_properties()
237 self._parse_metadata()
238

C:\Anaconda\lib\site-packages\pandas\io\sas\sas7bdat.py in _get_properties(self)
333 self.os_name = buf.rstrip(b'\x00 ').decode()
334 else:
--> 335 buf = self._path_or_buf.read(_os_maker_offset, _os_maker_length)
336 self.os_name = buf.rstrip(b'\x00 ').decode()
337

TypeError: read() takes at most 1 argument (2 given)


Tracking down the error:

It looks like line 335 should either be set up to read a single length from the _path_or_buf bytestream something like:
buf = self._path_or_buf.read(_os_maker_length)
or replace it with something like this
buf = self._read_bytes(_os_maker_offset, _os_maker_length)


** note i tried this under python3.5 and got the same error.

@jreback
Copy link
Contributor

jreback commented Mar 16, 2016

can you post a link to the file?

@jreback jreback added the IO SAS SAS: read_sas label Mar 16, 2016
@jreback
Copy link
Contributor

jreback commented Mar 16, 2016

cc @kshedden

@kshedden
Copy link
Contributor

Thanks for the report. I think using read_bytes would resolve this. But we don't have any test files that reach the else clause of this branch. The code here traces back to line 1450 of Jared Hobbs' code linked below (but I introduced the bug when refactoring):

https://bitbucket.org/jaredhobbs/sas7bdat/src/da1faa90d0b15c2c97a2a8eb86c91c58081bdd86/sas7bdat.py?fileviewer=file-view-default

If you could provide us with a test file that reaches else here, or give us a tip on where to find one that would be great. I am generating all the test files on a linux/intel box with a recent version of SAS.

@raderaj
Copy link
Author

raderaj commented Mar 17, 2016

@kshedden thanks. I found another small file that caused this error; but sas7bdat files are 'unsupported' by github issues. Should I create a branch and upload there?

@gdementen
Copy link
Contributor

fix (and test case linked in comments) in #12658

@stoffprof
Copy link

So this means the sas_read() method doesn't work at all on sas7bdat files, right?

@jreback
Copy link
Contributor

jreback commented Mar 30, 2016

well @Itzybitzy there is a linked issue which will close this. will be part of 0.18.1. IIRC there was issues with some encodings, but you can review the issues if you'd like. Of course any fixes welcome as well!

@kshedden
Copy link
Contributor

The current release works on many sas7bdat files, there are around 20 in
the test suite that all work fine. There was a minor bug that will be
fixed for 0.18.1.

On Wed, Mar 30, 2016 at 3:31 PM, itzybitzy [email protected] wrote:

So this means the sas_read() method doesn't work at all on sas7bdat files,
right?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#12647 (comment)

@stoffprof
Copy link

Thanks @kshedden. I still don't know much about how GitHub works and I'm not sure where to find the test suite. But I was asking because I was able to get the same error with a very simple data set, generated like this (in SAS):

libname tmp 'c:\temp';  
data tmp.test;
    do i=1 to 100;
        x=rannor(0);
        output;
    end;
run;

I'm wondering if this is causing pd.read_sas() to fail for the same reason, or if perhaps there's another bug somewhere.

@kshedden
Copy link
Contributor

The SAS7BDAT file specification is not public, so we have no idea what
issues might come up down the line. The problems I am aware of are due to
some very minor differences in the headers, which encode meta-data about
the file and the machine used to generate the file. We have fixed these in
a branch that should become part of 0.18.1.

We are also working on improving performance and hope to have a much faster
version of this ready soon.

On Thu, Mar 31, 2016 at 12:48 PM, itzybitzy [email protected]
wrote:

Thanks @kshedden https://github.com/kshedden. I still don't know much
about how GitHub works and I'm not sure where to find the test suite. But I
was asking because I was able to get the same error with a very simple data
set, generated like this (in SAS):

libname tmp 'c:\temp';
data tmp.test;
do i=1 to 100;
x=rannor(0);
output;
end;
run;

I'm wondering if this is causing pd.read_sas() to fail for the same
reason, or if perhaps there's another bug somewhere.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#12647 (comment)

@jreback
Copy link
Contributor

jreback commented Mar 31, 2016

@Itzybitzy if you'd like to save that file with SAS and provide it on a link, we could use it as an example (if its not already covered by the fixes that @kshedden have done).

@stoffprof
Copy link

@jreback Here's the sas data set. (I saved it as a zip so I could upload it here.)

test.zip

jreback pushed a commit to jreback/pandas that referenced this issue Apr 18, 2016
kshedden added a commit to kshedden/pandas that referenced this issue Apr 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO SAS SAS: read_sas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants