-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TypeError with pd.read_sas() (pandas 0.18.0) #12647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can you post a link to the file? |
cc @kshedden |
Thanks for the report. I think using read_bytes would resolve this. But we don't have any test files that reach the If you could provide us with a test file that reaches |
@kshedden thanks. I found another small file that caused this error; but sas7bdat files are 'unsupported' by github issues. Should I create a branch and upload there? |
fix (and test case linked in comments) in #12658 |
So this means the sas_read() method doesn't work at all on sas7bdat files, right? |
well @Itzybitzy there is a linked issue which will close this. will be part of 0.18.1. IIRC there was issues with some encodings, but you can review the issues if you'd like. Of course any fixes welcome as well! |
The current release works on many sas7bdat files, there are around 20 in On Wed, Mar 30, 2016 at 3:31 PM, itzybitzy [email protected] wrote:
|
Thanks @kshedden. I still don't know much about how GitHub works and I'm not sure where to find the test suite. But I was asking because I was able to get the same error with a very simple data set, generated like this (in SAS):
I'm wondering if this is causing |
The SAS7BDAT file specification is not public, so we have no idea what We are also working on improving performance and hope to have a much faster On Thu, Mar 31, 2016 at 12:48 PM, itzybitzy [email protected]
|
@Itzybitzy if you'd like to save that file with SAS and provide it on a link, we could use it as an example (if its not already covered by the fixes that @kshedden have done). |
Code Sample, a copy-pastable example if possible
df = pd.read_sas('infilename.sas7bdat')
Expected Output
to read a sas7bdat file into a pandas data frame.
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.0
nose: 1.3.7
pip: 8.1.0
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.0
pytz: 2016.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
Error seen
TypeError Traceback (most recent call last)
in ()
----> 1 pd.read_sas('mydatainfo.sas7bdat')
C:\Anaconda\lib\site-packages\pandas\io\sas\sasreader.py in read_sas(filepath_or_buffer, format, index, encoding, chunksize, iterator)
52 reader = SAS7BDATReader(filepath_or_buffer, index=index,
53 encoding=encoding,
---> 54 chunksize=chunksize)
55 else:
56 raise ValueError('unknown SAS format')
C:\Anaconda\lib\site-packages\pandas\io\sas\sas7bdat.py in init(self, path_or_buf, index, convert_dates, blank_missing, chunksize, encoding)
234 self._path_or_buf = open(self._path_or_buf, 'rb')
235
--> 236 self._get_properties()
237 self._parse_metadata()
238
C:\Anaconda\lib\site-packages\pandas\io\sas\sas7bdat.py in _get_properties(self)
333 self.os_name = buf.rstrip(b'\x00 ').decode()
334 else:
--> 335 buf = self._path_or_buf.read(_os_maker_offset, _os_maker_length)
336 self.os_name = buf.rstrip(b'\x00 ').decode()
337
TypeError: read() takes at most 1 argument (2 given)
Tracking down the error:
It looks like line 335 should either be set up to read a single length from the _path_or_buf bytestream something like:
buf = self._path_or_buf.read(_os_maker_length)
or replace it with something like this
buf = self._read_bytes(_os_maker_offset, _os_maker_length)
** note i tried this under python3.5 and got the same error.
The text was updated successfully, but these errors were encountered: