-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Reading SAS files from AWS S3 #13939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
actually this should already work, have you tried it? this goes thru the same filepath as is used by read_csv |
I have tried it, but unfortunately it doesn't work. I get the following error:
The documentation doesn't mention the ability to read from S3 as well Pandas.read_sas Output from pd.show_versions():
|
hmm, this may require something more then. cc @kshedden |
switching over to s3fs should handle this, as it does implement Although I thing there's a second bug in In [31]: with open('test1.sas7bdat', 'rb') as f:
...: pd.read_sas(f)
...:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-31-85bbca72eb65> in <module>()
1 with open('test1.sas7bdat') as f:
----> 2 pd.read_sas(f)
3
/Users/tom.augspurger/Envs/surveyer/lib/python3.5/site-packages/pandas/io/sas/sasreader.py in read_sas(filepath_or_buffer, format, index, encoding, chunksize, iterator)
43 pass
44
---> 45 if format.lower() == 'xport':
46 from pandas.io.sas.sas_xport import XportReader
47 reader = XportReader(filepath_or_buffer, index=index,
AttributeError: 'NoneType' object has no attribute 'lower' will make an issue later. Working around that, and reading from s3, In [33]: from pandas.io.sas.sas7bdat import SAS7BDATReader
In [34]: fs = s3fs.S3FileSystem(anon=False)
In [35]: f = fs.open('s3://<bucket>/test.sas7bdat')
In [36]: SAS7BDATReader(f).read()
Out[36]:
Column1 Column2 Column3 Column4 Column5 Column6 Column7 \
0 0.636 b'pear' 84.0 1965-12-10 0.103 b'apple' 20.0
1 0.283 b'dog' 49.0 1977-03-07 0.398 b'pear' 50.0
2 0.452 b'pear' 35.0 1983-08-15 0.117 b'pear' 70.0
3 0.557 b'dog' 29.0 1974-06-28 0.640 b'pear' 34.0
4 0.138 NaN 55.0 1965-03-18 0.583 b'crocodile' 34.0
5 0.948 b'dog' 33.0 1984-07-15 0.691 b'pear' NaN
6 0.162 b'crocodile' 17.0 1982-06-03 0.002 b'pear' 30.0
7 0.148 b'crocodile' 37.0 1964-10-06 0.411 b'apple' 23.0
8 NaN b'pear' 15.0 1970-01-27 0.102 b'pear' 1.0
9 0.663 b'pear' NaN 1981-03-06 0.086 b'apple' 80.0 |
@TomAugspurger Are you talking about this s3fs. |
@ankitdhingra yeah, sorry. See #11915 for context. We were having issues with boto2 and are probably going to depend on s3fs for S3-related stuff in the future. That'll make supporting things like this (and writing s3 files) easier. I jsut haven't had time to finish that pull-request. In the meantime, getting a filebuffer object from |
I think this fully works with fsspec support now so closing |
Currently the
read_sas
method doesn't support reading SAS7BDAT files from AWS S3 likeread_csv
.Can it be added?
The text was updated successfully, but these errors were encountered: