-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Cannot write partitioned parquet file to S3 #27596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you verify that the
is correct? pyarrow may want a FileSystem-type thing. |
It sounds like usage of s3fs should largely be replaced with fsspec. Can somebody confirm that is true? I think the bug here is probably some clean up in the io/parquet.py to do with that but there might be plans already in progress? |
fsspec is a dependency of s3fs. It provides the backend-agnostic parts to
various filesystem-like things. s3fs is still the only relevant dependency
for pandas.
…On Thu, Sep 12, 2019 at 8:50 AM David Cottrell ***@***.***> wrote:
It sounds like usage of s3fs should largely be replaced with fsspec. Can
somebody confirm that is true? I think the bug here is probably some clean
up in the io/parquet.py to do with that but there might be plans already in
progress?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#27596?email_source=notifications&email_token=AAKAOIREUL4LYOGYZ422BNLQJJCKLA5CNFSM4IG7BWM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6R6QRQ#issuecomment-530835526>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITXVDAZEFXZGA72OLTQJJCKLANCNFSM4IG7BWMQ>
.
|
@TomAugspurger @cottrell Is this fixed, whats the work around? Please help. |
@getsanjeevdubey I think stills opened. You should write into disk and upload files to s3 manually |
Writing partitioned parquet to S3 is still an issue with Pandas 1.0.1, pyarrow 0.16, and s3fs 0.4. @TomAugspurger the @getsanjeevdubey you can work around this by giving PyArrow an S3FileSystem directly:
Of course you'll have to special-case this for S3 paths vs. other destinations for |
Yep looks like this is exactly the problem. Should be fixed after #33632 and if filesytem kwarg is passed |
closed by #33632 |
Apologies if this is a pyarrow issue.
Code Sample, a copy-pastable example if possible
Problem description
Fails with
AttributeError: 'NoneType' object has no attribute '_isfilestore'
Expected Output
Expected to see partitioned data show up in S3.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.21.3.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: None
pip: 19.0.3
setuptools: 41.0.0
Cython: 0.29.7
numpy: 1.16.2
scipy: 1.3.0
pyarrow: 0.14.0
xarray: None
IPython: 7.5.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.3
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: 0.3.0
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: