Skip to content

to_csv fails silently with s3fs #32486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gpkc opened this issue Mar 6, 2020 · 6 comments · Fixed by #33645
Closed

to_csv fails silently with s3fs #32486

gpkc opened this issue Mar 6, 2020 · 6 comments · Fixed by #33645
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@gpkc
Copy link

gpkc commented Mar 6, 2020

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.to_csv("s3://super_dumb_s3_bucket/not_real.csv")

Problem description

If you provide an invalid s3 path (e.g. a bucket that you don't have access to), this command fails silently.

Expected Output

Raise some exception at least.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit : None
python : 3.7.2.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-88-generic
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 19.0.2
setuptools : 40.8.0
Cython : None
pytest : 3.10.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@TomAugspurger
Copy link
Contributor

Hmm... thanks for the report.

I think the issue is the failure to call close in

if is_zip:
# zipfile doesn't support writing string to archive. uses string
# buffer to receive csv writing and dump into zip compression
# file handle. GH21241, GH21118
f = StringIO()
close = False
elif hasattr(self.path_or_buf, "write"):
f = self.path_or_buf
close = False
else:
f, handles = get_handle(
self.path_or_buf,
self.mode,
encoding=self.encoding,
compression=dict(self.compression_args, method=self.compression),
)
close = True
try:
# Note: self.encoding is irrelevant here
self.writer = csvlib.writer(
f,
lineterminator=self.line_terminator,
delimiter=self.sep,
quoting=self.quoting,
doublequote=self.doublequote,
escapechar=self.escapechar,
quotechar=self.quotechar,
)
self._save()
. S3FS only makes the request to validate the credentials when the file is actually closed, which we don't seem to do here.

Are you interested in debugging further?

@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Mar 6, 2020
@TomAugspurger TomAugspurger added the IO CSV read_csv, to_csv label Mar 6, 2020
@adamhadani
Copy link

👍 just ran into this myself. very dangerous silent failure..

@TomAugspurger
Copy link
Contributor

See also #32470. Can anyone see if reverting the relevant changes in d6fe194#diff-a37b395bed03f0404dec864a4529c97d (at least for write mode) is doable?

@jreback jreback modified the milestones: Contributions Welcome, 1.1 Apr 20, 2020
@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Apr 20, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1, 1.0.4 May 26, 2020
@ron-labau-inttra
Copy link

I just received this silent failure. I noticed a difference, when i used only the bucket name in the path_or_buf, I received the anonymous access error message. But when I added a "sub directory", i received no message at all, and it failed silently.

Received error message:
df.to_csv(path_or_buf="s3://my-bucket/{idx}.csv".format(idx=keyDate),...

Did not receive error message:
df.to_csv(path_or_buf="s3://my-bucket/csv/{idx}.csv".format(idx=keyDate),...

@shlomi-viz
Copy link

Any updates on this? how can I prevent this silent error? Thanks

@shlomi-viz
Copy link

Upgrading to pandas==1.2.4 fixed it for me, now I get the following error:

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateBucket operation: Access Denied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants