Skip to content

IO: Fix S3 Error Handling #33645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 21, 2020
Merged

Conversation

alimcmaster1
Copy link
Member

@alimcmaster1 alimcmaster1 commented Apr 19, 2020

closes #27679
closes #32486

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@alimcmaster1 alimcmaster1 added IO CSV read_csv, to_csv IO Parquet parquet, feather labels Apr 19, 2020
@@ -62,7 +62,7 @@ def __init__(
# Extract compression mode as given, if dict
compression, self.compression_args = get_compression_method(compression)

self.path_or_buf, _, _, _ = get_filepath_or_buffer(
self.path_or_buf, _, _, self.should_close = get_filepath_or_buffer(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger - you pointer here was the exact problem #32486 (comment)

@@ -92,7 +92,7 @@ def write(
**kwargs,
):
self.validate_dataframe(df)
path, _, _, _ = get_filepath_or_buffer(path, mode="wb")
path, _, _, should_close = get_filepath_or_buffer(path, mode="wb")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason we only have this logic in read. Should add to write too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, can you open an issue for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For which bit? File basically needs to be closed post write for s3fs to throw as expected. @TomAugspurger mention it here: #32486 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason we only have this logic in read. Should add to write too

your comment here.

@WillAyd WillAyd added this to the 1.1 milestone Apr 19, 2020
@WillAyd
Copy link
Member

WillAyd commented Apr 19, 2020

Seems pretty reasonable to me if green. Can you add a whatsnew?

@@ -56,7 +56,15 @@ def open(*args):

monkeypatch.setattr("gcsfs.GCSFileSystem", MockGCSFileSystem)
df1.to_csv("gs://test/test.csv", index=True)
df2 = read_csv(StringIO(s.getvalue()), parse_dates=["dt"], index_col=0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer works since file needs to be closed to validate credentials. #32486 (comment)

@@ -86,28 +94,6 @@ def open(self, path, mode="r", *args):
)


@td.skip_if_no("gcsfs")
def test_gcs_get_filepath_or_buffer(monkeypatch):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed since def test_to_csv_gcs now tests same functionality and uses identical monkeypatch.

tips_df.to_csv("s3://an_s3_bucket_data_doesnt_exit/not_real.csv")

def test_write_s3_parquet_fails(self, tips_df):
# GH 27679
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need an importorskip, or do we raise prior to importing the engine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an importskip. Would raise ImportError if PyArrow or fastparquet isn’t installed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the decorator version instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have done - and changed other importerskip in this file

@alimcmaster1
Copy link
Member Author

Thanks for the review @TomAugspurger

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some minor comments. ping on green.

@@ -92,7 +92,7 @@ def write(
**kwargs,
):
self.validate_dataframe(df)
path, _, _, _ = get_filepath_or_buffer(path, mode="wb")
path, _, _, should_close = get_filepath_or_buffer(path, mode="wb")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, can you open an issue for this

tips_df.to_csv("s3://an_s3_bucket_data_doesnt_exit/not_real.csv")

def test_write_s3_parquet_fails(self, tips_df):
# GH 27679
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the decorator version instead

@alimcmaster1
Copy link
Member Author

green @jreback

@jreback jreback merged commit 6b2dd37 into pandas-dev:master Apr 21, 2020
@jreback
Copy link
Contributor

jreback commented Apr 21, 2020

thanks @alimcmaster1

@alimcmaster1 alimcmaster1 deleted the mcmali-csv-s3 branch April 27, 2020 23:47
rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request May 10, 2020
simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request May 14, 2020
simonjayhawkins added a commit that referenced this pull request May 14, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1, 1.0.4 May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv IO Parquet parquet, feather
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_csv fails silently with s3fs to_parquet swallows NoCredentialsError
5 participants