IO: Fix S3 Error Handling #33645

alimcmaster1 · 2020-04-19T00:40:51Z

tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

alimcmaster1 · 2020-04-19T00:42:02Z

pandas/io/formats/csvs.py

@@ -62,7 +62,7 @@ def __init__(
        # Extract compression mode as given, if dict
        compression, self.compression_args = get_compression_method(compression)

-        self.path_or_buf, _, _, _ = get_filepath_or_buffer(
+        self.path_or_buf, _, _, self.should_close = get_filepath_or_buffer(


@TomAugspurger - you pointer here was the exact problem #32486 (comment)

alimcmaster1 · 2020-04-19T00:42:53Z

pandas/io/parquet.py

@@ -92,7 +92,7 @@ def write(
        **kwargs,
    ):
        self.validate_dataframe(df)
-        path, _, _, _ = get_filepath_or_buffer(path, mode="wb")
+        path, _, _, should_close = get_filepath_or_buffer(path, mode="wb")


For some reason we only have this logic in read. Should add to write too

hmm, can you open an issue for this

For which bit? File basically needs to be closed post write for s3fs to throw as expected. @TomAugspurger mention it here: #32486 (comment)

For some reason we only have this logic in read. Should add to write too

your comment here.

WillAyd · 2020-04-19T01:39:38Z

Seems pretty reasonable to me if green. Can you add a whatsnew?

alimcmaster1 · 2020-04-19T02:46:49Z

pandas/tests/io/test_gcs.py

@@ -56,7 +56,15 @@ def open(*args):

    monkeypatch.setattr("gcsfs.GCSFileSystem", MockGCSFileSystem)
    df1.to_csv("gs://test/test.csv", index=True)
-    df2 = read_csv(StringIO(s.getvalue()), parse_dates=["dt"], index_col=0)


This no longer works since file needs to be closed to validate credentials. #32486 (comment)

alimcmaster1 · 2020-04-19T02:49:16Z

pandas/tests/io/test_gcs.py

@@ -86,28 +94,6 @@ def open(self, path, mode="r", *args):
    )


-@td.skip_if_no("gcsfs")
-def test_gcs_get_filepath_or_buffer(monkeypatch):


Removed since def test_to_csv_gcs now tests same functionality and uses identical monkeypatch.

doc/source/whatsnew/v1.1.0.rst

TomAugspurger · 2020-04-20T11:28:57Z

pandas/tests/io/parser/test_network.py

+            tips_df.to_csv("s3://an_s3_bucket_data_doesnt_exit/not_real.csv")
+
+    def test_write_s3_parquet_fails(self, tips_df):
+        # GH 27679


Does this need an importorskip, or do we raise prior to importing the engine?

Added an importskip. Would raise ImportError if PyArrow or fastparquet isn’t installed.

can you add the decorator version instead

have done - and changed other importerskip in this file

Co-Authored-By: Tom Augspurger <[email protected]>

alimcmaster1 · 2020-04-20T19:03:53Z

Thanks for the review @TomAugspurger

jreback

lgtm. some minor comments. ping on green.

doc/source/whatsnew/v1.1.0.rst

jreback · 2020-04-20T22:11:51Z

pandas/io/parquet.py

@@ -92,7 +92,7 @@ def write(
        **kwargs,
    ):
        self.validate_dataframe(df)
-        path, _, _, _ = get_filepath_or_buffer(path, mode="wb")
+        path, _, _, should_close = get_filepath_or_buffer(path, mode="wb")


hmm, can you open an issue for this

pandas/tests/io/parser/test_network.py

jreback · 2020-04-20T22:12:47Z

pandas/tests/io/parser/test_network.py

+            tips_df.to_csv("s3://an_s3_bucket_data_doesnt_exit/not_real.csv")
+
+    def test_write_s3_parquet_fails(self, tips_df):
+        # GH 27679


can you add the decorator version instead

alimcmaster1 · 2020-04-21T11:11:56Z

green @jreback

jreback · 2020-04-21T12:39:12Z

thanks @alimcmaster1

Co-authored-by: alimcmaster1 <[email protected]>

Fix S3 Error Handling

b8e636b

alimcmaster1 added IO CSV read_csv, to_csv IO Parquet parquet, feather labels Apr 19, 2020

alimcmaster1 commented Apr 19, 2020

View reviewed changes

alimcmaster1 mentioned this pull request Apr 19, 2020

IO: Fix parquet read from s3 directory #33632

Merged

6 tasks

alimcmaster1 requested a review from TomAugspurger April 19, 2020 00:45

elif close

c891f22

WillAyd added this to the 1.1 milestone Apr 19, 2020

alimcmaster1 added 2 commits April 19, 2020 03:36

Fix gcs test

6c60dc1

Add whatsnew

b745e1d

alimcmaster1 commented Apr 19, 2020

View reviewed changes

Add whatsnew

31701e8

TomAugspurger reviewed Apr 20, 2020

View reviewed changes

alimcmaster1 and others added 2 commits April 20, 2020 16:45

Update doc/source/whatsnew/v1.1.0.rst

83ffcb6

Co-Authored-By: Tom Augspurger <[email protected]>

importer skip

25ca427

TomAugspurger approved these changes Apr 20, 2020

View reviewed changes

jreback requested changes Apr 20, 2020

View reviewed changes

Update as per jreback comments

7bd1c9f

jreback approved these changes Apr 21, 2020

View reviewed changes

jreback merged commit 6b2dd37 into pandas-dev:master Apr 21, 2020

alimcmaster1 deleted the mcmali-csv-s3 branch April 27, 2020 23:47

rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request May 10, 2020

IO: Fix S3 Error Handling (pandas-dev#33645)

2cfefdd

simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request May 14, 2020

Backport PR pandas-dev#33645 on branch 1.0.x (IO: Fix S3 Error Handling)

965981a

simonjayhawkins mentioned this pull request May 14, 2020

Backport PR #33645, #33632 and #34087 on branch 1.0.x #34173

Merged

simonjayhawkins added a commit that referenced this pull request May 14, 2020

Backport PR #33645, #33632 and #34087 on branch 1.0.x (#34173)

82c5ce2

Co-authored-by: alimcmaster1 <[email protected]>

simonjayhawkins modified the milestones: 1.1, 1.0.4 May 26, 2020

alimcmaster1 mentioned this pull request Sep 5, 2020

to_csv swallows exception when writing to S3 #30732

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IO: Fix S3 Error Handling #33645

IO: Fix S3 Error Handling #33645

alimcmaster1 commented Apr 19, 2020 •

edited

Loading

alimcmaster1 Apr 19, 2020

alimcmaster1 Apr 19, 2020

jreback Apr 20, 2020

alimcmaster1 Apr 21, 2020

jreback Apr 21, 2020

WillAyd commented Apr 19, 2020

alimcmaster1 Apr 19, 2020

alimcmaster1 Apr 19, 2020

TomAugspurger Apr 20, 2020

alimcmaster1 Apr 20, 2020

jreback Apr 20, 2020

alimcmaster1 Apr 21, 2020

alimcmaster1 commented Apr 20, 2020

jreback left a comment

jreback Apr 20, 2020

jreback Apr 20, 2020

alimcmaster1 commented Apr 21, 2020

jreback commented Apr 21, 2020

IO: Fix S3 Error Handling #33645

IO: Fix S3 Error Handling #33645

Conversation

alimcmaster1 commented Apr 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Apr 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimcmaster1 commented Apr 20, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimcmaster1 commented Apr 21, 2020

jreback commented Apr 21, 2020

alimcmaster1 commented Apr 19, 2020 •

edited

Loading