Skip to content

DEPR/Community: spin off s3 support #45433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jan 17, 2022 · 5 comments
Closed

DEPR/Community: spin off s3 support #45433

jbrockmendel opened this issue Jan 17, 2022 · 5 comments
Labels
CI Continuous Integration Deprecate Functionality to remove in pandas IO Network Local or Cloud (AWS, GCS, etc.) IO Issues

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Jan 17, 2022

A disproportionate amount of our CI failures are in s3 tests e.g. test_with_s3_url. Moreover we have 2 xfailed tests with strict=False: test_write_s3_parquet_fails and test_write_s3_csv_fails xref #39155 where AFAICT we have no idea how to fix them.

This suggests we might not be the right team to maintain this feature. Is there a better home for this?

(The other big pain point CI-wise is the test_user_agent tests which cause all the timeouts)

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 17, 2022
@twoertwein
Copy link
Member

Pandas uses fsspec to support s3 and all other kinds of protocols. Instead of testing s3 we could use any(?) other fsspec protocol that seems more reliable.

@jreback
Copy link
Contributor

jreback commented Jan 18, 2022

this sounds like a great idea

IIRC we have a lot of these tests prior to using fssc so certainly can use a non network protocol

@mroeschke
Copy link
Member

mroeschke commented Jan 18, 2022

I think the specific reason why those s3 tests fail is because the s3_resource fixture populates and deletes files from "s3" (moto) after each test, and in a muti-process CI environment this leads to flaky race condition errors. Running these tests in a single process environment would probably fix them: #44584

That being said, +1 to just using fsspec instead of s3f3 & gcsfs

@mroeschke mroeschke added CI Continuous Integration IO Network Local or Cloud (AWS, GCS, etc.) IO Issues Deprecate Functionality to remove in pandas and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 18, 2022
@twoertwein
Copy link
Member

twoertwein commented Jan 18, 2022

If #44584 doesn't work out:

Migrating frequently failing s3 tests to other fsspec protocols sounds like a good idea to me. I think we shouldn't remove all s3/gcfs tests:

  • There seems to be a bit of s3-specific code:
    if filepath_or_buffer.startswith("s3a://"):
  • And it might be good to keep at least four s3 tests: read_csv/to_csv both w/wo compression to make sure that text/binary mode is correctly inferred.

@jbrockmendel
Copy link
Member Author

No momentum for spin-off, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Deprecate Functionality to remove in pandas IO Network Local or Cloud (AWS, GCS, etc.) IO Issues
Projects
None yet
Development

No branches or pull requests

4 participants