Skip to content

CI testing for s3 reads from public buckets on 1.0.x - DO NOT MERGE #34793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

simonjayhawkins
Copy link
Member

xref #34626

@simonjayhawkins simonjayhawkins added this to the 1.0.5 milestone Jun 15, 2020
@simonjayhawkins
Copy link
Member Author

@jorisvandenbossche this works for me locally. If you can have a look at the failures in Travis and let me know if we're confident that #34626 is fixed on 1.0.x

@jorisvandenbossche
Copy link
Member

Does the test run on one of the azure builds as well?

@jorisvandenbossche
Copy link
Member

cc @TomAugspurger do you have any experience with this? On travis, the S3 public bucket reading is failing with boto / permission errors.

I checked out the 1.0.x branch locally, and can confirm it is working locally.

@TomAugspurger
Copy link
Contributor

Not sure. I notice that the two jobs with failing moto / boto issues have moto==1.3.7. https://travis-ci.org/github/pandas-dev/pandas/jobs/698478225#L1128 has 1.3.14. Perhaps we failed to backport a dependency update?

@jorisvandenbossche
Copy link
Member

But moto shouldn't be used for that test? (unless that is done automatically?)

@TomAugspurger
Copy link
Contributor

Ah I missed that this was for the public bucket. You're correct, I think moto shouldn't be involed here..

@TomAugspurger
Copy link
Contributor

When you're testing locally, do you run the whole file or just the single test? s3fs does cache things, including the S3FSFileSystem constructor, which might cause this.

@simonjayhawkins
Copy link
Member Author

Does the test run on one of the azure builds as well?

i'll create a failing test to make sure

@jorisvandenbossche
Copy link
Member

When you're testing locally, do you run the whole file or just the single test? s3fs does cache things, including the S3FSFileSystem constructor, which might cause this.

I just ran pd.read_csv("s3://nyc-tlc/misc/taxi _zone_lookup.csv") in a console

@jorisvandenbossche
Copy link
Member

But running it with pytest pandas/tests/io/test_s3.py also works

@simonjayhawkins
Copy link
Member Author

if we get the tests passing, does pandas have an amazon account where we can place a test file in a public bucket?

@simonjayhawkins
Copy link
Member Author

hmm travis didn't run.

Does the test run on one of the azure builds as well?

on 3 azure builds, will revert the fail shortly

This reverts commit c7be2cd.
@jorisvandenbossche
Copy link
Member

OK, if it thus was passing on azure, I think we can be confident enough for the 1.0.5 release that this is properly working. And can then figure out the travis failure later when we fix this in master

@simonjayhawkins
Copy link
Member Author

OK, if it thus was passing on azure, I think we can be confident enough for the 1.0.5 release that this is properly working. And can then figure out the travis failure later when we fix this in master

OK closing

@jorisvandenbossche
Copy link
Member

We could also already merge it (but then in the master branch), with an xfail, if you want

@simonjayhawkins
Copy link
Member Author

i'm not sure about the ethics of using someones public bucket for testing

@simonjayhawkins simonjayhawkins deleted the test-for-34626 branch June 15, 2020 17:51
@alimcmaster1
Copy link
Member

alimcmaster1 commented Jun 16, 2020

Ref my comment here: #34626 (comment)

I couldn’t get a test for public buckets working using moto.

I propose we use an Amazon Open Data public s3 bucket to test https://aws.amazon.com/opendata/public-datasets/

We can use the tm.network decorator? Would be helpful to catch future regressions - especially since we are moving to fsspec. Thoughts @jorisvandenbossche @simonjayhawkins

I can open a PR.

@jorisvandenbossche
Copy link
Member

Using an existing public s3 bucket seems fine for this (we should just find a small dataset).
And indeed, we have network tests, so we can just test this for real, without moto, with the network decorator. Thanks @alimcmaster1 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants