Skip to content

[WIP]REGR: Fixed reading from public S3 buckets with credentials #34866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

TomAugspurger
Copy link
Contributor

Closes #34626

This works in 1.0.4 I think, so no whatsnew.

I'd like to wait for #34266 to get in first. I'll need to update things.

And in the future, I'd like to deprecate this behavior in favor of something like

pd.read_csv("s3://path/to/key", storage_options={"anon": True})

It should be easy to detect the cases when we need to do that. We just don't have the storage_options kwarg yet.

Closes pandas-dev#34626

This works in 1.0.4 I think, so no whatsnew.
@TomAugspurger TomAugspurger added this to the 1.1 milestone Jun 18, 2020
@TomAugspurger TomAugspurger added the IO CSV read_csv, to_csv label Jun 18, 2020
@TomAugspurger TomAugspurger changed the title REGR: Fixed reading from public S3 buckets with credentials [WIP]REGR: Fixed reading from public S3 buckets with credentials Jun 18, 2020
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment otherwise lgtm

@pytest.mark.slow
def test_read_s3_public():
# ensure we can read from a public bucket with credentials
pytest.importorskip("s3fs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the skip_if_no decorator from pandas/util/_test_decorators here instead?

@alimcmaster1
Copy link
Member

Ahh just realised we are looking at the same thing @TomAugspurger . I think the case #34626 is trying to address is reading without AWS Credentials right? I've put a test together for this: #34877

@jorisvandenbossche
Copy link
Member

The credentials @TomAugspurger is adding here are dummy (non-working) credentials to avoid a moto failure. But: since this is reading from an actual s3, and nothing is being mocked, moto should not be involved here?

@alimcmaster1
Copy link
Member

The credentials @TomAugspurger is adding here are dummy (non-working) credentials to avoid a moto failure. But: since this is reading from an actual s3, and nothing is being mocked, moto should not be involved here?

Agree - think it should be as simple as: https://github.com/pandas-dev/pandas/pull/34877/files#diff-681243e864617f600083a270b329d17dR36 - unless i'm missing something?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 20, 2020 via email

@alimcmaster1
Copy link
Member

@TomAugspurger - my understanding was:

import pandas as pd
res = pd.read_csv("s3://gdelt-open-data/events/1981.csv", nrows=3)
  1. With AWS Credentials in my Env:
    pandas 1.0.3 & 1.0.4 -> Works

  2. Without AWS Credentials in Env
    1.0.3 - Works
    1.0.4 - NoCredentialsError: Unable to locate credentials

I think (2) was the regression in #34626 - hence my PR here

But also think we should merge the case you are testing here ~ Dummy AWS Creds since maybe that tests (1) here?

@TomAugspurger
Copy link
Contributor Author

Sounds good. Can you add the tests from my PR and I'll just close mine?

Regardless, I think we should wait till #34266 is in.

@alimcmaster1
Copy link
Member

alimcmaster1 commented Jun 22, 2020

Sounds good. Can you add the tests from my PR and I'll just close mine?

Regardless, I think we should wait till #34266 is in.

Sure and apologies for the mix up. I've added this test to the PR here: #34877

Agree re waiting for fsspec.

@TomAugspurger
Copy link
Contributor Author

Closing in favor of #34877.

@TomAugspurger TomAugspurger deleted the s3-test branch June 23, 2020 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: s3 reads from public buckets not working
4 participants