Skip to content

ENH: Add ability to specify AWS port through AWS_S3_PORT environment variable #16662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ivansabik opened this issue Jun 11, 2017 · 2 comments
Closed

Comments

@ivansabik
Copy link

ivansabik commented Jun 11, 2017

When trying to integrate Pandas CSV reading from s3 for local development (in docker) with containers from LocalStack or Minio we need to be able to define a custom host as well as a port.

PR #12198 introduces the AWS_S3_HOST environment variable, I propose adding the AWS_S3_PORT one. Something like:

s3_host = os.environ.get('AWS_S3_HOST','s3.amazonaws.com')
s3_port = os.environ.get('AWS_S3_PORT')

try:
    conn = boto.connect_s3(host=s3_host, port=s3_port)
except boto.exception.NoAuthHandlerFound:
     conn = boto.connect_s3(host=s3_host,anon=True, port=s3_port)

This would allow to define something like this in the docker-compose.yml and use Minio for serving the csv files from a local s3 for development and AWS for production:

environment:
  - AWS_ACCESS_KEY_ID=supersecret
  - AWS_SECRET_ACCESS_KEY=supersecret
  - AWS_S3_HOST=s3local
  - AWS_S3_PORT=9000
  - S3_USE_SIGV4=True

This is only applicable for pandas 0.18.X and 0.19.X since 0.20.X uses s3f. I would be willing to submit a PR for this.

@jreback
Copy link
Contributor

jreback commented Jun 11, 2017

we don't offer backports for any version before the last major one (0.20)

@jreback jreback closed this as completed Jun 11, 2017
@ivansabik
Copy link
Author

ivansabik commented Jun 11, 2017

For the record, I ended up using a workaround with s3fs along with the change introduced in fsspec/s3fs#69:

import pandas as pd
from s3fs.core import S3FileSystem

client_kwargs = {'endpoint_url': 'http://s3:9000'}
s3 = S3FileSystem(anon=False, client_kwargs=client_kwargs)
df = pd.read_csv(s3.open('s3://bucket/file.csv.gz', mode='rb'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants