Skip to content

Add ability to specify AWS s3 host through AWS_S3_HOST environment variable #12198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mlurie
Copy link

@mlurie mlurie commented Feb 1, 2016

The AWS s3 host used by read_csv currently defaults to s3.amazonaws.com and there is currently no way to change this. This pull requests adds functionality where the code checks the AWS_S3_HOST environment variable for a user-specified host and defaults back to the current s3.amazonaws.com host.

This is a simple code change, but it will allow users from around the globe to use read_csv to directly read from files in s3 when the s3 bucket is not located in the US East region.

@TomAugspurger
Copy link
Contributor

Can this be configured through boto? I think ~/.aws/credentials can have a region key. It'd be nice to use their config system rather than trying to work around in pandas.

@mlurie
Copy link
Author

mlurie commented Feb 1, 2016

Unfortunately, it can't. You can't add it as an option to the .boto configuration either. Other packages have implemented a similar work-around.

@jreback jreback added the IO Data IO issues that don't fit into a more specific label label Feb 1, 2016
@@ -274,14 +274,15 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
import boto
except:
raise ImportError("boto is required to handle s3 files")
# Assuming AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
# Assuming AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_S3_HOST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we document this in some doc-strings and/or a separate section in the docs? (e.g. it may be too long for the actual doc-strings, so might be better to make a section on how to do S3 access and just point to it).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it would be nice to have S3 documented in a dedicated section. Can that be its own GH issue so that this feature/bug can be pushed through faster?

I am not very familiar with the Pandas doc-string/documentation strategy for the read_* functions, so this may be something for a core contributor to work on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to make a little section at least to document this new env variable, so can you add a simple section to the docs (which can be built out later).

@jreback jreback added this to the 0.18.0 milestone Feb 2, 2016
@jreback jreback closed this in 63abbe4 Feb 2, 2016
@jreback
Copy link
Contributor

jreback commented Feb 2, 2016

@mlurie ok, thanks for this

see #12206 for adding more docs w.r.t. s3 and options that we support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants