Add ability to specify AWS s3 host through AWS_S3_HOST environment variable #12198

mlurie · 2016-02-01T20:49:46Z

The AWS s3 host used by read_csv currently defaults to s3.amazonaws.com and there is currently no way to change this. This pull requests adds functionality where the code checks the AWS_S3_HOST environment variable for a user-specified host and defaults back to the current s3.amazonaws.com host.

This is a simple code change, but it will allow users from around the globe to use read_csv to directly read from files in s3 when the s3 bucket is not located in the US East region.

TomAugspurger · 2016-02-01T20:57:35Z

Can this be configured through boto? I think ~/.aws/credentials can have a region key. It'd be nice to use their config system rather than trying to work around in pandas.

mlurie · 2016-02-01T21:03:34Z

Unfortunately, it can't. You can't add it as an option to the .boto configuration either. Other packages have implemented a similar work-around.

jreback · 2016-02-01T21:07:42Z

pandas/io/common.py

@@ -274,14 +274,15 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
            import boto
        except:
            raise ImportError("boto is required to handle s3 files")
-        # Assuming AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
+        # Assuming AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_S3_HOST


can we document this in some doc-strings and/or a separate section in the docs? (e.g. it may be too long for the actual doc-strings, so might be better to make a section on how to do S3 access and just point to it).

I agree, it would be nice to have S3 documented in a dedicated section. Can that be its own GH issue so that this feature/bug can be pushed through faster?

I am not very familiar with the Pandas doc-string/documentation strategy for the read_* functions, so this may be something for a core contributor to work on.

I would like to make a little section at least to document this new env variable, so can you add a simple section to the docs (which can be built out later).

jreback · 2016-02-02T15:53:18Z

@mlurie ok, thanks for this

see #12206 for adding more docs w.r.t. s3 and options that we support

add s3_host from env variables

245cc81

jreback added the IO Data IO issues that don't fit into a more specific label label Feb 1, 2016

jreback reviewed Feb 1, 2016
View reviewed changes

jreback added this to the 0.18.0 milestone Feb 2, 2016

jreback mentioned this pull request Feb 2, 2016

DOC: improve s3 access doc-strings / docs #12206

Closed

jreback closed this in 63abbe4 Feb 2, 2016

ivansabik mentioned this pull request Jun 11, 2017

ENH: Add ability to specify AWS port through AWS_S3_PORT environment variable #16662

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add ability to specify AWS s3 host through AWS_S3_HOST environment variable #12198

Add ability to specify AWS s3 host through AWS_S3_HOST environment variable #12198

Uh oh!

mlurie commented Feb 1, 2016

Uh oh!

TomAugspurger commented Feb 1, 2016

Uh oh!

mlurie commented Feb 1, 2016

Uh oh!

jreback Feb 1, 2016

Uh oh!

mlurie Feb 1, 2016

Uh oh!

jreback Feb 1, 2016

Uh oh!

jreback commented Feb 2, 2016

Uh oh!

Uh oh!

Uh oh!

Add ability to specify AWS s3 host through AWS_S3_HOST environment variable #12198

Add ability to specify AWS s3 host through AWS_S3_HOST environment variable #12198

Uh oh!

Conversation

mlurie commented Feb 1, 2016

Uh oh!

TomAugspurger commented Feb 1, 2016

Uh oh!

mlurie commented Feb 1, 2016

Uh oh!

jreback Feb 1, 2016

Choose a reason for hiding this comment

Uh oh!

mlurie Feb 1, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Feb 1, 2016

Choose a reason for hiding this comment

Uh oh!

jreback commented Feb 2, 2016

Uh oh!

Uh oh!