-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
AWS_S3_HOST environment variable no longer available #26195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is pretty similar to #16662 - I don't think this is something we'd generally offer support for so using the file system directly as you have in your solution is probably the way to go. @TomAugspurger in case he has differing thoughts |
+1 Support for this feature. It would be good to support customised S3 endpoint as it is widely used. I have created a pull request to support this. |
Closing because, as I gather from the discussion in #29050, this would be out-of-scope for pandas (please ping me if I've misunderstood and I'll reopen) |
You can do this now with storage_options. See how we do it in our tests: pandas/pandas/tests/io/conftest.py Lines 36 to 67 in 132e191
If anyone wants to turn that into a doc example we'd welcome it. |
take |
Hi @TomAugspurger I just wanted to clarify a few things as I am taking the issue. I was thinking of updating this in this pandas user_guide section does this look like the right place to you? Also, is the fix that we are setting the endpoint url in the s3so function or that we are explicitly defining the environment variables (endpoint_port and endpoint_uri) in the s3_base function? In addition, it is not quite clear to me why the worker_id code is duplicated. |
I think that's the right spot to document this. If I understand things, this is documented in s3fs at https://s3fs.readthedocs.io/en/latest/index.html?highlight=host#s3-compatible-storage. So we're likely just summarizing that section and linking to it. |
In the What's New - Other enhancements section from the 18.0 release it says that you can define the environment variable
AWS_S3_HOST
.Therefore, when setting this environment variable, together with
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
I expected the following code to work.The script crashes with:
This is the exact same error as when I don't specify the variable
AWS_S3_HOST
at all.I managed to get a working solution by directly interacting directly with
s3fs
:Looks like this feature got (accidentally?) removed in
https://github.com/pandas-dev/pandas/commit/dc4b0708f36b971f71890bfdf830d9a5dc019c7b#diff-a37b395bed03f0404dec864a4529c97dL94
, when they switched fromboto
tos3fs
.Is it planned to support this environment variable again?
I'm sure that many companies, e.g. like the one I work for, hosts their own s3 server, e.g. with Minio, and don't use Amazon.
The text was updated successfully, but these errors were encountered: