Skip to content

Support customised S3 servers endpoint URL #29050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions pandas/io/s3.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
""" s3 support for remote file interactivity """
from typing import IO, Any, Optional, Tuple
import os
from typing import IO, Any, Dict, Optional, Tuple
from urllib.parse import urlparse as parse_url

from pandas.compat._optional import import_optional_dependency
Expand All @@ -25,7 +26,15 @@ def get_file_and_filesystem(
if mode is None:
mode = "rb"

fs = s3fs.S3FileSystem(anon=False)
# Support customised S3 servers endpoint URL via environment variable
# The S3_ENDPOINT should be the complete URL to S3 service following
# the format: http(s)://{host}:{port}. If S3_ENDPOINT is undefined, it will
# fallback to use the default AWS S3 endpoint as determined by boto3.
s3_endpoint = os.environ.get("S3_ENDPOINT")
client_kwargs: Optional[Dict[str, str]] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this be better as a feature directly to s3fs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, here comes an awkward situation: From the point view of s3fs and boto3, they already support customised endpoint by accepting parameters endpoint_url via client_kwargs, so there is no point to add environment variable support for this parameter there, as discussed on this boto3 issue.

The problem with Pandas now is that it doesn't provide a way to pass this parameter over to s3fs. I think adding support using this environment variable will benefit many people using their own s3 servers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't using the environment variable as the default endpoint_url go in s3fs, to benefit even more people?

"endpoint_url": s3_endpoint
} if s3_endpoint else None
fs = s3fs.S3FileSystem(anon=False, client_kwargs=client_kwargs)
try:
file = fs.open(_strip_schema(filepath_or_buffer), mode)
except (FileNotFoundError, NoCredentialsError):
Expand All @@ -35,7 +44,7 @@ def get_file_and_filesystem(
# aren't valid for that bucket.
# A NoCredentialsError is raised if you don't have creds
# for that bucket.
fs = s3fs.S3FileSystem(anon=True)
fs = s3fs.S3FileSystem(anon=True, client_kwargs=client_kwargs)
file = fs.open(_strip_schema(filepath_or_buffer), mode)
return file, fs

Expand Down