-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[WIP] Add remote file io using fsspec. #33549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add remote file io using fsspec. #33549
Conversation
import fsspec | ||
scheme = parse_url(filepath_or_buffer).scheme | ||
filesystem = fsspec.filesystem(scheme) | ||
file_obj = filesystem.open(filepath_or_buffer, mode=mode or "rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pass through the encoding? to .open() You will potentially also fix this if you do so: #26124 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using filesystem.open
directly, I would for now always open binary and use the existing encoding within pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you been able to test this out manually? Do things seem OK?
if is_s3_url(filepath_or_buffer): | ||
from pandas.io import s3 | ||
if is_fsspec_url(filepath_or_buffer): | ||
import fsspec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be import_optional_dependency("fsspec")
. Make sure we have a nice error message on the failure.
return s3.get_filepath_or_buffer( | ||
filepath_or_buffer, encoding=encoding, compression=compression, mode=mode | ||
) | ||
# if is_s3_url(filepath_or_buffer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete all these.
|
||
try: | ||
from fsspec.registry import known_implementations | ||
scheme = parse_url(url).scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps split_protocol
, or even just "://" in url or "::" in url
from pandas.io import s3 | ||
if is_fsspec_url(filepath_or_buffer): | ||
import fsspec | ||
scheme = parse_url(filepath_or_buffer).scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three lines can be done with fsspec.open
, except for the garbage collection issue, which should take care of encoding and compression too.
import fsspec | ||
scheme = parse_url(filepath_or_buffer).scheme | ||
filesystem = fsspec.filesystem(scheme) | ||
file_obj = filesystem.open(filepath_or_buffer, mode=mode or "rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using filesystem.open
directly, I would for now always open binary and use the existing encoding within pandas.
@jrderuiter , is there any way in which I can help here? |
ping ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to add fsspec to several of the ci files but not all and environment.yaml
to have testing and proper skipping
@@ -158,6 +158,23 @@ def urlopen(*args, **kwargs): | |||
return urllib.request.urlopen(*args, **kwargs) | |||
|
|||
|
|||
def is_fsspec_url(url) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u type url
fsspec filesystem. | ||
""" | ||
|
||
if not isinstance(url, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don’t need this
Closing in favor of #34266. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff