Skip to content

[WIP] Add remote file io using fsspec. #33549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jrderuiter
Copy link

@alimcmaster1 alimcmaster1 added the IO Data IO issues that don't fit into a more specific label label Apr 17, 2020
import fsspec
scheme = parse_url(filepath_or_buffer).scheme
filesystem = fsspec.filesystem(scheme)
file_obj = filesystem.open(filepath_or_buffer, mode=mode or "rb")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass through the encoding? to .open() You will potentially also fix this if you do so: #26124 :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using filesystem.open directly, I would for now always open binary and use the existing encoding within pandas.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you been able to test this out manually? Do things seem OK?

if is_s3_url(filepath_or_buffer):
from pandas.io import s3
if is_fsspec_url(filepath_or_buffer):
import fsspec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be import_optional_dependency("fsspec"). Make sure we have a nice error message on the failure.

return s3.get_filepath_or_buffer(
filepath_or_buffer, encoding=encoding, compression=compression, mode=mode
)
# if is_s3_url(filepath_or_buffer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete all these.


try:
from fsspec.registry import known_implementations
scheme = parse_url(url).scheme
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps split_protocol, or even just "://" in url or "::" in url

from pandas.io import s3
if is_fsspec_url(filepath_or_buffer):
import fsspec
scheme = parse_url(filepath_or_buffer).scheme
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three lines can be done with fsspec.open, except for the garbage collection issue, which should take care of encoding and compression too.

import fsspec
scheme = parse_url(filepath_or_buffer).scheme
filesystem = fsspec.filesystem(scheme)
file_obj = filesystem.open(filepath_or_buffer, mode=mode or "rb")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using filesystem.open directly, I would for now always open binary and use the existing encoding within pandas.

@martindurant
Copy link
Contributor

@jrderuiter , is there any way in which I can help here?

@martindurant
Copy link
Contributor

ping ?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to add fsspec to several of the ci files but not all and environment.yaml

to have testing and proper skipping

@@ -158,6 +158,23 @@ def urlopen(*args, **kwargs):
return urllib.request.urlopen(*args, **kwargs)


def is_fsspec_url(url) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u type url

fsspec filesystem.
"""

if not isinstance(url, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don’t need this

@TomAugspurger
Copy link
Contributor

Closing in favor of #34266.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Use fsspec for reading/writing from/to S3, GCS, Azure Blob, etc.
5 participants