-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Storage options #35381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage options #35381
Conversation
This PR is near complete. Question: the docstring for the kwarg should be added, but it would be repeated verbatim in every place; is this the right thing to do? (cc @TomAugspurger ) |
Yeah I think it needs to be duplicated. We do share some sharing between some readers, but not all. It'll be easier to just add the same lines to each, and add an example in one place (perhaps read_csv). |
pandas/core/frame.py
Outdated
buf: Optional[IO[str]] = None, | ||
mode: Optional[str] = None, | ||
buf: Optional[Union[IO[str], str]] = None, | ||
mode: Optional[str] = "wt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mode=None doesn't actually make sense. This is for writing, and most open commands (including python builtin) would default to "rt" or "rb".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if changing this, could also remove Optional from type annotation.
pandas/io/common.py
Outdated
@@ -234,6 +236,8 @@ def get_filepath_or_buffer( | |||
).open() | |||
|
|||
return file_obj, encoding, compression, True | |||
elif storage_options: | |||
raise ValueError("storage_options passed with non-fsspec URL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "URL" here actually a user-provided buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local file or HTTP URL; but I guess an already open file instance would end up here too, worth checking.
(fails against numpy dev with "ValueError: invalid array_struct" while reading stata) |
Added text to io.rst |
I disagree with "file-like objects must be iterable" in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we've gone back and forth on exactly what is required. I don't especially care what is required, but I'd prefer to be consistent in what we accept across readers unless there's a good reason not to be. Hopefully @gfyoung can comment here. |
isort is complaining, but passes for me locally. Can someone tell me what's wrong?
code: @pytest.fixture()
def fsspectest():
pytest.importorskip("fsspec")
from fsspec.implementations.memory import MemoryFileSystem
from fsspec import register_implementation
from fsspec.registry import _registry as registry |
Perhaps check that the version of isort matches? Also some warnings in io.rst: https://github.com/pandas-dev/pandas/pull/35381/checks?check_run_id=954318397#step:6:2333. |
@TomAugspurger you were right, newer isort solved it; also fixed the RST link. |
ping |
My only outstanding question is #35381 (comment). |
@gfyoung has not yet responded, and I would rather not require iterable files, nor change the utils code that other things might be depending on. I suppose I'm stuck for now! This particular case is only for stata, and the previous code didn't use (note that IOBase does guarantee iterability, but I'm not certain that all file-like implementations do - not that I can imagine why they wouldn't) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable a few comments
@@ -3126,6 +3156,16 @@ def to_csv( | |||
|
|||
.. versionadded:: 1.1.0 | |||
|
|||
storage_options : dict, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if there is a way to share doc-strings components for all of these i/o methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of a way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think we can do this with our shared docs infra, but out of scope for now
pandas/io/formats/csvs.py
Outdated
@@ -54,6 +54,7 @@ def __init__( | |||
doublequote: bool = True, | |||
escapechar: Optional[str] = None, | |||
decimal=".", | |||
storage_options: Optional[Dict[str, Any]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seeing as we are doing this in lots of places I would add it
@@ -1931,7 +1937,9 @@ def read_stata( | |||
|
|||
|
|||
def _open_file_binary_write( | |||
fname: FilePathOrBuffer, compression: Union[str, Mapping[str, str], None], | |||
fname: FilePathOrBuffer, | |||
compression: Union[str, Mapping[str, str], None], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also add Compression to typing.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't me, but the styling; compression as a kwarg has different types in different places (most often str, but also Optional[Union[str, Mapping[str, Any]]]
, which is similar), so I don't think there's a useful common type to be made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm interesting, we ought to fix that. can you create an issue about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain it's wrong - sometimes the args is required, but could be None, other times it's not required, but the default value can be None or str (e.g., "infer").
Mainly defining the StorageOptions type
@@ -21,7 +21,8 @@ to pass a dictionary of parameters to the storage backend. This allows, for | |||
example, for passing credentials to S3 and GCS storage. The details of what | |||
parameters can be passed to which backends can be found in the documentation | |||
of the individual storage backends (detailed from the fsspec docs for | |||
`builtin implementations`_ and linked to `external ones`_). | |||
`builtin implementations`_ and linked to `external ones`_). See |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these referenced in io.rst (more important that they are there), ok if they are here as well (but not really necessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I phrased it a bit differently (one general link, one specific instead of two specific) - I'll make the two places more similar.
lint issue |
@jreback : fixed I think. Even linting the docs... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's to avoid horizontal scroll bars in the code snippets in the HTML rendered version.
Thanks! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff