Skip to content

Storage options #35381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Aug 10, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1649,8 +1649,10 @@ options include:
Specifying any of the above options will produce a ``ParserWarning`` unless the
python engine is selected explicitly using ``engine='python'``.

Reading remote files
''''''''''''''''''''
.. _io.remote:

Reading/writing remote files
''''''''''''''''''''''''''''

You can pass in a URL to read or write remote files to many of Pandas' IO
functions - the following example shows reading a CSV file:
Expand Down Expand Up @@ -1686,6 +1688,8 @@ You can also pass parameters directly to the backend driver. For example,
if you do *not* have S3 credentials, you can still access public data by
specifying an anonymous connection, such as

.. versionadded:: 1.2.0

.. code-block:: python

pd.read_csv("s3://ncei-wcsd-archive/data/processed/SH1305/18kHz/SaKe2013"
Expand All @@ -1696,6 +1700,8 @@ specifying an anonymous connection, such as
archives, local caching of files, and more. To locally cache the above
example, you would modify the call to

.. code-block:: python

pd.read_csv("simplecache::s3://ncei-wcsd-archive/data/processed/SH1305/18kHz/SaKe2013"
"-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv",
storage_options={"s3": {"anon": True}})
Expand Down
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ to pass a dictionary of parameters to the storage backend. This allows, for
example, for passing credentials to S3 and GCS storage. The details of what
parameters can be passed to which backends can be found in the documentation
of the individual storage backends (detailed from the fsspec docs for
`builtin implementations`_ and linked to `external ones`_).
`builtin implementations`_ and linked to `external ones`_). See
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these referenced in io.rst (more important that they are there), ok if they are here as well (but not really necessary)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I phrased it a bit differently (one general link, one specific instead of two specific) - I'll make the two places more similar.

Section :ref:`io.remote`.

.. _builtin implementations: https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations
.. _external ones: https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations
Expand Down
3 changes: 3 additions & 0 deletions pandas/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,3 +106,6 @@
List[AggFuncTypeBase],
Dict[Label, Union[AggFuncTypeBase, List[AggFuncTypeBase]]],
]

# for arbitrary kwargs passed during reading/writing files
StorageOptions = Optional[Dict[str, Any]]
7 changes: 4 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
Label,
Level,
Renamer,
StorageOptions,
ValueKeyFunc,
)
from pandas.compat import PY37
Expand Down Expand Up @@ -2056,7 +2057,7 @@ def to_stata(
version: Optional[int] = 114,
convert_strl: Optional[Sequence[Label]] = None,
compression: Union[str, Mapping[str, str], None] = "infer",
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
) -> None:
"""
Export DataFrame object to Stata dta format.
Expand Down Expand Up @@ -2259,7 +2260,7 @@ def to_markdown(
buf: Optional[Union[IO[str], str]] = None,
mode: str = "wt",
index: bool = True,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
**kwargs,
) -> Optional[str]:
if "showindex" in kwargs:
Expand Down Expand Up @@ -2295,7 +2296,7 @@ def to_parquet(
compression: Optional[str] = "snappy",
index: Optional[bool] = None,
partition_cols: Optional[List[str]] = None,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
**kwargs,
) -> None:
"""
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
Label,
Level,
Renamer,
StorageOptions,
TimedeltaConvertibleTypes,
TimestampConvertibleTypes,
ValueKeyFunc,
Expand Down Expand Up @@ -2042,7 +2043,7 @@ def to_json(
compression: Optional[str] = "infer",
index: bool_t = True,
indent: Optional[int] = None,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
) -> Optional[str]:
"""
Convert the object to a JSON string.
Expand Down Expand Up @@ -2629,7 +2630,7 @@ def to_pickle(
path,
compression: Optional[str] = "infer",
protocol: int = pickle.HIGHEST_PROTOCOL,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
) -> None:
"""
Pickle (serialize) object to file.
Expand Down Expand Up @@ -3044,7 +3045,7 @@ def to_csv(
escapechar: Optional[str] = None,
decimal: Optional[str] = ".",
errors: str = "strict",
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
) -> Optional[str]:
r"""
Write object to a comma-separated values (csv) file.
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
TYPE_CHECKING,
Any,
Callable,
Dict,
Iterable,
List,
Optional,
Expand All @@ -32,6 +31,7 @@
FrameOrSeriesUnion,
IndexKeyFunc,
Label,
StorageOptions,
ValueKeyFunc,
)
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -1425,7 +1425,7 @@ def to_markdown(
buf: Optional[IO[str]] = None,
mode: str = "wt",
index: bool = True,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
**kwargs,
) -> Optional[str]:
"""
Expand Down
4 changes: 2 additions & 2 deletions pandas/io/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
)
import zipfile

from pandas._typing import FilePathOrBuffer
from pandas._typing import FilePathOrBuffer, StorageOptions
from pandas.compat import _get_lzma_file, _import_lzma
from pandas.compat._optional import import_optional_dependency

Expand Down Expand Up @@ -162,7 +162,7 @@ def get_filepath_or_buffer(
encoding: Optional[str] = None,
compression: Optional[str] = None,
mode: Optional[str] = None,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):
"""
If the filepath_or_buffer is a url, translate and return the buffer.
Expand Down
6 changes: 3 additions & 3 deletions pandas/io/formats/csvs.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
import csv as csvlib
from io import StringIO, TextIOWrapper
import os
from typing import Any, Dict, Hashable, List, Mapping, Optional, Sequence, Union
from typing import Hashable, List, Mapping, Optional, Sequence, Union
import warnings

import numpy as np

from pandas._libs import writers as libwriters
from pandas._typing import FilePathOrBuffer
from pandas._typing import FilePathOrBuffer, StorageOptions

from pandas.core.dtypes.generic import (
ABCDatetimeIndex,
Expand Down Expand Up @@ -53,7 +53,7 @@ def __init__(
doublequote: bool = True,
escapechar: Optional[str] = None,
decimal=".",
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):
self.obj = obj

Expand Down
8 changes: 4 additions & 4 deletions pandas/io/json/_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
from io import BytesIO, StringIO
from itertools import islice
import os
from typing import Any, Callable, Dict, Optional, Type
from typing import Any, Callable, Optional, Type

import numpy as np

import pandas._libs.json as json
from pandas._libs.tslibs import iNaT
from pandas._typing import JSONSerializable
from pandas._typing import JSONSerializable, StorageOptions
from pandas.errors import AbstractMethodError
from pandas.util._decorators import deprecate_kwarg, deprecate_nonkeyword_arguments

Expand Down Expand Up @@ -44,7 +44,7 @@ def to_json(
compression: Optional[str] = "infer",
index: bool = True,
indent: int = 0,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):

if not index and orient not in ["split", "table"]:
Expand Down Expand Up @@ -371,7 +371,7 @@ def read_json(
chunksize: Optional[int] = None,
compression="infer",
nrows: Optional[int] = None,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):
"""
Convert a JSON string to pandas object.
Expand Down
20 changes: 6 additions & 14 deletions pandas/io/parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from typing import Any, AnyStr, Dict, List, Optional
from warnings import catch_warnings

from pandas._typing import FilePathOrBuffer
from pandas._typing import FilePathOrBuffer, StorageOptions
from pandas.compat._optional import import_optional_dependency
from pandas.errors import AbstractMethodError

Expand Down Expand Up @@ -89,7 +89,7 @@ def write(
path: FilePathOrBuffer[AnyStr],
compression: Optional[str] = "snappy",
index: Optional[bool] = None,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
partition_cols: Optional[List[str]] = None,
**kwargs,
):
Expand Down Expand Up @@ -128,11 +128,7 @@ def write(
self.api.parquet.write_table(table, path, compression=compression, **kwargs)

def read(
self,
path,
columns=None,
storage_options: Optional[Dict[str, Any]] = None,
**kwargs,
self, path, columns=None, storage_options: StorageOptions = None, **kwargs,
):
if is_fsspec_url(path) and "filesystem" not in kwargs:
import_optional_dependency("fsspec")
Expand Down Expand Up @@ -178,7 +174,7 @@ def write(
compression="snappy",
index=None,
partition_cols=None,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
**kwargs,
):
self.validate_dataframe(df)
Expand Down Expand Up @@ -222,11 +218,7 @@ def write(
)

def read(
self,
path,
columns=None,
storage_options: Optional[Dict[str, Any]] = None,
**kwargs,
self, path, columns=None, storage_options: StorageOptions = None, **kwargs,
):
if is_fsspec_url(path):
fsspec = import_optional_dependency("fsspec")
Expand All @@ -248,7 +240,7 @@ def to_parquet(
engine: str = "auto",
compression: Optional[str] = "snappy",
index: Optional[bool] = None,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
partition_cols: Optional[List[str]] = None,
**kwargs,
):
Expand Down
8 changes: 4 additions & 4 deletions pandas/io/pickle.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
""" pickle compat """
import pickle
from typing import Any, Dict, Optional
from typing import Any, Optional
import warnings

from pandas._typing import FilePathOrBuffer
from pandas._typing import FilePathOrBuffer, StorageOptions
from pandas.compat import pickle_compat as pc

from pandas.io.common import get_filepath_or_buffer, get_handle
Expand All @@ -14,7 +14,7 @@ def to_pickle(
filepath_or_buffer: FilePathOrBuffer,
compression: Optional[str] = "infer",
protocol: int = pickle.HIGHEST_PROTOCOL,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):
"""
Pickle (serialize) object to file.
Expand Down Expand Up @@ -113,7 +113,7 @@ def to_pickle(
def read_pickle(
filepath_or_buffer: FilePathOrBuffer,
compression: Optional[str] = "infer",
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):
"""
Load pickled pandas object (or any object) from file.
Expand Down
14 changes: 7 additions & 7 deletions pandas/io/stata.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@

from pandas._libs.lib import infer_dtype
from pandas._libs.writers import max_len_string_array
from pandas._typing import FilePathOrBuffer, Label
from pandas._typing import FilePathOrBuffer, Label, StorageOptions
from pandas.util._decorators import Appender

from pandas.core.dtypes.common import (
Expand Down Expand Up @@ -1035,7 +1035,7 @@ def __init__(
columns: Optional[Sequence[str]] = None,
order_categoricals: bool = True,
chunksize: Optional[int] = None,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):
super().__init__()
self.col_sizes: List[int] = []
Expand Down Expand Up @@ -1910,7 +1910,7 @@ def read_stata(
order_categoricals: bool = True,
chunksize: Optional[int] = None,
iterator: bool = False,
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
) -> Union[DataFrame, StataReader]:

reader = StataReader(
Expand Down Expand Up @@ -1939,7 +1939,7 @@ def read_stata(
def _open_file_binary_write(
fname: FilePathOrBuffer,
compression: Union[str, Mapping[str, str], None],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also add Compression to typing.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't me, but the styling; compression as a kwarg has different types in different places (most often str, but also Optional[Union[str, Mapping[str, Any]]], which is similar), so I don't think there's a useful common type to be made.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm interesting, we ought to fix that. can you create an issue about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain it's wrong - sometimes the args is required, but could be None, other times it's not required, but the default value can be None or str (e.g., "infer").

storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
) -> Tuple[BinaryIO, bool, Optional[Union[str, Mapping[str, str]]]]:
"""
Open a binary file or no-op if file-like.
Expand Down Expand Up @@ -2238,7 +2238,7 @@ def __init__(
data_label: Optional[str] = None,
variable_labels: Optional[Dict[Label, str]] = None,
compression: Union[str, Mapping[str, str], None] = "infer",
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):
super().__init__()
self._convert_dates = {} if convert_dates is None else convert_dates
Expand Down Expand Up @@ -3121,7 +3121,7 @@ def __init__(
variable_labels: Optional[Dict[Label, str]] = None,
convert_strl: Optional[Sequence[Label]] = None,
compression: Union[str, Mapping[str, str], None] = "infer",
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):
# Copy to new list since convert_strl might be modified later
self._convert_strl: List[Label] = []
Expand Down Expand Up @@ -3526,7 +3526,7 @@ def __init__(
convert_strl: Optional[Sequence[Label]] = None,
version: Optional[int] = None,
compression: Union[str, Mapping[str, str], None] = "infer",
storage_options: Optional[Dict[str, Any]] = None,
storage_options: StorageOptions = None,
):
if version is None:
version = 118 if data.shape[1] <= 32767 else 119
Expand Down