Skip to content

Initial pandas.typing Module #25884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 30, 2019
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,18 @@ What's New in 0.25.0 (April XX, 2019)
These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog
including other versions of pandas.

Enhancements
~~~~~~~~~~~~

.. _whatsnew_0250.enhancements.typing:

Type Hints and ``pandas.typing``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would remove this entirely. if its private it is not to be relied upon.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In accordance with :pep:`484` pandas has introduced Type Hints and a new ``pandas.typing`` module containing aliases for idiomatic pandas types into the code base. We will be continually adding annotations to the code base to improve readability, reduce code maintenance and proactively identify bugs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we make this public?
(which mean: maintaining compatibility) Maybe at least not initially?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I don't see a downside to making private. Can do that on next iteration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK just made this private and updated documentation - lmk what you think


`MyPy <http://mypy-lang.org>`__ has been configured as part of our CI to perform compile-time type checking.


.. _whatsnew_0250.enhancements.other:

Expand Down
3 changes: 2 additions & 1 deletion pandas/io/gcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
mode = 'rb'

fs = gcsfs.GCSFileSystem()
filepath_or_buffer = fs.open(filepath_or_buffer, mode)
filepath_or_buffer = fs.open(
filepath_or_buffer, mode) # type: gcsfs.GCSFile
return filepath_or_buffer, None, compression, True
23 changes: 16 additions & 7 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
Index, MultiIndex, RangeIndex, ensure_index_from_sequences)
from pandas.core.series import Series
from pandas.core.tools import datetimes as tools
from pandas.typing import FilePathOrBuffer

from pandas.io.common import (
_NA_VALUES, BaseIterator, UnicodeReader, UTF8Recoder, _get_handle,
Expand Down Expand Up @@ -400,7 +401,7 @@ def _validate_names(names):
return names


def _read(filepath_or_buffer, kwds):
def _read(filepath_or_buffer: FilePathOrBuffer, kwds):
"""Generic reader of line files."""
encoding = kwds.get('encoding', None)
if encoding is not None:
Expand All @@ -409,7 +410,12 @@ def _read(filepath_or_buffer, kwds):

compression = kwds.get('compression', 'infer')
compression = _infer_compression(filepath_or_buffer, compression)
filepath_or_buffer, _, compression, should_close = get_filepath_or_buffer(

# TODO: get_filepath_or_buffer could return
# Union[FilePathOrBuffer, s3fs.S3File, gcsfs.GCSFile]
# though mypy handling of conditional imports is difficult.
# See https://github.com/python/mypy/issues/1297
fp_or_buf, _, compression, should_close = get_filepath_or_buffer(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mentioned in the comments but I changed the variable name here from filepath_or_buffer to fp_or_buf to intentionally NOT shadow the parameter from the signature.

As mentioned in the comment this local variable could potentially introduce new types for S3 and GCP and I don't think there is a great way with typing to statically analyze conditional imports like those just yet, so it's a clearer delimitation IMO to assign the return of this function to a separate variable

filepath_or_buffer, encoding, compression)
kwds['compression'] = compression

Expand All @@ -426,7 +432,7 @@ def _read(filepath_or_buffer, kwds):
_validate_names(kwds.get("names", None))

# Create the parser.
parser = TextFileReader(filepath_or_buffer, **kwds)
parser = TextFileReader(fp_or_buf, **kwds)

if chunksize or iterator:
return parser
Expand All @@ -438,7 +444,7 @@ def _read(filepath_or_buffer, kwds):

if should_close:
try:
filepath_or_buffer.close()
fp_or_buf.close()
except ValueError:
pass

Expand Down Expand Up @@ -533,7 +539,7 @@ def _make_parser_function(name, default_sep=','):
else:
sep = default_sep

def parser_f(filepath_or_buffer,
def parser_f(filepath_or_buffer: FilePathOrBuffer,
sep=sep,
delimiter=None,

Expand Down Expand Up @@ -725,8 +731,11 @@ def parser_f(filepath_or_buffer,
)(read_table)


def read_fwf(filepath_or_buffer, colspecs='infer', widths=None,
infer_nrows=100, **kwds):
def read_fwf(filepath_or_buffer: FilePathOrBuffer,
colspecs='infer',
widths=None,
infer_nrows=100,
**kwds):

r"""
Read a table of fixed-width formatted lines into DataFrame.
Expand Down
3 changes: 2 additions & 1 deletion pandas/io/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,6 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
# A NoCredentialsError is raised if you don't have creds
# for that bucket.
fs = s3fs.S3FileSystem(anon=True)
filepath_or_buffer = fs.open(_strip_schema(filepath_or_buffer), mode)
filepath_or_buffer = fs.open(
_strip_schema(filepath_or_buffer), mode) # type: s3fs.S3File
return filepath_or_buffer, None, compression, True
2 changes: 1 addition & 1 deletion pandas/tests/api/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ class TestPDApi(Base):

# top-level sub-packages
lib = ['api', 'arrays', 'compat', 'core', 'errors', 'pandas',
'plotting', 'test', 'testing', 'tseries',
'plotting', 'test', 'testing', 'tseries', 'typing',
'util', 'options', 'io']

# these are already deprecated; awaiting removal
Expand Down
4 changes: 4 additions & 0 deletions pandas/typing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from pathlib import Path
from typing import IO, AnyStr, Union

FilePathOrBuffer = Union[str, Path, IO[AnyStr]]