Skip to content

Revert fastparquet nullable dtype support #42954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion doc/source/whatsnew/v1.3.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ Bug fixes

Other
~~~~~
- :meth:`pandas.read_parquet` now supports reading nullable dtypes with ``fastparquet`` versions above 0.7.1.
-

.. ---------------------------------------------------------------------------
Expand Down
27 changes: 10 additions & 17 deletions pandas/io/parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,20 +309,16 @@ def write(
def read(
self, path, columns=None, storage_options: StorageOptions = None, **kwargs
):
parquet_kwargs = {}
parquet_kwargs: dict[str, Any] = {}
use_nullable_dtypes = kwargs.pop("use_nullable_dtypes", False)
# Technically works with 0.7.0, but was incorrect
# so lets just require 0.7.1
if Version(self.api.__version__) >= Version("0.7.1"):
# Need to set even for use_nullable_dtypes = False,
# since our defaults differ
parquet_kwargs["pandas_nulls"] = use_nullable_dtypes
else:
if use_nullable_dtypes:
raise ValueError(
"The 'use_nullable_dtypes' argument is not supported for the "
"fastparquet engine for fastparquet versions less than 0.7.1"
)
# We are disabling nullable dtypes for fastparquet pending discussion
parquet_kwargs["pandas_nulls"] = False
if use_nullable_dtypes:
raise ValueError(
"The 'use_nullable_dtypes' argument is not supported for the "
"fastparquet engine"
)
path = stringify_path(path)
handles = None
if is_fsspec_url(path):
Expand Down Expand Up @@ -478,18 +474,15 @@ def read_parquet(

use_nullable_dtypes : bool, default False
If True, use dtypes that use ``pd.NA`` as missing value indicator
for the resulting DataFrame.
for the resulting DataFrame. (only applicable for the ``pyarrow``
engine)
As new dtypes are added that support ``pd.NA`` in the future, the
output with this option will change to use those dtypes.
Note: this is an experimental option, and behaviour (e.g. additional
support dtypes) may change without notice.

.. versionadded:: 1.2.0

.. versionchanged:: 1.3.2
``use_nullable_dtypes`` now works with the the ``fastparquet`` engine
if ``fastparquet`` is version 0.7.1 or higher.

**kwargs
Any additional kwargs are passed to the engine.

Expand Down
11 changes: 6 additions & 5 deletions pandas/tests/io/test_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -600,18 +600,18 @@ def test_use_nullable_dtypes(self, engine):
import pyarrow.parquet as pq

if engine == "fastparquet":
pytest.importorskip(
"fastparquet",
"0.7.1",
reason="fastparquet must be 0.7.1 or higher for nullable dtype support",
)
# We are manually disabling fastparquet's
# nullable dtype support pending discussion
pytest.skip("Fastparquet nullable dtype support is disabled")

table = pyarrow.table(
{
"a": pyarrow.array([1, 2, 3, None], "int64"),
"b": pyarrow.array([1, 2, 3, None], "uint8"),
"c": pyarrow.array(["a", "b", "c", None]),
"d": pyarrow.array([True, False, True, None]),
# Test that nullable dtypes used even in absence of nulls
"e": pyarrow.array([1, 2, 3, 4], "int64"),
}
)
with tm.ensure_clean() as path:
Expand All @@ -627,6 +627,7 @@ def test_use_nullable_dtypes(self, engine):
"b": pd.array([1, 2, 3, None], dtype="UInt8"),
"c": pd.array(["a", "b", "c", None], dtype="string"),
"d": pd.array([True, False, True, None], dtype="boolean"),
"e": pd.array([1, 2, 3, 4], dtype="Int64"),
}
)
if engine == "fastparquet":
Expand Down