Skip to content

Fix read parquet import error message #33361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 8, 2020
25 changes: 13 additions & 12 deletions pandas/io/parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,20 +18,23 @@ def get_engine(engine: str) -> "BaseImpl":

if engine == "auto":
# try engines in this order
try:
return PyArrowImpl()
except ImportError:
pass
engine_classes = [PyArrowImpl, FastParquetImpl]

try:
return FastParquetImpl()
except ImportError:
pass
error_msgs = ""
for engine_class in engine_classes:
try:
return engine_class()
except ImportError as err:
error_msgs += "\n - " + str(err)

raise ImportError(
"Unable to find a usable engine; "
"tried using: 'pyarrow', 'fastparquet'.\n"
"pyarrow or fastparquet is required for parquet support"
"A suitable version of "
"pyarrow or fastparquet is required for parquet "
"support.\n"
"Trying to import the above resulted in these errors:"
f"{error_msgs}"
)

if engine == "pyarrow":
Expand Down Expand Up @@ -105,9 +108,7 @@ def write(
**kwargs,
)
else:
self.api.parquet.write_table(
table, path, compression=compression, **kwargs,
)
self.api.parquet.write_table(table, path, compression=compression, **kwargs)

def read(self, path, columns=None, **kwargs):
path, _, _, should_close = get_filepath_or_buffer(path)
Expand Down
44 changes: 44 additions & 0 deletions pandas/tests/io/test_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
except ImportError:
_HAVE_FASTPARQUET = False


pytestmark = pytest.mark.filterwarnings(
"ignore:RangeIndex.* is deprecated:DeprecationWarning"
)
Expand Down Expand Up @@ -223,6 +224,49 @@ def test_options_get_engine(fp, pa):
assert isinstance(get_engine("fastparquet"), FastParquetImpl)


def test_get_engine_auto_error_message():
# Expect different error messages from get_engine(engine="auto")
# if engines aren't installed vs. are installed but bad version
from pandas.compat._optional import VERSIONS

# Do we have engines installed, but a bad version of them?
pa_min_ver = VERSIONS.get("pyarrow")
fp_min_ver = VERSIONS.get("fastparquet")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're repeating the same twice for pyarrow and fastparquet (which makes sense). But it'd probably be worth to just implement things once, and parametrize (with pytest).

You can search for @pytest.mark.parametrize, and you'll find lots of examples of parametrized tests. The idea is that the test will receive a set of variables for each of pyarrow and fastparquet, and pytest will execute it twice with each set of variables.

Copy link
Contributor Author

@jfcorbett jfcorbett Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about parametrizing, but in this case it's tricky: the error message will only ever show up if both pyarrow and fastparquet are inadequately installed (i.e. not installed, or bad version installed). This is embodied in the conditional:

if not have_usable_pa and not have_usable_fp:

The and'ed conditions can't be decoupled. So we'd still need both the lines of code you highlight above.

The only thing that could be de-duplicated with parametrization, is the contents of the if block. But even that doesn't feel quite right; it's just two aspects of the same situation.

Maybe the best thing to do if we want to be absolutely strict, is to take all the boolean flagging that is currently in these two sections

    # Do we have engines installed, but a bad version of them?
[...]
    # Do we have usable engines installed?

and move it outside of the test function, to module level. And use pytest.mark.skipif (or one of those wacky fixtures that inject pytest.mark.skip) to only run the test when we expect an error message.

At this point, I'm going to ask: how important is it that we do it this way? Because I'm slowly running out of steam for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is kind of overkill , but ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, didn't realize you need to know about both versions at the same time.

I think you could write this in a very compact way, if we create the _HAVE_USABLE_PYARROW... variables at the beginning of the file, like _HAVE_PYARROW. Then the parametrization would be trivial. But not that important.

Thanks for the work on this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'm happy to come back to it later if someone sees this as adding value. Though I tend to agree with @jreback that we're bordering overkill. Anyhoo, I'm glad to have this closed before everything drops out of mental cache over Easter.
Cheers all for the good input!!

have_pa_bad_version = (
False
if not _HAVE_PYARROW
else LooseVersion(pyarrow.__version__) < LooseVersion(pa_min_ver)
)
have_fp_bad_version = (
False
if not _HAVE_FASTPARQUET
else LooseVersion(fastparquet.__version__) < LooseVersion(fp_min_ver)
)
# Do we have usable engines installed?
have_usable_pa = _HAVE_PYARROW and not have_pa_bad_version
have_usable_fp = _HAVE_FASTPARQUET and not have_fp_bad_version

if not have_usable_pa and not have_usable_fp:
# No usable engines found.
if have_pa_bad_version:
match = f"Pandas requires version .{pa_min_ver}. or newer of .pyarrow."
with pytest.raises(ImportError, match=match):
get_engine("auto")
else:
match = "Missing optional dependency .pyarrow."
with pytest.raises(ImportError, match=match):
get_engine("auto")

if have_fp_bad_version:
match = f"Pandas requires version .{fp_min_ver}. or newer of .fastparquet."
with pytest.raises(ImportError, match=match):
get_engine("auto")
else:
match = "Missing optional dependency .fastparquet."
with pytest.raises(ImportError, match=match):
get_engine("auto")


def test_cross_engine_pa_fp(df_cross_compat, pa, fp):
# cross-compat with differing reading/writing engines

Expand Down