-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TST: add test to read empty array #43459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: add test to read empty array #43459
Conversation
nakatomotoi
commented
Sep 8, 2021
- closes BUG: New param [use_nullable_dtypes] of pd.read_parquet() can't handle empty parquet file #41241
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping when updated and green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add back the original test you add for all types? It doesn't need to include the nullable types since they are added in the other test that uses use_nullable_dtypes
pandas/tests/io/test_parquet.py
Outdated
@@ -931,6 +931,18 @@ def test_read_parquet_manager(self, pa, using_array_manager): | |||
else: | |||
assert isinstance(result._mgr, pd.core.internals.BlockManager) | |||
|
|||
@pytest.mark.parametrize( | |||
"dtype", ["Int64", "UInt8", "boolean", "object", "datetime64[ns, UTC]"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add more types here, float, int, period[D], category, Float64, string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
I get "AssertionError" when I add "category", is "category" necessary as a test case?
self = <pandas.tests.io.test_parquet.TestParquetPyArrow object at 0x7f020ec01640>, pa = 'pyarrow', dtype = 'category'
@pytest.mark.parametrize(
"dtype",
[
"Int64",
"UInt8",
"boolean",
"object",
"datetime64[ns, UTC]",
"float",
"int",
"period[D]",
"category",
"Float64",
"string",
],
)
def test_read_empty_array(self, pa, dtype):
# GH #41241
df = pd.DataFrame(
{
"value": pd.array([], dtype=dtype),
}
)
> check_round_trip(df, pa)
pandas/tests/io/test_parquet.py:957:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/tests/io/test_parquet.py:221: in check_round_trip
compare(repeat)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
repeat = 2
def compare(repeat):
for _ in range(repeat):
df.to_parquet(path, **write_kwargs)
with catch_warnings(record=True):
actual = read_parquet(path, **read_kwargs)
> tm.assert_frame_equal(
expected,
actual,
check_names=check_names,
check_like=check_like,
check_dtype=check_dtype,
)
E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="value") are different
E
E Attribute "dtype" are different
E [left]: CategoricalDtype(categories=[], ordered=False)
E [right]: object
pandas/tests/io/test_parquet.py:211: AssertionError
pandas/tests/io/test_parquet.py
Outdated
"value": pd.array([], dtype=dtype), | ||
} | ||
) | ||
check_round_trip(df, pa) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add read_kwars={'use_nullable_types': True}
pandas/tests/io/test_parquet.py
Outdated
} | ||
) | ||
check_round_trip(df, pa) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this test near here:
def test_use_nullable_dtypes(self, engine):
import pyarrow.parquet as pq
if engine == "fastparquet":
# We are manually disabling fastparquet's
# nullable dtype support pending discussion
pytest.skip("Fastparquet nullable dtype support is disabled")
and in fact structure it similarly
@jreback |
you can merge master and see if that fixes, but sometimes this does timeout unrelated to your change |
thanks @nakatomotoi very nice. if you would like to do a PR with an empty category would be great as well (llikley just need to construct it properly) |