-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC/TST: Update the parquet (pyarrow >= 0.15) docs and tests regarding Categorical support #28018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
61888b2
0028293
fdd2e3f
ab9b082
cfc53ae
aebfae7
1e3df80
791f164
6c6b09d
1b0a444
bf32a80
159b270
4211071
a7c414d
4a6f6ad
c02059d
bd08e16
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -174,6 +174,7 @@ Categorical | |
- Added test to assert the :func:`fillna` raises the correct ValueError message when the value isn't a value from categories (:issue:`13628`) | ||
- Bug in :meth:`Categorical.astype` where ``NaN`` values were handled incorrectly when casting to int (:issue:`28406`) | ||
- :meth:`Categorical.searchsorted` and :meth:`CategoricalIndex.searchsorted` now work on unordered categoricals also (:issue:`21667`) | ||
- Added test to assert roundtripping to parquet with :func:`DataFrame.to_parquet` or :func:`read_parquet` will preserve Categorical dtypes for string types (:issue:`27955`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this note is not really needed, but ok |
||
- | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -167,6 +167,7 @@ def compare(repeat): | |
df.to_parquet(path, **write_kwargs) | ||
with catch_warnings(record=True): | ||
actual = read_parquet(path, **read_kwargs) | ||
|
||
tm.assert_frame_equal(expected, actual, check_names=check_names) | ||
|
||
if path is None: | ||
|
@@ -461,11 +462,26 @@ def test_unsupported(self, pa): | |
def test_categorical(self, pa): | ||
|
||
# supported in >= 0.7.0 | ||
df = pd.DataFrame({"a": pd.Categorical(list("abc"))}) | ||
df = pd.DataFrame() | ||
df["a"] = pd.Categorical(list("abcdef")) | ||
|
||
# de-serialized as object | ||
expected = df.assign(a=df.a.astype(object)) | ||
check_round_trip(df, pa, expected=expected) | ||
# test for null, out-of-order values, and unobserved category | ||
df["b"] = pd.Categorical( | ||
["bar", "foo", "foo", "bar", None, "bar"], | ||
dtype=pd.CategoricalDtype(["foo", "bar", "baz"]), | ||
) | ||
|
||
# test for ordered flag | ||
df["c"] = pd.Categorical( | ||
["a", "b", "c", "a", "c", "b"], categories=["b", "c", "d"], ordered=True | ||
) | ||
|
||
if LooseVersion(pyarrow.__version__) >= LooseVersion("0.15.0"): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche this isn't released yet, right? Should we wait to merge until 0.15.0 is released? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this isn't released yet. I can assure that it runs locally for me on Arrow master (if I change the version check to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since Arrow 0.15.0 has been released, can we merge this now? |
||
check_round_trip(df, pa) | ||
else: | ||
# de-serialized as object for pyarrow < 0.15 | ||
expected = df.astype(object) | ||
check_round_trip(df, pa, expected=expected) | ||
|
||
def test_s3_roundtrip(self, df_compat, s3_resource, pa): | ||
# GH #19134 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add an ordered categorical as well?