-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC/TST: Update the parquet (pyarrow >= 0.15) docs and tests regarding Categorical support #28018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
61888b2
0028293
fdd2e3f
ab9b082
cfc53ae
aebfae7
1e3df80
791f164
6c6b09d
1b0a444
bf32a80
159b270
4211071
a7c414d
4a6f6ad
c02059d
bd08e16
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,7 +88,7 @@ Categorical | |
^^^^^^^^^^^ | ||
|
||
- Added test to assert the :func:`fillna` raises the correct ValueError message when the value isn't a value from categories (:issue:`13628`) | ||
- | ||
- Added test to assert roundtripping to parquet with :func:`to_parquet` or :func:`read_parquet` will preserve Categorical dtypes for string types (:issue:`27955`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DataFrame.to_parquet |
||
- | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
""" test parquet compat """ | ||
import datetime | ||
from distutils.version import LooseVersion | ||
import os | ||
from warnings import catch_warnings | ||
|
||
|
@@ -166,6 +167,7 @@ def compare(repeat): | |
df.to_parquet(path, **write_kwargs) | ||
with catch_warnings(record=True): | ||
actual = read_parquet(path, **read_kwargs) | ||
|
||
tm.assert_frame_equal(expected, actual, check_names=check_names) | ||
|
||
if path is None: | ||
|
@@ -453,9 +455,12 @@ def test_categorical(self, pa): | |
# supported in >= 0.7.0 | ||
df = pd.DataFrame({"a": pd.Categorical(list("abc"))}) | ||
|
||
# de-serialized as object | ||
expected = df.assign(a=df.a.astype(object)) | ||
check_round_trip(df, pa, expected=expected) | ||
if LooseVersion(pyarrow.__version__) >= LooseVersion("0.15.0"): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche this isn't released yet, right? Should we wait to merge until 0.15.0 is released? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this isn't released yet. I can assure that it runs locally for me on Arrow master (if I change the version check to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since Arrow 0.15.0 has been released, can we merge this now? |
||
check_round_trip(df, pa) | ||
else: | ||
# de-serialized as object for pyarrow < 0.15 | ||
expected = df.assign(a=df.a.astype(object)) | ||
check_round_trip(df, pa, expected=expected) | ||
|
||
def test_s3_roundtrip(self, df_compat, s3_resource, pa): | ||
# GH #19134 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true with both engines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.