TST: Fix test_parquet failures for pyarrow 1.0 #35814

alimcmaster1 · 2020-08-20T01:31:11Z

closes BUG: failing test for pyarrow/s3fs on windows #35791
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff

cc @martindurant -> looks like this is a pyarrow 1.0.0 compat issue (read_table uses the new API) - https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html

I noticed the partition cols are casted from int64 -> int32 is that expected pyarrow behaviour? From the write_table docs looking at version 1.0/2.0 it suggests it is https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table

Fix tests for pyarrow 1.0.0 Revert "Add new core members" This reverts commit 7ef7c12

martindurant · 2020-08-20T13:15:09Z

May be best asking @jorisvandenbossche about intended behaviour

jorisvandenbossche · 2020-08-20T14:08:56Z

@alimcmaster1 thanks for looking into this!

Yes, so with pyarrow 1.0 and using the new datasets implementation, we kept the default for category type if your partition field is string, but not for integers (and indeed also the default now is int32, not int64)

So I suppose out other roundtrip tests are using string partition fields (since this test is the only one that is failing)

jorisvandenbossche · 2020-08-20T14:11:47Z

So I suppose out other roundtrip tests are using string partition fields (since this test is the only one that is failing)

Actually it doesn't see we have other full roundtrip tests that use partitioning .. (only tests checking that it is written correctly)

jorisvandenbossche · 2020-08-20T14:12:18Z

pandas/tests/io/test_parquet.py

+        expected_df = df_compat.copy()
+
+        # read_table uses the new Arrow Datasets API since pyarrow 1.0.0
+        # Previous behaviour was pyarrow partitioned columns become 'categorical' dtypes


so only for integer columns the behaviour changed, for string columns it is still using category type

alimcmaster1 · 2020-08-20T23:00:17Z

@alimcmaster1 thanks for looking into this!

Yes, so with pyarrow 1.0 and using the new datasets implementation, we kept the default for category type if your partition field is string, but not for integers (and indeed also the default now is int32, not int64)

So I suppose out other roundtrip tests are using string partition fields (since this test is the only one that is failing)

Gotcha makes sense - thanks for the info!

jreback · 2020-08-21T21:14:30Z

@alimcmaster1 can you rebase. I marked for 1.1.2 i think is ok (not sure if it is failing on that).

jreback · 2020-08-24T14:23:04Z

@jorisvandenbossche lgtm.

jorisvandenbossche · 2020-08-25T07:00:12Z

@meeseeksdev backport to 1.1.x

lumberbot-app · 2020-08-25T07:00:41Z

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

Checkout backport branch and update it.

$ git checkout 1.1.x
$ git pull

Cherry pick the first parent branch of the this PR on top of the older branch:

$ git cherry-pick -m1 d3d74c590e2578988a2be48d786ddafa89f91454

You will likely have some merge/cherry-pick conflict here, fix them and commit:

$ git commit -am 'Backport PR #35814: TST: Fix test_parquet failures for pyarrow 1.0'

Push to a named branch :

git push YOURFORK 1.1.x:auto-backport-of-pr-35814-on-1.1.x

Create a PR against branch 1.1.x, I would have named this PR:

"Backport PR #35814 on branch 1.1.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

Co-authored-by: Ali McMaster <[email protected]>

Fix arrow tests

613d51b

Fix tests for pyarrow 1.0.0 Revert "Add new core members" This reverts commit 7ef7c12

alimcmaster1 added Testing pandas testing functions or related to the test suite IO Parquet parquet, feather labels Aug 20, 2020

jorisvandenbossche reviewed Aug 20, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into mcmali-arrow

dd05114

alimcmaster1 changed the title ~~WIP TST: Fix test_parquet failures for pyarrow 1.0~~ TST: Fix test_parquet failures for pyarrow 1.0 Aug 20, 2020

jreback added this to the 1.2 milestone Aug 21, 2020

jreback modified the milestones: 1.2, 1.1.2 Aug 21, 2020

alimcmaster1 added 2 commits August 22, 2020 17:51

Merge remote-tracking branch 'upstream/master' into mcmali-arrow

4acc797

Remove typo

2484e19

jorisvandenbossche approved these changes Aug 25, 2020

View reviewed changes

jorisvandenbossche merged commit d3d74c5 into pandas-dev:master Aug 25, 2020

lumberbot-app bot added the Still Needs Manual Backport label Aug 25, 2020

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Aug 25, 2020

TST: Fix test_parquet failures for pyarrow 1.0 (pandas-dev#35814)

aa49947

jorisvandenbossche mentioned this pull request Aug 25, 2020

Backport PR #35814: TST: Fix test_parquet failures for pyarrow 1.0 #35887

Merged

jorisvandenbossche removed the Still Needs Manual Backport label Aug 25, 2020

simonjayhawkins mentioned this pull request Aug 25, 2020

BUG: failing test for pyarrow/s3fs on windows #35791

Closed

3 tasks

simonjayhawkins pushed a commit that referenced this pull request Aug 25, 2020

TST: Fix test_parquet failures for pyarrow 1.0 (#35814) (#35887)

54ea0cd

Co-authored-by: Ali McMaster <[email protected]>

jorisvandenbossche mentioned this pull request Oct 21, 2020

TST: correct parquet test expected partition column dtype for pyarrow 2.0 #37304

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST: Fix test_parquet failures for pyarrow 1.0 #35814

TST: Fix test_parquet failures for pyarrow 1.0 #35814

Uh oh!

alimcmaster1 commented Aug 20, 2020

Uh oh!

martindurant commented Aug 20, 2020

Uh oh!

jorisvandenbossche commented Aug 20, 2020

Uh oh!

jorisvandenbossche commented Aug 20, 2020

Uh oh!

jorisvandenbossche Aug 20, 2020

Uh oh!

alimcmaster1 commented Aug 20, 2020

Uh oh!

jreback commented Aug 21, 2020

Uh oh!

jreback commented Aug 24, 2020

Uh oh!

jorisvandenbossche commented Aug 25, 2020

Uh oh!

lumberbot-app bot commented Aug 25, 2020

Uh oh!

Uh oh!

Uh oh!

TST: Fix test_parquet failures for pyarrow 1.0 #35814

TST: Fix test_parquet failures for pyarrow 1.0 #35814

Uh oh!

Conversation

alimcmaster1 commented Aug 20, 2020

Uh oh!

martindurant commented Aug 20, 2020

Uh oh!

jorisvandenbossche commented Aug 20, 2020

Uh oh!

jorisvandenbossche commented Aug 20, 2020

Uh oh!

jorisvandenbossche Aug 20, 2020

Choose a reason for hiding this comment

Uh oh!

alimcmaster1 commented Aug 20, 2020

Uh oh!

jreback commented Aug 21, 2020

Uh oh!

jreback commented Aug 24, 2020

Uh oh!

jorisvandenbossche commented Aug 25, 2020

Uh oh!

lumberbot-app bot commented Aug 25, 2020

Uh oh!

Uh oh!