Skip to content

CI: Test pyarrow nightly instead of intermediate versions #52211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 30, 2023

Conversation

mroeschke
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@mroeschke mroeschke added CI Continuous Integration Arrow pyarrow functionality labels Mar 25, 2023
@mroeschke mroeschke marked this pull request as ready for review March 29, 2023 21:40
@mroeschke mroeschke requested a review from lithomas1 March 30, 2023 00:23
@mroeschke mroeschke added this to the 2.0 milestone Mar 30, 2023
- pytest-xdist>=2.2.0
- hypothesis>=6.34.2
- pytest-asyncio>=0.17.0
- boto3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is boto3 a required dep for pyarrow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so. I think i accidentally copied this from another deps file

@@ -92,22 +92,18 @@ def _get_path_or_handle(
if fs is not None:
pa_fs = import_optional_dependency("pyarrow.fs", errors="ignore")
fsspec = import_optional_dependency("fsspec", errors="ignore")
if pa_fs is None and fsspec is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build dependency setup uncovered a bug in the recent pyarrow Filesystem implementation not raising a ValueError consistently if pyarrow was install but fastparquet wasn't

@mroeschke mroeschke merged commit 4a2c06c into pandas-dev:main Mar 30, 2023
@mroeschke mroeschke deleted the ci/pyarrow branch March 30, 2023 23:52
@lumberbot-app
Copy link

lumberbot-app bot commented Mar 30, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.0.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 4a2c06c8a5e4b12f7850b834eb10f1fa1f302f92
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #52211: CI: Test pyarrow nightly instead of intermediate versions'
  1. Push to a named branch:
git push YOURFORK 2.0.x:auto-backport-of-pr-52211-on-2.0.x
  1. Create a PR against branch 2.0.x, I would have named this PR:

"Backport PR #52211 on branch 2.0.x (CI: Test pyarrow nightly instead of intermediate versions)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

mroeschke added a commit to mroeschke/pandas that referenced this pull request Mar 30, 2023
mroeschke added a commit that referenced this pull request Mar 31, 2023
…of intermediate versions) (#52313)

Backport PR #52211: CI: Test pyarrow nightly instead of intermediate versions
@mroeschke mroeschke mentioned this pull request Mar 31, 2023
5 tasks
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Apr 1, 2023
…#52211)

* CI: Test pyarrow nightly instead of intermediate versions

* Change format

* Pin, remove hardcoded channel

* Try pip

* Fix some tests

* Address more tests

* Fix test condition

* Fix another condidition

* Cleanup name

* Remove boto3
@jorisvandenbossche
Copy link
Member

Do we have an issue to add back testing with multiple versions of pyarrow? I understand this PR reduced the CI load, but now we have several supported versions of pyarrow that we are not testing at all.

And it actually seems that in practice, something else in the environment is causing a constraint on older pyarrow, so all our standard builds are using pyarrow 9.0.0. Which means we are not even testing the latest released pyarrow version at the moment?

We have standard env files for python 3.9, 3.10 and 3.11 at the moment, we could easily test three different pyarrow versions in those? (xref #51308 (comment))

@jorisvandenbossche
Copy link
Member

Testing our actions-311.yaml env file locally, and then trying to upgrade pyarrow causes a downgrade of pandas-gbq, so the bump of the minimum version of this might have caused to test with an older version of pyarrow.

@mroeschke
Copy link
Member Author

We don't have an open issue to discuss adding back testing with multiple version, but it would be good to have one to discuss. Yeah maybe we could have jobs that tests the required dependencies + pyarrow 8, 9, 10, ...

@lithomas1
Copy link
Member

Do we have an issue to add back testing with multiple versions of pyarrow? I understand this PR reduced the CI load, but now we have several supported versions of pyarrow that we are not testing at all.

Are there issues with pyarrow versions in the middle of our support range breaking?

FWIW, for numpy we only test the min, the latest, and numpy dev, and it seems to work fine.

Although the build matrix has been reduced by fusing jobs together, the individual jobs themselves take longer now (~1hr per job), so I'm still a little worried about the runners clogging up (We support a wide range of pyarrow rn, which is pyarrow 7-12 I think).

And it actually seems that in practice, something else in the environment is causing a constraint on older pyarrow, so all our standard builds are using pyarrow 9.0.0. Which means we are not even testing the latest released pyarrow version at the moment?

This seems like an issue.

We could try adding a "clean" job without any dependencies, that would have pyarrow in it (this depends on PDEP-10 being accepted).

In this case though, we are probably just pulling a very old version of pandas-gbq (mamba bug?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants