Skip to content

CI: Test pyarrow nightly instead of intermediate versions #52211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 30, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 0 additions & 11 deletions .github/actions/setup-conda/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,9 @@ inputs:
extra-specs:
description: Extra packages to install
required: false
pyarrow-version:
description: If set, overrides the PyArrow version in the Conda environment to the given string.
required: false
runs:
using: composite
steps:
- name: Set Arrow version in ${{ inputs.environment-file }} to ${{ inputs.pyarrow-version }}
run: |
grep -q ' - pyarrow' ${{ inputs.environment-file }}
sed -i"" -e "s/ - pyarrow/ - pyarrow=${{ inputs.pyarrow-version }}/" ${{ inputs.environment-file }}
cat ${{ inputs.environment-file }}
shell: bash
if: ${{ inputs.pyarrow-version }}

- name: Install ${{ inputs.environment-file }}
uses: mamba-org/provision-with-micromamba@v12
with:
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/macos-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ jobs:
uses: ./.github/actions/setup-conda
with:
environment-file: ci/deps/${{ matrix.env_file }}
pyarrow-version: ${{ matrix.os == 'macos-latest' && '9' || '' }}

- name: Build Pandas
uses: ./.github/actions/build_pandas
Expand Down
22 changes: 5 additions & 17 deletions .github/workflows/ubuntu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ jobs:
env_file: [actions-38.yaml, actions-39.yaml, actions-310.yaml, actions-311.yaml]
# Prevent the include jobs from overriding other jobs
pattern: [""]
pyarrow_version: ["8", "9", "10"]
include:
- name: "Downstream Compat"
env_file: actions-38-downstream_compat.yaml
Expand Down Expand Up @@ -76,21 +75,11 @@ jobs:
# TODO(cython3): Re-enable once next-beta(after beta 1) comes out
# There are some warnings failing the build with -werror
pandas_ci: "0"
exclude:
- env_file: actions-38.yaml
pyarrow_version: "8"
- env_file: actions-38.yaml
pyarrow_version: "9"
- env_file: actions-39.yaml
pyarrow_version: "8"
- env_file: actions-39.yaml
pyarrow_version: "9"
- env_file: actions-310.yaml
pyarrow_version: "8"
- env_file: actions-310.yaml
pyarrow_version: "9"
- name: "Pyarrow Nightly"
env_file: actions-311-pyarrownightly.yaml
pattern: "not slow and not network and not single_cpu"
fail-fast: false
name: ${{ matrix.name || format('{0} pyarrow={1} {2}', matrix.env_file, matrix.pyarrow_version, matrix.pattern) }}
name: ${{ matrix.name || matrix.env_file }}
env:
ENV_FILE: ci/deps/${{ matrix.env_file }}
PATTERN: ${{ matrix.pattern }}
Expand All @@ -108,7 +97,7 @@ jobs:
COVERAGE: ${{ !contains(matrix.env_file, 'pypy') }}
concurrency:
# https://github.community/t/concurrecy-not-work-for-push/183068/7
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-${{ matrix.env_file }}-${{ matrix.pattern }}-${{ matrix.pyarrow_version || '' }}-${{ matrix.extra_apt || '' }}-${{ matrix.pandas_data_manager || '' }}
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-${{ matrix.env_file }}-${{ matrix.pattern }}-${{ matrix.extra_apt || '' }}-${{ matrix.pandas_data_manager || '' }}
cancel-in-progress: true

services:
Expand Down Expand Up @@ -167,7 +156,6 @@ jobs:
uses: ./.github/actions/setup-conda
with:
environment-file: ${{ env.ENV_FILE }}
pyarrow-version: ${{ matrix.pyarrow_version }}

- name: Build Pandas
id: build
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-310.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ dependencies:
- psycopg2>=2.8.6
- pymysql>=1.0.2
- pytables>=3.6.1
- pyarrow
- pyarrow>=7.0.0
- pyreadstat>=1.1.2
- python-snappy>=0.6.0
- pyxlsb>=1.0.8
Expand Down
30 changes: 30 additions & 0 deletions ci/deps/actions-311-pyarrownightly.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: pandas-dev
channels:
- conda-forge
dependencies:
- python=3.11

# build dependencies
- versioneer[toml]
- cython>=0.29.33

# test dependencies
- pytest>=7.0.0
- pytest-cov
- pytest-xdist>=2.2.0
- hypothesis>=6.34.2
- pytest-asyncio>=0.17.0
- boto3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is boto3 a required dep for pyarrow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so. I think i accidentally copied this from another deps file


# required dependencies
- python-dateutil
- numpy
- pytz
- pip

- pip:
- "tzdata>=2022.1"
- "--extra-index-url https://pypi.fury.io/arrow-nightlies/"
- "--prefer-binary"
- "--pre"
- "pyarrow"
2 changes: 1 addition & 1 deletion ci/deps/actions-311.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ dependencies:
- psycopg2>=2.8.6
- pymysql>=1.0.2
# - pytables>=3.8.0 # first version that supports 3.11
- pyarrow
- pyarrow>=7.0.0
- pyreadstat>=1.1.2
- python-snappy>=0.6.0
- pyxlsb>=1.0.8
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-38-downstream_compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ dependencies:
- openpyxl<3.1.1, >=3.0.7
- odfpy>=1.4.1
- psycopg2>=2.8.6
- pyarrow
- pyarrow>=7.0.0
- pymysql>=1.0.2
- pyreadstat>=1.1.2
- pytables>=3.6.1
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-38.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ dependencies:
- odfpy>=1.4.1
- pandas-gbq>=0.15.0
- psycopg2>=2.8.6
- pyarrow
- pyarrow>=7.0.0
- pymysql>=1.0.2
- pyreadstat>=1.1.2
- pytables>=3.6.1
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-39.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ dependencies:
- pandas-gbq>=0.15.0
- psycopg2>=2.8.6
- pymysql>=1.0.2
- pyarrow
- pyarrow>=7.0.0
- pyreadstat>=1.1.2
- pytables>=3.6.1
- python-snappy>=0.6.0
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/circle-38-arm64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ dependencies:
- odfpy>=1.4.1
- pandas-gbq>=0.15.0
- psycopg2>=2.8.6
- pyarrow
- pyarrow>=7.0.0
- pymysql>=1.0.2
# Not provided on ARM
#- pyreadstat
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ dependencies:
- odfpy>=1.4.1
- py
- psycopg2>=2.8.6
- pyarrow
- pyarrow>=7.0.0
- pymysql>=1.0.2
- pyreadstat>=1.1.2
- pytables>=3.6.1
Expand Down
20 changes: 8 additions & 12 deletions pandas/io/parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,22 +92,18 @@ def _get_path_or_handle(
if fs is not None:
pa_fs = import_optional_dependency("pyarrow.fs", errors="ignore")
fsspec = import_optional_dependency("fsspec", errors="ignore")
if pa_fs is None and fsspec is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build dependency setup uncovered a bug in the recent pyarrow Filesystem implementation not raising a ValueError consistently if pyarrow was install but fastparquet wasn't

raise ValueError(
f"filesystem must be a pyarrow or fsspec FileSystem, "
f"not a {type(fs).__name__}"
)
elif (pa_fs is not None and not isinstance(fs, pa_fs.FileSystem)) and (
fsspec is not None and not isinstance(fs, fsspec.spec.AbstractFileSystem)
):
if pa_fs is not None and isinstance(fs, pa_fs.FileSystem):
if storage_options:
raise NotImplementedError(
"storage_options not supported with a pyarrow FileSystem."
)
elif fsspec is not None and isinstance(fs, fsspec.spec.AbstractFileSystem):
pass
else:
raise ValueError(
f"filesystem must be a pyarrow or fsspec FileSystem, "
f"not a {type(fs).__name__}"
)
elif pa_fs is not None and isinstance(fs, pa_fs.FileSystem) and storage_options:
raise NotImplementedError(
"storage_options not supported with a pyarrow FileSystem."
)
if is_fsspec_url(path_or_handle) and fs is None:
if storage_options is None:
pa = import_optional_dependency("pyarrow")
Expand Down
6 changes: 3 additions & 3 deletions pandas/tests/arrays/string_/test_string.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import pandas as pd
import pandas._testing as tm
from pandas.core.arrays.string_arrow import ArrowStringArray
from pandas.util.version import Version


@pytest.fixture
Expand Down Expand Up @@ -406,15 +407,14 @@ def test_fillna_args(dtype, request):
arr.fillna(value=1)


@td.skip_if_no("pyarrow")
def test_arrow_array(dtype):
# protocol added in 0.15.0
import pyarrow as pa
pa = pytest.importorskip("pyarrow")

data = pd.array(["a", "b", "c"], dtype=dtype)
arr = pa.array(data)
expected = pa.array(list(data), type=pa.string(), from_pandas=True)
if dtype.storage == "pyarrow":
if dtype.storage == "pyarrow" and Version(pa.__version__) <= Version("11.0.0"):
expected = pa.chunked_array(expected)

assert arr.equals(expected)
Expand Down
3 changes: 3 additions & 0 deletions pandas/tests/io/test_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -1019,7 +1019,10 @@ def test_read_dtype_backend_pyarrow_config_index(self, pa):
{"a": [1, 2]}, index=pd.Index([3, 4], name="test"), dtype="int64[pyarrow]"
)
expected = df.copy()
import pyarrow

if Version(pyarrow.__version__) > Version("11.0.0"):
expected.index = expected.index.astype("int64[pyarrow]")
check_round_trip(
df,
engine=pa,
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/util/test_show_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def test_show_versions_console(capsys):
assert re.search(r"numpy\s*:\s[0-9]+\..*\n", result)

# check optional dependency
assert re.search(r"pyarrow\s*:\s([0-9\.]+|None)\n", result)
assert re.search(r"pyarrow\s*:\s([0-9]+.*|None)\n", result)


def test_json_output_match(capsys, tmpdir):
Expand Down
2 changes: 1 addition & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ openpyxl<3.1.1, >=3.0.7
odfpy>=1.4.1
py
psycopg2-binary>=2.8.6
pyarrow
pyarrow>=7.0.0
pymysql>=1.0.2
pyreadstat>=1.1.2
tables>=3.6.1
Expand Down
2 changes: 1 addition & 1 deletion scripts/validate_min_versions_in_sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
YAML_PATH = pathlib.Path("ci/deps")
ENV_PATH = pathlib.Path("environment.yml")
EXCLUDE_DEPS = {"tzdata", "blosc"}
EXCLUSION_LIST = frozenset(["python=3.8[build=*_pypy]", "pyarrow"])
EXCLUSION_LIST = frozenset(["python=3.8[build=*_pypy]"])
# pandas package is not available
# in pre-commit environment
sys.path.append("pandas/compat")
Expand Down