Skip to content

CI: unpin pyarrow, fix failing test #51175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 10, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/setup-conda/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ runs:
- name: Set Arrow version in ${{ inputs.environment-file }} to ${{ inputs.pyarrow-version }}
run: |
grep -q ' - pyarrow' ${{ inputs.environment-file }}
sed -i"" -e "s/ - pyarrow<11/ - pyarrow=${{ inputs.pyarrow-version }}/" ${{ inputs.environment-file }}
sed -i"" -e "s/ - pyarrow/ - pyarrow=${{ inputs.pyarrow-version }}/" ${{ inputs.environment-file }}
cat ${{ inputs.environment-file }}
shell: bash
if: ${{ inputs.pyarrow-version }}
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-310.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ dependencies:
- psycopg2
- pymysql
- pytables
- pyarrow<11
- pyarrow
- pyreadstat
- python-snappy
- pyxlsb
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-311.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ dependencies:
- psycopg2
- pymysql
# - pytables>=3.8.0 # first version that supports 3.11
- pyarrow<11
- pyarrow
- pyreadstat
- python-snappy
- pyxlsb
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-38-downstream_compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ dependencies:
- openpyxl
- odfpy
- psycopg2
- pyarrow<11
- pyarrow
- pymysql
- pyreadstat
- pytables
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-38.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ dependencies:
- odfpy
- pandas-gbq
- psycopg2
- pyarrow<11
- pyarrow
- pymysql
- pyreadstat
- pytables
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-39.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ dependencies:
- pandas-gbq
- psycopg2
- pymysql
- pyarrow<11
- pyarrow
- pyreadstat
- pytables
- python-snappy
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/circle-38-arm64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ dependencies:
- odfpy
- pandas-gbq
- psycopg2
- pyarrow<11
- pyarrow
- pymysql
# Not provided on ARM
#- pyreadstat
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ dependencies:
- odfpy
- py
- psycopg2
- pyarrow<11
- pyarrow
- pymysql
- pyreadstat
- pytables
Expand Down
2 changes: 2 additions & 0 deletions pandas/compat/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
pa_version_under7p0,
pa_version_under8p0,
pa_version_under9p0,
pa_version_under11p0,
)


Expand Down Expand Up @@ -161,6 +162,7 @@ def get_lzma_file() -> type[pandas.compat.compressors.LZMAFile]:
"pa_version_under7p0",
"pa_version_under8p0",
"pa_version_under9p0",
"pa_version_under11p0",
"IS64",
"PY39",
"PY310",
Expand Down
2 changes: 2 additions & 0 deletions pandas/compat/pyarrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@
pa_version_under8p0 = _palv < Version("8.0.0")
pa_version_under9p0 = _palv < Version("9.0.0")
pa_version_under10p0 = _palv < Version("10.0.0")
pa_version_under11p0 = _palv < Version("11.0.0")
except ImportError:
pa_version_under6p0 = True
pa_version_under7p0 = True
pa_version_under8p0 = True
pa_version_under9p0 = True
pa_version_under10p0 = True
pa_version_under11p0 = True
14 changes: 12 additions & 2 deletions pandas/core/arrays/arrow/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,9 +258,19 @@ def _from_sequence_of_strings(

scalars = to_datetime(strings, errors="raise").date
elif pa.types.is_duration(pa_type):
from pandas.core.tools.timedeltas import to_timedelta
try:
# GH51175: test_from_sequence_of_strings_pa_array
# attempt to parse as int64 reflecting pyarrow's
# duration to string casting behavior
if isinstance(strings, (pa.Array, pa.ChunkedArray)):
scalars = strings
else:
scalars = pa.array(strings, type=pa.string(), from_pandas=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO (slightly) I think we should still prefer to use to_timedelta first if a non-pyarrow scalars was passed over the cast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using to_timedelta first would be incorrect in some cases. For instance, if passed "1000 seconds" as

ArrowExtensionArray._from_sequence_of_strings(["1000"], dtype=pa.duration("s"))

pd.to_timedelta(["1000"]) is interpreted as 1000 ns (to_timedelta does not allow specifying unit with a string input and assumes nanoseconds).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, the example provided above errors on main due to assuming ns units:

import pyarrow as pa

ArrowExtensionArray._from_sequence_of_strings(["1000"], dtype=pa.duration("s"))

# ArrowInvalid: Casting from duration[ns] to duration[s] would lose data: 1000

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW there's a corner case with iNaT that pyarrow will handle differently from to_timedelta

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point re: NaT. I made some changes which now call to_timedelta first and only attempts the int conversion when the pyarrow unit is non-nano. It respects the null mask from to_timedelta so I think the NaT edge case is handled. I added a test for it.

It all feels a bit messy, but I don't see another option ATM. Open to suggestions.

scalars = scalars.cast(pa.int64())
except pa.ArrowInvalid:
from pandas.core.tools.timedeltas import to_timedelta

scalars = to_timedelta(strings, errors="raise")
scalars = to_timedelta(strings, errors="raise")
elif pa.types.is_time(pa_type):
from pandas.core.tools.times import to_time

Expand Down
4 changes: 3 additions & 1 deletion pandas/core/tools/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,9 @@ def _convert_listlike(
# returning arg (errors == "ignore"), and where the input is a
# generator, we return a useful list-like instead of a
# used-up generator
arg = np.array(list(arg), dtype=object)
if not hasattr(arg, "__array__"):
arg = list(arg)
arg = np.array(arg, dtype=object)

try:
td64arr = sequence_to_td64ns(arg, unit=unit, errors=errors, copy=False)[0]
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/extension/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
pa_version_under7p0,
pa_version_under8p0,
pa_version_under9p0,
pa_version_under11p0,
)
from pandas.errors import PerformanceWarning

Expand Down Expand Up @@ -295,7 +296,7 @@ def test_from_sequence_of_strings_pa_array(self, data, request):
reason="Nanosecond time parsing not supported.",
)
)
elif pa.types.is_duration(pa_dtype):
elif pa_version_under11p0 and pa.types.is_duration(pa_dtype):
request.node.add_marker(
pytest.mark.xfail(
raises=pa.ArrowNotImplementedError,
Expand Down
2 changes: 1 addition & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ openpyxl
odfpy
py
psycopg2-binary
pyarrow<11
pyarrow
pymysql
pyreadstat
tables
Expand Down