BUG: avoid attribute error with pyarrow >=0.16.0 and <1.0.0 #38803

ADraginda · 2020-12-30T10:47:20Z

Problem: The minimum pyarrow listed as an optional dependency is 0.15.1. Pyarrow added the compute module that is imported in the change here with pyarrow 0.16.0 but attributes imported by pandas are not available in that module until 1.0.0 thus anyone using pyarrow [0.16.0, 1.0.0) will get an Attribute error.

This PR adds as try/expect around accessing the attributes such that they are only accessed if available (i.e. pyarrow >=1.0.0)

closes BUG: pandas 1.2.0 and Pyarrow [0.16.0, 1.0.0) are incompatible for some column types #38801
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback

we should just raise the min for string_arrow to 1.0 rather than do this

we do have multiple builds testing these versions of pyarrow, what exactly is failing to cause this change?

cc @jorisvandenbossche @xhochy @simonjayhawkins

simonjayhawkins · 2020-12-30T13:51:43Z

Thanks @ADraginda for the report and the PR.

to having testing for this, and prevent future issues, we should pin one of the ci environments to the >=0.16.0 and <1.0.0 range.

indeed, creation of the ARROW_CMP_FUNCS was moved to the module level, #35259 (comment), but was initially guarded with a version check (in object creation).

the try/except is OK, or we could duplicate the version check or refactor the version check out to avoid duplication.

jorisvandenbossche · 2020-12-30T13:53:02Z

@ADraginda Thanks for the PR! Your analysis sounds correct, and this seems a good fix for it.

Now, I am also curious how you ran into it, because in principle we should only import this file (at least in the current version of pandas) if you are explicitly importing it, since it's not yet being used internally or publicly exposed.

(but your fix is useful anyway, because once we will start exposing it publicly, it shouldn't error like that when having pyarrow 0.16 installed)

simonjayhawkins · 2020-12-30T13:56:57Z

Now, I am also curious how you ran into it, because in principle we should only import this file (at least in the current version of pandas) if you are explicitly importing it

we have special casing in pandas/core/arrays/base.py that imports ArrowStringDtype

jorisvandenbossche · 2020-12-30T13:58:33Z

Ah, indeed, I forgot about that one (so runtime, I was only thinking about on import)

jorisvandenbossche · 2020-12-30T14:03:11Z

to having testing for this, and prevent future issues, we should pin one of the ci environments to the >=0.16.0 and <1.0.0 range.

The actions-37-locale.yaml now has >=0.17 (which results in 1.0.1 currently, but that version is pinned elsewhere), so maybe can change this to a pin to 0.17 (=0.17):

pandas/ci/deps/actions-37-locale.yaml

Line 33 in 94810d1

- pyarrow>=0.17

jreback · 2020-12-30T14:07:11Z

pandas/core/arrays/string_arrow.py

-            "le": pc.less_equal,
-            "ge": pc.greater_equal,
-        }
+        # pyarrow 0.16.0 adds a compute module (thus the above compute import


rather than a try/except why don't we just simply check the version at this point. this nested try/except is very messy. or way better is just put a min of 1.0.0 on this functionaility.

We discussed that on the original PR. The version check already happens elsewhere when actually trying to create an array (and already uses a minimum version of 1.0.0), so it's not needed to do a version check here.

I remember, and obviously this is fragile. so this check needs to happen here as well.

still -1 on this soln. i would just check that we are >= 1.0 and call it a day.

ADraginda · 2020-12-31T09:02:03Z

@jorisvandenbossche Thanks for the review! I added the pin from >= to ==

@simonjayhawkins I'd prefer to punt on a refactor in favor of passing off to someone who knows this code better if that's ok.

@jreback I'm hoping to avoid a 1.0.0 minimum version if at all possible.
You can reproduce with pyarrow 0.17.1 (that's what I'm on, but breaks for other versions too) with the following example. Seems to be related to merging tow tables, one on a column of type string and the other of type object.

import pandas as pd

example_dict = {'i': {0: 'foo'}, 'j': {0: 1}}

foo = pd.DataFrame(example_dict)
foo['i'] = foo['i'].astype(pd.StringDtype.name)
foo['j'] = foo['j'].astype(pd.Int64Dtype.name)

bar = pd.DataFrame(example_dict)
bar['j'] = bar['j'].astype(pd.Int64Dtype.name)

foo.merge(bar)

jorisvandenbossche · 2020-12-31T09:10:44Z

Hmm, apparently my suggestion of pinning to 0.17 in that environment was not as simple as that .. as it gives an unsolvable environment. Will do a check locally of what might work.

ci/deps/actions-37-locale.yaml

simonjayhawkins · 2020-12-31T10:51:31Z

@jorisvandenbossche

=========================== short test summary info ============================
FAILED pandas/tests/io/test_user_agent.py::test_server_and_default_headers[ParquetPyArrowUserAgentResponder-read_parquet-34268-pyarrow]
FAILED pandas/tests/io/test_user_agent.py::test_server_and_custom_headers[JSONUserAgentResponder-read_json-34264-None]
FAILED pandas/tests/io/test_user_agent.py::test_server_and_custom_headers[ParquetPyArrowUserAgentResponder-read_parquet-34270-pyarrow]
FAILED pandas/tests/io/test_user_agent.py::test_server_and_custom_headers[GzippedJSONUserAgentResponder-read_json-34266-None]
= 4 failed, 136679 passed, 14311 skipped, 1054 xfailed, 63 xpassed in 1148.83s (0:19:08) =

jorisvandenbossche · 2020-12-31T11:53:55Z

Hmm the build we changed gives a different failure

FAILED pandas/tests/io/test_parquet.py::TestParquetPyArrow::test_filter_row_groups

I think that test can be skipped for pyarrow 0.17, it's complaining about file-like object not yet being implemented.

jorisvandenbossche · 2020-12-31T11:55:58Z

I think the other failures on Azure are unrelated (it's not impacted by this PR), probably a flaky test that can be addressed elsewhere. Let's restart the build

jreback · 2020-12-31T19:49:56Z

@jorisvandenbossche Thanks for the review! I added the pin from >= to ==

@simonjayhawkins I'd prefer to punt on a refactor in favor of passing off to someone who knows this code better if that's ok.

@jreback I'm hoping to avoid a 1.0.0 minimum version if at all possible.
You can reproduce with pyarrow 0.17.1 (that's what I'm on, but breaks for other versions too) with the following example. Seems to be related to merging tow tables, one on a column of type string and the other of type object.
import pandas as pd

example_dict = {'i': {0: 'foo'}, 'j': {0: 1}}

foo = pd.DataFrame(example_dict)
foo['i'] = foo['i'].astype(pd.StringDtype.name)
foo['j'] = foo['j'].astype(pd.Int64Dtype.name)

bar = pd.DataFrame(example_dict)
bar['j'] = bar['j'].astype(pd.Int64Dtype.name)

foo.merge(bar)

ok pls add this as a test.

I don't know why anyone is trying to use StringArray before 1.0 but we need a test for this in any event.

put in tests/reshape/merge somewhere

ADraginda · 2020-12-31T22:13:08Z

@jreback good point. The root of it all is actually just this call:

from pandas.core.arrays.string_arrow import ArrowStringDtype

So a test for the merge fn feels a bit incorrect for a unit test to me. Moreover, this import is already in the tests so is covered I think?

Tests: I fixed a failing test that had the wrong minimum version of pyarrow for what the test wanted to accomplish (test_filter_row_groups in test_parquet)

ADraginda · 2020-12-31T23:10:59Z

pandas/tests/io/test_parquet.py

@@ -896,7 +896,7 @@ def test_timezone_aware_index(self, pa, timezone_aware_date_list):
        # this use-case sets the resolution to 1 minute
        check_round_trip(df, pa, check_dtype=False)

-    @td.skip_if_no("pyarrow", min_version="0.17")
+    @td.skip_if_no("pyarrow", min_version="1.0.0")


this mode of read_parquet was not available until 1.0.0

pandas/core/arrays/string_arrow.py

jreback · 2021-01-01T20:56:09Z

pandas/core/arrays/string_arrow.py

-            "le": pc.less_equal,
-            "ge": pc.greater_equal,
-        }
+        # pyarrow 0.16.0 adds a compute module (thus the above compute import


still -1 on this soln. i would just check that we are >= 1.0 and call it a day.

jorisvandenbossche · 2021-01-01T21:01:23Z

So a test for the merge fn feels a bit incorrect for a unit test to me. Moreover, this import is already in the tests so is covered I think?

Yes, this is already covered by the StringArray astype tests. We only didn't catch it, because we didn't have any CI build with this specific version of pyarrow. So no need to add an additional test.

jreback

thanks @ADraginda can you add a whatsnew note for 1.2.1

can you put in bug fixes I/O section. ping on green.

update pyarrow Update ci/deps/actions-37-locale.yaml Co-authored-by: Joris Van den Bossche <[email protected]> fixup fixup fixup

jorisvandenbossche

Thanks for the updates!

simonjayhawkins · 2021-01-05T13:05:27Z

@meeseeksdev backport 1.2.x

… >=0.16.0 and <1.0.0

simonjayhawkins · 2021-01-05T13:09:20Z

can you put in bug fixes I/O section. ping on green.

for recent patch releases we have tended not to split into sub-sections, just regressions, bug-fixes and maybe other.

jorisvandenbossche · 2021-01-05T13:23:44Z

(yep, since there were already others, I thought we have to clean it up anyway, so not worth doing another round of changes here)

…and <1.0.0 (#38971) Co-authored-by: Ada Draginda <[email protected]>

…andas-dev#38803) Co-authored-by: Joris Van den Bossche <[email protected]>

jreback requested changes Dec 30, 2020

View reviewed changes

jreback added Dependencies Required and optional dependencies ExtensionArray Extending pandas with custom dtypes or arrays. Strings String extension data type and string data labels Dec 30, 2020

jorisvandenbossche added this to the 1.2.1 milestone Dec 30, 2020

jorisvandenbossche approved these changes Dec 30, 2020

View reviewed changes

jreback reviewed Dec 30, 2020

View reviewed changes

ADraginda force-pushed the pyarrow_fix branch from 4ddfa93 to 71c4cf1 Compare December 31, 2020 07:48

jorisvandenbossche reviewed Dec 31, 2020

View reviewed changes

ci/deps/actions-37-locale.yaml Outdated Show resolved Hide resolved

ADraginda force-pushed the pyarrow_fix branch from a376809 to 15f9833 Compare December 31, 2020 22:12

ADraginda commented Dec 31, 2020

View reviewed changes

jreback reviewed Dec 31, 2020

View reviewed changes

pandas/core/arrays/string_arrow.py Outdated Show resolved Hide resolved

jreback requested changes Jan 1, 2021

View reviewed changes

ADraginda force-pushed the pyarrow_fix branch from 15f9833 to b9a0795 Compare January 2, 2021 06:53

ADraginda requested a review from jreback January 4, 2021 20:15

jreback approved these changes Jan 5, 2021

View reviewed changes

jreback requested changes Jan 5, 2021

View reviewed changes

BUG: GH38801 avoid attribute error with pyarrow >=0.16.0 and <1.0.0

dcc2d8a

update pyarrow Update ci/deps/actions-37-locale.yaml Co-authored-by: Joris Van den Bossche <[email protected]> fixup fixup fixup

ADraginda force-pushed the pyarrow_fix branch from b9a0795 to dcc2d8a Compare January 5, 2021 06:39

jorisvandenbossche approved these changes Jan 5, 2021

View reviewed changes

jorisvandenbossche changed the title ~~BUG: GH38801 avoid attribute error with pyarrow >=0.16.0 and <1.0.0~~ BUG: avoid attribute error with pyarrow >=0.16.0 and <1.0.0 Jan 5, 2021

jorisvandenbossche merged commit 6168f99 into pandas-dev:master Jan 5, 2021

meeseeksmachine mentioned this pull request Jan 5, 2021

Backport PR #38803 on branch 1.2.x (BUG: avoid attribute error with pyarrow >=0.16.0 and <1.0.0) #38971

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 5, 2021

Backport PR pandas-dev#38803: BUG: avoid attribute error with pyarrow…

95b8fd8

… >=0.16.0 and <1.0.0

jreback pushed a commit that referenced this pull request Jan 5, 2021

Backport PR #38803: BUG: avoid attribute error with pyarrow >=0.16.0 …

6cdb4e7

…and <1.0.0 (#38971) Co-authored-by: Ada Draginda <[email protected]>

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

BUG: GH38801 avoid attribute error with pyarrow >=0.16.0 and <1.0.0 (p…

25c9565

…andas-dev#38803) Co-authored-by: Joris Van den Bossche <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: avoid attribute error with pyarrow >=0.16.0 and <1.0.0 #38803

BUG: avoid attribute error with pyarrow >=0.16.0 and <1.0.0 #38803

ADraginda commented Dec 30, 2020 •

edited

Loading

jreback left a comment

simonjayhawkins commented Dec 30, 2020

jorisvandenbossche commented Dec 30, 2020

simonjayhawkins commented Dec 30, 2020

jorisvandenbossche commented Dec 30, 2020

jorisvandenbossche commented Dec 30, 2020

jreback Dec 30, 2020

jorisvandenbossche Dec 30, 2020

jreback Dec 30, 2020

jreback Jan 1, 2021

ADraginda commented Dec 31, 2020

jorisvandenbossche commented Dec 31, 2020

simonjayhawkins commented Dec 31, 2020

jorisvandenbossche commented Dec 31, 2020

jorisvandenbossche commented Dec 31, 2020

jreback commented Dec 31, 2020

ADraginda commented Dec 31, 2020

ADraginda Dec 31, 2020

jreback Jan 1, 2021

jorisvandenbossche commented Jan 1, 2021

jreback left a comment

jorisvandenbossche left a comment

simonjayhawkins commented Jan 5, 2021

simonjayhawkins commented Jan 5, 2021

jorisvandenbossche commented Jan 5, 2021

BUG: avoid attribute error with pyarrow >=0.16.0 and <1.0.0 #38803

BUG: avoid attribute error with pyarrow >=0.16.0 and <1.0.0 #38803

Conversation

ADraginda commented Dec 30, 2020 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

simonjayhawkins commented Dec 30, 2020

jorisvandenbossche commented Dec 30, 2020

simonjayhawkins commented Dec 30, 2020

jorisvandenbossche commented Dec 30, 2020

jorisvandenbossche commented Dec 30, 2020

jreback Dec 30, 2020

Choose a reason for hiding this comment

jorisvandenbossche Dec 30, 2020

Choose a reason for hiding this comment

jreback Dec 30, 2020

Choose a reason for hiding this comment

jreback Jan 1, 2021

Choose a reason for hiding this comment

ADraginda commented Dec 31, 2020

jorisvandenbossche commented Dec 31, 2020

simonjayhawkins commented Dec 31, 2020

jorisvandenbossche commented Dec 31, 2020

jorisvandenbossche commented Dec 31, 2020

jreback commented Dec 31, 2020

ADraginda commented Dec 31, 2020

ADraginda Dec 31, 2020

Choose a reason for hiding this comment

jreback Jan 1, 2021

Choose a reason for hiding this comment

jorisvandenbossche commented Jan 1, 2021

jreback left a comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

simonjayhawkins commented Jan 5, 2021

simonjayhawkins commented Jan 5, 2021

jorisvandenbossche commented Jan 5, 2021

ADraginda commented Dec 30, 2020 •

edited

Loading