Skip to content

REF: ArrowEA _data->_pa_array #50987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Mar 10, 2023
Merged

Conversation

jbrockmendel
Copy link
Member

Totally subjective: I prefer a distinctive/informative name. This makes it easy to e.g. grep for all the places where we access this attribute.

@mroeschke mroeschke added Refactor Internal refactoring of code Arrow pyarrow functionality labels Jan 26, 2023
@jbrockmendel
Copy link
Member Author

@mroeschke if you're not on board we can close this, otherwise ill rebase

@mroeschke
Copy link
Member

Curious, would this make it harder for ArrowExtensionArray to try to use masked array methods since it calls self._data a lot?

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Feb 2, 2023

Curious, would this make it harder for ArrowExtensionArray to try to use masked array methods since it calls self._data a lot?

It would if we did that, but they don't share any code...

Also the ._data attributes are different types, which i would think is exactly the kind of thing its nice to have clear names for

@mroeschke
Copy link
Member

I was thinking of, for example, utilizing the base masked engine work in the short term to have better indexing support which I think utilizes ._data.

But overall it's not that big of a deal, I am fine changing it to _pa_array

@jbrockmendel
Copy link
Member Author

@mroeschke any objection to merging before more merge conflicts pile up?

@simonjayhawkins
Copy link
Member

But overall it's not that big of a deal, I am fine changing it to _pa_array

yep. Go ahead and change if you feel it's an improvement.

@jbrockmendel
Copy link
Member Author

no objections but also no enthusiasm. im on the fence about self-merging.

@jbrockmendel
Copy link
Member Author

Just fixed another merge conflict. im planning to merge on green unless someone objects.

@simonjayhawkins simonjayhawkins added this to the 2.1 milestone Mar 9, 2023
@simonjayhawkins simonjayhawkins merged commit 5d04432 into pandas-dev:main Mar 10, 2023
@simonjayhawkins
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the ref-pa-name branch March 10, 2023 15:17
return state

def __setstate__(self, state) -> None:
state["_data"] = pa.chunked_array(state["_data"])
state["_pa_array"] = pa.chunked_array(state["_data"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I guess not changing the _data key here and above as well was done because of backwards compatibility? This caused a bug, see dask/dask#10036

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, looks like we'll need to check for both _data and _pa_array until older pickles are dropped

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have time to take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants