Skip to content

BUG: interchange categorical_column_to_series() should not accept only PandasColumn #52763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 19, 2023

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Apr 18, 2023

cc @AlenkaF


I've added this to the 2.0.1 whatsnew, and it's a trivial 1-line change (which was only there to begin with to help type checkers, it doesn't have any runtime purpose) and it unlocks a fair bit

@MarcoGorelli MarcoGorelli added the Interchange Dataframe Interchange Protocol label Apr 18, 2023
@@ -74,6 +74,22 @@ def test_categorical_dtype(data):
tm.assert_frame_equal(df, from_dataframe(df.__dataframe__()))


@td.skip_if_no("pyarrow")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use pytest.importorskip instead?

@mroeschke mroeschke added this to the 2.0.1 milestone Apr 18, 2023
@AlenkaF
Copy link

AlenkaF commented Apr 19, 2023

Thank you @MarcoGorelli! LGTM +1

@mroeschke
Copy link
Member

Just a merge conflict

@mroeschke mroeschke merged commit 2750ee2 into pandas-dev:main Apr 19, 2023
@mroeschke
Copy link
Member

Thanks @MarcoGorelli

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Apr 19, 2023
…series() should not accept only PandasColumn
mroeschke pushed a commit that referenced this pull request Apr 20, 2023
…mn_to_series() should not accept only PandasColumn) (#52793)

Backport PR #52763: BUG: interchange categorical_column_to_series() should not accept only PandasColumn

Co-authored-by: Marco Edward Gorelli <[email protected]>
# for mypy/pyright
assert isinstance(cat_column, PandasColumn), "categories must be a PandasColumn"
categories = np.array(cat_column._col)
if hasattr(cat_column, "_col"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for using this attribute? (that also seems like relying on and exposing some internal detail)
Haven't looked in detail, but my understanding is that this object is just a protocol Column object, and we already have all the code to convert that to a pandas object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, this was already there - looks like it was introduced in https://github.com/pandas-dev/pandas/pull/47886/files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know it was already there, but IMO the "proper" fix for the reported issue is to stop relying on this detail.
Because the the test you added only "happens" to work because pyarrow's Column object also uses _col to store the original array; if we would rename that internal property (and I would say it's actually a confusing name in hindsight, since it's an array ;)), the test would start failing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, agreed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interchange Dataframe Interchange Protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: interchange categorical_column_to_series() should not accept only PandasColumn
5 participants