Skip to content

ExtensionArray.map #23179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Oct 16, 2018 · 7 comments
Closed

ExtensionArray.map #23179

TomAugspurger opened this issue Oct 16, 2018 · 7 comments
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@TomAugspurger
Copy link
Contributor

Both Categorical and SparseArray found implementing a .map method useful. This allows them to efficiently apply a function / mapping to the categories / sp_values, rather than every element of an array. We dispatch to it internally in https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L3379-L3380

So, we need to either

  1. Add it to the interface
  2. hard-code checks for categorical or sparse dtype there.

Do people have a preference? Right now I'm leaning toward 2. Or are there other array types that would have a similar efficiency gain to Categorical or Sparse?

@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Oct 16, 2018
@jreback
Copy link
Contributor

jreback commented Oct 16, 2018

-1 on hard coding things

expanding the interface is the way forward here

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Oct 16, 2018 via email

@jbrockmendel
Copy link
Member

Trying to de-duplicate is_extension_array_dtype and is_extension_type, I'm finding that the lack of EA.map is a blocker for using is_extension_array_dtype in all cases.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Nov 7, 2019 via email

@rhshadrach
Copy link
Member

rhshadrach commented May 4, 2021

Ran into this in #39941, where map is used for categorical and sparse in apply. Here, it results in different dtype behavior than other EAs. But it seems to me that map only makes sense when any UDF can be remain in the same dtype (which I think is true for categorical and sparse). But how would one implement map for e.g. Int64 where the mapper is lambda x: 3.2 or lambda x: "a"?

Edit: I just found datetime64 also implements map which does not have the property I mentioned.

@jbrockmendel
Copy link
Member

any UDF can be remain in the same dtype (which I think is true for categorical and sparse)

Not exactly. For sparse you can make the return dtype always be sparse, but we can come up with UDFs that must have different sparse dtype. For Categorical you could pass your result to type(self)._from_sequence(result, dtype=self.dtype) and that will usually work, but thats bc it will just set any non-fitting element to nan.

@topper-123
Copy link
Contributor

Closed. ExtensionArray.map was added in #51809.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

6 participants