ExtensionArray.map #23179

TomAugspurger · 2018-10-16T11:21:04Z

Both Categorical and SparseArray found implementing a .map method useful. This allows them to efficiently apply a function / mapping to the categories / sp_values, rather than every element of an array. We dispatch to it internally in https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L3379-L3380

So, we need to either

Add it to the interface
hard-code checks for categorical or sparse dtype there.

Do people have a preference? Right now I'm leaning toward 2. Or are there other array types that would have a similar efficiency gain to Categorical or Sparse?

The text was updated successfully, but these errors were encountered:

jreback · 2018-10-16T12:13:32Z

-1 on hard coding things

expanding the interface is the way forward here

TomAugspurger · 2018-10-16T12:23:50Z

-1 on expanding things needlessly though. I’d rather wait for a compelling use case to come along.

…

________________________________ From: Jeff Reback <[email protected]> Sent: Tuesday, October 16, 2018 7:13:42 AM To: pandas-dev/pandas Cc: Tom Augspurger; Author Subject: Re: [pandas-dev/pandas] ExtensionArray.map (#23179)

-1 on hard coding things expanding the interface is the way forward here — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#23179 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIlJ6xk5ybMZul4tdigJ8bd5o1sP4ks5ulc12gaJpZM4XeIt8>.

jbrockmendel · 2019-11-07T00:31:49Z

Trying to de-duplicate is_extension_array_dtype and is_extension_type, I'm finding that the lack of EA.map is a blocker for using is_extension_array_dtype in all cases.

TomAugspurger · 2019-11-07T01:27:58Z

FWIW, I don't think deduplicating is_extension_array_dtype and is_extension_dtype important enough to warrant adding a new method to the API.

…

On Wed, Nov 6, 2019 at 7:31 PM jbrockmendel ***@***.***> wrote: Trying to de-duplicate is_extension_array_dtype and is_extension_type, I'm finding that the lack of EA.map is a blocker for using is_extension_array_dtype in all cases. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#23179?email_source=notifications&email_token=AAKAOITWXM6223AUIINOSHLQSNOX3A5CNFSM4F3YRN6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIPG4Q#issuecomment-550564722>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIUF4HT534VRBJWHHMTQSNOX3ANCNFSM4F3YRN6A> .

rhshadrach · 2021-05-04T03:38:28Z

Ran into this in #39941, where map is used for categorical and sparse in apply. Here, it results in different dtype behavior than other EAs. But it seems to me that map only makes sense when any UDF can be remain in the same dtype (which I think is true for categorical and sparse). But how would one implement map for e.g. Int64 where the mapper is lambda x: 3.2 or lambda x: "a"?

Edit: I just found datetime64 also implements map which does not have the property I mentioned.

jbrockmendel · 2021-05-04T23:26:48Z

any UDF can be remain in the same dtype (which I think is true for categorical and sparse)

Not exactly. For sparse you can make the return dtype always be sparse, but we can come up with UDFs that must have different sparse dtype. For Categorical you could pass your result to type(self)._from_sequence(result, dtype=self.dtype) and that will usually work, but thats bc it will just set any non-fitting element to nan.

topper-123 · 2023-03-14T18:29:31Z

Closed. ExtensionArray.map was added in #51809.

TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Oct 16, 2018

mroeschke mentioned this issue Aug 23, 2019

Cannot map Timestamp.isocalendar on tz-aware timestamp series, gives ValueError: MultiIndex has no single backing array #28092

Closed

jbrockmendel mentioned this issue Nov 7, 2019

DEPR: is_extension_type #29457

Merged

jbrockmendel mentioned this issue Mar 10, 2020

EA: revisit interface #32586

Closed

mroeschke added the Enhancement label Jun 23, 2021

MichaelTiemannOSC mentioned this issue Jan 4, 2023

Support for PintArray-preserving .map() function? hgrecco/pint-pandas#161

Closed

jbrockmendel mentioned this issue Mar 7, 2023

REF: Add ExtensionArray.map #51809

Merged

1 task

topper-123 closed this as completed Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExtensionArray.map #23179

ExtensionArray.map #23179

TomAugspurger commented Oct 16, 2018

jreback commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018 via email

jbrockmendel commented Nov 7, 2019

TomAugspurger commented Nov 7, 2019 via email

rhshadrach commented May 4, 2021 •

edited

Loading

jbrockmendel commented May 4, 2021

topper-123 commented Mar 14, 2023

ExtensionArray.map #23179

ExtensionArray.map #23179

Comments

TomAugspurger commented Oct 16, 2018

jreback commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018 via email

jbrockmendel commented Nov 7, 2019

TomAugspurger commented Nov 7, 2019 via email

rhshadrach commented May 4, 2021 • edited Loading

jbrockmendel commented May 4, 2021

topper-123 commented Mar 14, 2023

rhshadrach commented May 4, 2021 •

edited

Loading