Skip to content

[QST] What should ExtensionDtype.type return? #35291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shwina opened this issue Jul 15, 2020 · 2 comments
Closed

[QST] What should ExtensionDtype.type return? #35291

shwina opened this issue Jul 15, 2020 · 2 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Usage Question

Comments

@shwina
Copy link
Contributor

shwina commented Jul 15, 2020

Greetings, Pandas devs! cuDF is building out additional dtypes such as cudf.CategoricalDtype and cudf.ListDtype based on pd.ExtensionDtype, and this is one question that came up.

The documentation states:

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

However, I note that pd.CategoricalDtype for instance does not adhere to this:

In [47]: import pandas as pd

In [48]: a = pd.Series(['a', 'b'], dtype='category')

In [49]: type(a[0])
Out[49]: str

In [50]: type(a.array[0])
Out[50]: str

In [51]: isinstance(a.array, pd.api.extensions.ExtensionArray)
Out[51]: True

In [52]: isinstance(a.dtype, pd.api.extensions.ExtensionDtype)
Out[52]: True

On the other hand, NumPy defines dtype.type somewhat differently:

The type object used to instantiate a scalar of this data-type.

Would love any insights as to the appropriate return value of .type.

@shwina shwina added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jul 15, 2020
@shwina shwina changed the title [QST] What shoud ExtensionDtype.type return? [QST] What should ExtensionDtype.type return? Jul 15, 2020
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 15, 2020

The CategoricalDtype.type issue is discussed a bit at #22938 (comment). Categorical is a bit hard since it can hold anything, (including no categories, for backwards compatibility).

IIRC you're using pyarrow for ListDtype, so I would expect something like like pyarrow's ListValue or some cudf wrapper around it.

In [10]: a = pa.array([[1, 2]])

In [11]: type(a[0])
Out[11]: pyarrow.lib.ListValue

@TomAugspurger TomAugspurger added ExtensionArray Extending pandas with custom dtypes or arrays. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 15, 2020
@shwina
Copy link
Contributor Author

shwina commented Jul 16, 2020

Thanks, this answers my question. It appears that CategoricalDtype is the exception and not the rule

@shwina shwina closed this as completed Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants