REF: share astype code in MaskedArray #38490

jbrockmendel · 2020-12-15T04:36:05Z

cc @jorisvandenbossche i dont know if this is the optimal way to do this, but it seems like there is a lot of share-ability here.

jorisvandenbossche

Thanks! Yes, this is certainly one of the methods on the list of "to share" for masked arrays

pandas/core/arrays/boolean.py

jorisvandenbossche · 2020-12-15T08:26:56Z

pandas/core/arrays/masked.py

@@ -229,6 +231,30 @@ def to_numpy(
            data = self._data.astype(dtype, copy=copy)
        return data

+    def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
+        dtype = pandas_dtype(dtype)


If we know this method is only called from the subclasses, this line is not needed (each of the subclass methods already does it as well. The type annotation also indicates we already have a dtype object)

The type annotation also indicates we already have a dtype object)

heads up "Dtype" includes string; "DtypeObj" means a dtype object

Ah, yes. Can you then update the annotation to DtypeObj? Or actually, to ExtensionDtype? Or will mypy complain about that because the methods in the subclasses have a less strict annotation?

BTW, I would still remove this line

jorisvandenbossche · 2020-12-15T08:28:58Z

pandas/core/arrays/masked.py

+    def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
+        dtype = pandas_dtype(dtype)
+
+        if is_dtype_equal(dtype, self.dtype):


here could in theory actually use if self.dtype == dtype since we know we have an extension dtype, and those should never raise when being compared (avoiding yet another set of checks, not sure if it would ever be significant, though).

im fine with it either way. this branch was broken off from one that was trying to push some astype boilerplate into a decorator so used the is_dtype_equal version

jorisvandenbossche · 2020-12-15T08:34:22Z

pandas/core/arrays/masked.py

+            data = self._data.astype(dtype.numpy_dtype, copy=copy)
+            # mask is copied depending on whether the data was copied, and
+            # not directly depending on the `copy` keyword
+            mask = self._mask if data is self._data else self._mask.copy()


I wrote this code myself in the past, but I am not actually sure now this is needed. AFAIK the self._data.astype (numpy's method) will always return a copy unless dtype.numpy_dtype is the exact same type as the data. But that case should already be covered by the if is_dtype_equal(dtype, self.dtype) above.

(now since this is copying existing code, fine to leave this for another issue)

makes sense, and i think you're right about the numpy behavior

jorisvandenbossche · 2020-12-15T08:37:03Z

pandas/core/arrays/masked.py

+            cls = dtype.construct_array_type()
+            return cls._from_sequence(self, dtype=dtype, copy=copy)
+
+        return super().astype(dtype, copy=copy)


this is in theory not reachable?

not with the existing subclasses. this still seemed like the idiomatic thing to do

I would personally remove it then, if it's not reachable at the moment.
Then you can also remove the if isinstance(dtype, ExtensionDtype): from the previous block.

im not wild about this method relying on the current behavior of the existing subclasses. better to conform to the standard patterns, even if it means a couple of extra checks that turn out to be unnecessary

Could also call it _astype instead of calling super(), to make it clearer it is a shared helper method, and not an actual (full) base implementation
(and then the method could also be properly typed)

jreback

lgrm merge if ok @jorisvandenbossche

jreback · 2020-12-16T00:49:33Z

Thanks! Yes, this is certainly one of the methods on the list of "to share" for masked arrays

also pls link to the issue & check the box

jbrockmendel · 2020-12-19T19:41:18Z

updated per request + green

REF: share astype code in MaskedArray

1efca14

jorisvandenbossche reviewed Dec 15, 2020

View reviewed changes

jorisvandenbossche added NA - MaskedArrays Related to pd.NA and nullable extension arrays Refactor Internal refactoring of code labels Dec 15, 2020

mypy fixup

8ad627b

jreback approved these changes Dec 16, 2020

View reviewed changes

jreback added this to the 1.3 milestone Dec 16, 2020

super -> NotImplementedError

856ec2c

jreback merged commit 8a2942c into pandas-dev:master Dec 21, 2020

jbrockmendel deleted the ref-masked-astype branch December 22, 2020 00:11

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

REF: share astype code in MaskedArray (pandas-dev#38490)

b1220ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: share astype code in MaskedArray #38490

REF: share astype code in MaskedArray #38490

jbrockmendel commented Dec 15, 2020

jorisvandenbossche left a comment

jorisvandenbossche Dec 15, 2020

jbrockmendel Dec 15, 2020

jorisvandenbossche Dec 16, 2020

jorisvandenbossche Dec 15, 2020

jbrockmendel Dec 15, 2020

jorisvandenbossche Dec 15, 2020

jbrockmendel Dec 15, 2020

jorisvandenbossche Dec 15, 2020

jbrockmendel Dec 15, 2020

jorisvandenbossche Dec 16, 2020

jbrockmendel Dec 16, 2020

jorisvandenbossche Dec 16, 2020 •

edited

Loading

jreback left a comment

jreback commented Dec 16, 2020

jbrockmendel commented Dec 19, 2020

REF: share astype code in MaskedArray #38490

REF: share astype code in MaskedArray #38490

Conversation

jbrockmendel commented Dec 15, 2020

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Dec 16, 2020 • edited Loading

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback commented Dec 16, 2020

jbrockmendel commented Dec 19, 2020

jorisvandenbossche Dec 16, 2020 •

edited

Loading