Skip to content

REF: share astype code in MaskedArray #38490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 21, 2020

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

cc @jorisvandenbossche i dont know if this is the optimal way to do this, but it seems like there is a lot of share-ability here.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, this is certainly one of the methods on the list of "to share" for masked arrays

@@ -229,6 +231,30 @@ def to_numpy(
data = self._data.astype(dtype, copy=copy)
return data

def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
dtype = pandas_dtype(dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we know this method is only called from the subclasses, this line is not needed (each of the subclass methods already does it as well. The type annotation also indicates we already have a dtype object)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation also indicates we already have a dtype object)

heads up "Dtype" includes string; "DtypeObj" means a dtype object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Can you then update the annotation to DtypeObj? Or actually, to ExtensionDtype? Or will mypy complain about that because the methods in the subclasses have a less strict annotation?

BTW, I would still remove this line

def astype(self, dtype: Dtype, copy: bool = True) -> ArrayLike:
dtype = pandas_dtype(dtype)

if is_dtype_equal(dtype, self.dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here could in theory actually use if self.dtype == dtype since we know we have an extension dtype, and those should never raise when being compared (avoiding yet another set of checks, not sure if it would ever be significant, though).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im fine with it either way. this branch was broken off from one that was trying to push some astype boilerplate into a decorator so used the is_dtype_equal version

data = self._data.astype(dtype.numpy_dtype, copy=copy)
# mask is copied depending on whether the data was copied, and
# not directly depending on the `copy` keyword
mask = self._mask if data is self._data else self._mask.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this code myself in the past, but I am not actually sure now this is needed. AFAIK the self._data.astype (numpy's method) will always return a copy unless dtype.numpy_dtype is the exact same type as the data. But that case should already be covered by the if is_dtype_equal(dtype, self.dtype) above.

(now since this is copying existing code, fine to leave this for another issue)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, and i think you're right about the numpy behavior

cls = dtype.construct_array_type()
return cls._from_sequence(self, dtype=dtype, copy=copy)

return super().astype(dtype, copy=copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in theory not reachable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not with the existing subclasses. this still seemed like the idiomatic thing to do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally remove it then, if it's not reachable at the moment.
Then you can also remove the if isinstance(dtype, ExtensionDtype): from the previous block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not wild about this method relying on the current behavior of the existing subclasses. better to conform to the standard patterns, even if it means a couple of extra checks that turn out to be unnecessary

Copy link
Member

@jorisvandenbossche jorisvandenbossche Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also call it _astype instead of calling super(), to make it clearer it is a shared helper method, and not an actual (full) base implementation
(and then the method could also be properly typed)

@jorisvandenbossche jorisvandenbossche added NA - MaskedArrays Related to pd.NA and nullable extension arrays Refactor Internal refactoring of code labels Dec 15, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgrm merge if ok @jorisvandenbossche

@jreback jreback added this to the 1.3 milestone Dec 16, 2020
@jreback
Copy link
Contributor

jreback commented Dec 16, 2020

Thanks! Yes, this is certainly one of the methods on the list of "to share" for masked arrays

also pls link to the issue & check the box

@jbrockmendel
Copy link
Member Author

updated per request + green

@jreback jreback merged commit 8a2942c into pandas-dev:master Dec 21, 2020
@jbrockmendel jbrockmendel deleted the ref-masked-astype branch December 22, 2020 00:11
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants