Skip to content

REF: move values_for_json to EA #53680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 25, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1443,6 +1443,10 @@ def _reduce(self, name: str, *, skipna: bool = True, **kwargs):
# Non-Optimized Default Methods; in the case of the private methods here,
# these are not guaranteed to be stable across pandas versions.

def _values_for_json(self) -> np.ndarray:
# TODO: document!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah would be good to have this before merging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd thoughts on what would be useful here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing this should just be an object dtype array where any value is JSON serializable (ex: string, number, list, dict)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the status quo returns non-object ndarrays in many cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yea but that is also because the JSON serializer has extra logic to handle primitives for those types (eg: int64 backing for datetime). Is the idea here that general EAs need to define this or is this scoped to the internally managed EAs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real motivation is to try to move EA-specific logic from Blocks to EAs. I'm kind of hoping that we can stubmle onto a solution to #31917 along the way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. For that issue @lithomas1 might be interested in implementing the serializer.

As far as this PR is concerned I am ok with it either way. Generally not sure it should end up in the EA in the long run, but doesn't hurt to move around today either

return np.asarray(self)

def _hash_pandas_object(
self, *, encoding: str, hash_key: str, categorize: bool
) -> npt.NDArray[np.uint64]:
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -2186,6 +2186,12 @@ def _with_freq(self, freq) -> Self:
# --------------------------------------------------------------
# ExtensionArray Interface

def _values_for_json(self) -> np.ndarray:
# Small performance bump vs the base class which calls np.asarray(self)
if isinstance(self.dtype, np.dtype):
return self._ndarray
return super()._values_for_json()

def factorize(
self,
use_na_sentinel: bool = True,
Expand Down
16 changes: 0 additions & 16 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1602,9 +1602,6 @@ def get_values(self, dtype: DtypeObj | None = None) -> np.ndarray:
"""
raise AbstractMethodError(self)

def values_for_json(self) -> np.ndarray:
raise AbstractMethodError(self)


class EABackedBlock(Block):
"""
Expand Down Expand Up @@ -1835,9 +1832,6 @@ def get_values(self, dtype: DtypeObj | None = None) -> np.ndarray:
# TODO(EA2D): reshape not needed with 2D EAs
return np.asarray(values).reshape(self.shape)

def values_for_json(self) -> np.ndarray:
return np.asarray(self.values)

def interpolate(
self,
*,
Expand Down Expand Up @@ -2145,9 +2139,6 @@ def get_values(self, dtype: DtypeObj | None = None) -> np.ndarray:
return self.values.astype(_dtype_obj)
return self.values

def values_for_json(self) -> np.ndarray:
return self.values

@cache_readonly
def is_numeric(self) -> bool: # type: ignore[override]
dtype = self.values.dtype
Expand Down Expand Up @@ -2245,9 +2236,6 @@ class DatetimeLikeBlock(NDArrayBackedExtensionBlock):
is_numeric = False
values: DatetimeArray | TimedeltaArray

def values_for_json(self) -> np.ndarray:
return self.values._ndarray

def interpolate(
self,
*,
Expand Down Expand Up @@ -2304,10 +2292,6 @@ class DatetimeTZBlock(DatetimeLikeBlock):
_validate_ndim = True
_can_consolidate = False

# Don't use values_for_json from DatetimeLikeBlock since it is
# an invalid optimization here(drop the tz)
values_for_json = NDArrayBackedExtensionBlock.values_for_json


# -----------------------------------------------------------------
# Constructor Helpers
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1110,7 +1110,7 @@ def column_arrays(self) -> list[np.ndarray]:

for blk in self.blocks:
mgr_locs = blk._mgr_locs
values = blk.values_for_json()
values = blk.array_values._values_for_json()
if values.ndim == 1:
# TODO(EA2D): special casing not needed with 2D EAs
result[mgr_locs[0]] = values
Expand Down