Skip to content

POC: pass date_unit to values_for_json #54198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
20 changes: 19 additions & 1 deletion pandas/_libs/src/vendored/ujson/python/objToJSON.c
Original file line number Diff line number Diff line change
Expand Up @@ -720,7 +720,25 @@ void PdBlock_iterBegin(JSOBJ _obj, JSONTypeContext *tc) {
return;
}

arrays = get_sub_attr(obj, "_mgr", "column_arrays");
NPY_DATETIMEUNIT dunit = ((PyObjectEncoder *)tc)->datetimeUnit;
char *date_unit;
if (dunit == NPY_FR_s) {
date_unit = "s";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you want to do here is create a Python object from the literal string, so PyUnicode_FromString("s")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this comment if you change L730 to PyObject_CallMethod - in that case you can just declare date_unit as a char * and call it a day

} else if (dunit == NPY_FR_ms) {
date_unit = "ms";
} else if (dunit == NPY_FR_us) {
date_unit = "us";
} else if (dunit == NPY_FR_ns) {
date_unit = "ns";
}

PyObject *mgr = PyObject_GetAttrString(obj, "_mgr");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be some code segments that hit this but don't have a _mgr attr. This would return NULL and could explain the segfaults

So you will want to check if the result is equal to NULL and return an appropriate error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update, but we only get here with a DataFrame.

if (mgr == NULL) {
return;
}
arrays = PyObject_CallMethod(mgr, "column_arrays", "%s", date_unit);
Py_DECREF(mgr);

if (!arrays) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you are done with mgr you will want to Py_DECREF(mgr) to avoid memory leaks

GET_TC(tc)->iterNext = NpyArr_iterNextNone;
return;
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1736,7 +1736,7 @@ def _reduce(
# Non-Optimized Default Methods; in the case of the private methods here,
# these are not guaranteed to be stable across pandas versions.

def _values_for_json(self) -> np.ndarray:
def _values_for_json(self, date_unit: str) -> np.ndarray:
"""
Specify how to render our entries in to_json.

Expand Down
22 changes: 18 additions & 4 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -2202,11 +2202,25 @@ def _with_freq(self, freq) -> Self:
# --------------------------------------------------------------
# ExtensionArray Interface

def _values_for_json(self) -> np.ndarray:
def _values_for_json(self, date_unit: str) -> np.ndarray:
# Small performance bump vs the base class which calls np.asarray(self)
if isinstance(self.dtype, np.dtype):
return self._ndarray
return super()._values_for_json()
if self.dtype.kind == "M":
if date_unit == "s":
return self.strftime("%Y-%M-%D %h-%m-%s")
elif date_unit == "ms":
raise NotImplementedError
elif date_unit == "us":
return self.strftime("%Y-%M-%D %h-%m-%s.%f")
elif date_unit == "ns":
return self.astype(str)
else:
raise NotImplementedError
elif self.dtype.kind == "m":
raise NotImplementedError
else:
return super()._values_for_json(date_unit)

return self._ndarray

def factorize(
self,
Expand Down
3 changes: 1 addition & 2 deletions pandas/core/internals/array_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -691,8 +691,7 @@ def iget_values(self, i: int) -> ArrayLike:
"""
return self.arrays[i]

@property
def column_arrays(self) -> list[ArrayLike]:
def column_arrays(self, date_unit) -> list[np.ndarray]:
"""
Used in the JSON C code to access column arrays.
"""
Expand Down
5 changes: 2 additions & 3 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -993,8 +993,7 @@ def iget_values(self, i: int) -> ArrayLike:
values = block.iget(self.blklocs[i])
return values

@property
def column_arrays(self) -> list[np.ndarray]:
def column_arrays(self, date_unit) -> list[np.ndarray]:
"""
Used in the JSON C code to access column arrays.
This optimizes compared to using `iget_values` by converting each
Expand All @@ -1008,7 +1007,7 @@ def column_arrays(self) -> list[np.ndarray]:

for blk in self.blocks:
mgr_locs = blk._mgr_locs
values = blk.array_values._values_for_json()
values = blk.array_values._values_for_json(date_unit)
if values.ndim == 1:
# TODO(EA2D): special casing not needed with 2D EAs
result[mgr_locs[0]] = values
Expand Down