Skip to content

BUG: to_json failing on PyPy #40525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 23, 2021
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Fixed regressions
~~~~~~~~~~~~~~~~~

- Fixed regression in :meth:`DataFrame.sum` when ``min_count`` greater than the :class:`DataFrame` shape was passed resulted in a ``ValueError`` (:issue:`39738`)
- Fixed regression in :meth:`DataFrame.to_json` raising ``AttributeError`` when run on PyPy (:issue:`39837`)
-

.. ---------------------------------------------------------------------------
Expand Down
10 changes: 5 additions & 5 deletions pandas/_libs/src/ujson/python/objToJSON.c
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ typedef struct __PyObjectEncoder {
enum PANDAS_FORMAT { SPLIT, RECORDS, INDEX, COLUMNS, VALUES };

int PdBlock_iterNext(JSOBJ, JSONTypeContext *);
static Py_ssize_t get_attr_length(PyObject *obj, char *attr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than forward declaring this can you just swap the position the of get_attr_length and is_simple_frame functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make that change


void *initObjToJSON(void) {
PyObject *mod_pandas;
Expand Down Expand Up @@ -273,14 +274,13 @@ static PyObject *get_sub_attr(PyObject *obj, char *attr, char *subAttr) {
}

static int is_simple_frame(PyObject *obj) {
PyObject *check = get_sub_attr(obj, "_mgr", "is_mixed_type");
int ret = (check == Py_False);

if (!check) {
PyObject *mgr = PyObject_GetAttrString(obj, "_mgr");
if (!mgr) {
return 0;
}
int ret = (get_attr_length(mgr, "blocks") <= 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not really sure about this change. The problem with how this code is currently set up is that when is_simple_frame is True then we will just serialize whatever .values is for that DataFrame. However, the rules for extension arrays are a bit different.

Do we have test coverage for DataFrames that only contain 1 block of an extension array? If not I'd be worried this could change things that we aren't testing currently

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My motivation here was to replace the is_mixed_type property the same way it was handled in #40525 (so theoretically this should revert behavior to pre #40525).

Will look into your question and see if that case is covered

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see any coverage, have added some simple tests for the one ExtensionBlock case (which match 1.1.5 behavior). Let me know if this is what you had in mind!


Py_DECREF(check);
Py_DECREF(mgr);
return ret;
}

Expand Down