Skip to content

BUG: Fixed DataFrame.__repr__ Type Error when column dtype is np.record (GH 48526) #48637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 21, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.6.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ Conversion
^^^^^^^^^^
- Bug in constructing :class:`Series` with ``int64`` dtype from a string list raising instead of casting (:issue:`44923`)
- Bug in :meth:`DataFrame.eval` incorrectly raising an ``AttributeError`` when there are negative values in function call (:issue:`46471`)
-
- Bug in :meth:`DataFrame.__repr__` incorrectly raising a ``TypeError`` when the dtype of a column is ``np.record`` (:issue:`48526`)

Strings
^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ def _isna_array(values: ArrayLike, inf_as_na: bool = False):
# "Union[ndarray[Any, Any], ExtensionArraySupportsAnyAll]", variable has
# type "ndarray[Any, dtype[bool_]]")
result = values.isna() # type: ignore[assignment]
elif is_string_or_object_np_dtype(values.dtype):
elif is_string_or_object_np_dtype(values.dtype) or isinstance(values, np.recarray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm appears this function doesn't deal with output formatting? Why is the fixed specifically needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put it there because when the dtype is a np.recarray, then result = np.isnan(values) in core/dtypes/missing.py will be executed and np.isnan cannot be evaluated on a np.recarray, it will throw the error which was also described in the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix still appears like a band-aid fix. pandas doesn't have great support for np.recarrays and it doesn't look like _isna_string_dtype was specifically meant to deal with them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then I have a look what one can do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke
Since _isna_string_dtype is not meant to deal with np.recarray, I have introduced a function that deals with that, can you have a look and tell me if this is okay? Thanks.

result = _isna_string_dtype(values, inf_as_na=inf_as_na)
elif needs_i8_conversion(dtype):
# this is the NaT pattern
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/frame/methods/test_to_records.py
Original file line number Diff line number Diff line change
Expand Up @@ -508,3 +508,11 @@ def test_to_records_datetimeindex_with_tz(self, tz):

# both converted to UTC, so they are equal
tm.assert_numpy_array_equal(result, expected)

def test_to_records_no_typeerror_in_repr(self):
# GH 48526
df = DataFrame([["a", "b"], ["c", "d"], ["e", "f"]], columns=["left", "right"])
df["record"] = df[["left", "right"]].to_records()

with tm.assert_produces_warning(False):
print(df)