-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fixed DataFrame.__repr__ Type Error when column dtype is np.record (GH 48526) #48637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/dtypes/missing.py
Outdated
@@ -292,7 +292,7 @@ def _isna_array(values: ArrayLike, inf_as_na: bool = False): | |||
# "Union[ndarray[Any, Any], ExtensionArraySupportsAnyAll]", variable has | |||
# type "ndarray[Any, dtype[bool_]]") | |||
result = values.isna() # type: ignore[assignment] | |||
elif is_string_or_object_np_dtype(values.dtype): | |||
elif is_string_or_object_np_dtype(values.dtype) or isinstance(values, np.recarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm appears this function doesn't deal with output formatting? Why is the fixed specifically needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put it there because when the dtype is a np.recarray, then result = np.isnan(values)
in core/dtypes/missing.py
will be executed and np.isnan cannot be evaluated on a np.recarray, it will throw the error which was also described in the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix still appears like a band-aid fix. pandas doesn't have great support for np.recarray
s and it doesn't look like _isna_string_dtype
was specifically meant to deal with them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, then I have a look what one can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke
Since _isna_string_dtype
is not meant to deal with np.recarray
, I have introduced a function that deals with that, can you have a look and tell me if this is okay? Thanks.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
still interested |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
@mroeschke |
Sure, reopened this pull request |
@mroeschke |
pandas/core/dtypes/missing.py
Outdated
def _isna_recarray_dtype(values: np.recarray, inf_as_na: bool) -> npt.NDArray[np.bool_]: | ||
result = np.zeros(values.shape, dtype=bool) | ||
for i, record in enumerate(values): | ||
result[i] = libmissing.isnaobj(np.array(record), inf_as_na=inf_as_na) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think isnaobj
is well defined for a record. Do all elements in the record need to be na-like? Does the record itself need to be na-like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, isnaobj doesn't work on records.
I added in my latest commit 2 more tests to see if this is what you would expect from it and also refined the _isna_recarray_dtype method and made it return False if any value in the record is nan.
Can you have a look if this is reasonable? If you wish that I add tests for this function, I can add them, but could you tell me where I should place these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke What is your opinion on this?
pandas/core/dtypes/missing.py
Outdated
result = np.zeros(values.shape, dtype=bool) | ||
for i, record in enumerate(values): | ||
does_record_contain_nan = np.zeros(len(record), dtype=bool) | ||
if inf_as_na: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of if inf_as_na
everywhere could you create a checker
function initially like
if inf_as_na:
is_missing = lambda x: np.isnan(x) or np.isinf(x)
else:
is_missing = lambda x: np.isnan(x)
And use that in the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke I rewrote the function to increase readbility. What do you think about the change?
result = repr(df) | ||
assert result == expected | ||
|
||
def test_to_records_with_na_record_value(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have tests for inf as well here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke I added 2 tests for inf, is the expected result what you would also expect?
pandas/tests/frame/test_repr_info.py
Outdated
|
||
def test_to_records_with_inf_as_na_record(self): | ||
# GH 48526 | ||
options.mode.use_inf_as_na = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the with option_context(...)
context manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/tests/frame/test_repr_info.py
Outdated
|
||
def test_to_records_with_inf_record(self): | ||
# GH 48526 | ||
options.mode.use_inf_as_na = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/source/whatsnew/v2.0.0.rst
Outdated
@@ -1229,6 +1229,7 @@ Conversion | |||
- Bug in :meth:`Series.to_numpy` converting to NumPy array before applying ``na_value`` (:issue:`48951`) | |||
- Bug in :meth:`DataFrame.astype` not copying data when converting to pyarrow dtype (:issue:`50984`) | |||
- Bug in :func:`to_datetime` was not respecting ``exact`` argument when ``format`` was an ISO8601 format (:issue:`12649`) | |||
- Bug in :meth:`DataFrame.__repr__` incorrectly raising a ``TypeError`` when the dtype of a column is ``np.record`` (:issue:`48526`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to 2.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, done
pandas/core/dtypes/missing.py
Outdated
result = np.zeros(values.shape, dtype=bool) | ||
for i, record in enumerate(values): | ||
does_record_contain_nan = np.zeros(len(record), dtype=bool) | ||
for j, element in enumerate(record): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just call isnan
and/or isinf
on record
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke I used isnan_all for the record and I rewrote a little bit the isinf part but I cannot use isinf
directly on the record since it will throw a TypeError
when the record value is a string. But I hope the way I rewrote the method still improved on the previous version.
Thanks for sticking with this @RaphSku |
…rd (GH 48526) (pandas-dev#48637) * BUG: Fixed repr type error when column in np.record * Reallocated the test to proposed destination and test the repr of the frame * Assert has gone missing due to merge conflict * Added new function that handles np.recarrays in _isna_array * Addes 2 more tests + refined _isna_recarray_dtype + formatting fixed * Reworked isna_recarray_dtype method + Added test for inf value * Pre-commit fix * Fixing typing * Fixed formatting * Fixed pyright error --------- Co-authored-by: RaphSku <[email protected]>
doc/source/whatsnew/v1.6.0.rst
file if fixing a bug or adding a new feature.