BUG: Fixed DataFrame.repr Type Error when column dtype is np.record (GH 48526) #48637

RaphSku · 2022-09-19T10:03:21Z

closes BUG: TypeError when printing dataframe with numpy.record dtype column #48526
[Tests added and passed]
All [code checks passed]
Added an entry in the latest doc/source/whatsnew/v1.6.0.rst file if fixing a bug or adding a new feature.

pandas/tests/frame/methods/test_to_records.py

mroeschke · 2022-09-19T19:11:30Z

pandas/core/dtypes/missing.py

@@ -292,7 +292,7 @@ def _isna_array(values: ArrayLike, inf_as_na: bool = False):
            # "Union[ndarray[Any, Any], ExtensionArraySupportsAnyAll]", variable has
            # type "ndarray[Any, dtype[bool_]]")
            result = values.isna()  # type: ignore[assignment]
-    elif is_string_or_object_np_dtype(values.dtype):
+    elif is_string_or_object_np_dtype(values.dtype) or isinstance(values, np.recarray):


Hmm appears this function doesn't deal with output formatting? Why is the fixed specifically needed here?

I've put it there because when the dtype is a np.recarray, then result = np.isnan(values) in core/dtypes/missing.py will be executed and np.isnan cannot be evaluated on a np.recarray, it will throw the error which was also described in the issue.

This fix still appears like a band-aid fix. pandas doesn't have great support for np.recarrays and it doesn't look like _isna_string_dtype was specifically meant to deal with them

Okay, then I have a look what one can do.

@mroeschke
Since _isna_string_dtype is not meant to deal with np.recarray, I have introduced a function that deals with that, can you have a look and tell me if this is okay? Thanks.

… frame

…e_repr_bugfix

github-actions · 2022-10-21T00:11:06Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

RaphSku · 2022-10-22T11:40:30Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

still interested

mroeschke · 2022-12-06T17:46:07Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

RaphSku · 2023-03-10T11:54:11Z

@mroeschke
Can we re-open the pull request? I merged the main branch and your first and second review comment were already considered: I moved the test to pandas/tests/frame/test_repr_info.py and also checked against repr(df). I commented on your last review comment but never got an answer if my explanation was sufficient or not.
Thank you in advance.

mroeschke · 2023-03-10T18:09:57Z

Sure, reopened this pull request

…e_repr_bugfix

RaphSku · 2023-03-10T23:16:06Z

@mroeschke
It passes the CI tests, but there is some connection problem with codecov:
"['error'] There was an error running the uploader: Error uploading to https://codecov.io: Error: There was an error fetching the storage URL during POST: 503 - upstream connect error or disconnect/reset before headers. reset reason: connection failure"

mroeschke · 2023-03-13T18:10:24Z

pandas/core/dtypes/missing.py

+def _isna_recarray_dtype(values: np.recarray, inf_as_na: bool) -> npt.NDArray[np.bool_]:
+    result = np.zeros(values.shape, dtype=bool)
+    for i, record in enumerate(values):
+        result[i] = libmissing.isnaobj(np.array(record), inf_as_na=inf_as_na)


I don't think isnaobj is well defined for a record. Do all elements in the record need to be na-like? Does the record itself need to be na-like?

Yes, you are right, isnaobj doesn't work on records.
I added in my latest commit 2 more tests to see if this is what you would expect from it and also refined the _isna_recarray_dtype method and made it return False if any value in the record is nan.
Can you have a look if this is reasonable? If you wish that I add tests for this function, I can add them, but could you tell me where I should place these?

@mroeschke What is your opinion on this?

mroeschke · 2023-04-12T16:30:18Z

pandas/core/dtypes/missing.py

+    result = np.zeros(values.shape, dtype=bool)
+    for i, record in enumerate(values):
+        does_record_contain_nan = np.zeros(len(record), dtype=bool)
+        if inf_as_na:


Instead of if inf_as_na everywhere could you create a checker function initially like

if inf_as_na: is_missing = lambda x: np.isnan(x) or np.isinf(x) else: is_missing = lambda x: np.isnan(x)

And use that in the loop?

@mroeschke I rewrote the function to increase readbility. What do you think about the change?

mroeschke · 2023-04-12T16:31:17Z

pandas/tests/frame/test_repr_info.py

+        result = repr(df)
+        assert result == expected
+
+    def test_to_records_with_na_record_value(self):


Would be good to have tests for inf as well here

@mroeschke I added 2 tests for inf, is the expected result what you would also expect?

mroeschke · 2023-04-17T17:20:35Z

pandas/tests/frame/test_repr_info.py

+
+    def test_to_records_with_inf_as_na_record(self):
+        # GH 48526
+        options.mode.use_inf_as_na = True


Please use the with option_context(...) context manager

mroeschke · 2023-04-17T17:20:40Z

pandas/tests/frame/test_repr_info.py

+
+    def test_to_records_with_inf_record(self):
+        # GH 48526
+        options.mode.use_inf_as_na = False


mroeschke · 2023-04-17T17:22:43Z

doc/source/whatsnew/v2.0.0.rst

@@ -1229,6 +1229,7 @@ Conversion
 - Bug in :meth:`Series.to_numpy` converting to NumPy array before applying ``na_value`` (:issue:`48951`)
 - Bug in :meth:`DataFrame.astype` not copying data when converting to pyarrow dtype (:issue:`50984`)
 - Bug in :func:`to_datetime` was not respecting ``exact`` argument when ``format`` was an ISO8601 format (:issue:`12649`)
+- Bug in :meth:`DataFrame.__repr__` incorrectly raising a ``TypeError`` when the dtype of a column is ``np.record`` (:issue:`48526`)


Can you move this to 2.1

Of course, done

mroeschke · 2023-04-17T17:23:33Z

pandas/core/dtypes/missing.py

+    result = np.zeros(values.shape, dtype=bool)
+    for i, record in enumerate(values):
+        does_record_contain_nan = np.zeros(len(record), dtype=bool)
+        for j, element in enumerate(record):


Can you just call isnan and/or isinf on record?

@mroeschke I used isnan_all for the record and I rewrote a little bit the isinf part but I cannot use isinfdirectly on the record since it will throw a TypeError when the record value is a string. But I hope the way I rewrote the method still improved on the previous version.

mroeschke · 2023-04-21T17:28:05Z

Thanks for sticking with this @RaphSku

…rd (GH 48526) (pandas-dev#48637) * BUG: Fixed repr type error when column in np.record * Reallocated the test to proposed destination and test the repr of the frame * Assert has gone missing due to merge conflict * Added new function that handles np.recarrays in _isna_array * Addes 2 more tests + refined _isna_recarray_dtype + formatting fixed * Reworked isna_recarray_dtype method + Added test for inf value * Pre-commit fix * Fixing typing * Fixed formatting * Fixed pyright error --------- Co-authored-by: RaphSku <[email protected]>

BUG: Fixed repr type error when column in np.record

bd811e2

RaphSku changed the title ~~BUG: Fixed DataFrame.__repr__ Type Error when column dtype is np.record~~ BUG: Fixed DataFrame.__repr__ Type Error when column dtype is np.record (GH 48526) Sep 19, 2022

mroeschke reviewed Sep 19, 2022

View reviewed changes

pandas/tests/frame/methods/test_to_records.py Outdated Show resolved Hide resolved

mroeschke reviewed Sep 19, 2022

View reviewed changes

pandas/tests/frame/methods/test_to_records.py Outdated Show resolved Hide resolved

mroeschke reviewed Sep 19, 2022

View reviewed changes

mroeschke added Output-Formatting __repr__ of pandas objects, to_string Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). labels Sep 19, 2022

RaphSku added 2 commits September 20, 2022 10:50

Reallocated the test to proposed destination and test the repr of the…

a454c0f

… frame

Merge branch 'main' of https://github.com/pandas-dev/pandas into fram…

34109e3

…e_repr_bugfix

github-actions bot added the Stale label Oct 21, 2022

Merged main branch and switched to doc/v2.0.0.rst

8204a58

RaphSku requested a review from mroeschke October 22, 2022 11:29

RaphSku closed this Oct 22, 2022

RaphSku reopened this Oct 22, 2022

stefan-jansen mentioned this pull request Oct 31, 2022

BUG: .unstack() with recarray column raises TypeError since 1.4.0 #49388

Open

3 tasks

Merged main branch + resolved conflict

25a9ab1

mroeschke closed this Dec 6, 2022

Merged main

2ecf9a3

Assert has gone missing due to merge conflict

dd837ce

mroeschke reopened this Mar 10, 2023

Merge branch 'main' of https://github.com/pandas-dev/pandas into fram…

076814f

…e_repr_bugfix

Added new function that handles np.recarrays in _isna_array

3a15ab1

mroeschke reviewed Mar 13, 2023

View reviewed changes

Addes 2 more tests + refined _isna_recarray_dtype + formatting fixed

961ebf0

mroeschke reviewed Apr 12, 2023

View reviewed changes

RaphSku added 4 commits April 14, 2023 20:47

Reworked isna_recarray_dtype method + Added test for inf value

4fb9443

Pre-commit fix

923d29d

Fixing typing

fdcdb57

Updating branch with upstream main branch

5c4c29f

mroeschke reviewed Apr 17, 2023

View reviewed changes

RaphSku added 3 commits April 19, 2023 21:29

Updated branch with upstream main branch

8275116

Fixed formatting

396172c

Fixed pyright error

145699f

mroeschke approved these changes Apr 21, 2023

View reviewed changes

mroeschke added this to the 2.1 milestone Apr 21, 2023

mroeschke merged commit c3f0aac into pandas-dev:main Apr 21, 2023

RaphSku deleted the frame_repr_bugfix branch April 21, 2023 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fixed DataFrame.repr Type Error when column dtype is np.record (GH 48526) #48637

BUG: Fixed DataFrame.repr Type Error when column dtype is np.record (GH 48526) #48637

RaphSku commented Sep 19, 2022

mroeschke Sep 19, 2022

RaphSku Sep 20, 2022

mroeschke Mar 11, 2023

RaphSku Mar 11, 2023

RaphSku Mar 11, 2023

github-actions bot commented Oct 21, 2022

RaphSku commented Oct 22, 2022

mroeschke commented Dec 6, 2022

RaphSku commented Mar 10, 2023 •

edited

Loading

mroeschke commented Mar 10, 2023

RaphSku commented Mar 10, 2023 •

edited

Loading

mroeschke Mar 13, 2023

RaphSku Mar 14, 2023

RaphSku Apr 12, 2023

mroeschke Apr 12, 2023

RaphSku Apr 15, 2023

mroeschke Apr 12, 2023

RaphSku Apr 15, 2023

mroeschke Apr 17, 2023

RaphSku Apr 20, 2023

mroeschke Apr 17, 2023

RaphSku Apr 20, 2023

mroeschke Apr 17, 2023

RaphSku Apr 20, 2023 •

edited

Loading

mroeschke Apr 17, 2023

RaphSku Apr 20, 2023 •

edited

Loading

mroeschke commented Apr 21, 2023

BUG: Fixed DataFrame.__repr__ Type Error when column dtype is np.record (GH 48526) #48637

BUG: Fixed DataFrame.__repr__ Type Error when column dtype is np.record (GH 48526) #48637

Conversation

RaphSku commented Sep 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 21, 2022

RaphSku commented Oct 22, 2022

mroeschke commented Dec 6, 2022

RaphSku commented Mar 10, 2023 • edited Loading

mroeschke commented Mar 10, 2023

RaphSku commented Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RaphSku Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RaphSku Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

mroeschke commented Apr 21, 2023

BUG: Fixed DataFrame.repr Type Error when column dtype is np.record (GH 48526) #48637

BUG: Fixed DataFrame.repr Type Error when column dtype is np.record (GH 48526) #48637

RaphSku commented Mar 10, 2023 •

edited

Loading

RaphSku commented Mar 10, 2023 •

edited

Loading

RaphSku Apr 20, 2023 •

edited

Loading

RaphSku Apr 20, 2023 •

edited

Loading