Skip to content

BUG: repr of inf values with use_inf_as_na #55483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,7 @@ Other
^^^^^
- Bug in :func:`cut` incorrectly allowing cutting of timezone-aware datetimes with timezone-naive bins (:issue:`54964`)
- Bug in :meth:`DataFrame.apply` where passing ``raw=True`` ignored ``args`` passed to the applied function (:issue:`55009`)
- Bug in rendering ``inf`` values inside a a :class:`DataFrame` with the ``use_inf_as_na`` option enabled (:issue:`55483`)
- Bug in rendering a :class:`Series` with a :class:`MultiIndex` when one of the index level's names is 0 not having that name displayed (:issue:`55415`)
-

Expand Down
1 change: 0 additions & 1 deletion pandas/_libs/missing.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,3 @@ def isneginf_scalar(val: object) -> bool: ...
def checknull(val: object, inf_as_na: bool = ...) -> bool: ...
def isnaobj(arr: np.ndarray, inf_as_na: bool = ...) -> npt.NDArray[np.bool_]: ...
def is_numeric_na(values: np.ndarray) -> npt.NDArray[np.bool_]: ...
def is_float_nan(values: np.ndarray) -> npt.NDArray[np.bool_]: ...
25 changes: 0 additions & 25 deletions pandas/_libs/missing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -255,31 +255,6 @@ cdef bint checknull_with_nat_and_na(object obj):
return checknull_with_nat(obj) or obj is C_NA


@cython.wraparound(False)
@cython.boundscheck(False)
def is_float_nan(values: ndarray) -> ndarray:
"""
True for elements which correspond to a float nan

Returns
-------
ndarray[bool]
"""
cdef:
ndarray[uint8_t] result
Py_ssize_t i, N
object val

N = len(values)
result = np.zeros(N, dtype=np.uint8)

for i in range(N):
val = values[i]
if util.is_nan(val):
result[i] = True
return result.view(bool)


@cython.wraparound(False)
@cython.boundscheck(False)
def is_numeric_na(values: ndarray) -> ndarray:
Expand Down
13 changes: 2 additions & 11 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@
is_datetime_array,
no_default,
)
from pandas._libs.missing import is_float_nan
from pandas._libs.tslibs import (
IncompatibleFrequency,
OutOfBoundsDatetime,
Expand Down Expand Up @@ -1390,16 +1389,8 @@ def _format_with_header(self, *, header: list[str_t], na_rep: str_t) -> list[str

if is_object_dtype(values.dtype) or is_string_dtype(values.dtype):
values = np.asarray(values)
values = lib.maybe_convert_objects(values, safe=True)

result = [pprint_thing(x, escape_chars=("\t", "\r", "\n")) for x in values]

# could have nans
mask = is_float_nan(values)
if mask.any():
result_arr = np.array(result)
result_arr[mask] = na_rep
result = result_arr.tolist()
# TODO: why do we need different justify for these cases?
result = trim_front(format_array(values, None, justify="all"))
else:
result = trim_front(format_array(values, None, justify="left"))
return header + result
Expand Down
22 changes: 10 additions & 12 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -1215,18 +1215,16 @@ def _format_strings(self) -> list[str]:

def _format(x):
if self.na_rep is not None and is_scalar(x) and isna(x):
try:
# try block for np.isnat specifically
# determine na_rep if x is None or NaT-like
if x is None:
return "None"
elif x is NA:
return str(NA)
elif x is NaT or np.isnat(x):
return "NaT"
except (TypeError, ValueError):
# np.isnat only handles datetime or timedelta objects
pass
if x is None:
return "None"
elif x is NA:
return str(NA)
elif lib.is_float(x) and np.isinf(x):
# TODO(3.0): this will be unreachable when use_inf_as_na
# deprecation is enforced
return str(x)
elif x is NaT or isinstance(x, (np.datetime64, np.timedelta64)):
return "NaT"
return self.na_rep
elif isinstance(x, PandasObject):
return str(x)
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/frame/test_repr_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,7 @@ def test_to_records_with_na_record(self):
def test_to_records_with_inf_as_na_record(self):
# GH 48526
expected = """ NaN inf record
0 NaN b [0, inf, b]
0 inf b [0, inf, b]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If use_inf_as_na is enabled, shouldn't inf still be rendered as NaN?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we still should render the "inf" accurately Notice on L413 that we already do this when the inf is in the columns

1 NaN NaN [1, nan, nan]
2 e f [2, e, f]"""
msg = "use_inf_as_na option is deprecated"
Expand Down