Skip to content

BUG: float-like string, trailing 0 truncation #38759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 29, 2020

Conversation

mzeitlin11
Copy link
Member

@mzeitlin11 mzeitlin11 added Regression Functionality that used to work in a prior pandas version Output-Formatting __repr__ of pandas objects, to_string labels Dec 28, 2020
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, just some minor test/doc requests.

@@ -15,6 +15,7 @@ including other versions of pandas.
Fixed regressions
~~~~~~~~~~~~~~~~~
- The deprecated attributes ``_AXIS_NAMES`` and ``_AXIS_NUMBERS`` of :class:`DataFrame` and :class:`Series` will no longer show up in ``dir`` or ``inspect.getmembers`` calls (:issue:`38740`)
- Bug in float-like strings having trailing 0's truncated (:issue:`38708`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add in "repr of" and "after the decimal" here for clarity that it is just a bug in regards to presentation (which is still important!).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not floats, rather strings in an object dtype

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -237,6 +237,18 @@ def test_series_repr_nat(self):
)
assert result == expected

def test_series_repr_float_like_object_no_truncate(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you parametrize/combine these two tests using the parameters data and expected, and add in cases where np.nan/None values are present, including data = [np.nan] and data = [None].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -15,6 +15,7 @@ including other versions of pandas.
Fixed regressions
~~~~~~~~~~~~~~~~~
- The deprecated attributes ``_AXIS_NAMES`` and ``_AXIS_NUMBERS`` of :class:`DataFrame` and :class:`Series` will no longer show up in ``dir`` or ``inspect.getmembers`` calls (:issue:`38740`)
- Bug in float-like strings having trailing 0's truncated (:issue:`38708`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not floats, rather strings in an object dtype

@@ -1310,7 +1310,9 @@ def _format(x):
tpl = " {v}"
fmt_values.append(tpl.format(v=_format(v)))

fmt_values = _trim_zeros_float(str_floats=fmt_values, decimal=".")
fmt_values = _trim_zeros_float(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a larger issue to address first. these are not floats at all. these are object dtypes that are reprering. why are they hitting this path is the issue. we likley do not want to touch the formatters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are likley getting inferred as floats and that is why the formatter is picked. this is incorrect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the GenericArrayFormatter - seems reasonable for object type data. I think hitting _trim_zeros_float in GenericArrayFormatter makes sense because of a case like

s = pd.Series([1.20, "1.00"])
repr(s)

which should hit _trim_zeros_float so that 1.20 is truncated to 1.2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sure but a string should not be truncated. that's where the issue is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see the issue here.

instead of _trim_zeros_float being a function that accepts a ndarray/list of strings, just have it process a single value, then call it when needed in the loop on L1299.

this makes the code much simpler. i am not entirely sure why it was done this way.

Copy link
Member Author

@mzeitlin11 mzeitlin11 Dec 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason why it was done that way is that floats are formatted with fixed-width, e.g.

s = pd.Series([1.230, 1.20000])
repr(s)

gives

0    1.23
1    1.20
dtype: float64

_trim_zeros_float was designed for usage in FloatArrayFormatter (so it needs the whole list to keep fixed-width, since it only truncates if all values can be truncated). The usage of this function in GenericArrayFormatter is what caused the regression. I think then it makes sense to keep _trim_zeros_float as is for that original purpose, and GenericArrayFormatter can use a much simpler solution for truncation since it does not have the same fixed width requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that sounds fine, tests are pretty sensitive here so you can refactor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the help here, this is much cleaner!

],
)
def test_repr_str_float_truncation(self, data, expected):
series = Series(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here

@@ -237,6 +237,24 @@ def test_series_repr_nat(self):
)
assert result == expected

@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests go in pandas/tests/io/formats/test_format.py (as that is where all of the formatting tests are)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put after this test test_float_trim_zeros

@jreback jreback added this to the 1.2.1 milestone Dec 29, 2020
@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

also pls rebase

@mzeitlin11
Copy link
Member Author

Moved tests, added issue number, fixed conflict

@jreback jreback merged commit 7f912a4 into pandas-dev:master Dec 29, 2020
@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

thanks @mzeitlin11

@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

@meeseeksdev backport 1.2.x

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Dec 29, 2020
@mzeitlin11 mzeitlin11 deleted the bug/float_like_str_repr branch December 29, 2020 03:20
gfyoung pushed a commit that referenced this pull request Dec 29, 2020
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
* BUG: float-like string, trailing 0 truncation

* Don't use _trim_zeros_float
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: repr of stringified floats (e.g. "3.50") drop ending "0"
3 participants