-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: float-like string, trailing 0 truncation #38759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: float-like string, trailing 0 truncation #38759
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, just some minor test/doc requests.
doc/source/whatsnew/v1.2.1.rst
Outdated
@@ -15,6 +15,7 @@ including other versions of pandas. | |||
Fixed regressions | |||
~~~~~~~~~~~~~~~~~ | |||
- The deprecated attributes ``_AXIS_NAMES`` and ``_AXIS_NUMBERS`` of :class:`DataFrame` and :class:`Series` will no longer show up in ``dir`` or ``inspect.getmembers`` calls (:issue:`38740`) | |||
- Bug in float-like strings having trailing 0's truncated (:issue:`38708`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add in "repr of" and "after the decimal" here for clarity that it is just a bug in regards to presentation (which is still important!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not floats, rather strings in an object
dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/tests/series/test_repr.py
Outdated
@@ -237,6 +237,18 @@ def test_series_repr_nat(self): | |||
) | |||
assert result == expected | |||
|
|||
def test_series_repr_float_like_object_no_truncate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you parametrize/combine these two tests using the parameters data
and expected
, and add in cases where np.nan/None values are present, including data = [np.nan]
and data = [None]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/source/whatsnew/v1.2.1.rst
Outdated
@@ -15,6 +15,7 @@ including other versions of pandas. | |||
Fixed regressions | |||
~~~~~~~~~~~~~~~~~ | |||
- The deprecated attributes ``_AXIS_NAMES`` and ``_AXIS_NUMBERS`` of :class:`DataFrame` and :class:`Series` will no longer show up in ``dir`` or ``inspect.getmembers`` calls (:issue:`38740`) | |||
- Bug in float-like strings having trailing 0's truncated (:issue:`38708`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not floats, rather strings in an object
dtype
pandas/io/formats/format.py
Outdated
@@ -1310,7 +1310,9 @@ def _format(x): | |||
tpl = " {v}" | |||
fmt_values.append(tpl.format(v=_format(v))) | |||
|
|||
fmt_values = _trim_zeros_float(str_floats=fmt_values, decimal=".") | |||
fmt_values = _trim_zeros_float( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a larger issue to address first. these are not floats at all. these are object dtypes that are reprering. why are they hitting this path is the issue. we likley do not want to touch the formatters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are likley getting inferred as floats and that is why the formatter is picked. this is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the GenericArrayFormatter
- seems reasonable for object type data. I think hitting _trim_zeros_float
in GenericArrayFormatter
makes sense because of a case like
s = pd.Series([1.20, "1.00"])
repr(s)
which should hit _trim_zeros_float
so that 1.20 is truncated to 1.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sure but a string should not be truncated. that's where the issue is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see the issue here.
instead of _trim_zeros_float being a function that accepts a ndarray/list of strings, just have it process a single value, then call it when needed in the loop on L1299.
this makes the code much simpler. i am not entirely sure why it was done this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason why it was done that way is that floats are formatted with fixed-width, e.g.
s = pd.Series([1.230, 1.20000])
repr(s)
gives
0 1.23
1 1.20
dtype: float64
_trim_zeros_float
was designed for usage in FloatArrayFormatter
(so it needs the whole list to keep fixed-width, since it only truncates if all values can be truncated). The usage of this function in GenericArrayFormatter
is what caused the regression. I think then it makes sense to keep _trim_zeros_float
as is for that original purpose, and GenericArrayFormatter
can use a much simpler solution for truncation since it does not have the same fixed width requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that sounds fine, tests are pretty sensitive here so you can refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the help here, this is much cleaner!
pandas/tests/series/test_repr.py
Outdated
], | ||
) | ||
def test_repr_str_float_truncation(self, data, expected): | ||
series = Series(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number here
pandas/tests/series/test_repr.py
Outdated
@@ -237,6 +237,24 @@ def test_series_repr_nat(self): | |||
) | |||
assert result == expected | |||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests go in pandas/tests/io/formats/test_format.py (as that is where all of the formatting tests are)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put after this test test_float_trim_zeros
also pls rebase |
Moved tests, added issue number, fixed conflict |
thanks @mzeitlin11 |
@meeseeksdev backport 1.2.x |
…8769) Co-authored-by: mzeitlin11 <[email protected]>
* BUG: float-like string, trailing 0 truncation * Don't use _trim_zeros_float
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff