-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/CLN: Clean float / complex string formatting #36799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
something is wrong then. _trim_zeros_complex was 'fixed' in #25745, why is test_to_string_complex_float_formatting not failing? has there been some other refactor since? |
My guess is it's passing because there are no zeros to trim in the test case. Instead of trimming zeros it's as though zeros are added; I would expect the formatting behavior instead to be similar to float for both the real and imaginary parts (below is master): [ins] In [4]: s = pd.Series([0.000, 1.000])
...: print(s)
...:
...: s = pd.Series([0.000+1.000j, 1.000+1.000j])
...: print(s)
...:
0 0.0
1 1.0
dtype: float64
0 0.000000+1.000000j
1 1.000000+1.000000j
dtype: complex128 Should I open an issue about this, and either close this PR or turn it into a bug fix? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you run some asv's i am not sure this is really hit much, but maybe
pandas/io/formats/format.py
Outdated
) -> List[str]: | ||
""" | ||
Trims zeros, leaving just one before the decimal points if need be. | ||
""" | ||
trimmed = str_floats | ||
|
||
def _is_number(x): | ||
return x != na_rep and not x.endswith("inf") | ||
return re.match(fr"\s*-?[0-9]+(\{decimal}[0-9]*)?", x) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you compile this and put it on the class / variable
Ran the IO asvs a few times and results aren't consistent. Maybe there's something going on with io.csv / io.hdf but hard to say. |
pandas/io/formats/format.py
Outdated
max_length = max(lengths) | ||
padded = [ | ||
s[: -((k - 1) // 2 + 1)] # real part | ||
+ (max_length - k) // 2 * "0" | ||
+ s[-((k - 1) // 2 + 1) : -((k - 1) // 2)] # + / - | ||
+ s[-((k - 1) // 2) : -1] # imaginary part | ||
+ (max_length - k) // 2 * "0" | ||
+ s[-1] | ||
for s, k in zip(complex_strings, lengths) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be safer to split real and imaginary parts via +- and then process decimal and fractional parts by splitting via the dot? This way you would not need to rely on the symmetry of the original string provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean trim zeros after splitting into fractional non-fractional parts? I think the trimming has to be done with the decimal there. (I realize this helper is very confusing, and there's likely a better way to do this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I mean trimming zeros after splitting into fractional and non-fractional parts. Since a dot char would always split float number, there is no risk to introduce a bug IMHO (even if there is no dot at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think then you have to keep track of which parts are fractional and then only trim those?
However this part is not doing any actual trimming, it's correcting for the fact that the previous function is now trimming "too much." (It trims the real and imaginary parts of each complex number independently, so they aren't aligned afterwards. Rather than rewrite the other function I found it easier to do this post-processing.)
lgtm can you add a whatsnew note and ping on green. |
@jreback Added note + green |
|
||
|
||
def test_to_string_complex_number_trims_zeros(): | ||
s = pd.Series([1.000000 + 1.000000j, 1.0 + 1.0j, 1.05 + 1.0j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should these have 2 decimal zeros and not 1 likely ordinary floats? Where did you get the expected output from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a 1.05 in the last element, the expected output is similar to what happens for floats:
[ins] In [2]: s = pd.Series([1.0, 1.0000, 1.05])
[ins] In [3]: s
Out[3]:
0 1.00
1 1.00
2 1.05
dtype: float64
thought we had merged this, thanks @dsaxton |
Noticed while working on another bug. The _is_number helper here is wrong and can cause incorrect results given that this code path is hit by arbitrary strings (e.g., it thinks "foo" is a number). Also the _trim_zeros_complex helper apparently does nothing: